资源描述:
《A Transcription Task for Crowdsourcing with Automatic Quality Control一种自动质量控制的众包转录任务》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、INTERSPEECH2011ATranscriptionTaskforCrowdsourcingwithAutomaticQualityControlChia-yingLeeandJamesGlassMITComputerScienceandArtificialIntelligenceLaboratoryCambridge,Massachusetts02139USAfchiaying,glassg@csail.mit.eduAbstractpoorqualitytranscriptsandallowspo
2、orqualitytranscriptstobescreenedoutquicklyduringpost-processing.InthesecondInthispaper,weproposeatwo-stagetranscriptiontaskde-stage,wordlevelconfidencescoresareutilizedtoprovidein-signforcrowdsourcingwithanautomaticqualitycontrolmech-stantaneousfeedbacktow
3、orkersregardingtheirperformanceanismembeddedineachstage.Forthefirststage,asupportonthetranscriptiontask.Thistwo-stagetranscriptiontaskwasvectormachine(SVM)classifierisutilizedtoquicklyfilterpoortestedonacademiclecturespeechviaMturkandcomparedtoqualitytranscr
4、iptsbasedonacousticcuesandlanguagepatternstranscriptscollectedbyusingROVERwiththreeworkers[2,9].inthetranscript.Inthesecondstage,wordlevelconfidencescoresareusedtoestimateatranscriptionqualityandprovideinstantaneousfeedbacktothetranscriber.Theproposedde-2.
5、Two-stageTranscriptionTasksignwasevaluatedusingAmazonMechanicalTurk(MTurk)Thetranscriptiontaskwesetuphastwophases:first,aShortandtestedonsevenhoursofacademiclecturespeech,whichisTranscriptionstage,andsecond,aTranscriptRefinementstage.typicallyconversational
6、innatureandcontainstechnicalmate-EachstagehasanASR-enabledqualitycontrollerthatmeasuresrial.ComparedtobaselinetranscriptswhichwerealsocollectedtheinputqualityandmakesiteasytofilteroutpoorqualityfromMTurkusingaROVER-basedmethod,weobservedthattranscripts.Wed
7、escribeeachphaseindetailinthefollowing.thenewmethodresultedinhigherqualitytranscriptswhilere-quiringlesstranscribereffort.2.1.ShortTranscriptionIndexTerms:Transcription,crowdsourcing,qualitycontrolForthefirststageoftranscription,wewantedtocreateaset1.Intro
8、ductionofsmalltasksthatwouldrequireaconsistentamountofeffortfromworkers.SincetheaudiodataweweretranscribingwereTranscribingspeechdatahashistoricallybeenanexpensiveandontheorderofanhourormore,wedecidedtoautomatically