资源描述:
《Ranking annotators for crowdsourced labeling tasks众源标记任务排序注释器》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、RankingannotatorsforcrowdsourcedlabelingtasksVikasC.RaykarShipengYuSiemensHealthcare,Malvern,PA,USASiemensHealthcare,Malvern,PA,USAvikas.raykar@siemens.comshipeng.yu@siemens.comAbstractWiththeadventofcrowdsourcingservicesithasbecomequitecheapandreason-ablyeffectivetogetadatasetlab
2、eledbymultipleannotatorsinashortamountoftime.Variousmethodshavebeenproposedtoestimatetheconsensuslabelsbycorrectingforthebiasofannotatorswithdifferentkindsofexpertise.Oftenwehavelowqualityannotatorsorspammers–annotatorswhoassignlabelsrandomly(e.g.,withoutactuallylookingattheinstan
3、ce).Spammerscanmakethecostofacquiringlabelsveryexpensiveandcanpotentiallydegradethequalityofthecon-sensuslabels.Inthispaperweformalizethenotionofaspammeranddefineascorewhichcanbeusedtoranktheannotators—withthespammershavingascoreclosetozeroandthegoodannotatorshavingahighscorecloset
4、oone.1SpammersincrowdsourcedlabelingtasksAnnotatinganunlabeleddatasetisoneofthebottlenecksinusingsupervisedlearningtobuildgoodpredictivemodels.Gettingadatasetlabeledbyexpertscanbeexpensiveandtimeconsuming.Withtheadventofcrowdsourcingservices(Amazon'sMechanicalTurkbeingaprimeexampl
5、e)ithasbecomequiteeasyandinexpensivetoacquirelabelsfromalargenumberofannotatorsinashortamountoftime(see[8],[10],and[11]forsomecomputervisionandnaturallanguageprocessingcasestudies).Onedrawbackofmostcrowdsourcingservicesisthatwedonothavetightcontroloverthequalityoftheannotators.The
6、annotatorscancomefromadiversepoolincludinggenuineexperts,novices,biasedannotators,maliciousannotators,andspammers.Henceinordertogetgoodqualitylabelsrequestorstypicallygeteachinstancelabeledbymultipleannotatorsandthesemultipleannotationsarethenconsolidatedeitherusingasimplemajority
7、votingormoresophisticatedmeth-odsthatmodelandcorrectfortheannotatorbiases[3,9,6,7,14]and/ortaskcomplexity[2,13,12].Inthispaperweareinterestedinrankingannotatorsbasedonhowspammerlikeeachannotatoris.Inourcontextaspammerisalowqualityannotatorwhoassignsrandomlabels(maybebecausetheanno
8、tatordoesnotunderstandthelabeling