资源描述:
《Reducing Label Cost by Combining Feature Labels and Crowdsourcing结合特征标签和众包降低标签成本》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、ReducingLabelCostbyCombiningFeatureLabelsandCrowdsourcingJayPujarajay@cs.umd.eduDept.ofComputerScience,UniversityofMaryland,CollegePark,MD,USA20742BenLondonblondon@cs.umd.eduDept.ofComputerScience,UniversityofMaryland,CollegePark,MD,USA20742LiseGetoorgetoor@cs.umd.eduDept.of
2、ComputerScience,UniversityofMaryland,CollegePark,MD,USA20742Abstractmalcost.WedemonstratetheecacyofourapproachthroughasentimentanalysistaskDecreasingtechnologycosts,increasingcom-ondatacollectedfromtheTwittermicroblogputationalpowerandubiquitousnetworkservice.connectivityar
3、econtributingtoanunprece-dentedincreaseintheamountofpubliclyavailabledata.Yetthissurgeofdatahas1.Introductionnotbeenaccompaniedbyacomplementaryincreaseinannotation.Thislackofanno-Alongstandingprobleminsupervisedlearningisnd-tateddatacomplicatesdataminingtasksininglabeleddat
4、a.Dataisproducedinhighvolumeswhichsupervisedlearningispreferredorre-fromsourcesasvariedassensornetworkstomobilequired.Inresponse,researchershavepro-phoneusers.Eachdatasetcanbeusedformanypossi-posedmanyapproachestocheaplyconstructbleapplicationsfromintrusiondetectiontosentime
5、nttrainingsets.Oneapproach,referredtoasanalysis.Eveniflaborisexpendedtometiculouslyfeaturelabels(McCallum&Nigam,1999),labeldata,thetrainingsetmaynotadequatelyrepre-choosesfeaturesthatstronglycorrelatewithsentthedistributionoftestinstances.Forallthesethelabelspaceandusesinsta
6、ncescontain-reasons,methodstoproducetrainingdatacheaplyingthosefeaturesaslabeleddatafortrainingareanimportantcomponentofmachinelearningre-aclassier.Thesehighprecisionexamplessearch.Manyresearchershaveconsideredtheproblemhelpbootstrapthelearningprocess.Anotherofscarcetrainin
7、gdata.Approachescanbebroadlydi-technique,crowdsourcing,exploitsourever-videdbetweenthosethatndcheaperwaysofacquir-increasingconnectivitytorequestannotationingtraininglabelsandthosethatdesignalgorithmsfromabroadercommunity(whomayormaythatbenetfromunlabeleddata,withalargebod
8、yofnotbedomainexperts),therebyreningandresearchthatcombinesbothapproaches.