资源描述:
《ITC-UT Tweet Categorization by Query Categorization for On-line Reputation Management》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、ITC-UT:TweetCategorizationbyQueryCategorizationforOn-lineReputationManagementMinoruYoshida,ShinMatsushima,ShingoOno,IsseiSato,andHiroshiNakagawaUniversityofTokyo7-3-1,Hongo,Bunkyo-ku,Tokyo113-0033{mino,masin,ono,sato,nakagawa}@r.dl.itc.u-tokyo.ac.jpAbs
2、tract.Thispaperdescribesoursystem,calledITC-UT,forthetask-2(on-linereputationmanagementtask)inWePS-3.Ourideaistocategorizeeachqueryinto3or4classesaccordingtohowmuchthetweetsretrievedbythequerycontainthetrueentitynamesthatrefertothetargetentity,andthenc
3、ategorizeeachtweetbytherulesdefinedforeachclassofqueries.Weshowtheevaluationresultsforoursystemalongwiththedetailsofresultsofquerycategorization.Keywords:OrganizationNameDisambiguation,Two-StageAlgorithm,NaiveBayes,Twitter1IntroductionThispaperreportsth
4、ealgorithmsandresultsoftheITC-UT(InformationTech-nologyCenter,theUniversityofTokyo)teamfortheWePS-3task-2(on-linereputationmanagementtask.)ThesupposedsituationofthistaskiswhereyousearchreputationofsomeorganizationinTwitter.Assumingthattweetsareretrieve
5、dbytheorganizationnamequery,theproblemistodecidewhethereachorganizationnamefoundineachtweetrepresentsthetargetorganizationornot(suchas“ApplePC”fortheformerand“ApplePie”forthelatterforthequery“Apple”.)Thisisonetypeofnamedisambiguationproblemsthathavebee
6、nex-tensivelystudiedthroughpreviousWePSworkshops[1,2].However,thecurrenttasksettingischallengingbecausegenerallyeachtweetissmallandprovideslittlecontextfordisambiguation.Ouralgorithmtosolvethisproblemisbasedontheintuitionthatorganiza-tionnamescanbeclas
7、sifiedinto“organization-likenames”and“general-word-likenames”,suchas“McDonald’s”fortheformerand“Pioneer”forthelatter.ThisintuitionissupportedbythefactthattheratioofTRUE1(orFALSE)tweetsinthetrainingdatavarywidelyfromentitytoentity.Forexample,over1TRUEind
8、icatesthatthetweetmentionsthetargetorganization(asdefinedinthenextsection).FALSEindicatestheopposite.2M.Yoshidaetal.98%oftweetswerelabeledTRUEforentity“nikon”,whiletheratioforentity“renaissancetechnologies”(forwhichthequerytermwas“Renais