资源描述:
《Bootstrapping Websites for Classification of Organization Names on Twitter》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、BootstrappingWebsitesforClassificationofOrganizationNamesonTwitterPaulKalmarKalmarResearchpaul@KalmarResearch.comAbstract.Therehasbeenagrowinginterestinmonitoringthesocialmediapresenceofcompaniesforimprovedmarketing.ManypublicAPIsareavailablefortappingintothe
2、data,andtherearecompaniesthatwillcollectallpostsrelatedtoagivensetofkeywords.Butwithsomuchdata,whoistosaythatallofthepostsarerelevant,especiallywhensomanycompanyandproductnamesarehighlyambiguous?InthecontextoftheWePSTask2,weaimtoreducenoisebycollectingonlythe
3、relevanttweetsaboutacompanygiventhecompany'swebsiteandsetofTwitterdata.Inarealworldsituation,anycompanywhowantedtoidentifytweetsaboutthemselvescouldprovideashortlistoflabeledtweetsandusethisasabasesetfortrainingdata.Giventhatforthistaskweweregivenalargelistof
4、companieswithnosuchtrainingdata,itwouldhavebeenunrealistictocreatesuchdataforeachcompany.Wechosetousethecompany'swebsiteassurrogatetrainingdata.BecausethewebsitescomefromadifferentregisterthanTwitter,weusedtheinitialmodeltobootstrapamodelfromtheactualtweets.A
5、sitisthemostsimpledatatoacquire,thefeatureswechosetouseweretheco-occurringwordsineachtweet.Tocomputetherelevanceofeachwordtoagivencompany,wecomputedthepointwisemutualinformationbetweenthewordandthetarget'slabel.Theresultsshowthatourapproachwassuccessful,yetwi
6、throomforimprovement.Keywords:bootstrap,unsupervised,Twitter,disambiguation1IntroductionTherehasbeenagrowinginterestinmonitoringthesocialmediapresenceofcompaniesforimprovedmarketing.ManypublicAPIsareavailablefortappingintothedata,andtherearecompaniesthatwillc
7、ollectallpostsrelatedtoagivensetofkeywords.Butwithsomuchdata,whoistosaythatallofthepostsarerelevant,especiallywhensomanycompanyandproductnamesarehighlyambiguous?InthecontextoftheWePSTask2,weaimtoreducenoisebycollectingonlytherelevanttweetsaboutacompanygiventh
8、ecompany'swebsiteandsetofTwitterdata.2MethodForthetaskofclassification,thereneedstobeatleastonewelldefinedclass.Inarealworldsituation,anycompanywhowantedtoidentifytweetsaboutthemselvescou