资源描述:
《基于语义相似度计算及twitter+storm平台的微博检索-研究》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、AbstractWiththerapiddevelopmentoftheInternetindustry,micro-bloggingproductsaregainingpopularitybothathomeandabroad.Theyhavegraduallydevelopedintoanewtypeofmediaholdingincreasinglyhighinfluencebyprovidinguserswithcentralizedandopensocialnetworkingservices.Giventhelargescaleandr
2、eal-timecharacteristicsmicro-bloggingdatahave,howcanweprovideuser-interestedinformationfrommassiveanddynamicallyupdatedmicro-bloggingdataisparticularlyimportantnow.Micro-blogretrievalandsortingmethoddiscussedinthispaperisbasedonshorttextfeatureexpansionandsimilaritycalculation
3、.Ourpaperispresentedasfollowingstructures:firstly,eachmicro-blog(tweethere)hasbeenexpanded(makeitlonger)toenrichitssemanticfeature,whichprovidessolidguaranteefortherelatednessbetweenquerytextandretrievedresults;secondly,wetrytogetsimilarityresultsbetweenmicro-blogswithrelative
4、lyhighprecisionandrecallusingWordNetdictionary;thirdly,thesimilarityvaluecomputedinlaststephasbeentakenasthecriteriaforsortingtosimulateareal-timemicro-blogretrievalenvironment,whichcouldcompletemicro-blogretrievalandsortingandwouldprovidealistofrelatedmicro-blogsforeachmicro-
5、blogretrieved.Inordertoenrichthesemanticfeatureofmicro-blogs,wetakenounsinmicro-blogsasrepresentativekeywordsthatexpressedmicro-blogtopics,andexpandthesenounswithassociatedwordsandphrasestoenlargemicro-blog.Specifically,Wikipediaarechosenasthesourceofsemanticfeatureforexpansio
6、n.Foreachnouninamicro-blog,wetakeitasqueryinWikipedia,findthespecificresultentry–category-insearchresultpage,andtakethewordsunderthe“category”(categoriesthespecificnounareclassifiedto)asadditionalsemanticexplainingwordsaddingtotheoriginalmicro-blogs.Also,experimentsareconducte
7、dtoprovethatthisextensioncouldimprovethesimilaritycalculationqualityinacertaindegree.Inordertogethigheraccuracyandprecision,thispapertakesfulladvantageofthespecialstructureofonlineEnglishWorddatabase-WordNetincomputingsemantic-basedsimilaritybetweenmicro-blogs.Specifically,weu
8、sethepath-length-basedmethodproposedin[37],whichtakeintoconsi