资源描述:
《基于语义相似度计算及twitter+storm平台的微博检索研究》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、AbstractWiththerapiddevelopmentoftheInternetindustry,micro-bloggingproductsaregainingpopularitybothathomeandabroad.Theyhavegraduallydevelopedintoanewtypeofmediaholdingincreasinglyhighinfluencebyprovidinguserswithcentralizedandopensocialnetworkingservices.Givent
2、helargescaleandreal-timecharacteristicsmicro-bloggingdatahave,howcanweprovideuser-interestedinformationfrommassiveanddynamicallyupdatedmicro-bloggingdataisparticularlyimportantnow.Micro-blogretrievalandsortingmethoddiscussedinthispaperisbasedonshorttextfeaturee
3、xpansionandsimilaritycalculation.Ourpaperispresentedasfollowingstructures:firstly,eachmicro-blog(tweethere)hasbeenexpanded(makeitlonger)toenrichitssemanticfeature,whichprovidessolidguaranteefortherelatednessbetweenquerytextandretrievedresults;secondly,wetrytoge
4、tsimilarityresultsbetweenmicro-blogswithrelativelyhighprecisionandrecallusingWordNetdictionary;thirdly,thesimilarityvaluecomputedinlaststephasbeentakenasthecriteriaforsortingtosimulateareal-timemicro-blogretrievalenvironment,whichcouldcompletemicro-blogretrieva
5、landsortingandwouldprovidealistofrelatedmicro-blogsforeachmicro-blogretrieved.Inordertoenrichthesemanticfeatureofmicro-blogs,wetakenounsinmicro-blogsasrepresentativekeywordsthatexpressedmicro-blogtopics,andexpandthesenounswithassociatedwordsandphrasestoenlargem
6、icro-blog.Specifically,Wikipediaarechosenasthesourceofsemanticfeatureforexpansion.Foreachnouninamicro-blog,wetakeitasqueryinWikipedia,findthespecificresultentry–category-insearchresultpage,andtakethewordsunderthe“category”(categoriesthespecificnounareclassified
7、to)asadditionalsemanticexplainingwordsaddingtotheoriginalmicro-blogs.Also,experimentsareconductedtoprovethatthisextensioncouldimprovethesimilaritycalculationqualityinacertaindegree.Inordertogethigheraccuracyandprecision,thispapertakesfulladvantageofthespecialst
8、ructureofonlineEnglishWorddatabase-WordNetincomputingsemantic-basedsimilaritybetweenmicro-blogs.Specifically,weusethepath-length-basedmethodproposedin[37],whichtakeintoconsi