欢迎来到天天文库
浏览记录
ID:33485668
大小:2.16 MB
页数:52页
时间:2019-02-26
《基于向量空间模型和lda模型相结合的微博客话题发现算法的研究》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、西南交通大学硕士研究生学位论文第1I页AbstractInrecentyears,withtherapiddevelopmentandwidespreadofInternettechnology,networkspeedofinformationdisseminationandquantityhavereachedanunprecedentedscale.AsanewInternctmedia,withhighpenetrationofInternetusers,Mierobloghasbecomeoneofthemainsourc.2sofinformationon
2、theInternet.Itisverydifferentfromothernetworktext.Firstly,ithasrelativelysimplecontent(Itsmainbodyusuallyincludeslessthan140words).Inaddition,itc跹bepostedinreal-timebymobilephone,instantmessagingsoftwareandSOon,whichresultsinlargeamountsofdatainashortperiodoftime.Thiskindofdataisoftenhu
3、ge,messyandchaotic,whichmeansthatdealingwithitrequiresconsiderableworkload.Besides,underthesecircumstances,itisextremelydifficulttofindtherequiredinformationaccuratelyandefficiently.TopicdetectiontechnologyCaninvolvethemergeoftheinformationthatdistributedunderthesametopic,whichgreatlyre
4、ducestherepetitionrateofinformation.ItCanhelptheUSerconvenientlyunderstandthelinkagesbetweenthedifferenttopics,andquicklyfindtheinformation也eymostneeded.AlthoughthetopicdetectionalgorithmbasedonthetraditionalVSM(VectorSpaceModel)achieved900dresultsandfacilitatedawiderangeofapplications,
5、whendealingwithlarge·scalemicroblogshorttext,thereareobviousshortcomings.Firstofall,traditionalVSMhavenospecialconsiderationsforshortandsparsemicroblogdata.Itwillleadtoinaccuratecalculatingthesimilaritybetweenthetexts,therebyaffectingthequalityoftopicdetection.Moreover,inthetraditionalV
6、SM,itisbelievedthatifmores锄ewordsappearintwodifferentdocuments,thcyaremoresimilar们theachother.However,infact,thesimilarityofthedifferentdocumentsnotonlydependsontheliteralwordsrepetition,butalSOdependsonthesemanticassociationofthecontext.Underthesecircumstances,accordingtothecharacteris
7、ticsofmicroblog,thepaperutilizestheLatentDirichletAllocation(LDA)Modeltoextractthehiddenmicroblogtopicsinformationformthedataset.Then,itispossibletogetthetopicdistributionbyGibbssamplingandcombiningitwiththeVSM.Atlast,thefinaltopicscouldbedetectedbyusingthemulti.1ayerclustering
此文档下载收益归作者所有