欢迎来到天天文库
浏览记录
ID:33524266
大小:6.04 MB
页数:47页
时间:2019-02-26
《基于主题模型和混合模型的微博客交叉话题发现的研究》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、西南交通大学硕士研究生学位论文第1I页AbstractMicro.blogisarelationshipbasedonuserinformationsharing,disseminationandaccesstotheplatform.Micro.bloghasbecomeoneofthemainsourcesofinformationontheIntemet.Itisverydifferentfromothernetworktext.Firstly,ithasrelativelysimplecontent(Itsmainbodyusuallyincludeslessthan140wo
2、rds).Inaddition,itCanbepostedinreal.timebymobilephone,instantmessagingsoftwareandSOon,whichresultsinlargeamountsofdatainashortperiodoftime.Thiskindofdataisoftenhuge,messyandchaotic.Itisextremelydifficulttofindtheinterestinginformationaccuratelyandefficiently.Topicdetectiontechnologyisanewresearc
3、hfieldofnaturallanguageprocessing.Itfocusesonhelp堍theusercollectionandmergingoftheinformationthatdistributedunderthesametopic.Theusersfendtheinformationtheyinterestedinquicklyandaccurately.AlthoughthetraditionaltopicdetectionalgorithmsbasedonVSM(VectorSpaceModel)andclusteringalgorithmachievedgoo
4、dresultsandfacilitatedawiderangeofapplications,whendealingwithalargescaleMicro-blogshorttext,therearesomeshortcomings.Firstly,theyexisthigh.dimension,sparse,synonymyproblemswhenthedocumentsarepresentedbyfeamreveCtors.Inaddition,themostclusteringalgorithmsoftraditionaltopicsextractionarepartition
5、ingmethod,whichdidnotconsidertherelationshipbetweenthetopics,SOtherearesomelimitations.Underthesecireumstances,thetopicmodelisproposedastextrepresentationmodelaccordiIlgtothecharacteristicsofMicro.blog.Therearethreemaintopicmodels:LatentSem锄ticAnalysis(LSA),ProbabilityLatentSemanticAnalysis(PLSA
6、)andLatentDirichietAllocation(LDA).LDAisoneofthemostpopularandconlmonlyusedtopicmodelstoday,SOthisthesisutilizesLDAmodeltoextractthehiddenMicro-blogtopicsinformation丘omt:hedataset.Andthen,anoverlappingtopicdetectionalgorithmbasedonmixturemodelisproposedinordertosolvetheinsufficiencyoftraditional
7、topicdetectionalgorithm.Finally,aMicrobloggingoverlappingtopicdetectionsystemisestablished·Experimentalresultsonrealdatasetsshowthefeasibilityandvalidityofthealgorithm.Keywords:Microblog;topicmodel;overlappingtopicdetection;
此文档下载收益归作者所有