欢迎来到天天文库
浏览记录
ID:33430580
大小:4.82 MB
页数:45页
时间:2019-02-25
《基于微博的热点话题发现》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、AbstractHotTopicExtractionfromMicroblogsMajor:ComputerApplicationAuthor:YingZhuSupervisor:Prof.LiLiAbstractWiththedevelopmentofinformationtechnology,theInternetdataandresourcesincreasemassively.Inordertoeffectivelymanageandutilizethiskindofinformatio
2、n,content-basedinformationretrievalanddatamininghavebecomehotresearchtopics.Latentsemanticanalysis,whichcomputestopicdistributionforthedocumentandworddistributionforthetopicbasedonknownworddistributionforthedocumentinordertogetlatenttopics,playsanimp
3、ortantroleininformationretrievalandtextminingandisappliedextensivelyinthefieldoftextclassificationandclustering,informationorganizationandmanagementandhottopicextraction.Inrecentyears,withtheriseofWeb2.0,socialnetworks,suchasRenren,Facebook,Twitter,S
4、inaWeibo,etc.notonlybecomeverypopular,butalsobecomeawayofmodemlife.Widelyusedsocialmediaproducesmassiveuser-generatedcontentdata(UGC),ofwhichmorethan80%isnaturallanguagetext.Itisobservedthatdetectinghottopicscarlbeveryhelpfulandnecessaryforpeopletoge
5、tessentialinformationquickly.However,thesetextsarespecialandbeartheirowncharacteristics,manytraditionaltopicanalysismodelcannotachievebetterresultsunlessaugmentedwithnewfeatures.Textsfromsocialnetworkshavefoursalient:high-dimensional,sparse,notnormat
6、iveandunevendistributionoftopics.Inotherwords,largenumbersofmessagesarepostedineveryminuteandthesetextslikelyproducethevectorwithmorethantenthousandsofdimensions,whichistootime-consumingforthetopicextraction;comparedwiththelongtexts,thesetextshaveeve
7、nlesskeywords,producingthesparse“document-word'’matrixandthusdifficulttoextracttheeffectivefeaturesandtoexploitthecorrelationbetweenthefeatures;abbreviationsandcatchwordsareusedextensivelyinsocialnetwork,increasingthesynonymsinthetextsandmakingthetop
8、icidentificationtaskmoredifficulttohandle;inaddition,fewermessagesonmicroblogsarevaluableforhottopicdetectionduetomassiveisaboutusers’dailylife,suchasweather,foods,emotions,andSOon.SowhetheratermishotornotisnotbasedonitsfrequencyofoCCUrn翻nCe.III西南大学硕
此文档下载收益归作者所有