欢迎来到天天文库
浏览记录
ID:34600345
大小:6.44 MB
页数:58页
时间:2019-03-08
《中文微博数据净化与情感倾向分析技术的研究与实现》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、万方数据StudyandImplementationonDataCleaningandSentimentAnalysisTechniquesforChineseMicroblogAbstractAsanewinformationcarrier,themicrobloghasbecomepopularandindispensibleinpeople’Sdailylife.Inmicroblog,therearealargenumberofvaluablecommentsoncelebrities,eventsandproducts,whichCanexpressusers’s
2、entimentorientationandplayamoreandmoreimportantroleintheemergenceandpropagationofWebpublicopinion.ForthenewcharacteristicsofChinesemocroblogs,thisthesisstudiesondatacleaning,sentimentorientationanalysis,andtheirrelatedtechniques.Firstly,fortheproblemofspamandnear-duplicatemicroblogs,thethe
3、sisstudiesonthemicroblogdatacleaningapproach.Recently,largenumbersofsparemicroblogsandnear-duplicatemicroblogscovereverycomerofthemicroblogspace.Theyhavebroughtinadverseeffectsontheaccuracyofinformationretrievalandaffectedthecredibilityoffurtheranalysis.Eliminatingthespamandnear—duplicatem
4、icroblogshasbecomeaseriousproblemintherelativeresearcharea.Totacklethisproblem,thecharacteristicsofthespamandnear.duplicatemicroblogsareanalyzedbasedonthestatisticalanalysisresultsofmassivereal.worldmicroblogdata,andafilteringapproachwithfeatureselectionanddoublecontentsimilarit)rdetection
5、formicroblogtextstreamisproposed.TheproposedmethodCanfirstlyfilteroutspammicroblogsthroughtheURLlinks,characterratesandhighfrequencywords.Thenthenear.duplicatemicroblogsareeliminatedthroughthesubsection-basedandindex—basedfilters。Experimentsshowthattheproposedmethodcaneffectivelypurifythem
6、icroblogsbyfilteringoutthespamandnear-duplicatemicroblogs.Secondly,accordingtothecharacteristicsof“straightforwardemotionexpression”,thisthesisstudiesonthesentimentorientationanalysismethodforChinesemicroblog.The“straightforwardemotionexpression'’meansthatpeopleinthemicroblogpostsusedtoemp
7、loyemoticons,interjections,anddegreeadverbstoexpression.TheexistingsentimentlexiconsandprevioussentimentanalysismethodsCanbeappliedtosentimentanalysistaskforChinesemicroblogs,butthesemethodsusuallyignorethenewcharacteristicsofthemicroblogcontent.Thereisalackof
此文档下载收益归作者所有