research on algorithms of topic based keyword extraction

ID：9206230

大小：991.28 KB

页数：50页

时间：2018-04-22

research on algorithms of topic based keyword extraction_第1页

research on algorithms of topic based keyword extraction_第2页

research on algorithms of topic based keyword extraction_第3页

research on algorithms of topic based keyword extraction_第4页

research on algorithms of topic based keyword extraction_第5页

资源描述：

《research on algorithms of topic based keyword extraction》由会员上传分享，免费在线阅读，更多相关内容在学术论文-天天文库。

1、摘要关键词提供了文档的概要信息，在信息检索、文本聚类和分类系统中受到了越来越多的应用，关键词的提取算法也受到了越来越多的重视。传统的方法主要依靠词汇的统计信息进行关键词提取，本文在回顾关键词提取的算方法的基础上，从文档主题的角度，综述了基于主题的关键词提取的三种算法--潜在语义分析(LSA)、概率潜在语义分析（PLSA）、隐含狄利克雷分布（LDA）。LSA方法将文档从稀疏的高维词汇空间映射到一个低维的向量空间，主要通过奇异值分解SVD的方式来求解。PLSA方法它用概率的方法来表示LSA，在文档和词汇之间引入一个潜在语义层（即主题层）。LDA的基本思想是:利

2、用概率推导等方式可以将单个文档表示为这些潜在主题的集合，对于主题而言，它又可以看成是若干词汇的概率分布。为了验证这三种方法的性能的优越与否，将理论与实践结合，本文通过实验将三种基于主题的关键词提取算法与TF-IDF方法进行对比，对实验结果进行总结归纳；实验证明，这三种方法无论是从召回率上还是从准确率上都优于TF-IDF方法，能够有效推荐关键词。关键词：关键词提取；LSA；PLSA；LDA-I-ResearchonAlgorithmsofTopicBasedKeywordExtractionAbstractKeywordsprovidesemanticmet

3、adataproducinganoverviewofthecontentofadocument.Theyarewidelyusedininformationretrieval,textclusteringandclassificationsystem.Asaresult,peoplepayalotofattentiontokeywordextractionalgorithm.Traditionalmethodsforkeywordextractionsimplyrankkeywordsaccordingtothestatisticalinformation

4、ofwords.Afterreviewingsomemethodsofkeywordextraction,thisarticlesummarizesthreetopicbasedmethodsofkeywordextractionwhichareLatentSemanticAnalysis(LSA),ProbabilityLatentSemanticAnalysis(PLSA),LatentDirichletAllocation(LDA).LSAmapsthedocumentfromsparsehighdimensionspacetoalowdimensi

5、onalvectorspace,mainlythroughthesingularvaluedecomposition(SVD).PLSAbringsinalatentsemanticlayerwhichiscalledthemelayerbetweendocumentsandwords.ItexplainsLSAinaprobabilisticway.ThebasicideaofLDAis:documentcanberegardedasthecombinationofseveralpotentialthemes.Asingledocumentcanbede

6、scribedasthecollectionoftheunderlyingthemeinaprobabilisticway.Toprovewhetherthetopicbasedkeywordextractionalgorithmsareefficient，thisarticlecombinestheorywithpractice.Afterexperiment,itcomparesthesethreealgorithmswiththetraditionalTF-IDFalgorithm.Itprovesthatboththerecallandthepre

7、cisiongetimproved.Thethreetopicbasedkeywordextractionalgorithmdosewellinkeywordextraction.KeyWords：keywordextraction;LSA;PLSA;LDA-II-1绪论1.1研究的背景和意义1.1.1研究背景随着网络信息量的激增，人们对信息质量的需求不断地提升，促使信息的组织和获取方式发生了极大的变化也面临着极大的挑战。信息量的剧增，信息内容的丰富多样、信息结构的复杂多变，信息传递的速度加快；用户范围宽广，需求多样化，处理信息的方式透明化、易用化等这些成

8、为新型互联网环境下的主要特点[1]。2012年1月16日，中国互联

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 50



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

正文描述：

《research on algorithms of topic based keyword extraction》由会员上传分享，免费在线阅读，更多相关内容在学术论文-天天文库。

8、为新型互联网环境下的主要特点[1]。2012年1月16日，中国互联

显示全部收起

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

research on algorithms of topic based keyword extraction

research on algorithms of topic based keyword extraction

相关文章

相关标签

research on algorithms of topic based keyword extraction

research on algorithms of topic based keyword extraction

打开微信扫一扫

相关文章

相关标签