LDA(for topic modeling)

LDA(for topic modeling)

ID:40083033

大小:408.20 KB

页数:30页

时间:2019-07-20

LDA(for topic modeling)_第1页
LDA(for topic modeling)_第2页
LDA(for topic modeling)_第3页
LDA(for topic modeling)_第4页
LDA(for topic modeling)_第5页
资源描述:

《LDA(for topic modeling)》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库

1、JournalofMachineLearningResearch3(2003)993-1022Submitted2/02;Published1/03LatentDirichletAllocationDavidM.BleiBLEI@CS.BERKELEY.EDUComputerScienceDivisionUniversityofCaliforniaBerkeley,CA94720,USAAndrewY.NgANG@CS.STANFORD.EDUComputerScienceDepartmentStanfordUnive

2、rsityStanford,CA94305,USAMichaelI.JordanJORDAN@CS.BERKELEY.EDUComputerScienceDivisionandDepartmentofStatisticsUniversityofCaliforniaBerkeley,CA94720,USAEditor:JohnLaffertyAbstractWedescribelatentDirichletallocation(LDA),agenerativeprobabilisticmodelforcollection

3、sofdiscretedatasuchastextcorpora.LDAisathree-levelhierarchicalBayesianmodel,inwhicheachitemofacollectionismodeledasafinitemixtureoveranunderlyingsetoftopics.Eachtopicis,inturn,modeledasaninfinitemixtureoveranunderlyingsetoftopicprobabilities.Inthecontextoftextmode

4、ling,thetopicprobabilitiesprovideanexplicitrepresentationofadocument.WepresentefficientapproximateinferencetechniquesbasedonvariationalmethodsandanEMalgorithmforempiricalBayesparameterestimation.Wereportresultsindocumentmodeling,textclassification,andcollaborative

5、filtering,comparingtoamixtureofunigramsmodelandtheprobabilisticLSImodel.1.IntroductionInthispaperweconsidertheproblemofmodelingtextcorporaandothercollectionsofdiscretedata.Thegoalistofindshortdescriptionsofthemembersofacollectionthatenableefficientprocessingoflarge

6、collectionswhilepreservingtheessentialstatisticalrelationshipsthatareusefulforbasictaskssuchasclassification,noveltydetection,summarization,andsimilarityandrelevancejudgments.Significantprogresshasbeenmadeonthisproblembyresearchersinthefieldofinforma-tionretrieval(

7、IR)(Baeza-YatesandRibeiro-Neto,1999).ThebasicmethodologyproposedbyIRresearchersfortextcorpora—amethodologysuccessfullydeployedinmodernInternetsearchengines—reduceseachdocumentinthecorpustoavectorofrealnumbers,eachofwhichrepre-sentsratiosofcounts.Inthepopulartf-i

8、dfscheme(SaltonandMcGill,1983),abasicvocabularyof“words”or“terms”ischosen,and,foreachdocumentinthecorpus,acountisformedofthenumberofoccurrencesofeachword.Aftersuitabl

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。