资源描述:
《(2011)Online Variational Inference for the Hierarchical Dirichlet Process》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、OnlineVariationalInferencefortheHierarchicalDirichletProcessChongWangJohnPaisleyDavidM.BleiComputerScienceDepartment,PrincetonUniversityfchongw,jpaisley,bleig@cs.princeton.eduAbstractlikeclassification,exploration,andsummarization.Unlikeitsfinitecounterpart,latentDirichletallocation[2],theHDPtopicmod
2、elinfersthenumberoftopicsfromthedata.ThehierarchicalDirichletprocess(HDP)isaBayesiannonparametricmodelthatcanbeusedPosteriorinferencefortheHDPisintractable,andmuchtomodelmixed-membershipdatawithapoten-researchisdedicatedtodevelopingapproximateinferencetiallyinfinitenumberofcomponents.Ithasbeenalgori
3、thms[1,3,4].Thesemethodsarelimitedformassiveappliedwidelyinprobabilistictopicmodeling,scaleapplications,however,becausetheyrequiremultiplewherethedataaredocumentsandthecompo-passesthroughthedataandarenoteasilyapplicabletonentsaredistributionsoftermsthatreflectrecur-streamingdata.1Inthispaper,wedevel
4、opanewapprox-ringpatterns(or“topics”)inthecollection.GivenimateinferencealgorithmfortheHDP.Ouralgorithmisadocumentcollection,posteriorinferenceisuseddesignedtoanalyzemuchlargerdatasetsthantheexistingtodeterminethenumberoftopicsneededandtostate-of-the-artallowsand,further,canbeusedtoanalyzecharacter
5、izetheirdistributions.Onelimitationstreamsofdata.ThisisparticularlyapttotheHDPtopicofHDPanalysisisthatexistingposteriorinfer-model.Topicmodelspromisetohelpsummarizeandorga-encealgorithmsrequiremultiplepassesthroughnizelargearchivesoftextsthatcannotbeeasilyanalyzedallthedata—thesealgorithmsareintrac
6、tableforbyhandand,further,couldbebetterexploitedifavailableverylargescaleapplications.Weproposeanon-onstreamsoftextssuchaswebAPIsornewsfeeds.linevariationalinferencealgorithmfortheHDP,Ourmethod—onlinevariationalBayesfortheHDP—wasanalgorithmthatiseasilyapplicabletomassiveinspiredbytherecentonlinevar
7、iationalBayesalgorithmandstreamingdata.OuralgorithmissignificantlyforLDA[7].OnlineLDAallowsLDAmodelstobefittofasterthantraditionalinferencealgorithmsforthemassiveandstreamingdata,andenjoyssignificantimprove-HD