资源描述:
《The Journal of Machine Learning Research Vol 6》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、JournalofMachineLearningResearch6(2005)1-35Submitted7/03;Revised2/04;Published1/05AsymptoticModelSelectionforNaiveBayesianNetworksDmitryRusakovRUSAKOV@CS.TECHNION.AC.ILDanGeigerDANG@CS.TECHNION.AC.ILComputerScienceDepartmentTechnion-IsraelInstituteofTechn
2、ologyHaifa,32000,IsraelEditor:DavidMadiganAbstractWedevelopaclosedformasymptoticformulatocomputethemarginallikelihoodofdatagivenanaiveBayesiannetworkmodelwithtwohiddenstatesandbinaryfeatures.ThisformuladeviatesfromthestandardBICscore.Ourworkprovidesaconcr
3、eteexamplethattheBICscoreisgenerallyincorrectforstatisticalmodelsthatbelongtostratifiedexponentialfamilies.Thisclaimstandsincontrasttolinearandcurvedexponentialfamilies,wheretheBICscorehasbeenproventoprovideacorrectasymptoticapproximationforthemarginallike
4、lihood.Keywords:Bayesiannetworks,asymptoticmodelselection,Bayesianinformationcriterion(BIC)1.IntroductionStatisticiansareoftenfacedwiththeproblemofchoosingtheappropriatemodelthatbestfitsagivensetofobservations.Oneexampleofsuchproblemisthechoiceofstructurei
5、nlearningofBayesiannetworks(Heckermanetal.,1995;CooperandHerskovits,1992).Insuchcasesthemaximumlikelihoodprinciplewouldtendtoselectthemodelofhighestpossibledimension,contrarytotheintuitivenotionofchoosingtherightmodel.PenalizedlikelihoodapproachessuchasAI
6、Chavebeenproposedtoremedythisdeficiency(Akaike,1974).WefocusontheBayesianapproachtomodelselectionbywhichamodelMischosenaccordingtothemaximumposterioriprobabilitygiventheobserveddataD:ZP(M
7、D)µP(M,D)=P(M)P(D
8、M)=P(M)P(D
9、M,w)P(w
10、M)dw,Wwherewdenotesthemodelpara
11、metersandWdenotesthedomainofthemodelparameters.Inparticular,wefocusonmodelselectionusinglargesampleapproximationforP(M
12、D),calledBIC-BayesianInformationCriterion.Thecriticalcomputationalpartinusingthiscriterionisevaluatingthemarginallikelihoodin-RtegralP(D
13、
14、M)=WP(D
15、M,w)P(w
16、M)dw.GivenanexponentialmodelMwewriteP(D
17、M)asafunctionoftheaveragedsufficientstatisticsYDofthedataD,andthenumberNofdatapointsinD:ZI[N,Y,M]=eL(YD,N
18、w,M)µ(w
19、M)dw,(1)DWwhereµ(w
20、M)isthepriorparameterdensi