欢迎来到天天文库
浏览记录
ID:32400786
大小:400.50 KB
页数:11页
时间:2019-02-04
《weka[28]em源代码分析》由会员上传分享,免费在线阅读,更多相关内容在应用文档-天天文库。
1、Weka[28]EM源代码分析作者:Koala++/屈伟EM算法在clusterers下面,提一下是因为我没有想到它竟然在这里,而且它的名字也太大了点,因为这里它只是与SimpleKMeans结合的算法。引自AndrewNg的LecturenotesmixturesofGaussiansandtheEMalgorithm:TheEM-algorithmisalsoreminiscentoftheK-meansclusteringalgorithm,exceptthatinsteadof“hard”clusterassignme
2、ntc(i),weinsteadhavethe“soft”assignmentw_j^(i).SimilartoK-means,itisalsosusceptibletolocaloptima,soreinitializingatseveraldifferentinitialparametersmaybeagoodidea。Soft指的是我们猜测是概率,取值在[0,1]区间,相反,“hard”猜测是指单个最好的猜测,可以取值在{0,1}或是{1,…,k}。英文原文:Theterm“soft”referstoourguesses
3、beingprobabilitiesandtakingvaluesin[0,1];incontrast,a“hard”guessisonethatrepresentsasinglebestguess(suchastakingvaluesin{0,1}or{1,…,k})下面的图来自NgAndrew和BishopChistopher,第一组图K-Means的猜测是两个点,而第二组图EM是对概率的猜测。另一点是刚才文中提到的,多个初始化点,在代码中也体现了。Ng在对EM算法收敛证明之后,解释如下:Hence,EMcausesthe
4、likelihoodtoconvergemonotonically.InourdescriptionoftheEMalgorithm,wesaidwe'drunituntilconvergence.Giventheresultthatwejustshowed,onereasonableconvergencetestwouldbetocheckiftheincreaseinl(theta)betweensuccessiveiterationsissmallerthansometoleranceparameter,andtodec
5、lareconvergenceifEMisimprovingl(theta)tooslowly.从buildCluster开始:if(data.checkForStringAttributes()){thrownewException("Can'thandlestringattributes!");}m_replaceMissing=newReplaceMissingValues();Instancesinstances=newInstances(data);instances.setClassIndex(-1);m_repl
6、aceMissing.setInputFormat(instances);data=weka.filters.Filter.useFilter(instances,m_replaceMissing);instances=null;m_theInstances=data;//calculateminandmaxvaluesforattributesm_minValues=newdouble[m_theInstances.numAttributes()];m_maxValues=newdouble[m_theInstances.n
7、umAttributes()];for(inti=0;i8、最大值数组。privatevoidupdateMinMax(Instanceinstance){for(intj=0;j
8、最大值数组。privatevoidupdateMinMax(Instanceinstance){for(intj=0;j
此文档下载收益归作者所有