资源描述:
《基于层次划分的最佳聚类数确定方法》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、ISSN1000-9825,CODENRUXUEWE-mail:jos@iscas.ac.cnJournalofSoftware,Vol.19,No.1,January2008,pp.62−72http://www.jos.org.cnDOI:10.3724/SP.J.1001.2008.00062Tel/Fax:+86-10-62562563©2008byJournalofSoftware.Allrightsreserved.∗基于层次划分的最佳聚类数确定方法12+3陈黎飞,姜青山,王声瑞1(厦门大学计算机科学系,福建厦门361005)2(厦门大学软件学院,福建厦门3
2、61005)3(DepartmentofComputerScience,UniversityofSherbooke,J1K2R1,Canada)AHierarchicalMethodforDeterminingtheNumberofClusters12+3CHENLi-Fei,JIANGQing-Shan,WANGSheng-Rui1(DepartmentofComputerScience,XiamenUniversity,Xiamen361005,China)2(SchoolofSoftware,XiamenUniversity,Xiamen361005,China)
3、3(DepartmentofComputerScience,UniversityofSherbooke,J1K2R1,Canada)+Correspondingauthor:Phn:+86-592-2186707,E-mail:qjiang@xmu.edu.cn,http://software.xmu.edu.cn/View/shizi/jqs.htmChenLF,JiangQS,WangSR.Ahierarchicalmethodfordeterminingthenumberofclusters.JournalofSoftware,2008,19(1):62−72.h
4、ttp://www.jos.org.cn/1000-9825/19/62.htmAbstract:Afundamentalanddifficultprobleminclusteranalysisisthedeterminationofthe“true”numberofclustersinadataset.Thecommontrail-and-errormethodgenerallydependsoncertainclusteringalgorithmsandisinefficientwhenprocessinglargedatasets.Inthispaper,ahie
5、rarchicalmethodisproposedtogetridofrepeatedlyclusteringonlargedatasets.ThemethodfirstlyobtainstheCF(clusteringfeature)viascanningthedatasetandagglomerativegeneratesthehierarchicalpartitionsofdataset,thenacurveoftheclusteringqualityw.r.tthevaryingpartitionsisincrementallyconstructed.Thepa
6、rtitionscorrespondingtotheextremumofthecurveisusedtoestimatethenumberofclustersfinally.Anewvalidityindexisalsopresentedtoquantifytheclusteringquality,whichisindependentofclusteringalgorithmandemphasisonthegeometricfeaturesofclusters,handlingefficientlythenoisydataandarbitraryshapedcluste
7、rs.Experimentalresultsonbothrealworldandsynthesisdatasetsdemonstratethatthenewmethodoutperformstherecentlypublishedapproaches,whiletheefficiencyissignificantlyimproved.Keywords:clustering;clusteringvalidityindex;statistics;numberofcluster;hierarchicallyclustering摘要:确定数据集的