资源描述:
《一种基于K -MEANS局部最优性的高效聚类算法》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、ISSN1000-9825,CODENRUXUEWE-mail:jos@iscas.ac.cnJournalofSoftware,Vol.19,No.7,July2008,pp.1683−1692http://www.jos.org.cnDOI:10.3724/SP.J.1001.2008.01683Tel/Fax:+86-10-62562563©2008byJournalofSoftware.Allrightsreserved.∗一种基于K-Means局部最优性的高效聚类算法1,2+113雷小锋,谢昆青,林帆,夏征义1(北京大学信息科学技术学
2、院智能科学系/视觉与听觉国家重点实验室,北京100871)2(中国矿业大学计算机学院,江苏徐州221116)3(中国人民解放军总后勤部后勤科学研究所,北京100071)AnEfficientClusteringAlgorithmBasedonLocalOptimalityofK-Means1,2+113LEIXiao-Feng,XIEKun-Qing,LINFan,XIAZheng-Yi1(DepartmentofIntelligenceScience/NationalLaboratoryonMachinePerception,PekingUn
3、iversity,Beijing100871,China)2(SchoolofComputerScienceandTechnology,ChinaUniversityofMiningandTechnology,Xuzhou221116,China)3(LogisticsScienceandTechnologyInstitute,P.L.A.ChiefLogisticsDepartment,Beijing100071,China)+Correspondingauthor:E-mail:leiyunhui@gmail.comLeiXF,XieKQ,
4、LinF,XiaZY.AnefficientclusteringalgorithmbasedonlocaloptimalityofK-Means.JournalofSoftware,2008,19(7):1683−1692.http://www.jos.org.cn/1000-9825/19/1683.htmAbstract:K-Meansisthemostpopularclusteringalgorithmwiththeconvergencetooneofnumerouslocalminima,whichresultsinmuchsensit
5、ivitytoinitialrepresentatives.ManyresearchesaremadetoovercomethesensitivityofK-Meansalgorithm.However,thispaperproposesanovelclusteringalgorithmcalledK-MeanSCANbymeansofthelocaloptimalityandsensitivityofK-Means.Thecoreideaistobuildtheconnectivitybetweensub-clustersbasedonthe
6、multipleclusteringresultsofK-Means,wheretheseclusteringresultsaredistinctbecauseoflocaloptimalityandsensitivityofK-Means.Thenaweightedconnectedgraphofthesub-clustersisconstructedusingtheconnectivity,andthesub-clustersaremergedbythegraphsearchalgorithm.Theoreticanalysisandexp
7、erimentaldemonstrationsshowthatK-MeanSCANoutperformsexistingalgorithmsinclusteringqualityandefficiency.Keywords:K-MeanSCAN;density-based;K-Means;clustering;connectivity摘要:K-Means聚类算法只能保证收敛到局部最优,从而导致聚类结果对初始代表点的选择非常敏感.许多研究工作都着力于降低这种敏感性.然而,K-Means的局部最优和结果敏感性却构成了K-MeanSCAN聚类算法的基
8、础.K-MeanSCAN算法对数据集进行多次采样和K-Means预聚类以产生多组不同的聚类结果,来自不同聚类结果的子簇之间必然会存在交集.算法的核心思