资源描述:
《基于密度的最佳聚类数确定方法l(a method for determining the optimal number of clusters based on density l)》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、基于密度的最佳聚类数确定方法l(AmethodfordeterminingtheoptimalnumberofclustersbasedondensityL)Amethodfordeterminingtheoptimalnumberofclustersbasedondensity[Abstract]todeterminethecorrectnumberofclustersisafundamentalprobleminclusteringanalysis.Thecommonlyusedmethodsofclusteringnumberdeterminationusua
2、llydependonaspecificclusteringalgorithm,andhavepooreffectinthepresenceofsubclusters.Inthispaper,weproposeanewindextodeterminetheoptimalnumberofclusters,whichfocusesonthegeometricstructureofclusters,andmeasuresthecompactnessandseparationbetweenclasses.Theindexisinsensitivetonoiseandcani
3、dentifyclustersofdatainthedataset.Theexperimentalresultsonrealdataandsyntheticdatashowthattheperformanceofthenewindexisbetterthanthatofotherwidelyusedindexes.[keyword]clusterevaluation,clusteringnumber,clusteringvalidityindex0IntroductionClusteringisanimportantmethodindataminingresearc
4、h.Thepurposeofclusteringistocollectobjectsindatasetsintoclasses,sothatobjectsinthesameclassaresimilar,whileobjectsindifferentclassesaredifferent.Sofar,researchershaveproposedalargenumberofclusteringalgorithms,andhavebeenwidelyusedinbusinessintelligence,graphicsanalysis,biologicalinform
5、ationandotherfields.Asanunsupervisedlearningmethod,itisnecessarytoevaluatetheclusteringresultsobtainedbylearning.Becausemanyclusteringalgorithmsrequirethenumberofclustersofusergivendatasets,inpractice,thisisusuallynotknownbeforehand.Theproblemofdeterminingthenumberofclustersindatasetsi
6、sstilloneofthefundamentalproblemsinclusteringanalysis,[1][2].Clusteringevaluationisusedtoevaluatethequalityofclusteringresults,whichisconsideredtobeoneoftheimportantfactorsthataffectthesuccessofclusteranalysis[3].Itslocationintheclusteranalysisprocessisshowninfigure1.Clusteringevaluati
7、onofsomeimportantissuesincludingclusteringtrend,determinethedatasetnumber,acorrectclusteringobjectiveanalysisresultswithknownresults,determinetheoptimalnumberofclustersinthispaperincludingthe.Usually,thedeterminationoftheoptimumnumberofclustersisdeterminedbythefollowingcomputationalp