资源描述:
《邻域平衡密度聚类算法》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、计算机研究与发展ISSN100021239PCN1121777PTPJournalofComputerResearchandDevelopment47(6):104421052,2010邻域平衡密度聚类算法武佳薇李雄飞孙涛李巍(符号计算与知识工程教育部重点实验室(吉林大学)长春130012)(wjw7251@163.com)ADensity2BasedClusteringAlgorithmConcerningNeighborhoodBalanceWuJiawei,LiXiongfei,SunTao,andLi
2、Wei(KeyLaboratoryofSymbolComputationandKnowledgeEngineering(JilinUniversity),Ministry2of2Education,Changchun130012)AbstractClusteringisanimportantanalyticaltoolindatamining.Density2basedclusteringanalysisisaclusteringanalysismethodwhichisdemandedtodealwithv
3、erylargedatabases.Byanalyzingthelimitationoftheexistingdensity2basedclusteringalgorithmsandtheproblemsofdisposingvariousdensitiesofdataandillegibilityofclustersboundaries,definitionssuchasprojectionpoints,neighborhoodbalance,balanceablecorepoints,andboundar
4、ysparsepointsareintroduced.Afteranalyzingthedistributioncharactersofcorepointsandpointsintheirneighborhood,adensitybasedclusteringalgorithmbDBSCANconcerningtheneighborhoodbalanceofcorepointsisproposedtoimproveDBSCAN.Thealgorithmdealswiththecorepointsbygetti
5、ngtheprojectionofthepointsintheirneighborhoodtojudgewhethertheyarebalanceable.Onlybalanceablecorepointscanbeexpandedtoformclusters.Thealgorithmcandiscoverclusterswitharbitraryshapeandvariousdatadistributioncharacterseffectivelyandefficientlyandeliminatenois
6、esuchasboundarysparsepoints.Thetheoreticalanalysisandexperimentalresultsindicatethatthealgorithmimprovestheaccuracyofclusteringandoffersbetterresultsofclusteringonvariousdatasetsandsolvesthedifficultiesofclusteringhighdimensionalspatialdatasuchasindistinctb
7、oundarybetweenclusters,toomanynoisedatapoints,etc.Meanwhilethechoiceandimpactoftheparameterinthealgorithmarediscussed.Keywordsprojectionpoint;neighborhoodbalance;balanceablecorepoint;boundarysparsepoint;density2basedclusteringalgorithm摘要聚类是数据挖掘领域的一项重要分析手段.在
8、分析核心对象与其邻域对象的分布特征后,引入对象的投影点,对象的邻域平衡、平衡核心对象、边界稀疏对象等概念.提出一种新的基于密度的聚类算法bDBSCAN(balance2DBSCAN).算法将核心对象邻域中的对象投影,进行向量单位化,考察核心对象的邻域平衡性,将与平衡核心对象平衡密度可达的对象聚成一个簇.理论分析和实验结果表明,算法可以处理任意形状的簇,有效地排除边界稀疏对象这类噪声,并且可以解决高维