资源描述:
《基于集群的并行分布式聚类及其应用_英文_.pdf》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、第38卷第4期郑州大学学报(理学版)Vol138No142006年12月J.ofZhengzhouUniv.(Nat.Sci.Ed.)Dec12006Cluster2PCBasedParallelDistributedDataClusteringandItsApplicationsXIASheng2ping,LU¨Xiao2jun,LIUJian2jun,YUANZhen2tao,YUWen2xian(ATRStateKeyLaboratory,NationalUniversityofDefenseTechnology,Changsha410073,China)Abstract:Clus
2、teringdatawithhighdimensionalitiesrequireshigh2performancecomputerstogetresultsinareasonableamountoftime,particularlyforextremelylarge2scaledatabases.Thus,therecursiveSOM(RSOM)treemethodisproposed.RSOMtreeisahierarchyofclustersandsub2clusterswhichincorporatestheclusterrepresentationintotheindexst
3、ructure.Itprovidesapracticalsolutiontoindexclustereddataset,anditsupportstheretrievalofthenearest2neighborseffectivelyandefficientlywithouthavingtolinearlysearchahigh2dimensionallargedatabase.Meanwhile,anincrementalRSOMtree2basedclusteringalgorithmisproposed;andbecauseoftheRSOMtreeisofthenatureof
4、parallelism,andcanbeimplementedonscalableparallelcomputers.Thusacluster2systembaseddistributedparallelalgorithmofincrementalRSOMtreeisproposed.Theperformanceofthemethodhasbeentestedwithhighdimensionalfeaturesetsextractedfromlargeimagedatabase.Keywords:paralleldistributedclustering;recursiveSOMtre
5、e;cluster2computer;incrementalclusteringCLCnumber:TP311ArticleID:1671-6841(2006)04-0033-080IntroductionDataclusteringisanimportantandbasictechnologyindomainssuchasdatamining,imageprocessingandpatternrecognition,andhasbeenunderwideresearchforalongtime.Variantclusteringalgorithms[1][2-3]havebeenpro
6、posed,andtheirmethodscanbegroupedintopartitionbased,hierarchybased,grid[4][5]basedandsubspacebased.Allthesealgorithmsneedthewholedatasetsbeseriallyprocessedatonetime.However,inthedailyworkofclusteringapplications,dataacquiredarechangingfrequently,andthenumberofthemmayexceedtensofmillions,thenumbe
7、rsofthedataandthepatternsinthemareincreasingdynamically.Therearetwosolutionsfortheupdateddata.Thefirstwayistorerunthealgorithm.Thetemporalandspatialcomplexityofdataclusteringwithhighdimensionalityisveryhigh.Ifalldataar