资源描述:
《一种基于语料特性的聚类算法》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、曾依灵等:一种基于语料特性的聚类算法2813ISSN1000-9825,CODENRUXUEWE-mail:jos@iscas.ac.cnJournalofSoftware,Vol.21,No.11,November2010,pp.2802-2813http://www.jos.org.cndoi:10.3724/SP.J.1001.2010.03677Tel/Fax:+86-10-62562563©byInstituteofSoftware,theChineseAcademyofSciences.Allrightsreserved.一种基于语料特性的聚类算法*Supportedb
2、ytheNationalNaturalScienceFoundationofChinaunderGrantNo.60933005(国家自然科学基金);theNationalBasicResearchProgramofChinaunderGrantNos.2007CB311100,2004CB318109(国家重点基础研究发展计划(973));theNationalHigh-TechResearchandDevelopmentPlanofChinaunderGrantNo.2007AA01Z441(国家高技术研究发展计划(863))Received2008-10-22;Revised
3、2009-03-05;Accepted2009-07-07曾依灵1,2+,许洪波1,吴高巍1,白硕11(中国科学院计算技术研究所网络重点实验室,北京100190)2(中国科学院研究生院,北京100049)ClusteringAlgorithmBasedontheDistributionsofIntrinsicClustersZENGYi-Ling1,2+,XUHong-Bo1,WUGao-Wei1,BAIShuo11(KeyLaboratoryofNetworkScienceandTechnology,InstituteofComputingTechnology,TheChines
4、eAcademyofSciences,Beijing100190,China)2(GraduateUniversity,TheChineseAcademyofSciences,Beijing100049,China)+Correspondingauthor:E-mail:zengyiling@software.ict.ac.cn,http://www.ict.ac.cnZengYL,XuHB,WuGW,BaiS.Clusteringalgorithmbasedonthedistributionsofintrinsicclusters.JournalofSoftware,2010,2
5、1(11):2802-2813.http://www.jos.org.cn/1000-9825/3677.htmAbstract:Infindingaflexibleapproachtosolvethemodelmisfitproblem,aclusteringalgorithmbasedonthedistributionsofintrinsicclusters(CADIC)isproposed,whichimplicitlyintegratesdistributioncharacteristicsintotheclusteringframeworkbyapplyingrescal
6、ingoperations.Intheclusteringprocess,asetofdiscriminativedirectionsarechosentoconstructtheCADICcoordinate,underwhichthedistributioncharacteristicsareanalyzedinordertodesignrescalingfunctions.Alongeveryaxis,rescalingfunctionsareappliedtoimplicitlynormalizethedatadistributionsuchthatmorereasonab
7、leclusteringdecisionscanbemade.Asaresult,thereliabilityofclusteringdecisionsisimproved.ThetimecomplexityofCADICremainsthesameasK-meansbyusingaK-means-likeiterationstrategy.Experimentsonwell-knownbenchmarkevaluationdatasetsshowthatthefra