资源描述:
《A hybrid method for XML clustering by strcture and content》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、JOURNALOFSOFTWARE,VOL.6,NO.12,DECEMBER20112361AHybridMethodforXMLClusteringbyStructureandContentYongPiaoandXiu-kunWangSchoolofElectronicandInformationEngineering,DalianUniversityofTechnology,Dalian,ChinaEmail:{piaoy,jsjwxk}@dlut.edu.cnAbstract—AneffectiveXMLclustermethodcalledne
2、ighbordocumentclusteringisnotsosatisfactory,andbesides,centerclusteringalgorithm(NCC)ispresentedinthispaper,theyarenotwellindistinguishingnoiseorisolatedpointswhosesimilarityisobtainedthroughbothstructuralandeffectively.Intermsofcomputationalcomplexity,thecontentinformationconta
3、inedinXMLfiles.Structuralsearchingtimeoftraditionalmethodsforclustercenterssimilarityisfirstlymeasuredbyfrequency-pathmodelandincreaserapidly,thisisanobstacletogetbetteritssimilaritycalculationalgorithmwithpositionandperformanceinXMLclustering.Inaddition,traditionalfrequencyweig
4、htbylongestcommonsubsequenceispartitioningmethodsrepresentedbyK-MeansandK-introduced.Inordertoimprovetheperformanceandprecision,thefrequency-pathmodelisfurtherextendedbyMedoidshavetobespecifiedtheclustersnumberKinconsideringthestructureandcontentinformationadvance.Duetothesereas
5、ons,aneighborcentersimultaneously.ExperimentsshowthattheNCCembedclusteringalgorithmwithsimilarity(NCC)isproposedinwithhybridsimilaritycalculationmethodcanobtainhighthissituation.Itisnotonlysimple,butcanfindnon-purityandF-measurevalueandiseffectiveandapplicablesphericalstructured
6、ocumentsanddistinguishnoiseorforclusteringXMLwithbothhomogenousandisolatedpointeffectivelyaswell.heterogeneousstructures.Similarityamongdocumentsisthekeyissueinthefieldofdocumentclustering.Sofar,themethodsIndexTerms—neighborcenterclustering,positionandproposedforthispurposecanbe
7、roughlyclassifiedintofrequencyweight,longestcommonsubsequence,hybridsimilaritycalculationthreetypes,namelybythegraphmatching,bytheeditdistanceorbythetreepathmodel.Reference[4]I.INTRODUCTIONdescribesanXMLdocumentwithadirectedgraphandcalculatessimilaritybetweenXMLdocumentsbygraphX
8、ML(eXtensibleMarkupLanguage),asacommonmatchingi