资源描述:
《ieee - finding patterns in three dimensional graphs algorithms and applications to scientific data mining》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、IEEETRANSACTIONSONKNOWLEDGEANDDATAENGINEERING,VOL.14,NO.4,JULY/AUGUST2002731FindingPatternsinThree-DimensionalGraphs:AlgorithmsandApplicationstoScientificDataMiningXiongWang,Member,IEEE,JasonT.L.Wang,Member,IEEE,DennisShasha,BruceA.Shapiro,IsidoreRigoutsos,Member,IE
2、EE,andKaizhongZhangAbstractÐThispaperpresentsamethodforfindingpatternsin3Dgraphs.Eachnodeinagraphisanundecomposableoratomicunitandhasalabel.Edgesarelinksbetweentheatomicunits.Patternsarerigidsubstructuresthatmayoccurinagraphafterallowingforanarbitrarynumberofwhole-s
3、tructurerotationsandtranslationsaswellasasmallnumber(specifiedbytheuser)ofeditoperationsinthepatternsorinthegraph.(Whenapatternappearsinagraphonlyafterthegraphhasbeenmodified,wecallthatappearanceªapproximateoccurrence.º)Theeditoperationsincluderelabelinganode,deleti
4、nganodeandinsertinganode.Theproposedmethodisbasedonthegeometrichashingtechnique,whichhashesnode-tripletsofthegraphsintoa3Dtableandcompressesthelabel-tripletsinthetable.Todemonstratetheutilityofouralgorithms,wediscusstwoapplicationsoftheminscientificdatamining.First,
5、weapplythemethodtolocatingfrequentlyoccurringmotifsintwofamiliesofproteinspertainingtoRNA-directedDNAPolymeraseandThymidylateSynthaseandusethemotifstoclassifytheproteins.Then,weapplythemethodtoclusteringchemicalcompoundspertainingtoaromatic,bicyclicalkanes,andphotos
6、ynthesis.Experimentalresultsindicatethegoodperformanceofouralgorithmsandhighrecallandprecisionratesforbothclassificationandclustering.IndexTermsÐKDD,classificationandclustering,datamining,geometrichashing,structuralpatterndiscovery,biochemistry,medicine.æ1INTRODUCTI
7、ONSTRUCTURALpatterndiscoveryfindsmanyapplicationsinthedataminingfield,whereautomateddiscoveryofnaturalsciences,computer-aideddesign,andimagepatterns,classificationandclusteringrulesisoneoftheprocessing[8],[33].Forinstance,detectingrepeatedlymaintasks.Weestablishafra
8、meworkforstructuralpatternoccurringstructuresinmoleculescanhelpbiologiststodiscoveryinthegraphsandapplyourapproachtounderstandfunctionsoft