资源描述:
《TWO-STAGE VARIABLE CLUSTERING》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、SASGlobalForum2008SASPresentsPaper320-2008TWO-STAGEVARIABLECLUSTERINGFORLARGEDATASETSTaiyeongLee,DavidDuling,SongLiu,andDominiqueLatourSASInstituteInc.,Cary,NCABSTRACTIndatamining,principalcomponentanalysisisapopulardimensionreductiontechnique.Italsopro
2、videsagoodremedyforthemulticollinearityproblem,butitsinterpretationofinputspaceisnotasgood.Toovercometheinterpretationproblem,principalcomponents(clustercomponents)areobtainedthroughvariableclustering,whichwasimplementedwithPROCVARCLUS.Theprocedureuseso
3、bliqueprincipalcomponentsanalysisandbinaryiterativesplitsforvariableclustering,anditprovidesnon-orthogonalprincipalcomponents.Evenifthisproceduresacrificestheorthogonalpropertyamongprincipalcomponents,itprovidesgoodinterpretableprincipalcomponentsandwel
4、l-explainedclusterstructuresofvariables.However,thePROCVARCLUSimplementationisinefficienttodealwithhigh-dimensionaldata.Weintroducethetwo-stage,variableclusteringtechniqueforlargedatasets.Thistechniqueusesglobalclusters,sub-clusters,andtheirprincipalcom
5、ponents.INTRODUCTIONDimensionreductionisoneofmostimportantdataminingtaskstohandledatasetswithaverylargenumberofvariables.Someeasyandcommon,superviseddimensionreductiontaskscanbeachievedthroughsimplelinearregression,thatis,byusingR-squaresbetweendependen
6、tandindependentvariables,stepwiseregression,andothervariantsoftheregressionmethod.Themethodsarealsousedaspreprocessingmethodsofsomenobledimensiontechniqueswhenthenumberofvariablesisextremelylarge.Anotherpopularmethodisanunsupervisedtechniquethatusesprin
7、cipalcomponentsanalysis.Thistechniquegivesverysuccessfuldimensionreductionresultsandremediesthemulticollinearityproblem.Howeveritsuffersfromitsinterpretationforinputspaceandsomecomputationproblemsintheeigenvaluecalculationwhenthedimensionofinputspaceisv
8、erylarge.Toovercomethosedifficulties,wecanuseamethodthatcombinessupervisedandnon-supervisedmethods,forexample,asimplevariableselectionthatusesanR-squareoraChi-Squaretestwithitstargetvariable,thenanotherdimensionreductiontechnique