资源描述:
《A New Method for Mining Regression classes in large data sets》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.23,NO.1,JANUARY20015ANewMethodforMiningRegressionClassesinLargeDataSetsYeeLeung,Jiang-HongMa,andWen-XiuZhangAbstractÐExtractingpatternsandmodelsofinterestfromlargedatabasesisattractingmuchattentioninavarietyofdisciplines.Knowledg
2、ediscoveryindatabases(KDD)anddatamining(DM)areareasofcommoninteresttoresearchersinmachinelearning,patternrecognition,statistics,artificialintelligence,andhighperformancecomputing.Aneffectiveandrobustmethod,coinedregression-classmixturedecomposition(RCMD)method,isproposedinthispaperforthem
3、iningofregressionclassesinlargedatasets,especiallythosecontaminatedbynoise.Anewconcept,calledªregressionclassºwhichisdefinedasasubsetofthedatasetthatissubjecttoaregressionmodel,isproposedasabasicbuildingblockonwhichtheminingprocessisbased.Alargedatasetistreatedasamixturepopulationinwhicht
4、herearemanysuchregressionclassesandothersnotaccountedforbytheregressionmodels.Iterativeandgenetic-basedalgorithmsfortheoptimizationoftheobjectivefunctionintheRCMDmethodarealsoconstructed.ItisdemonstratedthattheRCMDmethodcanresistaverylargeproportionofnoisydata,identifyeachregressionclass,
5、assignaninliersetofdatapointssupportingeachidentifiedregressionclass,anddeterminetheaprioriunknownnumberofstatisticallyvalidmodelsinthedataset.Althoughthemodelsareextractedsequentially,thefinalresultisalmostindependentoftheextractionorderduetoanoveldynamicclassificationstrategyemployedint
6、hehandlingofoverlappingregressionclasses.TheeffectivenessandrobustnessoftheRCMDmethodaresubstantiatedbyasetofsimulationexperimentsandareal-lifeapplicationshowingthewayitcanbeusedtofitmixeddatatolinearregressionclassesandnonlinearstructuresinvarioussituations.IndexTermsÐDatamining,genetica
7、lgorithm,maximumlikelihoodmethod,mixturemodeling,RCMDmethod,regressionclass,robustness.æ1INTRODUCTIONTiswell-knownthatstatisticsistheartandscienceofsuchascomputervision,patternrecognition,remoteIextractingusefulinformationandpatternsofinterestsensing,marketing,andfi