资源描述:
《Classification_and_Regression_Trees.pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、ClassicationandRegressionTrees36-350,DataMining6November2009Contents1PredictionTrees12RegressionTrees42.1Example:CaliforniaRealEstateAgain...............42.2RegressionTreeFitting.......................72.2.1Cross-ValidationandPruninginR.............132.3U
2、ncertaintyinRegressionTrees...................143ClassicationTrees183.1MeasuringInformation........................193.2MakingPredictions..........................203.3MeasuringError...........................203.3.1MisclassicationRate...................
3、.203.3.2AverageLoss.........................213.3.3LikelihoodandCross-Entropy................213.3.4Neyman-PearsonApproach.................234FurtherReading245Exercises24Reading:PrinciplesofDataMining,sections10.5and5.2(inthatorder);Berk,chapter3Havingbuil
4、tupincreasinglycomplicatedmodelsforregression,I'llnowswitchgearsandintroduceaclassofnonlinearpredictivemodelwhichatrstseemstoosimpletopossiblework,namelypredictiontrees.Thesehavetwovarieties,regressiontreesandclassicationtrees.1PredictionTreesThebasicide
5、aisverysimple.WewanttopredictaresponseorclassYfrominputsX1;X2;:::Xp.Wedothisbygrowingabinarytree.Ateachinternal1nodeinthetree,weapplyatesttooneoftheinputs,sayXi.Dependingontheoutcomeofthetest,wegotoeithertheleftortherightsub-branchofthetree.Eventuallywecom
6、etoaleafnode,wherewemakeaprediction.Thispredictionaggregatesoraveragesallthetrainingdatapointswhichreachthatleaf.Figure1shouldhelpclarifythis.Whydothis?Predictorslikelinearorpolynomialregressionareglobalmodels,whereasinglepredictiveformulaissupposedtoholdo
7、vertheentiredataspace.Whenthedatahaslotsoffeatureswhichinteractincomplicated,nonlinearways,assemblingasingleglobalmodelcanbeverydicult,andhope-lesslyconfusingwhenyoudosucceed.Someofthenon-parametricsmootherstrytotmodelslocallyandthenpastethemtogether,but
8、againtheycanbehardtointerpret.(Additivemodelsareatleastprettyeasytograsp.)Analternativeapproachtononlinearregressionistosub-divide,orparti-tion,thespaceintosmallerregions,wheretheinteractionsaremoremanage-abl