资源描述:
《Tree-Based Models》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、Tree-BasedModelsKwok-LeungTsuiSystemsEngineering&EngineeringManagementCityUniversityofHongKong1RegressionModels1.Classicallinearmodel2.Generalizedlinearmodel3.AdditivemodelSj:smoothnonparametricfunction4.Generalizedadditivemodels2ClassificationandReg
2、ressionTree(CART)Originaldata(Input)AlgorithmOutput•SplitthedataintoRootnodeVariablesResponsetwoormoresubsetX1>c1?x1x2…xnYbasedonthevalueofNOYESTerminalIntermediatevariables.nodenode•ContinuouslysplitX2>c2?eachsubsetintofinerNOYESsubsets.•Stopgrowing
3、treesTerminalTerminalorprunetrees.nodenode3WeatherData:PlayornotPlay?OutlookTemperatureHumidityWindyPlay?sunnyhothighfalseNosunnyhothightrueNoovercasthothighfalseYesrainmildhighfalseYesraincoolnormalfalseYesraincoolnormaltrueNoovercastcoolnormaltrueY
4、essunnymildhighfalseNosunnycoolnormalfalseYesrainmildnormalfalseYessunnymildnormaltrueYesovercastmildhightrueYesovercasthotnormalfalseYesrainmildhightrueNo4ExampleTreefor“Play?”OutlooksunnyrainovercastHumidityYesWindyhighnormaltruefalseNoYesNoYes5Tre
5、e-BasedModels•Basicidea•Partitionthefeaturespaceintoasetofrectangle(Forsimplicity,recursivebinarypartition)•Fitasimplemodel(e.g.,constant)ineachone.R5Rc5t24c22Rx3t2c3R4Rc41c1t1t3x1BinaryTreeBinaryPartition6Tree-BasedModels•Models•Cm—theregressionmode
6、lpredictionvaluecorrespondingtotheregionRm5f(x)cmI{(x1,x2)Rm}m1c1I{(x1,x2)R1}c2I{(x1,x2)R2}c3I{(x1,x2)R3}cI{(xx)R}cI{(xx)R}41,2451,25•Twotypes•Regressiontrees•Classificationtrees•FundamentalsIssuesinTree-basedModels•Howtodecidethesplit
7、tingpoint?(Treegrowing)•Howtocontrolthesizeofthetree?(Treepruning)7RegressionTrees•ForeachofNobservations,inputisxi=(xi1,xi2,…,xip),outputiscontinuousyi,•PartitionthespaceintoMregions:R1,R2,…,RM.Mf(x)cmI(xRm)m1•Thebestpartition:tominimizethesumof
8、squarederror:N2(yif(xi))i1caverage(y
9、xR)miim8ClassificationTrees•Forregressiontrees,impuritymeasureQm(T)121Q(T)ycˆ,wherecˆymimmiNmxiRmNmxiRm•Forclassificationtrees,impuritymeasureQm(T)1Misclassificationerror:I(yik(m))1pk(m),mNmiR