资源描述:
《使用SAS进行KNN分类和回归》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、PaperSD-09KNNClassificationandRegressionusingSASRLiangXie,TheTravelersCompanies,Inc.ABSTRACTK-NearestNeighbor(KNN)classificationandregressionaretwowidelyusedanalyticmethodsinpredictivemodelinganddataminingfields.Theyprovideawaytomodelhighlynonlineardecisionboundaries,andtofulfillmanyotheranalytic
2、altaskssuchasmissingvalueimputation,localsmoothing,etc.Inthispaper,wediscusswaysinSASRtoconductKNNclassificationandKNNRegression.Specifically,PROCDISCRIMisusedtobuildmulti-classKNNclassificationandPROCKRIGE2DisusedforKNNregressiontasks.Technicaldetailssuchastuningparameterselection,etcarediscuss
3、ed.WealsodiscusstipsandtricksinusingthesetwoproceduresforKNNclassificationandregression.ExamplesarepresentedtodemonstratefullprocessflowinapplyingKNNclassificationandregressioninrealworldbusinessprojects.INTRODUCTIONkNNstandsforkNearestNeighbor.Indataminingandpredictivemodeling,itreferstoamemory
4、-based(orinstance-based)algorithmforclassificationandregressionproblems.Itisawidelyusedalgorithmwithmanysuc-cessfullyapplicationsinmedicalresearch,businessapplications,etc.Infact,accordingtoGoogleAnalytics,itisthesecondmostviewedarticleonmySASprogrammingblogofalltime,withmorethan2200viewsayear
5、.Inclassificationproblems,thelabelofpotentialobjectsisdeterminedbythelabelsofclosesttrainingdatapointsinthefeaturespace.Thedeterminationprocessiseitherthrough”majorityvoting”or”averaging”.In”majorityvoting”,thelabelofobjectisassignedtobethelabelwhichmostfrequentamongthekclosesttrainingexamples
6、.In”averaging”,theobjectisnotassignedalabel,butinstead,theratioofeachclassamongthekclosesttrainingdatapoints.InRegressionproblems,thepropertyofobjectisobtainedviaasimilar”averaging”process,wherethevalueoftheobjectistheaveragevalueofthekclosesttrainingpoints.Inpractice,boththe”majorityvoting”a
7、nd”averaging”processcanberefinedbyaddingweightstothekclosesttrainingpoints,wheretheweightsareproportionaltothedistancebetweenobjectandthetrainingpoint.Inthisway,theclosestpointswillhavethebiggestinfluenceonthefinalresults.Infact,theSASimplementa