欢迎来到天天文库
浏览记录
ID:59265892
大小:681.50 KB
页数:47页
时间:2020-09-22
《数据挖掘:实用机器学习工具与技术_07变换技术ppt课件.ppt》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、DataMiningPracticalMachineLearningToolsandTechniquesSlidesforChapter7ofDataMiningbyI.H.Witten,E.FrankandM.A.Hall2DataMining:PracticalMachineLearningToolsandTechniques(Chapter7)DatatransformationsAttributeselectionScheme-independent,scheme-specificAttrib
2、utediscretizationUnsupervised,supervised,error-vsentropy-based,converseofdiscretizationProjectionsPrincipalcomponentanalysis,randomprojections,partialleast-squares,text,timeseriesSamplingReservoirsamplingDirtydataDatacleansing,robustregression,anomalyde
3、tectionTransformingmultipleclassestobinaryonesSimpleapproaches,error-correctingcodes,ensemblesofnesteddichotomiesCalibratingclassprobabilities3DataMining:PracticalMachineLearningToolsandTechniques(Chapter7)Justapplyalearner?NO!Scheme/parameterselectiont
4、reatselectionprocessaspartofthelearningprocessModifyingtheinput:DataengineeringtomakelearningpossibleoreasierModifyingtheoutputRe-calibratingprobabilityestimates4DataMining:PracticalMachineLearningToolsandTechniques(Chapter7)AttributeselectionAddingaran
5、dom(i.e.irrelevant)attributecansignificantlydegradeC4.5’sperformanceProblem:attributeselectionbasedonsmallerandsmalleramountsofdataIBLverysusceptibletoirrelevantattributesNumberoftraininginstancesrequiredincreasesexponentiallywithnumberofirrelevantattri
6、butesNaïveBayesdoesn’thavethisproblemRelevantattributescanalsobeharmful5DataMining:PracticalMachineLearningToolsandTechniques(Chapter7)Scheme-independentattributeselectionFilterapproach:assessbasedongeneralcharacteristicsofthedataOnemethod:findsmallests
7、ubsetofattributesthatseparatesdataAnothermethod:usedifferentlearningschemee.g.useattributesselectedbyC4.5and1R,orcoefficientsoflinearmodel,possiblyappliedrecursively(recursivefeatureelimination)IBL-basedattributeweightingtechniques:can’tfindredundantatt
8、ributes(butfixhasbeensuggested)Correlation-basedFeatureSelection(CFS):correlationbetweenattributesmeasuredbysymmetricuncertainty:goodnessofsubsetofattributesmeasuredby(breakingtiesinfavorofsmallersubsets):6DataMining:PracticalMac
此文档下载收益归作者所有