欢迎来到天天文库
浏览记录
ID:39771815
大小:1.66 MB
页数:32页
时间:2019-07-11
《feature engineering》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、DiscoverFeatureEngineeringHowtoEngineerFeaturesandHowtoGetGoodatItImportanceofFeatureEngineering●Betterfeaturesmeansflexibility.●Betterfeaturesmeanssimplermodels.●Betterfeaturesmeansbetterresults.WhatisFeatureEngineering?●Featureengineeringis●theprocessoftransformingrawdataintofeatures●thatbette
2、rrepresenttheunderlyingproblemtothepredictivemodels●resultinginimprovedmodelaccuracyonunseendata.Sub-ProblemsofFeatureEngineering●FeatureImportance(correlation,randomforest)–Anestimateoftheusefulnessofafeature●FeatureExtraction(PCA)–Theautomaticconstructionofnewfeaturesfromrawdata●FeatureSelecti
3、on(rankingscore,wrapper,LASSO)–Frommanyfeaturestoafewthatareuseful●FeatureConstruction()–Themanualconstructionofnewfeaturesfromrawdata●FeatureLearning–TheautomaticidentificationanduseoffeaturesinrawdataIterativeProcessofFeatureEngineering●Brainstormfeatures●Devisefeatures●Selectfeatures●Evaluate
4、modelsGeneralExamplesofFeatureEngineering●DecomposeCategoricalAttributes–“Item_Color”thatcanbeRed,BlueorUnknown.●DecomposeaDate-Time–2014-09-20T20:45:40Z●ReframeNumericalQuantities–Num_Customer_PurchasesPurchases_Summer,Purchases_FallFeatureselectioninsklearn●Removingfeatureswithlowvariance–Vari
5、anceThreshold●Univariatefeatureselection–Regressionp-values–ClassificationAnovaF-valueVariableRanking●CorrelationCriteria–Pearsoncorrelationcoefficient●SingleVariableClassifiers–ROC(x-FPRy-TPR)AUC●InformationTheoreticRankingCriteria●Noisy(noninformative)features●Applyingunivariatefeatureselectio
6、nbeforetheSVMincreasestheSVMweightattributedtothesignificantfeaturesLimitationsofvariableranking●CanPresumablyRedundantVariablesHelpEachOther?Limitationsofvariableranking●HowDoesCorrelationImpactVariableRedundancy●Limitationsofvariableranking●CanaVariablethatisUselessbyItselfbeUsefulwithOthers?●
7、Featureselectioninsklearn●Recursivefeatureelimination–Allfeature→absoluteweightsarethesmallestarepruned(SVC)●L1-basedfeatureselection–Lasso(higheralphathefewerfeatures)–SVMsandlogistic-regression(smallerCthefewerfeatures)––●
此文档下载收益归作者所有