资源描述:
《variable selection in credit card industry》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、NESUG2006DataManipulationandAnalysisAnalysisVariableSelectionintheCreditCardIndustryMoezHababou,AlecY.Cheng,andRayFalk,RoyalBankofScotland,Bridgeport,CTABSTRACTThecreditcardindustryisparticularinitsneedforawidevarietyofmodelsandthewealthofdatacollectedoncustomersandprospect
2、s.WeproposeamethodologytoselectvariablesforpredictivemodelingpurposesoutoftheplethoraofdataavailableusingacombinationofObliqueComponentAnalysis(PROCVARCLUS),InformationValue(IV)andWeightOfEvidence(WOE)analysis,andbusinessintelligence.Ourtoolsenableustoquicklyidentifythemost
3、informativevariablesforlogisticregressionmodels.INTRODUCTIONDatamininghasbecomecentraltothefinancialservicesindustryasthecompetitionforconsumershasintensifiedandincreasedinrecentyears.Asaresult,thereisanincreasingandgrowingplethoraofdatacollectedonconsumers.Thethreemajorbur
4、eaus(Equifax,TransUnion,andExperian)disposenowofthousandsofvariablesthatcanbeusedforanalyticalpurposes.Forinstance,Equifaxprovidesover1,200creditanddemographicattributeswhichcanbeusedforvariousmodelingandanalyticalprojects.Inaddition,thankstopowerfuldatawarehouses,financial
5、institutionshavemanagedtocollecttonsofdataoncustomersandprospectswhichcanbeusedforvariouspurposes(directmarketing,retention,fraud,riskmanagement,customersegmentation,revenueandprofitforecasts,etc.).Suchawealthofdatacanbeproblematicasmodelersneedtosiftthroughallthesevariable
6、s.Itisthusimportanttodevelopmechanismsandprocesseswhichassistanalystsandmodelerstonavigatethroughthemazeofdata,andidentifyasmallersetofvariables.Modelscanbebuiltinseveraldifferentwaysbutthereareseveralcommonmajorphasesinmodeldevelopmentprocess:variablereductionandtransforma
7、tion,andmodeldevelopmentashighlightedinFigure1.Recode/BuildReportDefineproblem/PullCleandatatransformReduce/Assessresults/Launchproject/samplevariablesvariables/validateProductiondatamodelcodeFigure1.TypicalphasesofmodeldevelopmentprocessInthispaper,weareconcernedwithStep5,
8、thatisreducingthenumberofvariablestoasmallermanageablesetthattheanalystormodelerca