资源描述:
《Logistic regression (with R).pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、Logisticregression(withR)ChristopherManning4November20071TheoryWecantransformtheoutputofalinearregressiontobesuitableforprobabilitiesbyusingalogitlinkfunctiononthelhsasfollows:plogitp=logo=log=β0+β1x1+β2x2+···+βkxk(1)1−pTheoddscanvaryonascaleof(0,∞),s
2、othelogoddscanvaryonthescaleof(−∞,∞)–preciselywhatwegetfromtherhsofthelinearmodel.Forareal-valuedexplanatoryvariablexi,theintuitionhereisthataunitadditivechangeinthevalueofthevariableshouldchangetheoddsbyaconstantmultiplicativeamount.Exponentiating,th
3、isisequivalentto:1elogitp=eβ0+β1x1+β2x2+···+βkxk(2)o=p=eβ0eβ1x1eβ2x2···eβkxk(3)1−pTheinverseofthelogitfunctionisthelogisticfunction.Iflogit(π)=z,thenezπ=1+ezThelogisticfunctionwillmapanyvalueoftherighthandside(z)toaproportionvaluebetween0and1,asshowni
4、nfigure1.Noteacommoncasewithcategoricaldata:Ifourexplanatoryvariablesxiareallbinary,thenfortheonesthatarefalse(0),wegete0=1andthetermdisappears.Similarly,ifx=1,eβixi=eβi.Soweareileftwithtermsforonlythexithataretrue(1).Forinstance,ifx3,x4,x7=1only,wehav
5、e:logitp=β0+β3+β4+β7(4)o=eβ0eβ3eβ4eβ7(5)TheintuitionhereisthatifIknowthatacertainfactistrueofadatapoint,thenthatwillproduceaconstantchangeintheoddsoftheoutcome(“Ifhe’sEuropean,thatdoublestheoddsthathesmokes”).LetL=L(D;B)bethelikelihoodofthedataDgivent
6、hemodel,whereB={β0,...,βk}aretheparametersofthemodel.Theparametersareestimatedbytheprincipleofmaximumlikelihood.Technicalpoint:thereisnoerrorterminalogisticregression,unlikeinlinearregressions.1Notethatwecanconvertfreelybetweenaprobabilitypandoddsofor
7、aneventversusitscomplement:poo=p=1−po+11Logisticfunction0.00.20.40.60.81.0-6-4-20246Figure1:Thelogisticfunction2BasicRlogisticregressionmodelsWewillillustratewiththeCedegrendatasetonthewebsite.cedegren<-read.table("cedegren.txt",header=T)Youneedtocrea
8、teatwo-columnmatrixofsuccess/failurecountsforyourresponsevariable.Youcannotjustusepercentages.(Youcangivepercentagesbutthenweightthembyacountofsuccess+failures.)attach(cedegren)ced.del<-cbind(sDel,sNoDel)Makethelogisticregressionmodel.Theshort