reilly media] r cookbook useful tricks

reilly media] r cookbook useful tricks

ID:7294771

大小:480.62 KB

页数:22页

时间:2018-02-10

上传者:U-5649
reilly media] r cookbook useful tricks_第1页
reilly media] r cookbook useful tricks_第2页
reilly media] r cookbook useful tricks_第3页
reilly media] r cookbook useful tricks_第4页
reilly media] r cookbook useful tricks_第5页
资源描述:

《reilly media] r cookbook useful tricks》由会员上传分享,免费在线阅读,更多相关内容在工程资料-天天文库

CHAPTER12UsefulTricksIntroductionTherecipesinthischapterareneitherobscurenumericalcalculationsnordeepstatis-ticaltechniques.Yettheyareusefulfunctionsandidiomsthatyouwilllikelyneedatonetimeoranother.12.1PeekingatYourDataProblemYouhavealotofdata—toomuchtodisplayatonce.Nonetheless,youwanttoseesomeofthedata.SolutionUseheadtoviewthefirstfewdataorrows:>head(x)Usetailtoviewthelastfewdataorrows:>tail(x)DiscussionPrintingalargedatasetispointlessbecauseeverythingjustrollsoffyourscreen.Useheadtoseealittlebitofthedata:>head(dfrm)xyz10.75331100.57562846-0.171076022.01435470.833122740.36985843-0.35513450.574715422.013234842.02816780.78945319-0.5378854313 5-2.21687450.017580241.834487960.7583962-1.782147552.2848990Usetailtoseethelastfewrowsandthenumberofrows.Here,weseethatthisdataframehas10,120rows:>tail(dfrm)xyz10115-0.0314354-0.74988291-0.204896310116-0.47790010.934075101.050997710117-1.13144021.893084171.7207972101180.4891881-1.20792811-1.4630227101191.2349013-0.09615198-0.988751310120-1.3763834-2.253096280.9296106SeeAlsoSeeRecipe12.15forseeingthestructureofyourvariable’scontents.12.2WidenYourOutputProblemYouareprintingwidedatasets.Riswrappingtheoutput,makingithardtoread.SolutionSetthewidthoptiontoreflectthetruenumberofcolumnsinyouroutputwindow:>options(width=numcols)DiscussionBydefault,Rformatsyouroutputtobe80columnswide.It’sfrustratingwhenyouprintawidedatasetandRwrapstheoutputlinesaccordingtoitsassumedlimitationof80columns.Yourexpensive,wide-screendisplaycanprobablyaccommodatemanymorecolumnsthanthat.Ifyourdisplayis120characterswide,forexample,thenaskRtousetheentire120columnsbysettingthewidthoption:>options(width=120)NowRwillwrapitslinesafter120characters,not80,lettingyoucomfortablyviewwiderdatasets.SeeAlsoSeeRecipe3.16formoreaboutsettingoptions.314|Chapter12:UsefulTricks 12.3PrintingtheResultofanAssignmentProblemYouareassigningavaluetoavariableandyouwanttoseeitsvalue.SolutionSimplyputparenthesesaroundtheassignment:>x<-1/pi#Printsnothing>(x<-1/pi)#Printsassignedvalue[1]0.3183099DiscussionNormally,Rinhibitsprintingwhenitseesyouenterasimpleassignment.Whenyousurroundtheassignmentwithparentheses,however,itisnolongerasimpleassignmentandsoRprintsthevalue.SeeAlsoSeeRecipe2.1formorewaystoprintthings.12.4SummingRowsandColumnsProblemYouwanttosumtherowsorcolumnsofamatrixordataframe.wnloadfromWow!eBookSolutionoDUserowSumstosumtherows:>rowSums(m)UsecolSumstosumthecolumns:>colSums(m)DiscussionThisisamundanerecipe,butit’ssocommonthatitdeservesmentioning.Iusethisrecipe,forexample,whenproducingreportsthatincludecolumntotals.Inthisexam-ple,daily.prodisarecordofthisweek’sfactoryproductionandwewanttotalsbyproductandbyday:>daily.prodWidgetsGadgetsThingysMon17916718212.4SummingRowsandColumns|315 Tue153193166Wed183190170Thu153161171Fri154181186>colSums(daily.prod)WidgetsGadgetsThingys822892875>rowSums(daily.prod)MonTueWedThuFri528512543485521Thesefunctionsreturnavector.Inthecaseofcolumnsums,wecanappendthevectortothematrixandtherebyneatlyprintthedataandtotalstogether:>rbind(daily.prod,Totals=colSums(daily.prod))WidgetsGadgetsThingysMon179167182Tue153193166Wed183190170Thu153161171Fri154181186Totals82289287512.5PrintingDatainColumnsProblemYouhaveseveralparalleldatavectors,andyouwanttoprintthemincolumns.SolutionUsecbindtoformthedataintocolumns;thenprinttheresult.DiscussionWhenyouhaveparallelvectors,it’sdifficulttoseetheirrelationshipifyouprintthemseparately:>print(x)[1]-0.31423172.4033751-0.7182640-1.7606110-1.1252812-0.71954061.3102240[8]0.45189850.15217740.6533708>print(y)[1]-0.9473355-1.0710863-0.17605941.8720502-0.51033841.43889041.1531710[8]-0.1548398-0.25990550.4713668Usethecbindfunctiontoformthemintocolumnsthat,whenprinted,showthedata’sstructure:>print(cbind(x,y))xy[1,]-0.3142317-0.9473355[2,]2.4033751-1.0710863[3,]-0.7182640-0.1760594316|Chapter12:UsefulTricks [4,]-1.76061101.8720502[5,]-1.1252812-0.5103384[6,]-0.71954061.4388904[7,]1.31022401.1531710[8,]0.4518985-0.1548398[9,]0.1521774-0.2599055[10,]0.65337080.4713668Youcanincludeexpressionsintheoutput,too.Useatagtogivethemacolumnheading:>print(cbind(x,y,Total=x+y))xyTotal[1,]-0.3142317-0.9473355-1.2615672[2,]2.4033751-1.07108631.3322888[3,]-0.7182640-0.1760594-0.8943233[4,]-1.76061101.87205020.1114392[5,]-1.1252812-0.5103384-1.6356196[6,]-0.71954061.43889040.7193499[7,]1.31022401.15317102.4633949[8,]0.4518985-0.15483980.2970587[9,]0.1521774-0.2599055-0.1077281[10,]0.65337080.47136681.124737612.6BinningYourDataProblemYouhaveavector,andyouwanttosplitthedataintogroupsaccordingtointervals.Statisticianscallthisbinningyourdata.SolutionUsethecutfunction.Youmustdefineavector,saybreaks,whichgivestherangesoftheintervals.Thecutfunctionwillgroupyourdataaccordingtothoseintervals.Itreturnsafactorwhoselevels(elements)identifyeachdatum’sgroup:>f<-cut(x,breaks)DiscussionThisexamplegenerates1,000randomnumbersthathaveastandardNormaldistribu-tion.Itbreaksthemintosixgroupsbydefiningintervalsat±1,±2,and±3standarddeviations:>x<-rnorm(1000)>breaks<-c(-3,-2,-1,0,1,2,3)>f<-cut(x,breaks)Theresultisafactor,f,thatidentifiesthegroups.Thesummaryfunctionshowsthenumberofelementsbylevel.Rcreatesnamesforeachlevel,usingthemathematicalnotationforaninterval:12.6BinningYourData|317 >summary(f)(-3,-2](-2,-1](-1,0](0,1](1,2](2,3]NA's18119348355137212Theresultsarebell-shaped,whichiswhatweexpect.TherearetwoNAvalues,indi-catingthattwovaluesinxfelloutsidethedefinedintervals.Wecanusethelabelsparametertogivenice,predefinednamestothesixgroupsinsteadofthefunkysynthesizednames:>f<-cut(x,breaks,labels=c("Bottom","Low","Neg","Pos","High","Top"))Nowthesummaryfunctionusesournames:BottomLowNegPosHighTopNA's18119348355137212Don’tThrowAwayThatInformationBinningisusefulforsummariessuchashistograms.Butitresultsininformationloss,whichcanbeharmfulinmodeling.Considertheextremecaseofbinningacontinuousvariableintotwovalues,“high”and“low”.Thebinneddatahasonlytwopossiblevalues,soyouhavereplacedarichsourceofinformationwithonebitofinformation.Wherethecontinuousvariablemightbeapowerfulpredictor,thebinnedvariablecandistinguishatmosttwostatesandsowilllikelyhaveonlyafractionoftheoriginalpower.Beforebinning,Isuggestexploringothertransformationsthatarelesslossy.12.7FindingthePositionofaParticularValueProblemYouhaveavector.Youknowaparticularvalueoccursinthecontents,andyouwanttoknowitsposition.SolutionThematchfunctionwillsearchavectorforaparticularvalueandreturntheposition:>vec<-c(100,90,80,70,60,50,40,30,20,10)>match(80,vec)[1]3Herematchreturns3,whichisthepositionof80withinvec.DiscussionTherearespecialfunctionsforfindingthelocationoftheminimumandmaximumvalues—which.minandwhich.max,respectively:>vec<-c(100,90,80,70,60,50,40,30,20,10)>which.min(vec)#Positionofsmallestelement318|Chapter12:UsefulTricks [1]10>which.max(vec)#Positionoflargestelement[1]1SeeAlsoThistechniqueisusedinRecipe11.12.12.8SelectingEverynthElementofaVectorProblemYouwanttoselecteverynthelementofavector.SolutionCreatealogicalindexingvectorthatisTRUEforeverynthelement.Oneapproachistofindallsubscriptsthatequalzerowhentakenmodulon:>v[seq_along(v)%%n==0]DiscussionThisproblemarisesinsystematicsampling:wewanttosampleadatasetbyselectingeverynthelement.Theseq_along(v)functiongeneratesthesequenceofintegersthatcanindexv;itisequivalentto1:length(v).Wecomputeeachindexvaluemodulonbytheexpression:seq_along(v)%%nThenwefindthosevaluesthatequalzero:seq_along(v)%%n==0Theresultisalogicalvector,thesamelengthasvandwithTRUEateverynthelement,thatcanindexvtoselectthedesiredelements:v[seq_along(v)%%n==0]Ifyoujustwantsomethingsimplelikeeverysecondelement,youcanusetherecyclingruleinacleverway.Indexvwithatwo-elementlogicalvector,likethis:v[c(FALSE,TRUE)]Ifvhasmorethantwoelementsthentheindexingvectoristooshort.HenceRwillinvoketheRecyclingRuleandexpandtheindexvectortothelengthofv,recyclingitscontents.ThatgivesanindexvectorthatisFALSE,TRUE,FALSE,TRUE,FALSE,TRUE,andsoforth.Voilà!Thefinalresultiseverysecondelementofv.SeeAlsoSeeRecipe5.3formoreabouttheRecyclingRule.12.8SelectingEverynthElementofaVector|319 12.9FindingPairwiseMinimumsorMaximumsProblemYouhavetwovectors,vandw,andyouwanttofindtheminimumsorthemaximumsofpairwiseelements.Thatis,youwanttocalculate:min(v1,w1),min(v2,w2),min(v3,w3),...or:max(v1,w1),max(v2,w2),max(v3,w3),...SolutionRcallsthesetheparallelminimumandtheparallelmaximum.Thecalculationisper-formedbypmin(v,w)andpmax(v,w),respectively:>pmin(1:5,5:1)#Findtheelement-by-elementminimum[1]12321>pmax(1:5,5:1)#Findtheelement-by-elementmaximum[1]54345DiscussionWhenanRbeginnerwantspairwiseminimumsormaximums,acommonmistakeistowritemin(v,w)ormax(v,w).Thosearenotpairwiseoperations:min(v,w)returnsasinglevalue,theminimumoverallvandw.Likewise,max(v,w)returnsasinglevaluefromallofvandw.Thepminandpmaxvaluescomparetheirargumentsinparallel,pickingtheminimumormaximumforeachsubscript.Theyreturnavectorthatmatchesthelengthoftheinputs.YoucancombinepminandpmaxwiththeRecyclingRuletoperformusefulhacks.Sup-posethevectorvcontainsbothpositiveandnegativevalues,andyouwanttoresetthenegativevaluestozero.Thisdoesthetrick:>v<-pmax(v,0)BytheRecyclingRule,Rexpandsthezero-valuedscalarintoavectorofzerosthatisthesamelengthasv.Thenpmaxdoesanelement-by-elementcomparison,takingthelargerofzeroandeachelementofv:>v<-c(1,-2,3,-4,5,-6,7)>pmax(v,0)[1]1030507Actually,pminandpmaxaremorepowerfulthantheSolutionindicates.Theycantakemorethantwovectors,comparingallvectorsinparallel.320|Chapter12:UsefulTricks SeeAlsoSeeRecipe5.3formoreabouttheRecyclingRule.12.10GeneratingAllCombinationsofSeveralFactorsProblemYouhavetwoormorefactors.Youwanttogenerateallcombinationsoftheirlevels,alsoknownastheirCartesianproduct.SolutionUsetheexpand.gridfunction.Here,fandgaretwofactors:>expand.grid(f,g)DiscussionThiscodesnippetcreatestwofactors—sidesrepresentsthetwosidesofacoin,andfacesrepresentsthesixfacesofadie(thoselittlespotsonadiearecalledpips):>sides<-factor(c("Heads","Tails"))>faces<-factor(c("1pip",paste(2:6,"pips")))Wecanuseexpand.gridtofindallcombinationsofonerollofthedieandonecointoss:>expand.grid(faces,sides)Var1Var211pipHeads22pipsHeads33pipsHeads44pipsHeads55pipsHeads66pipsHeads71pipTails82pipsTails93pipsTails104pipsTails115pipsTails126pipsTailsSimilarly,wecanfindallcombinationsoftwodice:>expand.grid(faces,faces)Var1Var211pip1pip22pips1pip33pips1pip44pips1pip55pips1pip66pips1pip71pip2pips12.10GeneratingAllCombinationsofSeveralFactors|321 82pips2pips..(etc.).344pips6pips355pips6pips366pips6pipsTheresultisadataframe.Rautomaticallyprovidestherownamesandcolumnnames.TheSolutionandtheexampleshowtheCartesianproductoftwofactors,butexpand.gridcanhandlethreeormorefactors,too.SeeAlsoIfyou’reworkingwithstringsandnotfactors,thenyoucanalsouseRecipe7.7togeneratecombinations.12.11FlattenaDataFrameProblemYouhaveadataframeofnumericvalues.Youwanttoprocessallitselementstogether,notasseparatecolumns—forexample,tofindthemeanacrossallvalues.SolutionConvertthedataframetoamatrixandthenprocessthematrix.Thisexamplefindsthemeanofallelementsinthedataframedfrm:>mean(as.matrix(dfrm))Itissometimesnecessarythentoconvertthematrixtoavector.Inthatcase,useas.vector(as.matrix(dfrm)).DiscussionSupposewehaveadataframe,suchasthefactoryproductiondatafromRecipe12.4:>daily.prodWidgetsGadgetsThingysMon179167182Tue153193166Wed183190170Thu153161171Fri154181186Supposealsothatwewanttheaveragedailyproductionacrossalldaysandproducts.Thiswon’twork:322|Chapter12:UsefulTricks >mean(daily.prod)WidgetsGadgetsThingys164.4178.4175.Themeanfunctionthinksitshouldaverageindividualcolumns,whichisusuallywhatyouwant.Butwhenyouwanttheaverageacrossallvalues,firstcollapsethedataframedowntoamatrix:>mean(as.matrix(daily.prod))[1]172.6Thisrecipeworksonlyondataframeswithall-numericdata.Recallthatconvertingadataframewithmixeddata(numericcolumnsmixedwithcharactercolumnsorfac-tors)intoamatrixforcesallcolumnstobeconvertedtocharacters.SeeAlsoSeeRecipe5.33formoreaboutconvertingbetweendatatypes.12.12SortingaDataFrameProblemYouhaveadataframe.Youwanttosortthecontents,usingonecolumnasthesortkey.SolutionUsetheorderfunctiononthesortkey,andthenrearrangethedataframerowsac-cordingtothatordering:>dfrm<-dfrm[order(dfrm$key),]Heredfrmisadataframeanddfrm$keyisthesort-keycolumn.DiscussionThesortfunctionisgreatforvectorsbutisineffectivefordataframes.Sortingadataframerequiresanextrastep.Supposewehavethefollowingdataframeandwewanttosortbymonth:>print(dfrm)monthdayoutcome1711Win2810Lose3825Tie4627Tie5722WinTheorderfunctiontellsushowtorearrangethemonthsintoascendingorder:>order(dfrm$month)[1]4152312.12SortingaDataFrame|323 Itdoesnotreturndatavalues;rather,itreturnsthesubscriptsofdfrm$monththatwouldsortit.Weusethosesubscriptstorearrangetheentiredataframe:>dfrm[order(dfrm$month),]monthdayoutcome4627Tie1711Win5722Win2810Lose3825TieAfterrearrangingthedataframe,themonthcolumnisinascendingorder—justaswewanted.12.13SortingbyTwoColumnsProblemYouwanttosortthecontentsofadataframe,usingtwocolumnsassortkeys.SolutionSortingbytwocolumnsissimilartosortingbyonecolumn.Theorderfunctionacceptsmultiplearguments,sowecangiveittwosortkeys:>dfrm<-dfrm[order(dfrm$key1,dfrm$key2),]DiscussionInRecipe12.12weusedtheorderfunctiontorearrangedataframecolumns.Wecangiveasecondsortkeytoorder,anditwillusetheadditionalkeytobreaktiesinthefirstkey.ContinuingourexamplefromRecipe12.12,wecansortonbothmonthanddaybygivingboththosecolumnstoorder:>dfrm[order(dfrm$month,dfrm$day),]monthdayoutcome4627Tie1711Win5722Win2810Lose3825TieWithinmonths7and8,thedaysarenowsortedintoascendingorder.SeeAlsoSeeRecipe12.12.324|Chapter12:UsefulTricks 12.14StrippingAttributesfromaVariableProblemAvariableiscarryingaroundoldattributes.Youwanttoremovesomeorallofthem.SolutionToremoveallattributes,assignNULLtothevariable’sattributesproperty:>attributes(x)<-NULLToremoveasingleattribute,selecttheattributeusingtheattrfunction,andsetittoNULL:>attr(x,"attributeName")<-NULLDiscussionAnyvariableinRcanhaveattributes.Anattributeissimplyaname/valuepair,andthevariablecanhavemanyofthem.Acommonexampleisthedimensionsofamatrixvariable,whicharestoredinanattribute.Theattributenameisdimandtheattributevalueisatwo-elementvectorgivingthenumberofrowsandcolumns.Youcanviewtheattributesofxbyprintingattributes(x)orstr(x).SometimesyouwantjustanumberandRinsistsongivingitattributes.Thiscanhappenwhenyoufitasimplelinearmodelandextracttheslope,whichisthesecondregressioncoefficient:>m<-lm(resp~pred)>slope<-coef(m)[2]>slopepred1.011960Whenweprintslope,Ralsoprints"pred".Thatisanameattributegivenbylmtothecoefficient(becauseit’sthecoefficientforthepredvariable).Wecanseethatmoreclearlybyprintingtheinternalsofslope,whichrevealsa"names"attribute:>str(slope)Namednum1.01-attr(*,"names")=chr"pred"Strippingoffalltheattributesiseasy,afterwhichtheslopevaluebecomessimplyanumber:>attributes(slope)<-NULL#Stripoffallattributes>str(slope)#Nowthe"names"attributeisgonenum1.01>slope#Andthenumberprintscleanlywithoutalabel[1]1.01196012.14StrippingAttributesfromaVariable|325 Alternatively,wecouldhavestrippedthesingleoffendingattributethisway:>attr(slope,"names")<-NULLRememberthatamatrixisavector(orlist)withadimattribute.Ifyoustripalltheattributesfromamatrix,thatwillstripawaythedimensionsandtherebyturnitintoamerevector(orlist).Furthermore,strippingtheattributesfromanobject(specifically,anS3object)canrenderituseless.Soremoveattributeswithcare.SeeAlsoSeeRecipe12.15formoreaboutseeingattributes.12.15RevealingtheStructureofanObjectProblemYoucalledafunctionthatreturnedsomething.Nowyouwanttolookinsidethatsomethingandlearnmoreaboutit.SolutionUseclasstodeterminethething’sobjectclass:>class(x)Usemodetostripawaytheobject-orientedfeaturesandrevealtheunderlyingstructure:>mode(x)Usestrtoshowtheinternalstructureandcontents:>str(x)DiscussionIamamazedhowoftenIcallafunction,getsomethingback,andwonder:“Whattheheckisthisthing?”Theoretically,thefunctiondocumentationshouldexplainthere-turnedvalue,butsomehowIfeelbetterwhenIcanseeitsstructureandcontentsmyself.Thisisespeciallytrueforobjectswithanestedstructure:objectswithinobjects.Let’sdissectthevaluereturnedbylm(thelinearmodelingfunction)inthesimplestlinearregressionrecipe,Recipe11.1:>m<-lm(y~x)>print(m)Call:lm(formula=y~x)Coefficients:326|Chapter12:UsefulTricks (Intercept)x17.723.25Alwaysstartbycheckingthething’sclass.Theclassindicatesifit’savector,matrix,list,dataframe,orobject:>class(m)[1]"lm"Hmmm.Itseemsthatmisanobjectofclasslm.Thatdoesn’tmeananythingtome.Iknow,however,thatallobjectclassesarebuiltuponthenativedatastructures(vector,matrix,list,ordataframe),soIusemodetostripawaytheobjectfacadeandrevealtheunderlyingstructure:>mode(m)[1]"list"Ah-ha!Itseemsthatmisbuiltonaliststructure.NowIcanuselistfunctionsandoperatorstodigintoitscontents.First,Iwanttoknowthenamesofitslistelements:>names(m)[1]"coefficients""residuals""effects""rank""fitted.values"[6]"assign""qr""df.residual""xlevels""call"[11]"terms""model"Thefirstlistelementiscalled“coefficients”.I’mguessingthosearetheregressionco-efficients.Let’shavealook:>m$coefficients(Intercept)x17.7226233.249677Yes,that’swhattheyare.Irecognizethosevalues.Icouldcontinuediggingintotheliststructureofm,butthatwouldgettedious.Thestrfunctiondoesagoodjobofrevealingtheinternalstructureofanyvariable.ThedumpofmisshowninExample12-1.Example12-1.Dumpingthestructureofavariable>str(m)Listof12$coefficients:Namednum[1:2]17.723.25..-attr(*,"names")=chr[1:2]"(Intercept)""x"$residuals:Namednum[1:30]-12.471-3.983-2.8990.19214.302.....-attr(*,"names")=chr[1:30]"1""2""3""4"...$effects:Namednum[1:30]-372.443155.463-0.6142.41816.552.....-attr(*,"names")=chr[1:30]"(Intercept)""x"""""...$rank:int2$fitted.values:Namednum[1:30]17.923.926.832.230.....-attr(*,"names")=chr[1:30]"1""2""3""4"...$assign:int[1:2]01$qr:Listof5..$qr:num[1:30,1:2]-5.4770.1830.1830.1830.183.......-attr(*,"dimnames")=Listof2......$:chr[1:30]"1""2""3""4"...12.15RevealingtheStructureofanObject|327 ......$:chr[1:2]"(Intercept)""x"....-attr(*,"assign")=int[1:2]01..$qraux:num[1:2]1.181.23..$pivot:int[1:2]12..$tol:num1e-07..$rank:int2..-attr(*,"class")=chr"qr"$df.residual:int28$xlevels:list()$call:languagelm(formula=y~x)$terms:Classes'terms','formula'length3y~x....-attr(*,"variables")=languagelist(y,x)....-attr(*,"factors")=int[1:2,1]01......-attr(*,"dimnames")=Listof2........$:chr[1:2]"y""x"........$:chr"x"....-attr(*,"term.labels")=chr"x"....-attr(*,"order")=int1....-attr(*,"intercept")=int1....-attr(*,"response")=int1....-attr(*,".Environment")=....-attr(*,"predvars")=languagelist(y,x)....-attr(*,"dataClasses")=Namedchr[1:2]"numeric""numeric"......-attr(*,"names")=chr[1:2]"y""x"$model:'data.frame':30obs.of2variables:..$y:num[1:30]5.4119.9423.9232.4344.26.....$x:num[1:30]0.04781.90862.79994.46763.7649.....-attr(*,"terms")=Classes'terms','formula'length3y~x......-attr(*,"variables")=languagelist(y,x)......-attr(*,"factors")=int[1:2,1]01........-attr(*,"dimnames")=Listof2..........$:chr[1:2]"y""x"..........$:chr"x"......-attr(*,"term.labels")=chr"x"......-attr(*,"order")=int1......-attr(*,"intercept")=int1......-attr(*,"response")=int1......-attr(*,".Environment")=......-attr(*,"predvars")=languagelist(y,x)......-attr(*,"dataClasses")=Namedchr[1:2]"numeric""numeric"........-attr(*,"names")=chr[1:2]"y""x"-attr(*,"class")=chr"lm"Noticethatstrshowsalltheelementsofmandthenrecursivelydumpseachelement’scontentsandattributes.Longvectorsandlistsaretruncatedtokeeptheoutputmanageable.ThereisanarttoexploringanRobject.Useclass,mode,andstrtodigthroughthelayers.328|Chapter12:UsefulTricks 12.16TimingYourCodeProblemYouwanttoknowhowmuchtimeisrequiredtorunyourcode.Thisisuseful,forexample,whenyouareoptimizingyourcodeandneed“before”and“after”numberstomeasuretheimprovement.SolutionThesystem.timefunctionwillexecuteyourcodeandthenreporttheexecutiontime:>system.time(aLongRunningExpression)Theoutputisthreetofivenumbers,dependingonyourplatform.Thefirstthreeareusuallythemostimportant:UserCPUtimeNumberofCPUsecondsspentrunningRSystemCPUtimeNumberofCPUsecondsspentrunningtheoperatingsystem,typicallyforinput/outputElapsedtimeNumberofsecondsontheclockDiscussionSupposewewanttoknowthetimerequiredtogenerate10,000,000randomnormalvariatesandsumthem:>system.time(sum(rnorm(10000000)))[1]3.060.013.14NANATheoutputfromsystem.timeshowsthatRused3.06secondsofCPUtime;theoper-atingsystemused0.01secondsofCPUtime;and3.14secondselapsedduringthetest.(TheNAvaluesaresubprocesstimes;inthisexample,therewerenosubprocesses.)Thethirdvalue,elapsedtime,isnotsimplythesumofUserCPUtimeandSystemCPUtime.Ifyourcomputerisbusyrunningotherprocesses,thenmoreclocktimewillelapsewhileRissharingtheCPUwiththoseotherprocesses.12.17SuppressingWarningsandErrorMessagesProblemAfunctionisproducingannoyingerrormessagesorwarningmessages.Youdon’twanttoseethem.12.17SuppressingWarningsandErrorMessages|329 SolutionSurroundthefunctioncallwithsuppressMessage(...)orsuppressWarnings(...):>suppressMessage(annoyingFunction())>suppressuWarnings(annoyingFunction())DiscussionIusetheadf.testfunctionquiteabit.However,itproducesanannoyingwarningmessage,shownhereatthebottomoftheoutput,whenthep-valueisbelow0.01:>library(tseries)>adf.test(x)AugmentedDickey-FullerTestdata:xDickey-Fuller=-4.3188,Lagorder=4,p-value=0.01alternativehypothesis:stationaryWarningmessage:Inadf.test(x):p-valuesmallerthanprintedp-valueFortunately,IcanmuzzlethefunctionbycallingitinsidesuppressWarnings(...):>suppressWarnings(adf.test(x))AugmentedDickey-FullerTestdata:xDickey-Fuller=-4.3188,Lagorder=4,p-value=0.01alternativehypothesis:stationaryNoticethatthewarningmessagedisappeared.ThemessageisnotentirelylostbecauseRretainsitinternally.Icanretrievethemessageatmyleisurebyusingthewarningsfunction:>warnings()Warningmessage:Inadf.test(x):p-valuesmallerthanprintedp-valueSomefunctionsalsoproduce“messages”(inRterminology),whichareevenmorebenignthanwarnings.Typically,theyaremerelyinformativeandnotsignalsofproblems.Ifsuchamessageisannoyingyou,callthefunctioninsidesuppressMessages(...),andthemessagewilldisappear.SeeAlsoSeetheoptionsfunctionforotherwaystocontrolthereportingoferrorsandwarnings.330|Chapter12:UsefulTricks 12.18TakingFunctionArgumentsfromaListProblemYourdataiscapturedinaliststructure.Youwanttopassthedatatoafunction,butthefunctiondoesnotacceptalist.SolutionInsimplecases,convertthelisttoavector.Formorecomplexcases,thedo.callfunc-tioncanbreakthelistintoindividualargumentsandcallyourfunction:>do.call(function,list)DiscussionIfyourdataiscapturedinavector,lifeissimpleandmostRfunctionsworkasexpected:>vec<-c(1,3,5,7,9)>mean(vec)[1]5Ifyourdataiscapturedinalist,thensomefunctionscomplainandreturnauselessresult,likethis:>numbers<-list(1,3,5,7,9)>mean(numbers)[1]NAWarningmessage:Inmean.default(numbers):argumentisnotnumericorlogical:returningNAThenumberslistisasimple,one-levellist,sowecanjustconvertittoavectorandcallthefunction:>mean(unlist(numbers))wnloadfromWow!eBooko[1]5DThebigheadachescomewhenyouhavemultilevelliststructures:listswithinlists.Thesecanoccurwithincomplexdatastructures.Hereisalistoflistsinwhicheachsublistisacolumnofdata:>lists<-list(col1=list(7,8,9),col2=list(70,80,90),col3=list(700,800,900))Supposewewanttoformthisdataintoamatrix.Thecbindfunctionissupposedtocreatedatacolumns,butitgetsconfusedbytheliststructureandreturnssomethinguseless:>cbind(lists)listscol1List,3col2List,3col3List,312.18TakingFunctionArgumentsfromaList|331 Ifweunlistthedatathenwejustgetonebig,longcolumn:>cbind(unlist(lists))[,1]col117col128col139col2170col2280col2390col31700col32800col33900Thesolutionistousedo.call,whichsplitsthelistintoindividualitemsandthencallscbindonthoseitems:>do.call(cbind,lists)col1col2col3[1,]770700[2,]880800[3,]990900Usingdo.callinthatwayisfunctionallyidenticaltocallingcbindlikethis:>cbind(lists[[1]],lists[[2]],lists[[3]])[,1][,2][,3][1,]770700[2,]880800[3,]990900Becarefulifthelistelementshavenames.Inthatcase,do.callinterpretstheelementnamesasnamesofparameterstothefunction,whichmightcausetrouble.Thisrecipepresentsthemostbasicuseofdo.call.Thefunctionisquitepowerfulandhasmanyotheruses.Seethehelppageformoredetails.SeeAlsoSeeRecipe5.33forconvertingbetweendatatypes.12.19DefiningYourOwnBinaryOperatorsProblemYouwanttodefineyourownbinaryoperators,makingyourRcodemorestreamlinedandreadable.SolutionRrecognizesanytextbetweenpercentsigns(%...%)asabinaryoperator.Createanddefineanewbinaryoperatorbyassigningatwo-argumentfunctiontoit.332|Chapter12:UsefulTricks DiscussionRcontainsaninterestingfeaturethatletsyoudefineyourownbinaryoperators.Anytextbetweentwopercentagesigns(%...%)isautomaticallyinterpretedbyRasabinaryoperator.Rpredefinesseveralsuchoperators,suchas%/%forintegerdivisionand%*%formatrixmultiplication.Youcancreateanewbinaryoperatorbyassigningafunctiontoit.Thisexamplecreatesanoperator,%+-%:>'%+-%'<-function(x,margin)x+c(-1,+1)*marginTheexpressionx%+-%mcalculatesx±m.Hereitcalculates100±(1.96×15),thetwo-standard-deviationrangeofastandardIQtest:>100%+-%(1.96*15)[1]70.6129.4Noticethatwequotethebinaryoperatorwhendefiningitbutnotwhenusingit.Thepleasureofdefiningyourownoperatorsisthatyoucanwrapcommonlyusedoperationsinsideasuccinctsyntax.Ifyourapplicationfrequentlyconcatenatesstringswithoutaninterveningblank,thenyoumightdefineabinaryconcatenationoperatorforthatpurpose:>'%+%'<-function(s1,s2)paste(s1,s2,sep="")>"Hello"%+%"World"[1]"HelloWorld">"limit="%+%round(qnorm(1-0.05/2),2)[1]"limit=1.96"Adangerofdefiningyourownoperatorsisthatthecodebecomeslessportabletootherenvironments.Bringthedefinitionsalongwiththecodeinwhichtheyareused;otherwise,Rwillcomplainaboutundefinedoperators.Alluser-definedoperatorshavethesameprecedenceandarelistedcollectivelyinTa-ble2-1as%any%.Theirprecedenceisfairlyhigh:higherthanmultiplicationanddivisionbutlowerthanexponentiationandsequencecreation.Asaresult,it’seasytomisexpressyourself.Ifweomitparenthesesfromthe"%+-%"example,wegetanunexpectedresult:>100%+-%1.96*15[1]1470.61529.4Rinterpretedtheexpressionas(100%+-%1.96)*15.SeeAlsoSeeRecipe2.11formoreaboutoperatorprecedenceandRecipe2.12forhowtodefineafunction.12.19DefiningYourOwnBinaryOperators|333

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭