欢迎来到天天文库
浏览记录
ID:7296235
大小:949.00 KB
页数:26页
时间:2018-02-10
《r statistical application development exploratory analysis》由会员上传分享,免费在线阅读,更多相关内容在工程资料-天天文库。
1、4ExploratoryAnalysisTukey(1977)inhisbenchmarkbookExploratoryDataAnalysis,abbreviatedpopularlyasEDA,describes"bestmethods"as:Wedonotguaranteetointroduceyoutothe"best"tools,particularlysincewearenotsurethattherecanbeuniquebests.Thegoalofthischapteristoemp
2、hasizeonEDAanditsstrengths.Inthepreviouschapter,wehaveseenvisualizationtechniquesfordataofdifferentcharacteristics.AnalyticalinsightisalsoimportantandthischapterconsidersEDAtechniques.Further,themorepopularmeasuresincludethemean,standarderror,andsoon.It
3、hasbeenprovedmanytimesthatthemeanhasseveraldrawbacks;oneofthembeingthatitisverysensitivetooutliers/extremes.Thus,inexploratoryanalysisthefocusisonmeasureswhicharerobusttotheextremes.ManytechniquesconsideredinthischapterarediscussedinmoredetailbyVelleman
4、andHoaglin(1981),andaneBookhasbeenkindlymadeavailableathttp://dspace.library.cornell.edu/handle/1813/62.Inthefirstsection,wewillhaveapeekattheoftenusedmeasuresforexploratoryanalysis.Themainlearningsfromthischapterarelistedasfollows:Summarystatisticsbase
5、donmediananditsvariants,whicharerobusttooutliersVisualizationtechniquesinstem-and-leaf,lettervalues,andbagplotsFirstregressionmodelinResistantlineandrefinedmethodsinsmoothingdataandmedianpolishExploratoryAnalysisEssentialsummarystatisticsWehaveseenusefu
6、lsummarystatisticsofmeanandvarianceintheDiscretedistributionsandContinuousdistributionssectionsofChapter1,DataCharacteristics.Theconceptsthereinhavetheirownutilityvalue.Thedrawbackofsuchstatisticalmetricsisthattheyareverysensitivetooutliers,inthesenseth
7、atasingleobservationmaycompletelydistorttheentirestory.Inthissection,wediscusssomeexploratoryanalysismetricswhichareintuitiveandmorerobustthanthemetricssuchasmeanandvariance.Percentiles,quantiles,andmedianForagivendatasetandanumber08、ledividesthedatasetintotwopartitionswith100k%ofthevaluesbelowitand100(1-k)percentofthevaluesaboveit.Thefractionkisreferredasaquantile.InStatistics,quantilesareusedmoreoftenthanpercentiles.Thedifferencebeingthatthequantilesvaryove
8、ledividesthedatasetintotwopartitionswith100k%ofthevaluesbelowitand100(1-k)percentofthevaluesaboveit.Thefractionkisreferredasaquantile.InStatistics,quantilesareusedmoreoftenthanpercentiles.Thedifferencebeingthatthequantilesvaryove
此文档下载收益归作者所有