欢迎来到天天文库
浏览记录
ID:34134714
大小:462.69 KB
页数:14页
时间:2019-03-04
《Large-Scale Parallel Statistical Forecasting Computations in R.pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、Large-ScaleParallelStatisticalForecastingComputationsinRMurrayStokelyFarzanRohaniEricTassoneAbstractWedemonstratetheutilityofmassivelyparallelcomputationalinfrastructureforstatisticalcomputingusingtheMapReduceparadigmforR.Thisframeworkallowsuserstowritecom-putationsinahigh-l
2、evellanguagethatarethenbrokenupanddistributedtoworkertasksinGoogledatacenters.Resultsarecollectedinascalable,distributeddatastoreandreturnedtotheinteractiveusersession.Weapplyourapproachtoaforecastingapplicationthattsavarietyofmodels,prohibitingananalyticaldescriptionofthestat
3、isticaluncertaintyassociatedwiththeoverallforecast.Toovercomethis,wegeneratesimulation-baseduncer-taintybands,whichnecessitatesalargenumberofcomputationallyintensiverealizations.Ourtechniquecuttotalruntimebyafactorof300.Distributingthecomputationacrossmanymachinespermitsanalyst
4、stofocusonstatisticalissueswhileansweringquestionsthatwouldbeintractablewithoutsignicantparallelcomputationalinfrastructure.Wepresentreal-worldperformancecharacteristicsfromourapplicationtoallowpractitionerstobetterunderstandthenatureofmassivelyparallelstatisticalsimulationsin
5、R.KeyWords:StatisticalComputing,R,forecasting,timeseries,parallelism1.IntroductionLarge-scalestatisticalcomputinghasbecomewidespreadatInternetcompaniesinrecentyears,andtherapidgrowthofavailabledatahasincreasedtheimportanceofscalingthetoolsfordataanalysis.Signicantprogresshasbe
6、enmadeindesign-ingdistributedsystemstotakeadvantageofmassiveclustersofsharedmachinesforlong-runningbatchjobs,butthedevelopmentofhigher-levelabstractionsandtoolsforinteractivestatisticalanalysisusingthisinfrastructurehaslagged.Itisparticularlyvitalthatanalystsareabletoiteratequi
7、cklywhenengagedindataexploration,modeltting,andvisualizationontheseverylargedatasets.Supportinginteractiveanalysisofdatasetsthatarefarlargerthanavailablememoryanddiskspaceonasinglemachinerequiresahighdegreeofparallelism.AtGoogle,parallelismisimplementedusingsharedclustersofcom
8、moditymachines[5].Thispaperdescribesastatisticalcomput
此文档下载收益归作者所有