资源描述:
《Automatic CPU-GPU CommunicationManagement and Optimization自动CPU-GPU通信 管理与优化》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、AutomaticCPU-GPUCommunicationManagementandOptimizationThomasB.JablinPrakashPrabhuJamesA.JablinyNickP.JohnsonStephenR.BeardDavidI.AugustPrincetonUniversity,Princeton,NJyBrownUniversity,Providence,RIftjablin,pprabhu,npjohnso,sbeard,augustg@cs.princeton.edujjablin@cs.brown.eduAb
2、stractmemories.Unfortunately,notallcommunicationmanagementisefficient;cycliccommunicationpatternsarefrequentlyordersofTheperformancebenefitsofGPUparallelismcanbeenormous,magnitudeslowerthanacyclicpatterns[15].Transformingcyclicbutunlockingthisperformancepotentialischallenging.T
3、heap-communicationpatternstoacyclicpatternsiscalledOptimizingplicabilityandperformanceofGPUparallelizationsislimitedbyCommunication.Na¨ıvelycopyingdatatoGPUmemory,spawn-thecomplexitiesofCPU-GPUcommunication.ToaddresstheseingaGPUfunction,andcopyingtheresultsbacktoCPUmemorycomm
4、unicationsproblems,thispaperpresentsthefirstfullyauto-yieldscycliccommunicationpatterns.CopyingdatatotheGPUmaticsystemformanagingandoptimizingCPU-GPUcommunca-inthepreheader,spawningmanyGPUfunctions,andcopyingthetion.Thissystem,calledtheCPU-GPUCommunicationMan-resultbacktoCPUme
5、moryintheloopexityieldsanacycliccom-ager(CGCM),consistsofarun-timelibraryandasetofcom-municationpattern.Incorrectcommunicationoptimizationcausespilertransformationsthatworktogethertomanageandoptimizeprogramstoaccessstaleorinconsistentdata.CPU-GPUcommunicationwithoutdependingo
6、nthestrengthofThispaperpresentsCPU-GPUCommunicationManagerstaticcompile-timeanalysesoronprogrammer-suppliedannota-(CGCM),thefirstfullyautomaticsystemformanagingandop-tions.CGCMeasesmanualGPUparallelizationsandimprovesthetimizingCPU-GPUcommunication.Automaticallymanagingandappl
7、icabilityandperformanceofautomaticGPUparallelizations.optimizingcommunicationincreasesprogrammerefficiencyandFor24programs,CGCM-enabledautomaticGPUparallelizationprogramcorrectness.Italsoimprovestheapplicabilityandperfor-yieldsawholeprogramgeomeanspeedupof5.36xoverthebestmance
8、ofautomaticGPUparallelization.sequentialCPU-onlyexecution.CGCMmanage