欢迎来到天天文库
浏览记录
ID:40850575
大小:169.00 KB
页数:36页
时间:2019-08-08
《Tuning a Monte Carlo Algorithm on GPUs》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、TechnicalNewsfromThePortlandGroupPGIHomePageFebruary2010TuningaMonteCarloAlgorithmonGPUsbyMathewColgrove,PGICustomerSupportEngineeringSincetheintroductionofPGICUDAFortranlatelastyear,we'veseenadramaticriseinthenumberofcustomersusingthisnewextensiontotheFortranlanguage.Asthemoderatorofthe
2、PGIUserForum,Ihavebeenverybusyansweringquestionsaboutthelanguage,andnotingthosequestionsthatseemtobeaskedoftenormaybeofinteresttothewidercommunity.ForthisinstallmentofthePGInsider,IhaveimplementedtheMonteCarloIntegrationalgorithmtohighlightsomeofthetips,tricks,andtrapsofprogrammingforthe
3、GPU.MonteCarloIntegrationFormysamplecodeIchoseasimpleMonteCarloIntegrationalgorithmtocomputetheapproximatevalueofpi.Thecodefirstcreatesalistofrandompointswithinasquare.Eachpointisevaluatedusingthefunction:f(x,y)=(x^2+y^2<1)?1:0Thepointsarethensummed.Theapproximatevalueofpicanthenbecalcul
4、atedbymultiplyingfourtimesthevolumeofthesquarewiththemeanvalueforf(x,y).Thecodeitselfisverysimpletounderstandandthealgorithmishighlyparallelbecauseeachf(x,y)calculationcanbeperformedindependently.Besidesthis,thealgorithmusesasumreductionandrequiresarandomnumbergenerator(RNG).Bothofwhichp
5、resentinterestingproblems.ForthefollowingexamplessourcecodeisavailablefordownloadfromthePGIwebsite.HereisabasicFortranimplementationofaMonteCarloIntegrationalgorithm.!Performthefunctionf(x,y)=(x^2+y^2<1)?1:0doi=1,NtempVal=X(i)*X(i)+Y(i)*Y(i)if(tempVal<1)thentemp(i)=1elsetemp(i)=0endifend
6、do!SumtheresultssumA=0sumSq=0doi=1,NsumA=sumA+temp(i)sumSq=sumSq+(temp(i)*temp(i))enddo!calulatethemeanmeanA=sumA/real(N);meanSq=sumSq/real(N);!approximatepiresults%estimate=meanA*volume*4results%variance=(meanSq-meanA*meanA)/(N-1)BaselinePerformanceFirst,let'sstartwiththehostversionofth
7、ecodetogetourbaselineperformance.Wecompiledthecodewithauto-parallelizationenabledandranwithfourthreads.Thesystemwe'reusingisanIntelCorei7(singlesocketfourcoreNehalem)withanattachedNVIDIAS1070(fourTeslaC1060cards).ThecompilerversionusedisPGI10.2.(Foralistingoftheflagsusedi
此文档下载收益归作者所有