欢迎来到天天文库
浏览记录
ID:34592739
大小:1.12 MB
页数:55页
时间:2019-03-08
《Performance Tuning for CPU(Marat Dukhan).pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、PerformanceTuningforCPUPart3:MemoryOptimizationsMaratDukhanExample0:ListTraversalsize_ttraverse_list(constlist_node*list){size_tlist_length=0;while(list!=0){list=list->next;list_length+=1;}returnlist_length;}Example0:ListTraversalsize_ttraverse_list(constlist_node*list){size_tlist_l
2、ength=0;while(list!=0){list=listO(N)?->next;list_length+=1;}returnlist_length;}Example0:ListTraversalsize_ttraverse_list(constlist_node*list){size_tlist_lengthO(N)?=0;while(list!=0){list=list->next;O(anything)!list_length+=1;}returnlist_length;}Example0:ListTraversalMemorySubsystem:
3、OverviewImagefromwww.anandtech.com/show/2960/2MemorySubsystem:OverviewImagefromwww.anandtech.com/show/2658MemorySubsystem:OverviewImagefromwww.anandtech.com/show/2960/2CacheHierarchy:OverviewThreelevelsofcache●Level1(thefastest,percore)●Level2(percore)●Level3(akaLast-LevelCacheinInt
4、eldocs,sharedbetweenallcores)Imagefromwww.anandtech.com/show/2594/9CacheHierarchy:Level1●Separatedataandinstructioncaches●L1datacache:○Almostasfastasregisters○Thelowestlatencyofallcaches○Thehighestbandwidth○Throughput:atleast1loadpercycle■Oftenmore●1load+1storeonNehalem○Latency:3-4c
5、ycles●L1instructioncache:○Intendedonlyformachinecode(notdata)■Read-only■DonotmixcodeanddatainthesamesectionCacheHierarchy:PerformanceRememberlatencyandthroughput●Latency=howlongtowaitfortheresult●Throughput=howmuchworkpersecondorCPUcyclecanbedoneBacktothememoryhierarchy●Latency=then
6、umberofcycles(ornanoseconds)tobringtherequesteddata●Throughput=howmanymegabytespersecondcanbereadorwrittenCacheHierarchy:LatencyLatencyalwaysincreasesfromlower-levelcachetoRAM.E.g.onNehalem:●4cyclesforL1cache●10cyclesforL2cache●17cyclesforL3cache●198(!)cyclesforRAMThisdataisfromuser
7、s.atw.hu/instlatx64/GenuineIntel00206C1_Gulftown_MemLatX64.txtCacheHierarchy:ThroughputThroughputdecreasesfromlower-levelcachetoRAM.However,thechangesarenotassharpaswithlatency.E.g.onNehalem:●32bytes/cycleforL1○116-byteread+116-bytewrite●32bytes/cycleforL2○Onaverage○Delivers64bytese
8、veryothercycle●Unkn
此文档下载收益归作者所有