资源描述:
《Memphis Finding and Fixing NUMA-related孟菲斯:寻找和修复NUMA相关 多核平台的性能问题》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、Memphis:FindingandFixingNUMA-relatedPerformanceProblemsonMulti-corePlatformsCollinMcCurdyandJeffreyVetterFutureTechnologiesGroupOakRidgeNationalLaboratoryOakRidgeTN,USA{cmccurdy,vetter}@ornl.govAbstract—Untilrecently,mosthigh-endscientificapplicationsHowever,twotrendsinmicro
2、processordesignarebringinghavebeenimmunetoperformanceproblemscausedbyNon-NUMAtoSMPs.FirstisthemovementtowardputtingtheUniformMemoryAccess(NUMA).However,currenttrendsinmemorycontrolleronthesamedieastheCPU,beginningwithmicro-processordesignarepushingNUMAtosmallerandAMD’sOptero
3、nprocessor[2],andcontinuingwithIBM’ssmallerscales.ThispaperexaminesthecurrentstateofNUMAPower5[3]and,mostrecently,Intel’sNehalem[4].On-chipandmakesseveralcontributions.First,wesummarizethememorycontrollerscansubstantiallyincreasememoryperformanceproblemsthatNUMAcanpresentfor
4、multi-performancebyreducingthenumberofchipboundarythreadedapplicationsanddescribemethodsofaddressingthem.crossingsrequiredbymemoryreferencesthatmissincacheSecond,wedemonstratethatNUMAcanindeedbeasignificantfromtwo(oncewhencrossingthememorybustogettotheproblemforscientificapp
5、lications,showingthatitcanmeanthememorycontroller,andthenagaintogettoDRAM)toone.differencebetweenanapplicationscalingperfectlyandfailingtoscaleatall.Third,wedescribe,inincreasingorderofusefulness,Removingthememorybus,however,makesthedesignofathreemethodsofusinghardwareperfor
6、mancecounterstoaidintrueSMPplatform–historicallybuiltontopofthememoryfindingNUMA-relatedproblems.Finally,weintroducebus,withmultipleprocessorsononesideandmemoryontheMemphis,adata-centrictoolsetthatusesInstructionBasedother–muchmorechallenging,ifnotimpossible.Multi-Samplingto
7、helppinpointproblematicmemoryaccesses,andprocessorsbuiltoutofCPUswithon-chipmemorycontrollersdemonstratehowweusedittoimprovetheperformanceoftypicallyimplementasharedphysicaladdressspaceviaaseveralproduction-levelcodes–HYCOM,XGC1andCAM–bypoint-to-pointnetworkbetweenprocessing
8、elements,anda13%,23%and24%respectively.modifiedcache-coherenceprotocolthate