资源描述:
《hadoop平台的mapreduce模型性能优化的研究》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、ABSTRACTABSTRACTWiththeextremelyincreasingdatavolumeandtheburdenofdataaccessinBIGDATAera,thedemandontheperformanceofcomputationissoaring.Asaneffectivesolution,CloudComputationhasgainedrapiddevelopmentsinceitwasproposed.CloudComputationhasledInformation-technologytoanewarenabecause
2、ofitsalmostinfinitestoragecapacityandcomputationpower.HadoophasalsogainedwideaknowlegementandapplicationsasacurrentmainplatformofCloudComputation.Hadoopisahigh-performancedataprocessingplatformwithhighusability,goodscalability,andexcellentextensibility,andalsohastheadvantagesoflow
3、costandopensource.TherealizationofHadoophastwocores:HDFS(HadoopDistributedFileSystem)andMapReduce.HDFSisadistributedfilesystemsupportingsuperbigfiles,stream-lineaccess,andhighthroughput;MapRedueceisarapidparallelprogrammingmodelwhichhyalinizestherealizationofparallelaccessandonlyp
4、rovidesuserswithsimpleinterfaces.Firstly,thisdissertationintroducesthebackgroundofHadoopplatform,includingthestartandthedevelopmentintechnologiesareas,andtheapplicationandtheprospectinapplicationareas.Then,thecrucialtechnologiesinHadoopplatform---HDFS,MapReduceandSchedulerarestudi
5、ed.Basedonthepreviouspreliminary,thisdissertationindicatesthreeoptimizinglevels,i.e.,programlevel,parameterlevel,andsystemlevel.Therearemanyoptionalchoicesinsystemlevelandparameterlevel,whichwillbeelaboratedinChapterThree.TheresourcemanagementinHadoopbindsthememoryandCPUresources,
6、andthendividestheresourcesintoMapSlotmodelandReduceSlotmodelbasedontasktype.Therealizationofthismechanismissimple,buthasthedrawbacksofresourcehoardingandlowutilizationrate.ChapterFourofthisdissertationdefinetworesourcemodels---memSlotandcupSlot,tounbindtheresources.Theproposedmode
7、lsassignresourcesbasedonrealisticrequirementsofMap/Reduce.InaclusterofsevenPCs,theproposedschemeachieves3.5%memoryand4.3%CPUutilizationimprovementswhen21GBlogdatawasprocessed,showingtheeffectivenesstosolvethehoardingphenomenon.BecauseMapReducewillproducealotofsortingthatareusually
8、recursive,theperformanceconsumpti