欢迎来到天天文库
浏览记录
ID:39715294
大小:2.89 MB
页数:28页
时间:2019-07-09
《Hadoop Framework for Data 》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、1Hadoop:AFrameworkforData-IntensiveDistributedComputingCS561-Spring2012WPI,MohamedY.Eltabakh2WhatisHadoop?•Hadoopisasoftwareframeworkfordistributedprocessingoflargedatasetsacrosslargeclustersofcomputers•Hadoopisopen-sourceimplementationforGoogleMapReduce•Hadoopisbasedonasimpleprogrammin
2、gmodelcalledMapReduce•Hadoopisbasedonasimpledatamodel,anydatawillfit•Hadoopframeworkconsistsontwomainlayers•Distributedfilesystem(HDFS)•Executionengine(MapReduce)3HadoopInfrastructure•Hadoopisadistributedsystemlikedistributeddatabases•However,thereareseveralkeydifferencesbetweenthetwoin
3、frastructures•Datamodel•Computingmodel•Costmodel•Designobjectives4HowDataModelisDifferent?DistributedDatabasesHadoop•Dealwithtablesandrelations•Dealwithflatfilesinanyformat•Musthaveaschemafordata•Noschemafordata•Datafragmentation&partitioning•Filesaredivideautomaticallyintoblocks5HowCom
4、putingModelisDifferent?DistributedDatabasesHadoop•Notionofatransaction•TransactionpropertiesACID•Notionofajobdividedintotasks•Map-Reducecomputingmodel•Distributedtransaction•Everytaskiseitheramaporreduce6Hadoop:BigPictureHigh-levellanguagesExecutionengineDistributedlight-weightDBCentral
5、izedtoolforcoordinationDistributedFilesystemHDFS+MapReduceareenoughtohavethingsworking7WhatisNext?•HadoopDistributedFileSystem(HDFS)•MapReduceLayer•Examples•WordCount•Join•FaultToleranceinHadoop8HDFS:HadoopDistributedFileSystem!Singlenamenodeandmanydatanodes!Namenodemaintainsthefilesyst
6、emmetadata!Filesaresplitintofixedsizedblocksandstoredondatanodes(Default64MB)!Datablocksarereplicatedforfaulttoleranceandfastaccess(Defaultis3)!Datanodesperiodicallysendheartbeatstonamenode•HDFSisamaster-slavearchitecture•Master:namenode•Slaves:datanodes(100sor1000sofnodes)9HDFS:DataPla
7、cementandReplicationDatanodescanbeorganizedintoracks•Defaultplacementpolicy:Wheretoputagivenblock?•Firstcopyiswrittentothenodecreatingthefile(writeaffinity)•Secondcopyiswrittentoadatanodewithinthesamerack•Thirdcopyiswrittentoadatanodeinadifferentrack•Objectives:loadbalancing,fa
此文档下载收益归作者所有