资源描述:
《学习 hive源码》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、学习Hive李建奇1学习看了一部分代码,感觉,hive比较复杂,使用场景有限,一般用hadoop原生的mapreduce就可以了。1.1版本0.61.2目的学习facebook等应用hive的经验,以便应用于公司。学习代码的目的是便于更好的应用,比如debuging,tuning.以及应用新的patch.等。2Pig+Hive:ETL+datawarehouseThedatapreparationphaseisoftenknownasETL(ExtractTransformLoad)ortheda
2、tafactory."Factory"isagoodanalogybecauseitcapturestheessenceofwhatisbeingdoneinthisstage:Justasaphysicalfactorybringsinrawmaterialsandoutputsproductsreadyforconsumers,soadatafactorybringsinrawdataandproducesdatasetsreadyfordatauserstoconsume.Rawdatais
3、loadedin,cleanedup,conformedtotheselecteddatamodel,joinedwithotherdatasources,andsoon.Usersinthisphasearegenerallyengineers,dataspecialists,orresearchers.Thedatapresentationphaseisusuallyreferredtoasthedatawarehouse.Awarehousestoresproductsreadyforcon
4、sumers;theyneedonlycomeandselecttheproperproductsoffoftheshelves.Inthisphase,usersmaybeengineersusingthedatafortheirsystems,analysts,ordecisionmakers.Giventhedifferentworkloadsanddifferentusersforeachphase,wehavefoundthatdifferenttoolsworkbestineachph
5、ase.Pig(combinedwithaworkflowsystemsuchasOozie)isbestsuitedforthedatafactory,andHiveforthedatawarehouse.1.1datawarehouseDatawarehouseusecasesInthedatawarehousephaseofprocessing,weseetwodominantusecases:business-intelligenceanalysisandad-hocqueries.Int
6、hefirstcase,usersconnectthedatatobusinessintelligence(BI)tools—suchasMicroStrategy—togeneratereportsordofurtheranalysis.Inthesecondcase,usersrunad-hocqueriesissuedbydataanalystsordecisionmakers.Inbothcases,therelationalmodelandSQLarethebestfit.Indeed,
7、datawarehousinghasbeenoneofthecoreusecasesforSQLthroughmuchofitshistory.Ithastherightconstructstosupportthetypesofqueriesandtoolsthatanalystswanttouse.Anditisalreadyinusebyboththetoolsandusersinthefield.1.1facebook的应用架构FacebookDeploymentWebServersScri
8、beMidTierProductionHive-HadoopClusterShardedMySQLScribe-HadoopClustersAdhocHive-HadoopClusterHivereplication2hive1.1ArchitecutureHiveArchitectureMetastoreQueryEngineCLIHiveThriftAPIMetastoreThriftAPIJDBC/ODBCclientsHadoopMap/Reduce+HDFSCluster