资源描述:
《Hive Quick Start Tutorial.pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、HiveQuickStart©2010Cloudera,Inc.Background•StartedatFacebook•DatawascollectedbynightlycronjobsintoOracleDB•“ETL”viahand-codedpython•Grewfrom10sofGBs(2006)to1TB/daynewdata(2007),now10xthat.©2010Cloudera,Inc.HadoopasEnterpriseDataWarehouse•ScribeandMySQLdataloadedintoHadoopHDFS•HadoopM
2、apReducejobstoprocessdata•Missingcomponents:–Command-lineinterfacefor“endusers”–Ad-hocquerysupport•…withoutwritingfullMapReducejobs–Schemainformation©2010Cloudera,Inc.HiveApplications•Logprocessing•Textmining•Documentindexing•Customer-facingbusinessintelligence(e.g.,GoogleAnalytics)•
3、Predictivemodeling,hypothesistesting©2010Cloudera,Inc.HiveArchitecture©2010Cloudera,Inc.DataModel•Tables–Typedcolumns(int,float,string,date,boolean)–Also,array/map/structforJSON-likedata•Partitions–e.g.,torange-partitiontablesbydate•Buckets–Hashpartitionswithinranges(usefulforsampling
4、,joinoptimization)©2010Cloudera,Inc.ColumnDataTypesCREATETABLEt(sSTRING,fFLOAT,aARRAY
5、ectories)•Statistics•ImplementedwithDataNucleusORM.RunsonDerby,MySQL,andmanyotherrelationaldatabases©2010Cloudera,Inc.PhysicalLayout•WarehousedirectoryinHDFS–e.g.,/user/hive/warehouse•Tablerowdatastoredinsubdirectoriesofwarehouse•Partitionsformsubdirectoriesoftabledirectories•Actuald
6、atastoredinflatfiles–Controlchar-delimitedtext,orSequenceFiles–WithcustomSerDe,canusearbitraryformat©2010Cloudera,Inc.InstallingHiveFromaReleaseTarball:$wgethttp://archive.apache.org/dist/hadoop/hive/hive-0.5.0/hive-0.5.0-bin.tar.gz$tarxvzfhive-0.5.0-bin.tar.gz$cdhive-0.5.0-bin$expor
7、tHIVE_HOME=$PWD$exportPATH=$HIVE_HOME/bin:$PATH©2010Cloudera,Inc.InstallingHiveBuildingfromSource:$svncohttp://svn.apache.org/repos/asf/hadoop/hive/trunkhive$cdhive$antpackage$cdbuild/dist$exportHIVE_HOME=$PWD$exportPATH=$HIVE_HOME/bin:$PATH©2010Cloudera,Inc.InstallingHiveOtherOption
8、s:•UseaGitMi