Hive Quick Start Tutorial.pdf

Hive Quick Start Tutorial.pdf

ID:34520671

大小:577.04 KB

页数:36页

时间:2019-03-07

Hive Quick Start Tutorial.pdf_第1页
Hive Quick Start Tutorial.pdf_第2页
Hive Quick Start Tutorial.pdf_第3页
Hive Quick Start Tutorial.pdf_第4页
Hive Quick Start Tutorial.pdf_第5页
资源描述:

《Hive Quick Start Tutorial.pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库

1、HiveQuickStart©2010Cloudera,Inc.Background•StartedatFacebook•DatawascollectedbynightlycronjobsintoOracleDB•“ETL”viahand-codedpython•Grewfrom10sofGBs(2006)to1TB/daynewdata(2007),now10xthat.©2010Cloudera,Inc.HadoopasEnterpriseDataWarehouse•ScribeandMySQLdataloadedintoHadoopHDFS•HadoopM

2、apReducejobstoprocessdata•Missingcomponents:–Command-lineinterfacefor“endusers”–Ad-hocquerysupport•…withoutwritingfullMapReducejobs–Schemainformation©2010Cloudera,Inc.HiveApplications•Logprocessing•Textmining•Documentindexing•Customer-facingbusinessintelligence(e.g.,GoogleAnalytics)•

3、Predictivemodeling,hypothesistesting©2010Cloudera,Inc.HiveArchitecture©2010Cloudera,Inc.DataModel•Tables–Typedcolumns(int,float,string,date,boolean)–Also,array/map/structforJSON-likedata•Partitions–e.g.,torange-partitiontablesbydate•Buckets–Hashpartitionswithinranges(usefulforsampling

4、,joinoptimization)©2010Cloudera,Inc.ColumnDataTypesCREATETABLEt(sSTRING,fFLOAT,aARRAY>);SELECTs,f,a[0][‘foobar’].p2FROMt;©2010Cloudera,Inc.Metastore•Database:namespacecontainingasetoftables•HoldsTable/Partitiondefinitions(columntypes,mappingstoHDFSdir

5、ectories)•Statistics•ImplementedwithDataNucleusORM.RunsonDerby,MySQL,andmanyotherrelationaldatabases©2010Cloudera,Inc.PhysicalLayout•WarehousedirectoryinHDFS–e.g.,/user/hive/warehouse•Tablerowdatastoredinsubdirectoriesofwarehouse•Partitionsformsubdirectoriesoftabledirectories•Actuald

6、atastoredinflatfiles–Controlchar-delimitedtext,orSequenceFiles–WithcustomSerDe,canusearbitraryformat©2010Cloudera,Inc.InstallingHiveFromaReleaseTarball:$wgethttp://archive.apache.org/dist/hadoop/hive/hive-0.5.0/hive-0.5.0-bin.tar.gz$tarxvzfhive-0.5.0-bin.tar.gz$cdhive-0.5.0-bin$expor

7、tHIVE_HOME=$PWD$exportPATH=$HIVE_HOME/bin:$PATH©2010Cloudera,Inc.InstallingHiveBuildingfromSource:$svncohttp://svn.apache.org/repos/asf/hadoop/hive/trunkhive$cdhive$antpackage$cdbuild/dist$exportHIVE_HOME=$PWD$exportPATH=$HIVE_HOME/bin:$PATH©2010Cloudera,Inc.InstallingHiveOtherOption

8、s:•UseaGitMi

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。