欢迎来到天天文库
浏览记录
ID:40701424
大小:1.45 MB
页数:24页
时间:2019-08-06
《sparksql-Michael Armbrust》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、BeyondSQL:SpeedingupSparkwithDataFramesMichaelArmbrust-@michaelarmbrustMarch2015–SparkSummitEastAboutMeandSQLSparkSQLGraduatedfromAlphain1.3•PartofthecoredistributionsinceSpark1.0(April2014)#ofUniqueContributors#OfCommitsPerMonth150 200 150 100 100 50 50 0 0 2AboutMeandSQLSparkSQL•Pa
2、rtofthecoredistributionsinceSpark1.0(April2014)•RunsSQL/HiveQLqueries,optionallyalongsideorreplacingexistingHivedeploymentsSELECTCOUNT(*) FROMhiveTableWHEREhive_udf(data)3AboutMeandSQLSparkSQL•PartofthecoredistributionsinceSpark1.0(April2014)•RunsSQL/HiveQLqueries,optionallyalongside
3、orreplacingexistingHivedeployments•ConnectexistingBItoolstoSparkthroughJDBC4AboutMeandSQLSparkSQL•PartofthecoredistributionsinceSpark1.0(April2014)•RunsSQL/HiveQLqueries,optionallyalongsideorreplacingexistingHivedeployments•ConnectexistingBItoolstoSparkthroughJDBC•BindingsinPython,Sc
4、ala,andJava5AboutMeandSQLSparkSQL•PartofthecoredistributionsinceSpark1.0(April2014)•RunsSQL/HiveQLqueries,optionallyalongsideorreplacingexistingHivedeployments•ConnectexistingBItoolstoSparkthroughJDBC•BindingsinPython,Scala,andJava@michaelarmbrust•LeaddeveloperofSparkSQL@databricks6T
5、henot-so-secrettruth...SQLisnotaboutSQL.7ExecutionEnginePerformanceTPC-DSPerformance450400350300250Shark200SparkSQL15010050037192734424346525355596368737989988Thenot-so-secrettruth...SQLisaboutmorethanSQL.9SparkSQL:ThewholestoryCreatingandRunningSparkProgramsFaster:•Writelesscode•Rea
6、dlessdata•Lettheoptimizerdothehardwork10DataFramenoun–[dey-tuh-freym]1.Adistributedcollectionofrowsorganizedintonamedcolumns.2.Anabstractionforselecting,filtering,aggregatingandplottingstructureddata(cf.R,Pandas).3.Archaic:PreviouslySchemaRDD(cf.Spark<1.3).11WriteLessCode:Input&Outpu
7、tSparkSQL’sDataSourceAPIcanreadandwriteDataFramesusingavarietyofformats.Built-InExternal{JSON}JDBCandmore…12WriteLessCode:High-LevelOperationsCommonoperationscanbeexpressedconciselyascallstotheDataFrameAPI:•Selectingrequiredcolumns•Joiningdifferentdatasources•Aggregation(count,sum,ave
8、rage,etc)•Fi
此文档下载收益归作者所有
点击更多查看相关文章~~