Spark-for-scala-meeting

Spark-for-scala-meeting

ID:40402968

大小:435.04 KB

页数:20页

时间:2019-08-01

Spark-for-scala-meeting_第1页
Spark-for-scala-meeting_第2页
Spark-for-scala-meeting_第3页
Spark-for-scala-meeting_第4页
Spark-for-scala-meeting_第5页
资源描述:

《Spark-for-scala-meeting》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库

1、SPARK@CrazyJvmWhatisSpark●afastandgeneral-purposeclustercomputingsystem.●highefficiency●highlevelapi(Scala,Java,Python)Howtorun●Local●Standalone●Mesos●YARNandaninteractiveshell(Scalasupported)AboutScala●JVMbased●Staticallytyped●InteroperatewithJava(vice-versa)tryininteractive

2、shell!Themostimportantconcept:RDDRDDs:resilientdistributeddatasetsinternally,eachRDDischaracterizedbyfivemainproperties:*-Alistofpartitions*-Afunctionforcomputingeachsplit*-AlistofdependenciesonotherRDDs*-Optionally,aPartitionerforkey-valueRDDs(e.g.tosaythattheRDDishash-parti

3、tioned)*-Optionally,alistofpreferredlocationstocomputeeachspliton(e.g.blocklocationsforanHDFSfile)Themostimportantconcept:RDD●immutablecollectionsofobjectsspreadacrossacluster●buildthroughparalleltransformations●automaticallyrebuildonfailure●differentstoragelevel(memorymanage

4、ment)Overview●RDDs●Transformations(Lazyevaluation!!!)●Action(defrunJob[T,U:ClassManifest](rdd:RDD[T],func:Iterator[T]=>U):Array[U])RDD:transformations&actionsSparkruntimeComponentsJustdoit●interactiveshell●localmode(getlocaldata)●standalonemode(getdatafromhdfs)●programminginI

5、DE(eclipse,idea)WordCountvaltext=sc.textFile(“README.md”)valwc=text.flatMap(_.split(“”)).map((_,1)).reduceByKey(_+_)wc.collectnotice:reduceByKeyiscalledbyimplicitconversionimplicitdefrddToPairRDDFunctions[K:ClassManifest,V:ClassManifest](rdd:RDD[(K,V)])=newPairRDDFunctions(rd

6、d)ifweletwc.cache,whatwillhappen?RDDLineage●Narrowdependency:eachpartitionoftheparentRDDisusedbyatmostonepartitionofthechildRDD.●Widedependency:multiplechildpartitionsmaydependonapartitionofparentRDD.RDDLineage●optimization?->pipeline●theimportanceofco-partitionedTaskschedule

7、r●rungeneraltaskgraphs●pipelinefunctionswherepossible●Cache-awaredatareuseandlocality●Partitioning-awaretoavoidshufflesTaskschedulerSchedulingProcessScheduleprocess●RDDobjects●DAGScheduler●TaskScheduler●WorkerRDDfaulttolerance●recoverybylineage●checkpointQ&Athanks!

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
相关文章
更多
相关标签