欢迎来到天天文库
浏览记录
ID:40402968
大小:435.04 KB
页数:20页
时间:2019-08-01
《Spark-for-scala-meeting》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、SPARK@CrazyJvmWhatisSpark●afastandgeneral-purposeclustercomputingsystem.●highefficiency●highlevelapi(Scala,Java,Python)Howtorun●Local●Standalone●Mesos●YARNandaninteractiveshell(Scalasupported)AboutScala●JVMbased●Staticallytyped●InteroperatewithJava(vice-versa)tryininteractive
2、shell!Themostimportantconcept:RDDRDDs:resilientdistributeddatasetsinternally,eachRDDischaracterizedbyfivemainproperties:*-Alistofpartitions*-Afunctionforcomputingeachsplit*-AlistofdependenciesonotherRDDs*-Optionally,aPartitionerforkey-valueRDDs(e.g.tosaythattheRDDishash-parti
3、tioned)*-Optionally,alistofpreferredlocationstocomputeeachspliton(e.g.blocklocationsforanHDFSfile)Themostimportantconcept:RDD●immutablecollectionsofobjectsspreadacrossacluster●buildthroughparalleltransformations●automaticallyrebuildonfailure●differentstoragelevel(memorymanage
4、ment)Overview●RDDs●Transformations(Lazyevaluation!!!)●Action(defrunJob[T,U:ClassManifest](rdd:RDD[T],func:Iterator[T]=>U):Array[U])RDD:transformations&actionsSparkruntimeComponentsJustdoit●interactiveshell●localmode(getlocaldata)●standalonemode(getdatafromhdfs)●programminginI
5、DE(eclipse,idea)WordCountvaltext=sc.textFile(“README.md”)valwc=text.flatMap(_.split(“”)).map((_,1)).reduceByKey(_+_)wc.collectnotice:reduceByKeyiscalledbyimplicitconversionimplicitdefrddToPairRDDFunctions[K:ClassManifest,V:ClassManifest](rdd:RDD[(K,V)])=newPairRDDFunctions(rd
6、d)ifweletwc.cache,whatwillhappen?RDDLineage●Narrowdependency:eachpartitionoftheparentRDDisusedbyatmostonepartitionofthechildRDD.●Widedependency:multiplechildpartitionsmaydependonapartitionofparentRDD.RDDLineage●optimization?->pipeline●theimportanceofco-partitionedTaskschedule
7、r●rungeneraltaskgraphs●pipelinefunctionswherepossible●Cache-awaredatareuseandlocality●Partitioning-awaretoavoidshufflesTaskschedulerSchedulingProcessScheduleprocess●RDDobjects●DAGScheduler●TaskScheduler●WorkerRDDfaulttolerance●recoverybylineage●checkpointQ&Athanks!
此文档下载收益归作者所有
点击更多查看相关文章~~