资源描述:
《In-Streaming Data Processing.pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、In-StreamBigDataProcessingCC:http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/IlyaKatsovTheshortcomingsanddrawbacksofbatch-orienteddataprocessingwerewidelyrecognizedbytheBigDatacommunityquitealongtimeago.Itbecameclearthatreal-timequeryprocess
2、ingandin-streamprocessingistheimmediateneedinmanypracticalapplications.Inrecentyears,thisideagotalotoftractionandawholebunchofsolutionslikeTwitter’sStorm,Yahoo’sS4,Cloudera’sImpala,ApacheSpark,andApacheTezappearedandjoinedthearmyofBigDataandNoSQLsystems.Thisarticleisane
3、fforttoexploretechniquesusedbydevelopersofin-streamdataprocessingsystems,tracetheconnectionsofthesetechniquestomassivebatchprocessingandOLTP/OLAPdatabases,anddiscusshowoneunifiedqueryenginecansupportin-stream,batch,andOLAPprocessingatthesametime.AtGridDynamics,werecentl
4、yfacedanecessitytobuildanin-streamdataprocessingsystemthataimedtocrunchabout8billioneventsdailyprovidingfault-toleranceandstricttransactioanlityi.e.noneoftheseeventscanbelostorduplicated.ThissystemhasbeendesignedtosupplementandsucceedtheexistingHadoop-basedsystemthathad
5、toohighlatencyofdataprocessingandtoohighmaintenancecosts.Therequirementsandthesystemitselfweresogenericandtypicalthatwedescribeitbelowasacanonicalmodel,justlikeanabstractproblemstatement.Ahigh-leveloverviewoftheenvironmentweworkedwithisshowninthefigurebelow:Onecanseetha
6、tthisenvironmentisatypicalBigDatainstallation:thereisasetofapplicationsthatproducetherawdatainmultipledatacenters,thedataisshippedbymeansofDataCollectionsubsystemtoHDFSlocatedinthecentralfacility,thentherawdataisaggregatedandanalyzedusingthestandardHadoopstack(MapReduce
7、,Pig,Hive)andtheaggregatedresultsarestoredinHDFSandNoSQL,importedtotheOLAPdatabaseandaccessedbycustomuserapplications.Ourgoalwastoequipallfacilitieswithanewin-streamengine(showninthebottomofthefigure)thatprocessesmostintensivedataflowsandshipsthepre-aggregateddatatothec
8、entralfacility,thusdecreasingtheamountofrawdataandheavybatchjobsinHadoop.Thedesignofthein-streamprocessingengi