资源描述:
《《大数据分析》pdf版》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、软件学报ISSN1000-9825,CODENRUXUEWE-mail:jos@iscas.ac.cnJournalofSoftware,2012,23(1):3245[doi:10.3724/SP.J.1001.2012.04091]http://www.jos.org.cn©中国科学院软件研究所版权所有.Tel/Fax:+86-10-62562563大数据分析——RDBMS与MapReduce的竞争与共生1,2+1,21,21,2覃雄派,王会举,杜小勇,王珊1(教育部数据工程与知识工程重点实验室(中国人民大学),北京1008
2、72)2(中国人民大学信息学院,北京100872)BigDataAnalysis—CompetitionandSymbiosisofRDBMSandMapReduce1,2+1,21,21,2QINXiong-Pai,WANGHui-Ju,DUXiao-Yong,WANGShan1(MOEKeyLaboratoryofDataEngineeringandKnowledgeEngineering(RenminUniversityofChina),Beijing100872,China)2(SchoolofInformation,Renm
3、inUniversityofChina,Beijing100872,China)+Correspondingauthor:E-mail:qxp199@sina.comQinXP,WangHJ,DuXY,WangS.Bigdataanalysis—CompetitionandsymbiosisofRDBMSandMapReduce.JournalofSoftware,2012,23(1):3245.http://www.jos.org.cn/1000-9825/4091.htmAbstract:Inmanyareassuchassci
4、ence,simulation,Internet,ande-commerce,thevolumeofdatatobeanalyzedgrowsrapidly.Paralleltechniqueswhichcouldbeexpandedcost-effectivelyshouldbeinventedtodealwiththebigdata.Relationaldatamanagementtechniquehasgonethroughahistoryofnearly40years.Nowitencountersthetoughobstac
5、leofscalability,whichrelationaltechniquescannothandlelargedataeasily.Inthemeantime,nonerelationaltechniques,suchasMapReduceasatypicalrepresentation,emergeasanewforce,andexpandtheirapplicationfromWebsearchtoterritoriesthatusedtobeoccupiedbyrelationaldatabasesystems.Theyc
6、onfrontrelationaltechniquewithhighavailability,highscalabilityandmassiveparallelprocessingcapability.Relationaltechniquecommunity,afterlosingthebigdealofWebsearch,beginstolearnfromMapReduce.MapReducealsoborrowsvaluableideasfromrelationaltechniquecommunitytoimproveperfor
7、mance.RelationaltechniqueandMapReducecompetewitheachother,andlearnfromeachother;newdataanalysisplatformandnewdataanalysiseco-systemareemerging.Finallythetwocampsoftechniqueswillfindtheirrightplacesintheneweco-systemofbigdataanalysis.Keywords:bigdata;deepanalysis;relatio
8、naldatamanagementtechnique;MapReduce摘要:在科学研究、计算机仿真、互联网应用、电子商务等诸多应用领域,数据量正在以极快的速度增长,为了分析和利用这些庞大的数据资源,必须依赖有效的数据分