谷歌大规模排序实验历史[翻译]

ID：26360510

大小：49.00 KB

页数：6页

时间：2018-11-26

资源描述：

《谷歌大规模排序实验历史[翻译]》由会员上传分享，免费在线阅读，更多相关内容在教育资源-天天文库。

1、.原文链接：https://cloud.google.com/blog/big-data/2016/02/history-of-massive-scale-sorting-experiments-at-google作者：MarianDvorsky，软件工程师，谷歌云平台Historyofmassive-scalesortingexperimentsatGoogle谷歌大规模排序实验的历史Thursday,February18,2016星期四，2016年2月18日We’vetestedMapReducebysortinglargeamountsofrando

2、mdataeversincewecreatedthetool.Welikesorting,becauseit’seasytogenerateanarbitraryamountofdata,andit’seasytovalidatethattheoutputiscorrect.我们发明了MapReduce这个工具之后，对它进行了大规模随机数据的排序测试。我们喜欢排序，因为很容易产生任意规模的数据，也很容易验证排序的输出是否正确。Eventhe originalMapReducepaper reportsaTeraSortresult.Engineersrun

3、1TBor10TBsortsasregressiontestsonaregularbasis,becauseobscurebugstendtobemorevisibleonalargescale.However,therealfunbeginswhenweincreasethescaleevenfurther.InthispostI’lltalkaboutourexperiencewithsomepetabyte-scalesortingexperimentswedidafewyearsago,includingwhatwebelievetobethela

4、rgestMapReducejobever:a50PBsort.我们最初的MapReduce论文就报道了一个TeraSort排序的结果。工程师在一定的规则基础上对1TB或10TB的数据进行排序测试，因为细小的错误更容易在大规模数据运行的时候被发现。然而，真正有趣的事情在我们进一步扩大数据规模后才开始。在这篇文章中，我将讲一讲我们在几年之前所做的一些PB级别的排序实验，包括我们认为是目前最大的MapReduce工作：50PB排序。Thesedays,GraySortisthelargescalesortingbenchmarkofchoice.InGrayS

5、ort,youmustsortatleast100TBofdata(as100-byterecordswiththefirst10bytesbeingthekey),lexicographically,asfastaspossible.Thesite sortbenchmark.org tracksofficialwinnersforthisbenchmark.Weneverenteredtheofficialcompetition.......那时候，GraySort是大型排序基准的选择。在GraySort基准下，你必须按照尽快对至少100TB的数据(每

6、100B数据用最前面的10B数据作为键)进行字典序排序。Storbenchmark.org这个网站追踪报道了这个基准的官方优胜者。而我们从未正式参加过比赛。MapReducehappenstobeagoodfitforsolvingthisproblem,becausethewayitimplementsreduceisbysortingthekeys.Withtheappropriate(lexicographic)shardingfunction,theoutputofMapReduceisasequenceoffilescomprisingthefi

7、nalsorteddataset.MapReduce是解决这个问题的一个不错选择，因为它实现减少(优化)的方法是对通过对键进行排序。结合适当的(字典)分区功能，MapReduce的输出是一组包含了最终排序数据的文件序列。Onceinawhile,whenanewclusterinadatacentercameup(typicallyforusebythesearchindexingteam),weintheMapReduceteamgottheopportunitytoplayforafewweeksbeforetherealworkloadmovedin

8、.Thisiswhenwehadachanceto“burnin”

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 6



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

谷歌大规模排序实验历史[翻译]

谷歌大规模排序实验历史[翻译]

相关文章

相关标签