欢迎来到天天文库
浏览记录
ID:57924586
大小:969.71 KB
页数:10页
时间:2020-04-14
《基于指令级并行的倒排索引压缩算法-论文.pdf》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、计算机研究与发展DOI:10.7544/issn1000—1239.2015.20131548JournalofComputerResearchandDevelopment52(5):995—1004,2015基于指令级并行的倒排索引压缩算法闫宏飞张旭东单栋栋。毛先领。赵鑫(北京大学网络与信息系统研究所北京’100871)。(淘宝(中国)软件有限公司杭州312000)。(北京理工大学北京100081)(yhf@net.pku.edu.cn)SIMD—BasedInvertedIndexCompressionAlgorithmsYanHongfei,ZhangX
2、udong,ShanDongdong。,MaoXianling。,andZhaoXin(InstituteofNetworkComputingandInformationSystems,PekingUniversity,Beijing100871)(Taobao(China)SoftwareCo.,Ltd,Hangzhou312000)。(BeijingInstituteofTechnology,Beijing100081)AbstractTherapidgrowthoftextinformaUonhasbroughtaboutnewchallengestot
3、raditionalinformationretrieva1.Inlargesearchengines,indexingisrequiredtohelpusersacquireimportantdatatheyneed,andtechniquesofinvertedindexhavegreatinfluenceontheefficiencyofqueryprocessinginsuchsystems.Thedataininvertedindexisstoredintheformofarraysofintegers,andtechniquesofcompress
4、ionarerequiredtoreducethecostofstoringsuchdataindisksandmemory,aswellastoboostthehitrateofCPUcacheandspeeduptransferringdata.Therefore,itisnecessarytochooseahighlyefficientcompressionalgorithmtoprocessqueryeffectively.Inthispaper,weproposetwoinstruction—level—parallelizedalgorithms,
5、i.e.SIMD-PBandSIMD—PFD,whichimprovetwocompetitivecompressionalgorithmsrespectively,i.e.PackedBinaryandPForDelta,andexploitSIMDinstructionstoacceleratethePackandUnpackprocedureinthealgorithms.ExperimentsbasedonpublicdatasetsofGOV2andClueWeb09Bshowthatournovelalgorithmshavegoodperform
6、anceonencodinganddecodingspeedwithoutimpairingthecompressionratio,andoutperformtheformerfastestinvertedlistcompressionalgorithmsbyatmost17,withrespecttodecompressionspeed.Furthermore。experimentsindicatethatournovelalgorithmshavebetterperformanceonIongerposting1istandlargerblocksizew
7、.r.t.decodingspeed.Keywordssingleinstructionmultipledata(SIMD);invertedindex;compression;integerencoding;informationretrieval摘要文本信息数量的快速增长给传统的信息检索技术带来了新的挑战.搜索引擎通常使用倒排索引来高效地处理查询.为了减少存储开销和加快访问速度,倒排索引通常被压缩存储.因此,如何选择一个高性能的压缩算法对高效查询处理是非常有必要的.在已有倒排链压缩算法PackedBinary和PForDelta的基础上,利用CPU的超标量
8、特性和SIMD向量指令集,将其压缩和解
此文档下载收益归作者所有