资源描述:
《Very Sparse Random Projections.pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、VerySparseRandomProjectionsPingLiTrevorJ.HastieKennethW.ChurchDepartmentofStatisticsDepartmentofStatisticsMicrosoftResearchStanfordUniversityStanfordUniversityMicrosoftCorporationStanfordCA94305,USAStanfordCA94305,USARedmondWA98052,USApingli@stat.stanford.eduhastie@stanfor
2、d.educhurch@microsoft.comT2ABSTRACTbecomputedasAA,atthecostoftimeO(nD),whichisoftenprohibitiveforlargenandD,inmoderndataminingTherehasbeenconsiderableinterestinrandomprojections,andinformationretrievalapplications.anapproximatealgorithmforestimatingdistancesbetweenTospeedu
3、pthecomputations,onecangeneratearan-pairsofpointsinahigh-dimensionalvectorspace.LetDknDdomprojectionmatrixR2RandmultiplyitwiththeA2RbeournpointsinDdimensions.ThemethodnDDkoriginalmatrixA2RtoobtainaprojecteddatamatrixmultipliesAbyarandommatrixR2R,reducingtheDdimensionsd
4、owntojustkforspeedingupthecompu-1nktation.RtypicallyconsistsofentriesofstandardnormalB=pAR2R;kmin(n;D):(1)N(0;1).Itiswellknownthatrandomprojectionspreservekpairwisedistances(intheexpectation).AchlioptasproposedThe(muchsmaller)matrixBpreservesallpairwisedis-sparserandompr
5、ojectionsbyreplacingtheN(0;1)entries121tancesofAinexpectations,providedthatRconsistsofinRwithentriesinf 1;0;1gwithprobabilitiesf;;g,636i.i.d.entrieswithzeromeanandconstantvariance.Thus,achievingathreefoldspeedupinprocessingtime.wecanachieveasubstantialcostreductionforcompu
6、tingWerecommendusingRofentriesinf 1;0;1gwithprob-pAAT,fromO(n2D)toO(nDk+n2k).111abilitiesfp;1 p;pgforachievingasignicantD-2DD2DIninformationretrieval,weoftendonothavetomateri-foldspeedup,withlittlelossinaccuracy.TalizeAA.Instead,databasesandsearchenginesareinter-estedinst
7、oringtheprojecteddataBinmainmemoryforCategoriesandSubjectDescriptorsecientlyrespondingtoinputqueries.WhiletheoriginalH.2.8[DatabaseApplications]:DataMiningdatamatrixAisoftentoolarge,theprojecteddatamatrixBcanbesmallenoughtoresideinthemainmemory.DkTheentriesofR(denotedbyfr
8、jigj=1i=1)shouldbei.i.d.GeneralTermswithzeromean.Infact,thisistheonlynecessarycondi-Algor