资源描述:
《mining of massive data.pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、MiningofMassiveDatasetsAnandRajaramanJureLeskovecStanfordUniv.JeffreyD.UllmanStanfordUniv.Copyrightc2010,2011,2012,2013AnandRajaraman,JureLeskovec,andJeffreyD.UllmaniiPrefaceThisbookevolvedfrommaterialdevelopedoverseveralyearsbyAnandRaja-ramanandJeffUllmanforaone-quartercoursea
2、tStanford.ThecourseCS345A,titled“WebMining,”wasdesignedasanadvancedgraduatecourse,althoughithasbecomeaccessibleandinterestingtoadvancedundergraduates.WhenJureLeskovecjoinedtheStanfordfaculty,wereorganizedthematerialconsiderably.HeintroducedanewcourseCS224Wonnetworkanalysisan
3、daddedmaterialtoCS345A,whichwasrenumberedCS246.Thethreeauthorsalsointroducedalarge-scaledata-miningprojectcourse,CS341.Thebooknowcontainsmaterialtaughtinallthreecourses.WhattheBookIsAboutAtthehighestlevelofdescription,thisbookisaboutdatamining.However,itfocusesondataminingof
4、verylargeamountsofdata,thatis,datasolargeitdoesnotfitinmainmemory.Becauseoftheemphasisonsize,manyofourexamplesareabouttheWebordataderivedfromtheWeb.Further,thebooktakesanalgorithmicpointofview:dataminingisaboutapplyingalgorithmstodata,ratherthanusingdatato“train”amachine-lear
5、ningengineofsomesort.Theprincipaltopicscoveredare:1.Distributedfilesystemsandmap-reduceasatoolforcreatingparallelalgorithmsthatsucceedonverylargeamountsofdata.2.Similaritysearch,includingthekeytechniquesofminhashingandlocality-sensitivehashing.3.Data-streamprocessingandspecia
6、lizedalgorithmsfordealingwithdatathatarrivessofastitmustbeprocessedimmediatelyorlost.4.Thetechnologyofsearchengines,includingGoogle’sPageRank,link-spamdetection,andthehubs-and-authoritiesapproach.5.Frequent-itemsetmining,includingassociationrules,market-baskets,theA-PrioriAl
7、gorithmanditsimprovements.6.Algorithmsforclusteringverylarge,high-dimensionaldatasets.iiiivPREFACE7.TwokeyproblemsforWebapplications:managingadvertisingandrec-ommendationsystems.8.Algorithmsforanalyzingandminingthestructureofverylargegraphs,especiallysocial-networkgraphs.Pre
8、requisitesToappreciatefullythematerialinthisbook,werecommendthefollowingpre