资源描述:
《Mining of Massive Datasets(2013)_v1.3.pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、MiningofMassiveDatasetsAnandRajaramanJureLeskovecStanfordUniv.JeffreyD.UllmanStanfordUniv.Copyrightc2010,2011,2012,2013AnandRajaraman,JureLeskovec,andJeffreyD.UllmaniiPrefaceThisbookevolvedfrommaterialdevelopedoverseveralyearsbyAnandRaja-ramanandJeffUllman
2、foraone-quartercourseatStanford.ThecourseCS345A,titled“WebMining,”wasdesignedasanadvancedgraduatecourse,althoughithasbecomeaccessibleandinterestingtoadvancedundergraduates.WhenJureLeskovecjoinedtheStanfordfaculty,wereorganizedthematerialconsiderably.Hei
3、ntroducedanewcourseCS224WonnetworkanalysisandaddedmaterialtoCS345A,whichwasrenumberedCS246.Thethreeauthorsalsointroducedalarge-scaledata-miningprojectcourse,CS341.Thebooknowcontainsmaterialtaughtinallthreecourses.WhattheBookIsAboutAtthehighestlevelofdes
4、cription,thisbookisaboutdatamining.However,itfocusesondataminingofverylargeamountsofdata,thatis,datasolargeitdoesnotfitinmainmemory.Becauseoftheemphasisonsize,manyofourexamplesareabouttheWebordataderivedfromtheWeb.Further,thebooktakesanalgorithmicpointof
5、view:dataminingisaboutapplyingalgorithmstodata,ratherthanusingdatato“train”amachine-learningengineofsomesort.Theprincipaltopicscoveredare:1.Distributedfilesystemsandmap-reduceasatoolforcreatingparallelalgorithmsthatsucceedonverylargeamountsofdata.2.Simil
6、aritysearch,includingthekeytechniquesofminhashingandlocality-sensitivehashing.3.Data-streamprocessingandspecializedalgorithmsfordealingwithdatathatarrivessofastitmustbeprocessedimmediatelyorlost.4.Thetechnologyofsearchengines,includingGoogle’sPageRank,l
7、ink-spamdetection,andthehubs-and-authoritiesapproach.5.Frequent-itemsetmining,includingassociationrules,market-baskets,theA-PrioriAlgorithmanditsimprovements.6.Algorithmsforclusteringverylarge,high-dimensionaldatasets.iiiivPREFACE7.TwokeyproblemsforWeba
8、pplications:managingadvertisingandrec-ommendationsystems.8.Algorithmsforanalyzingandminingthestructureofverylargegraphs,especiallysocial-networkgraphs.PrerequisitesToappreciatefullythematerialinthisbook,werecommendthefollowingpre