欢迎来到天天文库
浏览记录
ID:33070696
大小:157.50 KB
页数:32页
时间:2019-02-19
《google搜索引擎方案》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、www.semtrain.com.cnwww.semtrain.cnTheAnatomyofaLarge-ScaleHypertextualWebSearchEngineSergeyBrinandLawrencePageComputerScienceDepartment,StanfordUniversity,Stanford,CA94305,USAsergey@cs.stanford.eduandpage@cs.stanford.eduAbstractInthispaper,wepresentGoogle,aprototypeofalarg
2、e-scalesearchenginewhichmakesheavyuseofthestructurepresentinhypertext.GoogleisdesignedtocrawlandindextheWebefficientlyandproducemuchmoresatisfyingsearchresultsthanexistingsystems.Theprototypewithafulltextandhyperlinkdatabaseofatleast24millionpagesisavailableathttp://google
3、.stanford.edu/Toengineerasearchengineisachallengingtask.Searchenginesindextenstohundredsofmillionsofwebpagesinvolvingacomparablenumberofdistinctterms.Theyanswertensofmillionsofquerieseveryday.Despitetheimportanceoflarge-scalesearchenginesontheweb,verylittleacademicresearch
4、hasbeendoneonthem.Furthermore,duetorapidadvanceintechnologyandwebproliferation,creatingawebsearchenginetodayisverydifferentfromthreeyearsago.Thispaperprovidesanin-depthdescriptionofourlarge-scalewebsearchengine--thefirstsuchdetailedpublicdescriptionweknowoftodate.Apartfrom
5、theproblemsofscalingtraditionalsearchtechniquestodataofthismagnitude,therearenewtechnicalchallengesinvolvedwithusingtheadditionalinformationpresentinhypertexttoproducebettersearchresults.Thispaperaddressesthisquestionofhowtobuildapracticallarge-scalesystemwhichcanexploitth
6、eadditionalinformationpresentinhypertext.Alsowelookattheproblemofhowtoeffectivelydealwithuncontrolledhypertextcollectionswhereanyonecanpublishanythingtheywant.KeywordsWorldWideWeb,SearchEngines,InformationRetrieval,PageRank,Google地址:北京市朝阳区郎家园10号百事和大厦4层课程咨询热线:010-57222368/9
7、www.semtrain.com.cnwww.semtrain.cn1.Introduction(Note:Therearetwoversionsofthispaper--alongerfullversionandashorterprintedversion.ThefullversionisavailableonthewebandtheconferenceCD-ROM.)Thewebcreatesnewchallengesforinformationretrieval.Theamountofinformationonthewebisgrow
8、ingrapidly,aswellasthenumberofnewusersinexperiencedintheartofwebresearch.Peoplearelikelyt
此文档下载收益归作者所有