资源描述:
《联合网页搜索》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、WhatSnippetsSayAboutPagesinFederatedWebSearchThomasDemeester1,DongNguyen2,DolfTrieschnigg2,ChrisDevelder1,andDjoerdHiemstra21GhentUniversity,Ghent,Belgium{tdmeeste,cdvelder}@intec.ugent.be2UniversityofTwente,Enschede,TheNetherlands{d.nguyen,d.trieschnigg,d.hiemstra}@utwente
2、.nlAbstract.WhatisthelikelihoodthataWebpageisconsideredrel-evanttoaquery,giventherelevanceassessmentofthecorrespondingsnippet?UsinganewfederatedIRtestcollectionthatcontainssearchresultsfromoverahundredsearchenginesontheinternet,weareabletoinvestigatesuchresearchquestionsfro
3、maglobalperspective.OurtestcollectioncoversthemainWebsearchengineslikeGoogle,Yahoo!,andBing,aswellasanumberofsmallersearchenginesdedicatedtomulti-media,shopping,etc.,andassuchreflectsarealisticWebenvironment.Usingalargesetofrelevanceassessments,weareabletoinvestigatetheconne
4、ctionbetweensnippetqualityandpagerelevance.Thedatasetisstronglyinhomogeneous,andalthoughtheassessors’consistencyisshowntobesatisfying,careisrequiredwhencomparingresources.Tothisend,anumberofprobabilisticquantities,basedonsnippetandpagerelevance,areintroducedandevaluated.Key
5、words:Websearch,testcollection,relevancejudgments,federatedinformationretrieval,evaluation,snippet.1IntroductionFindingourwayaroundamongthevastquantitiesofdataontheWebwouldbeunthinkablewithouttheuseofWebsearchengines.Apartfromalimitednumberofverylargesearchenginesthatconsta
6、ntlycrawltheWebforpubliclyavailabledata,alargeamountofsmallerandmorefocusedsearchenginesexist,specializedinspecificinformationgoalsordatatypes(e.g.,onlineshopping,news,multimedia,socialmedia).ThegoalofFederatedInformationRetrieval(FIR)[1]istocombinemulti-pleexistingsearcheng
7、inesintoasinglesearchsystem.Withthewidevarietyofexistingresources,includingthosethatarenotdirectlyaccessiblebyWebcrawlers,federatedsearchontheWebhasanenormouspotential,butisahugeresearchchallengeallthesame.AnumberofFIRresearchcollectionshavebeencreatedinthepast,buttheyaremo
8、stlyartificialanddonotrepresentthehetero-geneousWebenvironment,i.e.,searchengineswi