欢迎来到天天文库
浏览记录
ID:43506136
大小:3.98 MB
页数:40页
时间:2019-10-09
《在深度链接中查询结构化数据》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、DeepWebIntegration:QueryingStructuredDataontheDeepWebFangjiaoJiang1OutlineBackgroundAccessDeepWebMetaQuerierMetasearchenginevs.MetaQuerierRelatedresearchgroupsConclusion…Somesuggestions2BackgroundPart13ThepreviousWeb:thingsarejustonthesurface4ThecurrentWeb:Getting“deeper”Agreatnumb
2、erofdataishiddenbehindqueryforms5TheProblemforaccessdatafromDeepWebDeep=notaccessiblethroughtraditionalsearchengines????6Whyisitimportant?Morethan10milliondistinctforms7Whyisitimportant?Upto5,000billionsdynamicresultpages8Whyisitimportant?——Google’sRecentSurvey[CIDR2007]Ifthereare1b
3、illionwebpages25millionpotentialDeepWebsources9Challenge:HowtoenableeffectiveaccesstotheDeepWeb?Cars.com10AccesstheDeepWebPart211ThreedifferentmannersWarehouse-likeapproachMetaQuerierSurfacingtheDeepWebWebDatabaseWebDatabaseWebDatabase…RepositoryQUERYWebdatabasesIntegratedqueryinterf
4、ace1)Pre-computeappropriatequeriersovertheforms2)Inserttheresultingpagesintoaweb-searchindex12(1)Warehouse-likeapproach中文期刊全文数据库国家自然基金信息库……WebDatabaseWebDatabaseWebDatabaseWebDatabaseWebDatabase…PDFPSDOCJournalHomepageAuhtorHomepageConf.Homepage13(2)MetaQuerierDatabaseCrawlerMetaQuer
5、ierInterfaceExtractionSourceClusteringinterfaceintegrationTheDeepWebBack-end:SemanticsDiscoveryFront-end:QueryExecutionQueryTranslationSourceSelectionSchemamatchingResultprocessingDeepWebRepositoryUnifiedInterfacesSubjectDomainsQueryCapabilitiesQueryInterfacesQueryWebdatabasesFindWeb
6、databasesMetaQuerieriswhatwefocuson.14(3)SurfacingtheDeepWeb[VLDB’08]ViewpointManydomainsandmanylanguagesNohumanintheloop,nosite-specificscriptsMainideapredictinginputvaluesfortextboxespredictinginputcombinationsGoogle’sDeep-WebcrawlingsystemAffectsmorethan1000queriespersecondEnables
7、accesstomorethanamillionDeep-WebsitesSpans50+languagesand100+domains15MetaQuerierPart316ASurveyonDeepWeb[SIGMOD2006]Howmanydeep-Websourcesareoutthere?307,000sites,450,000DBs,1,258,000interfaces.HowstructuredinDeepWeb?348,000(structured):102,000(text)==3:1Howdosearchenginescoverthem?c
8、overed10%sou
此文档下载收益归作者所有