资源描述:
《Ogmios a scalable NLP platform for annotating large web document collections》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、Ogmios:ascalableNLPplatformforannotatinglargewebdocumentcollectionsThierryHamon,JulienDerivière,AdelineNazarenkoLIPN–UMR7030CNRS–UniversitéParis1399av.J.B.Clément,F-93430Villetaneuse,FRANCETél.:33149402832,Fax.:33148260712firstname.lastname@lipn.univ-paris13.fr1Introduc
2、tionSearchengineslikeGoogleorYahooofferaccesstobillionsoftextualwebpages.ThesetoolsareverypopularandseemtobesufficientforalargenumberofgeneraluserqueriesontheInternet.However,someotherqueriesaremorecomplex,requiringspecificknowledgeorprocessingstrategies:noreallysatisfac
3、torysolutionexistsforthesere-quests.Thereisthusaneedformorespecificsearchenginesdedicatedtospecialiseddomainorusers.ConsideringthecaseoftextmininginMicrobiologyforexample,itisclearthatoneneedsmorethanexistingsearchenginesgiventhespecificityandthereliabilityoftheinformati
4、onthatissoughtbyscientists.Evenifrecentdevelopmentsinbiologyandbiomedicinearereportedinlargebibliographicaldatabases(e.g.Flybase,specialisedonDrosophiliaMenogasterorMedline),suchdatabasesandtheassociatedsearchingfunctionalitiesarenotsufficienttosatisfybiologists’specific
5、informationneeds,suchasfindinginformationongeneinteractionsinordertoprogressivelyfigureoutawholeinteractionnetwork.Wepreviouslyarguedthatlookingforthiskindofrelationalin-formationrequiresadomain-specificlinguisticanalysisandparsingofthedocuments(Alphonseetal.,2004).TheALV
6、ISprojectaimsatdevelopinganopensourcesearchengine,withextendedse-manticsearchfacilities.Comparedtostateoftheartsearchengines(likeGoogle,themostpopularone),theALVISsearchengineisdomainspecific.Itreliesonaspecialisedcrawler,whichselectsthewebpagesonterminologicalgrounds.I
7、ndexingexploitsvar-ioustypesoflinguisticanddomainspecificannotation(cf.figure1).Adedicatedin-terfacehelpsuserstorefinequeriesandanalysethecontentoftheretrieveddocuments.TheALVISsearchengineprocessesthequerymoreaccurately,takingintoaccountthetopicandthecontextofsearchtorefi
8、neboththequeryandthedocumentanalysis.Thispaperfocusesonthedesignandthedevelopmentofthetextprocessingplatform,Ogmios,w