资源描述:
《A Class Library for the Integration of NLP Tools Definition and implementation of an Abstra》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、AClassLibraryfortheIntegrationofNLPTools:DefinitionandimplementationofanAbstractDataTypeCollectionforthemanipulationofSGMLdocumentsinacontextofstand-offlinguisticannotationX.Artola,A.DíazdeIlarraza,N.Ezeiza,K.Gojenola,G.Hernández,A.SoroaFacultyofComputerScienceUniversityoft
2、heBasqueCountry(UPV/EHU)649p.k.,20080Donostia(TheBasqueCountry)jiparzux@si.ehu.esAbstractInthispaperwepresentaprogramlibraryconceivedandimplementedtorepresentandmanipulatetheinformationexchangedintheprocessofintegrationofNLPtools.Itiscurrentlyusedtointegratethetoolsdevelope
3、dforBasqueprocessingduringthelasttenyearsatourresearchgroup.Inouropinion,theprogramlibraryisgeneralenoughtobeusedinsimilarprocessesofintegrationofNLPtoolsorinthedesignofnewapplicationsbuiltonthem.Theprogramlibraryconstitutesaclasslibrarythatprovidestheprogrammerwiththeeleme
4、ntss/heneedswhenmanipulatingSGMLdocumentsinacontextofstand-offlinguisticannotation,wherelinguisticanalysesobtainedatdifferentphases(morphology,lemmatization,processingofmultiwordlexicalunits,surfacesyntax,andsoon)arerepresentedbywell-definedtypedfeaturesstructures.Duetothec
5、omplexityoftheinformationtobeexchangedamongthedifferenttools,featurestructures(FS)areusedtorepresentit.Featurestructuresprovideuswithawell-formalizedbasisfortheexchangeoflinguisticinformationamongthedifferenttextanalysistools.FeaturestructuresarecodedinSGMLfollowingtheTEI’s
6、DTDforFs,andFeature-SystemDeclarations(FSD)havebeenthoroughlyspecified.So,TEI-P3conformantfeaturestructuresconstitutetherepresentationschemaforthedifferentdocumentsthatconveytheinformationfromonelinguistictooltothenextinthelanguageprocessingchain.Thetoolsintegratedsofararea
7、lexicaldatabase,atokenizer,awide-coveragemorphosyntacticanalyzer,ageneralpurposetagger/lemmatizerandashallowsyntacticparser.ThetypeofinformationcontainedinthedocumentsexchangedamongthesetoolshasbeenanalyzedandcharacterizedusingasetofAbstractDataTypes.1.EDBL,alexicaldatabase
8、,whichatthemoment1.Introductioncontainsmorethan80,000entries(Aldezabaletal.,Inthis