资源描述:
《A comparative evaluation of modern English corpus.pdf》由会员上传分享,免费在线阅读,更多相关内容在应用文档-天天文库。
1、AcomparativeevaluationofmodernEnglishcorpusgrammaticalannotationschemesEricAtwell,GeorgeDemetriou,JohnHughes,AmandaSchiffrin,CliveSouterandSeanWilcockCentreforComputerAnalysisofLanguageandSpeech(CCALAS)1IntroductionManyEnglishCorpusLinguisticsprojects
2、reportedinICAMEJournalandelse-whereinvolvegrammaticalanalysisortaggingofEnglishtexts(egAtwell1983,Leechetal1983,Booth1985,Owen1987,Souter1989a,O’Donoghue1991,Belmore1991,KytöandVoutilainen1995,Aarts1996,QiaoandHuang1998).Eachnewprojecthastoreviewexist
3、ingtaggingschemes,anddecidewhichtoadoptand/oradapt.TheAMALGAMprojectcanhelpinthisdecision,bypro-vidingdescriptionsandanalysesofarangeoftaggingschemes,andaninternet-basedserviceforresearcherstotryouttherangeoftaggingschemesontheirowndata.TheprojectAMAL
4、GAM(AutomaticMappingAmongLexico-GrammaticalAnnotationModels)exploredarangeofPart-of-SpeechtagsetsandphrasestructureparsingschemesusedinmodernEnglishcorpus-basedresearch.ThePoS-taggingschemesinclude:Brown(GreeneandRubin1981),LOB(Atwell1982,Johanssoneta
5、l1986),Parts(man1986),SEC(TaylorandKnowles1988),POW(Souter1989b),UPenn(Santorini1990),LLC(Eeg-Olofsson1991),ICE(Greenbaum1993),andBNC(Garside1996).Theparsingschemesincludesomewhichhavebeenusedforhandannotationofcorporaormanualpost-editingofautomaticpa
6、rsers,andotherswhichareuneditedoutputofaparsingprogram.Projectdeliverablesinclude:–adetaileddescriptionofeachPoS-taggingscheme,atacomparablelevelofdetail.ThisincludesalistofPoS-tagswithdescriptionsandexampleusesfromthesourceCorpus.Thedescriptionoftheu
7、seofPoS-tagsisalsoillus-tratedinamulti-taggedcorpus:asetofsampletextsPoS-taggedinparallelwitheachPoS-tagset(andproofreadbyexperts),forcomparativestudies7ICAMEJournalNo.24–ananalysisofthedifferentlexicaltokenizationrulesusedinthesourceCor-pora,toarrive
8、ata‘Corpus-neutral’tokenizationscheme(andconsequentadjustmentstothePoS-tagsetsinourstudytoacceptmodifiedtokenization)–animplementationofeachPoS-tagsetinconjunctionwithourstandardisedtokenizer,asafamilyofPoS-taggers,oneforeachPoS-tagset–amethod