资源描述:
《Unit 4 Corpus annotation》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、Unit4Corpusannotation4.1IntroductionCorpusannotationiscloselyrelatedtocorpusmarkup.Oneimportantreasonforusingcorporainlinguisticresearchistoextractlinguisticinformationpresentinthosecorpora.Butitisoftenthecasethatinordertoextractsuchinformationfromacorpus,alingui
2、sticanalysismustfirstbeencodedinthecorpus.Theprocessof‘addingsuchinterpretative,linguisticinformationtoanelectroniccorpusofspokenand/orwrittenlanguagedata’isreferredtoascorpusannotation(Leech1997a:2).Corpusannotationaddsvaluetoacorpusinthatitconsiderablyextendsth
3、erangeofresearchquestionsthatacorpuscanreadilyaddress.Whilecorpusannotationdefinedinabroadsensemayrefertotheencodingofbothtextual/contextualinformationandinterpretativelinguisticanalysis,asshownbytheconflationofthetwooftenfoundintheliterature,thetermisusedinanarr
4、owsensehere,referringsolelytotheencodingoflinguisticanalysessuchaspart-of-speech(POS)taggingandsyntacticparsinginacorpustext.Corpusannotation,asusedinanarrowsense,isfundamentallydistinctfromcorpusmarkupasdiscussedinunit3.Corpusmarkupprovidesrelativelyobjectivelyv
5、erifiableinformationregardingthecomponentsofacorpusandthetextualstructureofeachtext.Incontrast,corpusannotationisconcernedwithinterpretativelinguisticinformation.‘Bycallingannotation“interpretative”,wesignalthatannotationis,atleastinsomedegree,theproductofthehuma
6、nmind’sunderstandingofthetext’(Leech1997a:2).Forexample,thepart-of-speechofawordmaybeambiguousandhenceismorereadilydefinedascorpusannotationthancorpusmarkup.Ontheotherhand,thesexofaspeakerorwriterisnormallyobjectivelyverifiableandassuchisamatterofmarkup,notannota
7、tion.Thisunitwillfirstdiscusstheadvantagesanddisadvantagesofcorpusannotation.Followingthisisadiscussionofhowcorpusannotationisachieved.Wewillthenintroducethemostcommonlyusedtypesofcorpusannotation.Finallywewillbrieflyreviewstand-alonecorpusannotation,asproposedby
8、theCorpusEncodingStandard(CES,seeunit3.3).4.2Corpusannotation=addedvalueLikecorpusmarkup,annotationaddsvaluetoacorpus.Leech(1997a:2)maintainsthatcorpusannotati