资源描述:
《a survey of named entity recognition and clas》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、AsurveyofnamedentityrecognitionandclassificationDavidNadeau,SatoshiSekineNationalResearchCouncilCanada/NewYorkUniversityIntroductionTheterm“NamedEntity”,nowwidelyusedinNaturalLanguageProcessing,wascoinedfortheSixthMessageUnderstandingConference(MUC-6)(R.Grishman&Sundheim1996).Atthattime,M
2、UCwasfocusingonInformationExtraction(IE)taskswherestructuredinformationofcompanyactivitiesanddefenserelatedactivitiesisextractedfromunstructuredtext,suchasnewspaperarticles.Indefiningthetask,peoplenoticedthatitisessentialtorecognizeinformationunitslikenames,includingperson,organizationand
3、locationnames,andnumericexpressionsincludingtime,date,moneyandpercentexpressions.Identifyingreferencestotheseentitiesintextwasrecognizedasoneoftheimportantsub-tasksofIEandwascalled“NamedEntityRecognitionandClassification(NERC)”.WepresenthereasurveyoffifteenyearsofresearchintheNERCfield,fr
4、om1991to2006.Whileearlysystemsweremakinguseofhandcraftedrule-basedalgorithms,modernsystemsmostoftenresorttomachinelearningtechniques.WesurveythesetechniquesaswellasothercriticalaspectsofNERCsuchasfeaturesandevaluationmethods.Itwasindeedconcludedinarecentconferencethatthechoiceoffeaturesis
5、atleastasimportantasthechoiceoftechniqueforobtainingagoodNERCsystem(E.TjongKimSang&DeMeulder2003).Moreover,thewayNERCsystemsareevaluatedandcomparedisessentialtoprogressinthefield.Tothebestofourknowledge,NERCfeatures,techniques,andevaluationmethodshavenotbeensurveyedextensivelyyet.Thefirst
6、sectionofthissurveypresentssomeobservationsonpublishedworkfromthepointofviewofactivityperyear,supportedlanguages,preferredtextualgenreanddomain,andsupportedentitytypes.ItwascollectedfromthereviewofahundredEnglishlanguagepaperssampledfromthemajorconferencesandjournals.Wedonotclaimthisrevie
7、wtobeexhaustiveorrepresentativeofalltheresearchinalllanguages,butwebelieveitgivesagoodfeelforthebreadthanddepthofpreviouswork.Section2coversthealgorithmictechniquesthatwereproposedforaddressingtheNERCtask.MosttechniquesareborrowedfromtheMachineLearning(ML)field.Inst