资源描述:
《Using the Distribution of Performance for Studying Statistical NLP Systems and Corpora》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、UsingtheDistributionofPerformanceforStudyingStatisticalNLPSystemsandCorporaYuvalKrymolowskiDepartmentofMathematicsandComputerScienceBar-IlanUniversity52900RamatGan,IsraelAbstractclassifierAbetterthanclassifierBforgiventrainingandtestdata?StatisticalNLPsystemsarefre-quent
2、lyevaluatedandcomparedonQ2:Adequacyoftrainingdatatotestdata:thebasisoftheirperformancesonIsasystemtrainedondatasetXade-asinglesplitoftrainingandtestquateforanalysingdatasetY?Arefea-data.ResultsobtainedusingasingleturesfromXindicativeinY?splitare,however,subjecttosam-Q3
3、:Comparingdatasetswithagivensys-plingnoise.Inthispaperwear-tem:Ifadifferenttrainingsetimprovesgueinfavourofreportingadistri-theresultofsystemAondatasetY1,willbutionofperformancefigures,ob-thisbethecaseondatasetY2aswell?tainedbyresamplingthetrainingdata,ratherthanasinglen
4、um-Theanswerstothesequestionscanprovideber.TheadditionalinformationusefulinsightintostatisticalNLPsystems.fromdistributionscanbeusedtoInparticular,aboutsensitivitytofeaturesinmakestatisticallyquantifiedstate-thetrainingdata,andtransferability.Thesementsaboutdifferencesac
5、rosspa-propertiescanbedifferentevenwhensimilarrametersettings,systems,andcor-performanceisreported.pora.AstatisticaltreatmentofQuestion1ispresentedbyYeh(2000).Hetestsforthe1Introductionsignificanceofperformancedifferencesonfixedtrainingandtestdatasets.InotherThecommonpract
6、iceinevaluatingstatisticalrelatedworks,MartinandHirschberg(1996)arXiv:cs/0106043v1[cs.CL]20Jun2001NLPsystemsisusingastandardcorpus(e.g.,providesanoverviewofsignificancetestsPennTreeBankforparsing,Reutersfortextoferrordifferencesinsmallsamples,andcategorization)alongwitha
7、standardsplitbe-Dietterich(1998)discussesresultsofanum-tweentrainingandtestdata.Assystemsim-beroftests.prove,itbecomeshardertoachieveadditionalQuestions2and3havebeenfrequentlyimprovements,andtheperformanceofvari-raisedinNLP,butnotexplicitlyaddressed,ousstate-of-the-art
8、systemsisapproximatelysincetheprevailingevaluationmethodspro-identical.Thismakesperformancecompar-videnomeansofaddres