资源描述:
《A Language Modelling Tool for Statistical NLP》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、ALanguageModellingToolforStatisticalNLPDanielBastosPereira,IvandréParaboniEscoladeArtes,CiênciaseHumanidades–UniversidadedeSãoPaulo(EACH/USP)Av.ArlindoBettio,1000-03828-000,SãoPaulo,Brazil.{daniel.bastos,ivandre}@usp.brAbstract.Inrecentyearstheuseofstatisti
2、callanguagemodels(SLMs)hasbecomewidespreadinmostNLPfields.InthisworkweintroducejNina,abasiclanguagemodellingtooltoaidthedevelopmentofMachineTranslationsystemsandmanyothertext-generatingapplications.Thetoolallowsforthequickcomparisonofmultipletextoutputs(e.g
3、.,alternativetranslationsofasinglesource)basedonagivenSLM,andenablestheusertobuildandevaluateherownSLMsfromanycorporaprovided.1.IntroductionConsideraPortuguesenativespeakerwhowantstosay(inEnglish)thatsheisthinkingofbuyinganewcar,butwhoisnotsureabouttheverbc
4、hoice.Inthatcase,areasonable(andhelplesslywrong)guesswouldbetomimictheequivalentPortuguesestructure“Euestouquerendocomprarumcarronovo”toproduce(a)insteadofthecorrectform(b):a.Iamwantingtobuyanewcar.b.Iamthinkingofbuyinganewcar/Iamconsideringbuyinganewcar.Wh
5、atactuallymakes(b)moreappropriatethan(a)isimmaterialtothepresentdiscussion,butthereisonethingthatwecansafelyassumetobetrue:veryfrequentlywehear“Iam”beingfollowedbyverbformssuchas“thinking”or“considering”,butveryseldomwehear“Iam”beingfollowedby“wanting”,even
6、though“wanting”isaproperverbforminitsownright.ThesimpleideathatsomewordsequencesaremorefrequentthanothershasleadtotheconceptoflanguagemodellingthatisnowpartofmainstreamNLPresearch.Inparticular,StatisticalLanguageModels(SLMs)basedonn-gramsarenowwidelyusedinm
7、ostNLPfields,nottomentionthelargeamountofworkonStatisticalMTinwhichtheyplayacentralrole(e.g.,Brownet.al,1990;1993).Besidestheirpowertorepresentsomecrucialaspectsoflanguage,SLMsarequickandeasytoimplement,anddonotrequirethelabour-intensivedevelopmentofNLPreso
8、urcessuchasdictionariesorgrammars,makingthemidealforresearchinrelativelyresource-poorlanguagessuchasPortuguese.TheusesofSLMsstretchwellbeyondthedesignofcomponentsofNLPapplications.Forexample,MachineTra