资源描述:
《machine translation introduction》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、SpellCorrection&MachineTranslationContentSpellCorrectionMachineTranslationIntroductionStatisticalMachineTranslation:IBMModelsPhrase-BasedTranslationModelsSpellCorrectionIntroductionGivenaword,wearetryingtochoosethemostlikelyspellingcorrectionfor
2、thatword(the"correction"maybetheoriginalworditself).Thereisnowaytoknowforsure(forexample,should"lates"becorrectedto"late"or"latest"?),whichsuggestsweuseprobabilities.Wewillsaythatwearetryingtofindthecorrectionc,outofallpossiblecorrections,thatmaximiz
3、estheprobabilityofcgiventheoriginalwordw:argmaxcP(c
4、w)ByBayes'Theoremthisisequivalentto:argmaxcP(w
5、c)P(c)/P(w)SinceP(w)isthesameforeverypossiblec,wecanignoreit,giving:argmaxcP(w
6、c)P(c)SpellCorrectionIntroductionTherearethreepartsofthisexpressio
7、n.P(c),theprobabilitythataproposedcorrectioncstandsonitsown.ThisiscalledthelanguagemodelSoP("the")wouldhavearelativelyhighprobability,whileP("zxzxzxzyyy")wouldbenearzero.P(w
8、c),theprobabilitythatwwouldbetypedinatextwhentheauthormeantc.Thisistheerr
9、ormodelargmaxc,thecontrolmechanism,whichsaystoenumerateallfeasiblevaluesofc,andthenchoosetheonethatgivesthebestcombinedprobabilityscore.HowtoworkP(c)1Wewillreadabigtextfile,big.txt,whichconsistsofaboutamillionwords2extracttheindividualwordsfromth
10、efile3trainaprobabilitymodel,whichisafancywayofsayingwecounthowmanytimeseachwordoccursenumeratingthepossiblecorrectionscofagivenwordweditdistance:thenumberofeditsitwouldtaketoturnoneintotheotherTheliteratureonspellingcorrectionclaimsthat80to95%of
11、spellingerrorsareaneditdistanceof1Forawordoflengthn,therewillbendeletions,n-1transpositions,26nalterations,and26(n+1)insertions,foratotalof54n+25(ofwhichafewaretypicallyduplicates).HowtoworkP(w
12、c)mistakingonevowelforanotherismoreprobablethanmistak
13、ingtwoconsonants;makinganerroronthefirstletterofawordislessprobable,etc.definedatrivialmodelthatsaysallknownwordsofeditdistance1areinfinitelymoreprobablethanknownwordsofeditdistance2,andinfinitelylessprobablethanaknownwordofeditdistance0.Thefunctio