欢迎来到天天文库
浏览记录
ID:40719957
大小:572.11 KB
页数:8页
时间:2019-08-06
《Language Modeling with Gated Convolutional Networks》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、LanguageModelingwithGatedConvolutionalNetworksYannN.DauphinAngelaFanMichaelAuliDavidGrangierFacebookAIResearchAbstractbeddingwordsincontinuousspaceoverwhichaneuralnet-workisapplied.ThecurrentstateofthearttolanguageThepre-dominantapproachtolanguagemodel-modeling
2、isbasedonlongshorttermmemorynetworksingtodateisbasedonrecurrentneuralnetworks.(LSTM;Hochreiteretal.,1997)whichcanmodelpoten-Inthispaperwepresentaconvolutionalapproachtiallyarbitrarilylongdependencies.tolanguagemodeling.Weintroduceanovelgatingmechanismthateasesg
3、radientpropaga-Inthispaper,weintroducegatedconvolutionalnetworkstionandwhichperformsbetterthantheLSTM-andapplythemtolanguagemodeling.Convolutionalnet-stylegatingofOordetal.(2016b)despitebeingworkscanbestackedtorepresentlargecontextsizesandsimpler.Weachieveanews
4、tateoftheartonextracthierarchicalfeaturesoverlargerandlargercontextsWikiText-103aswellasanewbestsingle-GPUwithmoreabstractivefeatures(LeCun&Bengio,1995).resultontheGoogleBillionWordbenchmark.InThisallowstomodellong-termdependenciesbyapplyingO(N)operationsoverac
5、ontextofsizeNandkernelwidthsettingswherelatencyisimportant,ourmodelkachievesanorderofmagnitudespeed-upcom-k.Incontrast,recurrentnetworksviewtheinputasachainparedtoarecurrentbaselinesincecomputationstructureandthereforerequirealinearnumberO(N)ofcanbeparallelized
6、overtime.Toourknowledge,operations.thisisthefirsttimeanon-recurrentapproachout-Analyzingtheinputhierarchicallybearsresemblancetoperformsstrongrecurrentmodelsonthesetasks.classicalgrammarformalismswhichbuildsyntactictreestructureofincreasinggranuality,e.g.,senten
7、cesconsistofnounphrasesandverbphraseseachcomprisingfurther1.Introductioninternalstructure(Manning&Schutze¨,1999;Steedman,2002).HierarchicalstructurealsoeaseslearningsincetheStatisticallanguagemodelsestimatetheprobabilitydistri-numberofnon-linearitiesforagivenco
8、ntextsizeisreducedbutionofasequenceofwords.Thisamountstomodelingcomparedtoachainstructure,therebymitigatingthevan-theprobabilityofthenextwordgiventheprecedingwords,ishinggra
此文档下载收益归作者所有