资源描述:
《A Structured Self-attentive Sentence Embedding》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、PublishedasaconferencepaperatICLR2017ASTRUCTUREDSELF-ATTENTIVESENTENCEEMBEDDINGZhouhanLinz,MinweiFeng,CiceroNogueiradosSantos,MoYu,BingXiang,BowenZhou&YoshuaBengiozyIBMWatsonzMontrealInstituteforLearningAlgorithms(MILA),UniversitedeMontr´eal´yCIFARSeniorFellowlin.zhouhan@gmail.co
2、mfmfeng,cicerons,yum,bingxia,zhoug@us.ibm.comABSTRACTThispaperproposesanewmodelforextractinganinterpretablesentenceembed-dingbyintroducingself-attention.Insteadofusingavector,weusea2-Dmatrixtorepresenttheembedding,witheachrowofthematrixattendingonadifferentpartofthesentence.Wealsopropose
3、aself-attentionmechanismandaspecialregularizationtermforthemodel.Asasideeffect,theembeddingcomeswithaneasywayofvisualizingwhatspecificpartsofthesentenceareencodedintotheembedding.Weevaluateourmodelon3differenttasks:authorprofiling,senti-mentclassificationandtextualentailment.Resultsshowthat
4、ourmodelyieldsasignificantperformancegaincomparedtoothersentenceembeddingmethodsinallofthe3tasks.1INTRODUCTIONMuchprogresshasbeenmadeinlearningsemanticallymeaningfuldistributedrepresentationsofindividualwords,alsoknownaswordembeddings(Bengioetal.,2001;Mikolovetal.,2013).Ontheotherhand,muc
5、hremainstobedonetoobtainsatisfyingrepresentationsofphrasesandsentences.Thosemethodsgenerallyfallintotwocategories.Thefirstconsistsofuniversalsentenceembeddingsusuallytrainedbyunsupervisedlearning(Hilletal.,2016).ThisincludesSkipThoughtvectors(Kirosetal.,2015),ParagraphVector(Le&Mikolov,20
6、14),recursiveauto-encoders(Socheretal.,2011;2013),SequentialDenoisingAutoencoders(SDAE),FastSent(Hilletal.,2016),etc.Theothercategoryconsistsofmodelstrainedspecificallyforacertaintask.Theyareusuallycombinedwithdownstreamapplicationsandtrainedbysupervisedlearning.Onegenerallyfindsthatspecifi
7、callytrainedsentenceembeddingsperformbetterthangenericones,althoughgenericarXiv:1703.03130v1[cs.CL]9Mar2017onescanbeusedinasemi-supervisedsetting,exploitinglargeunlabeledcorpora.Severalmodelshavebeenproposedalongthisline,byusingrecurrentnetworks(Hochreiter&Schmidhuber,199