基于深度学习模型的CCG超标注.pdf

ID：35099640

大小：13.46 MB

页数：114页

时间：2019-03-17

上传者：文档小小白

资源描述：

《基于深度学习模型的CCG超标注.pdf》由会员上传分享，免费在线阅读，更多相关内容在学术论文-天天文库。

嗨！thesis博士学位论文基于深度学习模型的CCG超标注CCGSUPERTAGGINGBASEDONDEEPLEARNINGMODELSREKIAKADARI哈尔滨工业大学2018年06月国内图书分类号：TP391.1学校代码：10213国际图书分类号：681.324密级：公开工学博士学位论文基于深度学习模型的CCG超标注博士研究生：REKIAKADARI导师：刘挺教授申请学位：工学博士学科：计算机科学与技术所在单位：计算机科学与技术学院答辩日期：2018年06月授予学位单位：哈尔滨工业大学 ClassiﬁedIndex:TP391.1U.D.C:681.324DissertationfortheDoctoralDegreeinEngineeringCCGSUPERTAGGINGBASEDONDEEPLEARNINGMODELSCandidate:REKIAKADARISupervisor:Prof.LiuTingAcademicDegreeAppliedfor:DoctorofEngineeringSpecialty:ComputerScienceAﬃliation:SchoolofComputerScienceandTechnologyDateofDefence:June,2018Degree-Conferring-Institution:HarbinInstituteofTechnology 摘要摘要如何让计算机理解并处理人类语言是人工智能领域的长盛不衰的研究课题。使用自然语言与具有人工智能的计算机交互常被称为自然语言处理（NLP）。自然语言处理在我们日常生活中应用十分广泛。序列标注是自然语言处理领域中历史最悠久的研究课题之一，包括词性标注（Partofspeechtagging）和CCG超标注(CombinatoryCategorialGrammarsupertagging,组合范畴语法超标注）。CCG超标注是许多自然语言处理任务的前序步骤，例如组块(chunking)和句法解析(parsing)。CCG超标注可定义为：给定一个由词构成的序列，要求给序列中的每个词赋予一个CCG超标签。CCG超标注的最大挑战在于超标签的总数巨大，以及每个词可赋予的超标签数目众多，这使得许多应用非常复杂。前人提出过许多方法来应对这一问题，这些方法通常基于不同的统计机器学习方法。同时这些方法通常使用大量人工设计的表示和输入特征来取得较好的实验效果。但是，如何自动地提取输入的表示特征也是研究的重点。深度学习可以看成是机器学习和表示学习的结合，可以自动学习有用的特征和输入表示。因此我们将尝试使用深度学习技术处理CCG超标注任务。在本文中，我们专注于CCG超标注这一任务，提出了一些技术，可以让赋予每个输入词的词法类别数目减少。我们的目标是开发一个简单而准确的模型来解决CCG超标注的挑战，同时利用深度神经网络学习必要的间接表示以避免复杂的人工特征选择。我们认为现有的CCG超标注有三个主要问题。第一个问题是长序列建模问题，即循环神经网络(RNNs)只能记忆较少的步骤，难以建模较长的序列。由于深度学习模型能从输入的依存中受益，而统计机器学习算法能够从输出的依存中受益；因此第二个问题是对于CCG超标注这一结构预测任务，如何同时从输入和输出依存中学习，这是非常必要的。最后，第三个问题是未登录词(OOV)的问题，即未登录词和罕见词会降低模型的准确率。因此，本文的主要目标是使用深度学习技术解决上述CCG超标注任务中的问题，并有效降低所预测的超标签的个数。此外，要避免使用词法特征以及其他手工构建的特征。特别地，以下问题是本文着重考虑的：1）如何记忆序列信息是许多序列标注问题的关键任务，CCG超标注亦是如此。我们提出了一个基于门限循环单元（GRU）网络的新方法。为了同时保存从-I- 哈尔滨工业大学工学博士学位论文左到右和从右到左的信息，我们应用了双向门限循环单元。此外，我们采用了深度结构来学习输入间的复杂交互。所提的方法的试验结果提升了CCG语法的超标注和多标注的性能。2）我们为CCG超标注提出了一个新的方法，叫做“后向-双向长短时记忆网络（Backward-BLSTM）”。长短时记忆网络（LSTM）作为一个比门限循环单元更有效的模型，它能更好地记忆信息以及预测最可能的超标签。我们提出的结构对于CCG语法的超标注和多标注都是有效的。试验结果表明所我们提出的方法能有效地建模长序列，同时能达到领先的性能。3）前人为CCG超标注这一任务提出了许多模型。然而这些模型要么是使用基于手工构建特征的机器学习方法，要么虽然是基于深度学习的模型但是却忽略了临近输出标签之间的依存关系，而这一关系对于预测当前标签十分重要。因此，如何利用临近的输出标签来预测当前位置的标签是关键。在这项工作中，我们同时利用了条件随机场（CRF）和双向长短时记忆网络。这个模型首先使用双向长短时记忆网络学习句子表示，同时获取过去和未来的输入并长距离地记忆这些数据。然后，模型使用条件随机场来处理句子级别的标签信息并输出预测。这个模型能够同时从输入和输出中受益，性能优于当前最好的方法。试验结果表明所提方法在CCG超标注和多标注上超越了现有的方法。4）尽管许多工作已经利用深度学习模型来解决CCG超标注的问题，仍然没有一项研究来深入解决未登录词的问题。考虑到这一点，我们提出了一种简洁而有效的方法来探索不同的输入表示。为表示词间的形态信息，首先使用预训练的词向量来提取词之间的相似度。然后我们使用字符级别的输入表示，建立了字符与向量间的检索表。然后把字符级别和词级别的表示拼接到一起，送入双向长短时记忆网络来产生输出。试验结果表明我们的方法在领域内和领域外的数据集上都要优于仅使用基于词的输入表示的模型。对于CCG超标注这一问题，我们进行了深入研究，并指出了现有公开技术的局限。基于这一分析，我们有条理地提出并实现了解决问题的新方法，并在若干数据集上验证了方法的有效性。试验结果证明了所有提出技术的有效性。关键词：自然语言处理，组合范畴语法，CCG超标注，深度学习，神经网络-II- AbstractAbstractMakingcomputersunderstandthehumanlanguagesandmanipulatethemhasbeenasubjectofresearchinArtiﬁcialIntelligence(AI)forlongyears.InteractingwithcomputerswithAIsystemsusingnaturallanguagesisoftenreferredtoasNaturalLanguageProcessing(NLP).NLPhasmanyapplicationsthatarewidelyusedinourdailylives.SequencelabelingisoneoftheoldestﬁeldsinNLPincludingmanytaskssuchaspartofspeechtaggingandCombinatoryCategorialGrammar(CCG)supertagging.TheCCGsupertaggingisdevotedastheﬁrstimportantstageformanyNLPap-plicationsfollowingwhichfurtherprocessinglikechunkingandparsingaredone.TheCCGsupertaggingcanbedeﬁnedas:givenasequenceofwords,thegoalistoassignaCCGsupertagtoeachwordinthesequence.ThemajorchallengingproblemoftheCCGsupertaggingisrelatedtothehugenumberofthecategorysetandthelargenumberoftheassignedcategoriestoeachitemwhichmakesmanyapplicationsverycomplex.ThisbecomesacriticaltaskintheNLPcommunity.ConsiderableapproacheshavebeenproposedtodealwiththeCCGsupertaggingproblemwheresolutionsareoftenbasedondiﬀerentstatisticalmachinelearningmodels.However,mostcurrentmachinelearningmethodsworkwellbecauseofwelldesignedhumanrepresentationsandinputsfeatures.Inrecentresearch,automaticallyextractingfeaturesthatcontaininformationaboutinputrepresentationsisveryimportant.Deeplearningcanbeseenasputtingbacktogetherrepresentationlearningwithmachinelearning.Itattemptstojointlylearngoodfeaturesandinputsrepresentations.Inthisthesis,wefocusontheCCGsupertaggingtask,inordertoproposeanddevelopsometechniques,whichallowreducingthenumberoftheassignedlexicalcategoriestoeachwordinaninput.OurgoalisthedevelopmentofsimpleandaccuratemodelsthatcansolvethechallengingproblemofCCGsupertaggingandlearnthenecessaryintermediaterepresentationsofinputentrieswithouttheneedforextensivefeaturesengineeringbasedondeeplearningmodels.WebelievethattherearethreemainproblemsofthecurrentCCGsupertaggingmodels.TheﬁrstproblemisrelatedtomodellongsequenceswhereRecurrentNeuralNetworks(RNNs)failtodoandtendtomemorizeinformationjustfor-III- 哈尔滨工业大学工学博士学位论文fewtimesteps.Becausedeeplearningmodelsbeneﬁtfrominputlevelsandstatisticalmachinelearningalgorithmsbeneﬁtfromoutputdependencies;thesecondproblemisrelatedtooutputsdependenciesasthenecessityofamodelthatcanbeneﬁtfrombothinputandoutputdependenciesisverynecessarytotheCCGsupertaggingasastructuredpredictiontask.AndthethirdproblemisrelatedtotheOut-Of-Vocabulary(OOV)wordswheretheexistingmodels’accuracydecreaseinthepresenceofunseenandrarewords.Forthisreason,thegeneralobjectiveofthisthesisistoproposenoveltechniquesfortheCCGsupertaggingproblembasedondeeplearningmethods,inordertoimprovethecapabilitytoreducethenumberofthepredictedsupertagsandsolvetheabovementionedproblems.Furthermore,nolexicalorhand-craftedfeatureswererequired.Inparticularizethefollowingspeciﬁcissuesareconsideredinthiswork:1)Howtomemorizeinformationfromsequentialdata,isstillacriticaltaskformanysequencetaggingtasksandfortheCCGsupertagginginparticular.WepresentanewmethodforCCGsupertaggingbasedonGatedRecurrentUnit(GRU)networks.InordertosaveinputdatafrombothleftandrightdirectionBidirectionalGRU(BGRU)modelisused.Moreover,adeeparchitectureisadoptedinordertolearncomplexinteractionsbetweeninputentries.Thereportedresultsoftheproposedmodelimprovethesupertaggingandmulti-taggingperformancefortheCCGgrammar.2)Wepresentanewmethodnamed"Backward-BLSTM"forCCGsupertagging.LongShort-TermMemory(LSTM)networksareadoptedasamorepowerfulmethodthanGRUnetworkstomemorizeinformationandtoselectthemostlikelypredictedsupertag.Theproposedarchitectureprovesitseﬃciencyforbothsupertaggingandmulti-taggingfortheCCGgrammar.Theexperimentalresultsshowthattheproposedmodeliseﬃcienttomodellongsequencesandstillachievesgoodperformancethanthe-state-of-the-artproposedmodels.3)ManyapproacheshavebeenproposedfortheCCGsupertaggingtask.However,thesemodelswhetherusemanyhand-craftedfeaturescaseofmachinelearningstrategiesorusesentencelevelrepresentationprocessingasequencewithoutanycorrelationbetweenlabelsinneighborhoodswhichhavegreatinﬂuenceonpredictingthecurrentlabelcaseofdeeplearningmodels.LabelingagivensequencewithasetofCCGsyntacticcategoriesandtakingintoaccountthetaglevelisaverycriticalpoint;inthiswork,weusethe-IV- AbstractcombinationofConditionalRandomFields(CRF)andBLSTMmodels.SoﬁrstthemodellearnssentencerepresentationwherewecangainfrombothpastandfutureinputfeaturesandstorethedataforlongperiodsthankstotheBLSTMarchitecture.Afterward,themodelusessentenceleveltaginformationthankstoaCRFlayerwhichisregardedastheoutputpredictor.Themodelallowsbeneﬁtingfrombothinputandoutputentriesandismorecompetentthanstate-of-the-artmethods.TheachievedresultsdemonstratethattheproposedmodeloutperformstheexistingapproachesforbothCCGsupertaggingandmulti-tagging.4)EventhoughsomeliteraturehasmadeadvantagesofdeeplearningmodelsforCCGsupertagging,therestillnocomprehensiveresearchonhowtodealwithOOVentries.Withthisinmind,wepresentanewmethodwhichexploresthestrengthsofdiﬀerentembeddingsinasimpleandeﬀectiveway.Torepresentmorphologicalinformationbetweenwords;thepre-trainedwordembeddingsareusedtoextractinformativesimilaritybetweenwords.Then,weusedcharactersembeddingsinwhicharemappedthelookuptablesofcharacters.BLSTMnetworksareusedforbothcharactersandwordsembeddingsthenconcatenatedtogethertogeneratetheﬁnaloutputs.Theexperimentalresultsshowthatourmethodproducedthebestperformancethanwordembeddingsbasedmodelsonbothin-domainandout-of-domaindatasets.FortheCCGsupertaggingproblem,adeepstudyoftheliteratureiscarriedout,andthelimitationsofthecurrentlypublishedtechniquesarehighlighted.Startingfromthisanalysis,novelapproachesaretheoreticallyproposed,implementedandtestedonseveraldatasetstoverifytheireﬀectiveness.Theachievedexperimentalresultsconﬁrmtheeﬀectivenessofalltheproposedtechniques.Keywords:NaturalLanguageProcessing,CombinatoryCategorialGrammar,CCGSu-pertagging,DeepLearning,NeuralNetworks-V- 哈尔滨工业大学工学博士学位论文ContentsAbstract(InChinese).............................................................................IAbstract(InEnglish)...........................................................................IIIIndexofﬁgure.....................................................................................XIndexoftable....................................................................................XIIChapter1Introduction.........................................................................11.1Motivation..................................................................................11.2TheCCGSupertaggingTask............................................................31.3ApplicationsofCCGSupertagging.....................................................51.4CategorialGrammar......................................................................61.5CombinatoryCategorialGrammar......................................................61.5.1ApplicationCombinators............................................................81.5.2CompositionCombinators..........................................................91.5.3Type-raisingCombinators..........................................................101.6LiteratureReview.........................................................................111.6.1Supertagging.........................................................................121.6.2CCGsupertagging...................................................................131.7EvaluationMetric........................................................................181.8Dataset.....................................................................................181.9ThesisContributions.....................................................................191.10OrganizationoftheThesis............................................................21Chapter2GatedRecurrentUnitsfortheCCGSupertaggingtask..................242.1Introduction...............................................................................242.2NeuralNetworks..........................................................................242.2.1DeepLearning.......................................................................242.2.2RecurrentNeuralNetworks........................................................272.2.3BidirectionalRNN..................................................................292.2.4GatedRecurrentUnits..............................................................31-VI- Contents2.3BGRUproposedmodelfortheCCGSupertaggingtask............................322.3.1InputLayer...........................................................................332.3.2GRUNeuralNetwork...............................................................342.3.3OutputLayer.........................................................................352.4ExperimentSettings......................................................................362.4.1Dataset................................................................................362.4.2DataPreprocessing..................................................................362.4.3Hyper-ParametersandTraining...................................................362.4.4WordembeddingsSettings.........................................................362.4.5LearningAlgorithm.................................................................372.4.6Dropout...............................................................................372.5ResultsandAnalysis.....................................................................372.5.1SupertaggingResults...............................................................372.5.2Multi-taggingResults...............................................................392.6Summary..................................................................................41Chapter3Backward-BLSTMmodelfortheCCGSupertaggingtask..............433.1Introduction...............................................................................433.1.1LongShortTermMemoryNetworks.............................................443.2Backward-BLSTMproposedmodelfortheCCGSupertaggingtask.............473.2.1InputLayer...........................................................................473.2.2NeuralNetwork......................................................................483.2.3Outputlayer..........................................................................493.3ExperimentsSettings....................................................................493.3.1ExperimentalData..................................................................493.3.2DataPreprocessing..................................................................503.3.3Implementation......................................................................503.3.4Hyper-Parameters...................................................................503.3.5LearningAlgorithm.................................................................513.3.6Dropout...............................................................................513.4ExperimentResults.......................................................................533.4.1SupertaggingResults...............................................................53-VII- 哈尔滨工业大学工学博士学位论文3.4.2Multi-taggingResults...............................................................553.5Summary..................................................................................56Chapter4BLSTM-CRFmodelfortheCCGSupertaggingtask.....................584.1Introduction...............................................................................584.2ModelDescription.......................................................................594.2.1BLSTMNetwork....................................................................594.2.2ConditionalRandomFields........................................................614.2.3BLSTM-CRFproposedmodelfortheCCGSupertaggingtask...............634.3ExperimentSettings......................................................................654.3.1Datasets...............................................................................654.3.2Wordembeddings...................................................................664.3.3OptimizationAlgorithm............................................................664.3.4DropoutTraining....................................................................664.3.5Hyper-ParametersTuning..........................................................664.4ResultsandAnalysis.....................................................................674.4.1SupertaggingResults...............................................................674.4.2Multi-taggingResults...............................................................694.5Summary..................................................................................70Chapter5Character-WordembeddingsfortheCCGSupertaggingtask..........715.1Introduction...............................................................................715.2Character-WordembeddingsproposedmodelfortheCCGSupertaggingtask..725.2.1Word-LevelNeuralNetwork.......................................................725.2.2Character-LevelNeuralNetwork..................................................745.2.3Concatenation........................................................................755.3Experimentssettings.....................................................................765.3.1Datasets...............................................................................765.3.2Hyper-Parameters...................................................................775.4ResultsandAnalysis.....................................................................785.4.1Supertaggingresults................................................................785.4.2Multi-taggingResults...............................................................805.5Summary..................................................................................80-VIII- ContentsConclusions.......................................................................................82References.........................................................................................84PaperspublishedintheperiodofPH.D.education......................................95StatementofcopyrightandLetterofauthorization......................................96Acknowledgements..............................................................................97Resume............................................................................................98-IX- 哈尔滨工业大学工学博士学位论文插图索引图1-1ExampleofPOStaggedsentence....................................................4图1-2ExampleofCCGSupertaggedsentence............................................5图1-3Examplefromsection00oftheCCGBankcorpus...............................19图1-4Dissertationoutlines..................................................................23图2-1AnexampleofanArtiﬁcialNeuralNetwork......................................26图2-2AnexampleofaDeepNeuralNetwork............................................26图2-3GeneralstructureofsimpleRNNs..................................................28图2-4GeneralstructureofasimpleRNNunfoldedforthreetimesteps..............29图2-5GeneralstructureofBRNNunfoldedforthreetimesteps.......................30图2-6Illustrationofthevanishinggradientproblem....................................31图2-7GatedRecurrentUnitsarchitecture[64].............................................32图2-8BGRUproposedmodelfortheCCGsupertagging...............................35图3-1FromRNNtoLSTM[87]..............................................................44图3-2LongShort-TermMemorynetworkarchitecture..................................45图3-3Backward-BLSTMmodelfortheCCGsupertagging............................49图3-41-bestaccuracyofourBackward-BLSTMproposedmodelonthedevelopmentsetwithandwithoutdropout........................................52图4-1DeepBLSTMarchitecturewith2-BLSTMLayers...............................60图4-2CRFGraph.............................................................................62图4-3Theneuralnetmechanism...........................................................64图4-4BLSTM-CRFnetworkmodelfortheCCGsupertagging........................65-X- Contents图5-1Wordlevelneuralnetwork...........................................................73图5-2Characterlevelneuralnetwork......................................................75图5-3Word-CharacterbasedembeddingsmodelfortheCCGsupertagging.........76-XI- 哈尔滨工业大学工学博士学位论文表格索引表2-1Theﬁnalchosenhyper-parameters.................................................38表2-2Performancecomparisonwithstate-of-the-artmethodsonthedevel-opmentset..............................................................................38表2-3Performancecomparisonwithstate-of-the-artmethodsonthetestset........39表2-4Performancecomparisonofdiﬀerentmodelsformulti-taggingaccu-racyonSection00fordiﬀerentlevels...........................................41表3-1Comparisonoftheaccuracyresultsonthedevelopmentsetusingdiﬀerentwordembeddings...........................................................50表3-21-bestAccuraciesresultswithandwithoutdropoutondevelopmentandtestdata............................................................................51表3-3Theﬁnalchosenhyper-parameters.................................................52表3-41-bestaccuracyonthedevelopmentset(Section00)............................53表3-51-bestaccuracyonthetestset.......................................................54表3-61-bestaccuracycomparison.........................................................54表3-7Performancecomparisonofdiﬀerentmodelsformulti-taggingaccu-racyonSection00fordiﬀerentlevels...........................................56表4-1Theﬁnalhyper-parameterssettingsforourmodel...............................67表4-2Performancecomparisonwithstate-of-the-artmethodsonthedevel-opmentset..............................................................................67表4-3Performancecomparisonwithstate-of-the-artmethodsonthetestset........68表4-4Performancecomparisonofdiﬀerentmodelsformulti-taggingaccu-racyonSection00fordiﬀerentlevels...........................................69-XII- Contents表5-1Accuracyresultsonthedevelopmentset..........................................78表5-2Accuracyresultsonthetestset......................................................79表5-3Performancecomparisonofdiﬀerentmodelsformulti-taggingaccu-racyonSection00fordiﬀerentlevels...........................................80-XIII- 第1章Introduction第1章Introduction1.1MotivationNowadays,computersplayanintegralroleinthedailyhumanlivesasthemostbrilliantgiftofscience.ComputationalLinguistics(CL)isaspecializeddisciplinaryintheapplicationofcomputerstotheprocessingofnaturalhumanlanguages.ThemaingoalofCListoenablecomputerstounderstandhumanlanguagesandmanipulatetheminvolvingknowledgefromlinguistics,computerscience,logic,cognitivescienceaswellasothersciences.CLcanbedividedintomanysubﬁeldsthatcanbebranchedintoseveralresearchareassuchasMachineTranslation(MT),zeropronounresolution,QuestionAnswering(QA),NaturalLanguageUnderstanding(NLU),speechrecognitionandparsing.ThesetasksareconsideredasNP-completeNLPproblems.Tobuildthosehigh-leveltasks,manypreliminarytasksshouldbetakenintoaccountsuchastokenization,InformationExtraction(IE),anaphorareferenceresolution,andsequencelabelingtasks,amongothers.SequencelabelingorstructuredpredictiontasksarerequiredtosolvemanyareasproblemssuchasNLP,andbioinformaticslikeproteinsecondarystructureprediction.Structuredlearningcorrespondstothetaskofassigningalabeltoeachelementofaninputsequence.InNLP,sequencepredictioncorrespondstoavastrangeofproblems.ThemostearliestandfamoussequencelabelingproblemisprobablyPart-Of-Speech(POS)taggingwhereeachwordinasentenceislabeledwithPOSclassessuchasNoun(N),Verb(VB),Adjective(JJ),Pronoun(PRP),Adverb(RB),etc[1].AnotherexampleisIEthataddressestheproblemofidentifyinginstancesofclassesincludingNamedEntityRecognition(NER)whichconsistsofdeﬁningentityinformationlike:person,location,time,organization,etc[2].Thereisalsotheco-referenceresolutionthataimsatidentifyingmultiplereferencesofthesameentityinatextwhichcanbeaname,pronominal,etc[3]andanotherexampleofsequencelabelingproblemsissupertaggingthatreferstoassignasingleappropriatesupertagtoeachwordinaninputsentence.ManyoftheearlypioneersofCLresearcherswereinterestedintheareaofsequencelabelingsinceitseemssousefulformanytasks.Inthelastfewdecades,supertagging-1- 哈尔滨工业大学工学博士学位论文attracttheattentionofseveralresearchersandbecomemoreandmoreimportantformanyNLPtasksasaprimarystepbeforemanyapplicationssuchasparsing[4],languagemodeling[5]andtextsimpliﬁcation[6].SupertaggingappearslikePOStaggingwhereeachwordinasentenceistaggedwithasupertagcategory.ItwasatthebeginningproposedforLexicalizedTreeAdjoiningGrammar(LTAG)[7]andthenappliedtoothergrammarsformalismsuchasProbabilisticContext-FreeGrammar(PCFG)[8]andCCGgrammar[9].CCGisthegrammarformalismusedbyhuman[10]thatprovideanaturallinkagebetweensyntacticstructureandsemanticrepresentation.Furthermore,comparingtoothergrammars,itoﬀershighﬂexibilitybecauseitallowsderivingthestructureforanypartofasentencewithouttheneedtoderivethestructureofthewholesentence.TheapplicationofthesupertaggingtotheCCGgrammarisoftenreferredas"CCGSupertagging"andconsistsofassigningaCCGsyntacticcategorytoeachwordinaninputsequence.However,themainchallengingproblemoftheCCGsupertaggingtaskcomparedtoPOStaggingasbothareconsideredassequencelabelingproblemsisrelatedtothehugesizeofthecategorysetoftheCCGsupertagscomparedtoPOStagsassupertagscontainmuchrichinformation.Moreover,manywordstakemultipleCCGsupertagswherethesizeofthepredictedsupertagsmaybeverylarge.Inliterature,dominantapproachesbasedonmachinelearningmethodshavebeenproposedfortheCCGSupertaggingtasksuchasHiddenMarkovModels(HMM)andConditionalRandomFields(CRF).However,theuseofmachinelearninginNLPhasbeenmostlylimitedtonumericaloptimizationofweightsforhumanlydesignedrepre-sentationsandfeaturesfromthetextdata.TheneedtoautomaticallylearningfeaturesorrepresentationsfromarawtextiscrucialforawiderangeofNLPtasks.Duringthepastseveralyears,therehasbeenalonghistoryofusingNeuralNetworks(NNs)wheretheymakemajoradvancesinsolvingproblemsthathaveresistedthebestattemptsofstatisticalmachinelearningmethodsformanyyearsandimpactawiderangeofinformationprocessing.NNbasedmethodshavebeenshowntoperformwellfortheCCGsupertaggingtask.ThemostattractiveaspectofNNmethodsistheirabilitytoperformthesetaskswithoutexternalhand-designedresourcesortime-intensivefeatureengineering.Tothisend,ArtiﬁcialNeuralNetworks(ANNs)developandmakeuseofan-2- 第1章Introductionimportantconceptcalled"embeddings"[11]thathavebeenprovedtobemoreeﬀectiveandhavebeenwidelyadoptedintheNLPcommunitywhichconsistsofturninginputs(i.e.words)intoarepresentation(i.e.avectorofﬂoats)thatNNscanmanipulate.Inpresentyears,deepneuralnetworks,ormorecommonlycalleddeeplearninghasemergedasthenewareaofmachinelearningresearchthatallowsthepropositionofstrongmodelstoovercometheshortcomingsofbothstatisticalmachinelearningandshallowNNswithagreeablemodelssuchasRNNsandLongShort-TermMemory(LSTM)networks.Today,deeplearninghasbecomethenewapproachfordevelopinghigh-performancemodelsandhaveshowntosigniﬁcantlyimprovetheeﬃciencyofnumeroussystems.Inthisthesis,ourmainobjectiveistousedeeplearningtechniquestosolveasequencelabelingproblem.WefocusonthetaskofsupertaggingfortheCCGgrammar.ThemostimportantproblemsinCCGsupertaggingprocessisrelatedtolearninglongsequences,thedependencebetweeninputsandoutputsandthelargenumberoftheCCGlexicalcategories.RecurrentnetworkssuchasGatedRecurrentUnits(GRUs)andLSTMswerechosenforthisworkbecause,amongthefamilyofdeeplearningtechniques,LSTMsandGRUsareratedasthebesttomodelsequentialdatafortheircapabilitytostoreinformationforlongtimewhichisveryusefulforourtask.1.2TheCCGSupertaggingTaskInNLPresearch,learningtasksareprimaryandcomplicatetoperformwhereweareusuallyrequiredtosolveasetofnecessaryproblemstogetherinordertosolveotherproblemswithrespecttosomeelementarystructure.Thisisusuallycalledstructuredlearning.StructuredlearningorsequencelabelingtasksarethemostwellstudiedproblemsintheNLPliteratureasthegenerictasksofassigninglabelstotheelementsofasequence.Sequencelabelingcorrespondstoawiderangeofreal-worldproblems.ThemostpopularsequencelabelingproblemisPOStaggingwhereeachwordislabeledwithaPOStag.However,itisknownthatnaturallanguagegrammarisambiguous.Inotherwords,givenanaturallanguagegrammar;onesentencemighthaveseveralvalidstructureswhereeachwordmaytakemultipletags.Figure1-1showsanexampleofasentencewiththecorrespondingPOStagswhereeachwordisassociatedwithmultiplePOStags(tagsindoubleboxesarethecorrecttags)[12].-3- 哈尔滨工业大学工学博士学位论文图1-1ExampleofPOStaggedsentence.Theterm’Supertagging’,nowwidelyusedinNLP,wascoinedbyJoshiandBan-galore[13]andthebeginningperiodofCCGsupertaggingresearchwasinthe2000-02s,Indeﬁningthetask,similartoPOStagging;theCCGsupertaggingcanbeviewedastheprocessof"assigning"wordsinatextintoaparticular’CCGlexicalcategory’.TheCCGsupertaggingisasupervisedsequencelabelingtaskwhereauserprovidesatrainingsetwithsentencesandtheircorrespondinglabelsandwantstolearnandtrainamodelabletolabelnewunseensequences.Trainingexamplesconsistofpairs(x,y)wherex2Xisaninputsequenceofelements(x1;x2;:::;xt)andy2Yisthecorrespondingsequenceoflabels(y1;y2;:::;yt),whereeachlabelytisthelabelthatcorrespondstotheelementxt,andtheytlabelsbelongstothelabeldictionarydenotedbyL.TheCCGsupertaggingcanbeformulatedasfollows,givenasequenceofinputwords(x1;x2;:::;xn),weaimtocomeupwiththecorrespondingCCGoutputs(y1;y2;:::;yn)fromthesetoflabels{L}whichguarantee:S=argmaxP¹y1;y2;:::;ynjx1;x2;:::;xnº(1-1)ComparedtostandardPOStagging;CCGsupertaggingismuchcomplicateasthePOStagsetisoftensmallerthantheCCGlexicalcategorysetusedfortheCCGsupertag-ging,becauseCCGsupertagsincludelongdistancedependenciesandcontainmorerichinformationthanPOStags.Inotherwords,therearemanyCCGsupertagsperwordthanPOStags.Figure1-2givesanexampleofCCGSupertaggedsentence.AsweareusingabiggersupertagssetcomparingtothesizeofPOStagset,thenumberoftheprobablyassociatedCCGsupertagstoeachwordincreases.-4- 第1章Introduction图1-2ExampleofCCGSupertaggedsentence.1.3ApplicationsofCCGSupertaggingMostofNLPapplicationsareconstitutedbyasetofdiﬀerentcomponents;eachmod-uleiscrucialtoaspeciﬁcanalysisofnaturallanguagetext.ThetaskofCCGsupertaggingisoneofthefundamentalNLPtasks,andisveryimportantbecauseitinﬂuencesvariousapplications.Inthefollowing,webrieﬂydiscusssomeoftheapplicationsthatbeneﬁtfromCCGsupertagging.•Parsing:parsingisthetaskofretrievingavalidstructuretoastringorlistoftokensgivenanaturallanguagegrammar.inNLP,parsingiscentraltomanyNLPtaskssuchasQA,MTandInformationRetrieval(IR).CCGsupertaggingisapreliminarystepthatshouldbetakenintoaccountbeforefullparsingastheinformationencodedinCCGsupertagsmakestheparsingmoresimpler.CCGsupertaggingservesastheinputtomanyparserssuchastheC&Cparser[14]anditprovidesexcellentperformanceandreducetheparsingcomplexity.•MachineTranslation:syntax-basedmethodsrelyingonpowerfulgrammarsformalismspromisetomodeltranslationinamorenaturalway[15].CCGsupertaggingisalsoacrucialpartforMTsystems,bymappingwordswiththeircorrespondingCCGsupertags;ithelpstomodelexplicittarget-syntaxinNeuralMachineTranslation(NMT)systems[16]andbeneﬁtfromthestructurallyrichCCGsyntacticcategoriesduetotheCCGgrammarabilitytorepresentnon-constituentsinasyntacticwaywhichfrequentlyoccurinbothsourceandtargetlanguagesforMTsystems[15].•QuestionAnswering:onemostinterestingNLPtasktackledbyCLcommunityisthatofknowledgeQuestionAnswering(QA)systems.InQAsystems,CCGsupertaggingisproventobeusefulintheparsingofquestionsthatincreasetheparsingaccuracyonquestionsproducingsuitableparsersforquestionsofQAsystems[17]toextractsomepieces-5- 哈尔滨工业大学工学博士学位论文ofinformationfromthequestionwhichhelptoeasilyretrievetherightanswers.ThemainadvantageofusingCCGsupertaggingforQAsystemsisthatwecandirectlygetsemanticrepresentationsofquestions.1.4CategorialGrammarCategorialGrammar(CG)[18]coversafamilynumberoftheoldestlexicalizedgram-marsproposedforthesyntaxandsemanticsofnaturallanguagesaswellaslogicalandmathematicallanguages[19].InCGgrammar,themainandentireresponsibilityofdeﬁn-ingthesyntacticformiscarriedbythelexicon,alongwithothergrammarssuchasHeadDrivenPhraseStructureGrammar(HPSG),TreeAdjoiningGrammar(TAG),LexicalFunctionalGrammar(LFG),etc.CGgrammarconsistsoftwoparts:alexicon,whichassignsacategorytoeachbasicsymbol,andsomeinferencerules,regroupinganumberofsyntacticandsemantictheorieswhereallexpressionsareseparatedbyasyntactictypeidentifyingthemasargumentsoffunctionsbuiltfromatomicandelementaryarguments.OneoftheearliestextensionsofCGwas"CombinatoryCG"byextendingthecoreofCGwithfunctionaloperationsonadjacentcategories,suchasfunctionalcomposition[20].1.5CombinatoryCategorialGrammarTherearevariousgrammarframeworksproposedfornaturallanguages,CCGcon-stitutesanimportantclassofCGlexicalizedgrammarformalismsthathavebeenarguedtobetheformalismusedbyhumansbecauseitprovidesanaturallinkagebetweensyn-tacticstructureandsemanticrepresentation[10].Moreover,CCGoﬀershigherﬂexibilitycomparedtoothergrammars;itcanderivethestructureforanypartofasentencewithoutderivingthestructureforthewholesentence[21].TheCCGgrammarassociatesrichsyn-tactictypeswithwords.Inthelastfewdecades,CCGhasbeenusedinseveralaspectsofnaturallanguageunderstanding,e.g.,parsing[22][23][24],semantics[25][26],andavastrangeofNLPapplicationssuchasMT[27][16].CCGgrammarisbasedontheCGgrammarformalismandisdevelopedbySteed-man[28].TheprimitiveelementsofCCGarecategories.ThesyntactictypeofCCGgrammarassumestwocategorytypes:atomicorcomplex:1.Atomiccategories:thebasicvocabularyofsimplecategoriescanbeSentence-6- 第1章Introduction(S),Noun(N),NounPhrase(NP)andPrepositionalPhrase(PP).2.Complexcategories:complextypesareoftheformA/BandAnBrepresentingfunctionsthatcombineanargumentoftypeBtoyieldtoAasaresult.TheyarebuiltbythecombinationofatomiccategoriesorcomplexcategoriesthemselvesbyslashesindicatingwhethertheBargumentprecedes(n)orfollows(/)thefunctor.Inotherwords,A/Bmeansthattheargumentshouldappeartotheright,whileAnBdesignatesthattheargumentshouldappearontheleft.InCCGgrammar,alexicalcategoryisassignedtoeachsymbolofsequence(i.e.,words).FollowingareexamplesofEnglishentriesassociatedwiththeirpossibleCCGlexicalcategories:{he,girl,lunch,...})N{good,the,eating,...})N/N{sleeps,ate,eating,...})SnN{sees,ate,...})(SnN)/N{quickly,today...})SnS{good,the,...})(SnN)/(SnN)Unlikecontext-freegrammarswhichencodetheinformationaboutstructurewithruleslike:S)NPVP;VP)VNP.InCCGgrammar,thestructureisencodedincategoriesthatdonotneedsuchrules.However,thelexicalcategoriesassociatedwithwordsinasequencedeterminehowthesewordscanbecombinedwithothercategoriestoappearinanacceptableorder.Therebytheconceptofcombinatorswasintroducedwhereelementarycategoriesarecombinedbycombinators.TheCCGgrammardeﬁnesanumberofcombinatorsthatallowcombiningoneortwocategoriesintoanewcategory.Inthefollowing,threediﬀerenttypesofcombinatorswillbeintroduced.ThemostcommonCCGcombinatorsthatcombineelementarycategoriesnamelyaretheApplication,Composition,andlastlyType-raisingcombinators.-7- 哈尔滨工业大学工学博士学位论文1.5.1ApplicationcombinatorsCCGoperatesbyﬁrstassigningasyntacticCCGcategorytoeachsymbol(i.e.,word)inagivensequenceandthebackwardandforwardslashesallowtocombinelexicalcategories,asdescribedintheprecedingsection.Giventhecategoryassignments,aderivationtocombinewords’categoriesproceedsbycombiningthecategoriesusingcombinators.InCCGcombinators,thesimplestisapplicationcombinators.Theapplicationofcombinatorsareforwardandbackwardapplicationswhichareoftendenotedby>and<,respectively.Fortheforwardapplication,thesyntacticcategoryofthetypeas’A/B’indicatesthattheargumentBshouldappearontheright.Inotherwords,thesyntacticcategoryofatypeasA/BtakesBasanargumentontherightandthecombination"A/BB"resultsinthecategoryA.Mathematically,asfollows:AB:fB:a!A:f¹aº(1-2)Asanexample,theassociatedCCGlexicalcategorytotheword’powerful’corre-spondstoafunctionthatmapsfromthedomainofnounsNintotherangeofnounsNresultingin"N/N."Theassociationofthisitemwithafunctionisrepresentedbywriting:Powerful!N/Nandtheword’girl’canbeassociatedtotheatomiccategoryN:girl!NTheargumentNofthefunctionN/Noftheword’powerful’appearstotherightoftheforwardslashcharacter,andthevalueoftheresultNisontheleft.ThefactthattheslashinthefunctionalN/Ntypeslantsrightwardindicatesthatanounmustappeartotherightofthecategorywithwhichitwillbecombined.ApplicationofthefunctionN/Nassociatedwiththeword’powerful’andthefunctionNoftheword’girl’,resultsinthesubstring’powerfulgirl’beingcombinedtohaveanatomiccategorywithasyntactictypeasN.Theforwardapplicationforthisexamplemayberepresentedasfollow:-8- 第1章IntroductionIncontrast,Forthebackwardapplication,thesyntacticcategoryofatypeas’AnB’indicatesthattheargumentBshouldappearontheleft.Andthesyntacticcategoryofatypeas’AnB’takesBasanargumentontheleftandthecombination’BAnB’resultsinthecategoryA.Mathematically,thebackwardapplicationisdeﬁnedasfollows:B:aAnB:f!A:f¹aº(1-3)Forexample,theword’day’canbeassociatedwiththesyntacticcategorytypeSnNP,thebackslashinSnNPindicatesthatanNPargumentmustbetotheleft.Iftheitem’nice’isassociatedwiththeatomictypeNP,thenthestring’niceday’canbecombinedasasentence,withatomicsyntactictypeS.Inthisinstance,thebackwardapplicationoperationcanberepresentedasfollows:1.5.2CompositionCombinatorsCompositioncombinatorsarecombinatoryoperationsthatareneededforthear-rangementofinputsentences.Theinputforthecompositioncombinatorsaretwocomplexcategories,andtheoutputisalsoacomplexsyntactictypecategory.Similartotheap-plicationcombinatorsbothforwardandbackwardcompositioncombinatorsaredeﬁned,schematicallyas(>B)and(Bº(1-4)BnCAnB!AnC¹B,thedomainoftheﬁrstlexicalcategoryshouldcorrespondtotherangeofthesecondcategoryresultinginanewfunctionwiththerangeoftheﬁrstlexicalcategoryandthedomainfromthesecond.Forexample,theitem"the"associatedwiththe"NP/N"category,indicatesthatanNargumentmustappearontherangeofasecondcategoryastheword"beautiful"withthesyntactictype"N/N".Thenthestring’thebeautiful’canbecombinedwiththeforwardcompositioncombinatorasfollowing:Forthebackwardcomposition,notedasforforwardtype-raisingandTTº(1-6)BackwardType-raising:A!Tn¹TAº¹>Tº(1-7)WhereTisavariabletype,ingeneral,thevariableTrepresenttheS(Sentence)categorytype.-10- 第1章IntroductionForexamplethecategoryofsyntactictypeNPassignedtotheword’grammar’becomesafunctionalcategorywithforwardtyperaisingcombinatorasfollows:andtheapplicationofthebackwardtyperaisingresultsinthefollowingfunction:Tosumup,thefollowingexampleillustratetheuseofthethreecombinatorstocombinetheCCGlexicalcategoriesassociatedtoeachwordinthesentence"AnnalovesDavid"whereaforwardtyperaisingfunctionisappliedtothe"NP"syntactictypemappedwiththeword"Anna"resultinginthecomplexcategoryoftype"S/(SnNP)"sothatitcanbecombinedwiththecategory"(SnNP)/NP"withaforwardcompositiongivingacategoryoftype"S/NP",andﬁnally,theresultingcategory"S/NP"canbecombinedwithcategoryNPassignedtotheitem"David"withaforwardapplicationresultinginaScategoryastheﬁnalresult,asfollows:1.6LiteratureReviewTheareaofsupertagginghasbeenenrichedoverthelastfewdecadesbythecontri-butionfromseveralresearchers.Sinceitsinceptionattheendofthenineties[7],manynew-11- 哈尔滨工业大学工学博士学位论文conceptshavebeenintroducedtoimprovetheeﬃciencyofsupertaggingandtoconstructsupertaggersforseveralgrammars[29]andlanguages[30][31].Morerecentlyseveralmodelshavebeenusedforthesupertaggingtaskforprovidingadaptivetaggers.Severalsophisticatedmachinelearningalgorithmshavebeendevelopedthatacquiremorerobustinformation.Ingeneral,allmachinelearningmodelsrelyonhand-craftedfeaturestoprovidegoodresults.Hence,someoftherecentworksfocusondeeplearningmodelstocopewiththeproblemoffeaturesextraction.Finally,combinationsofseveralmachinelearninganddeeplearningmodelshavebeenusedinthecurrentresearchdirection.Thissectionprovidesabriefreviewofthepriorworkonsupertagging.Tobeconscious,wedonotaimtogiveacomprehensivereviewoftherelatedwork.Instead,weprovideabriefreviewofthediﬀerenttechniquesusedinsupertagging.Further,wefocusonthedetailedreviewoftheCCGsupertaggingexistingmethods.Firstly,wewillprovideabriefdiscussionontheworkperformedaroundsupertaggingingeneral.Then,wediscusstheapplicationofmachinelearningalgorithmstoaddresstheCCGsupertaggingproblem.Lastly,wediscussthemostrecenteﬀortsthathavebeendoneinthisarea.1.6.1SupertaggingSupertaggingwasﬁrstproposedbyJoshiandBangalore[13]forLexicalizedTree-AdjoiningGrammar(LTAG)asequivalenttoPOSphrasalgrammarswiththediﬀerenceisthatthesetsofPOStagsaresmallerthanthesetsofsupertagsusedinlexicalizedgrammars.ComparingtoPOStags,supertagscontainmuchmoredetailedsyntacticinformation.Tofurnishthissupplementaryinformationthesetsofsupertagsmustbemuchlarger.Usually,asupertagsetcontainshundredsoftags.Forinstance,thesetofLTAGsupertagshad3964tags[32]wheremostPOStagsetscontainlessthanﬁftypossibletags[1].Whensupertagging,evenifthesetoftagsavailableforeachwordisminimizedtothoseobservedintrainingdata,thesetofsupertagsthatcouldbeassignedtoeachwordisstilllarge.StatisticalmachinelearningmethodswereusedtostandardPOStaggingdisambiguation,inthesameway,earliestworksonsupertaggingusethelocalstatisticalinformationintheformofn-grammodelsforthedistributionofsupertags;the-12- 第1章Introductionﬁrstsimplestmodelforsupertagsdisambiguationusestheunigrammodelandselectsasingletagforeachwordbasedonitslocalcontext[13].Themainobjectiveofthismodelwastodetermineforeachword,thesupertagwithwhichitismostoftenassociated.Unfortunately,themainproblemwiththeunigrammodelisthatitdoesnotaccountforcontext.Thisisasourceofmanyerrorsthatthismodelmakes.Laterappearstrigrammodelscalledthetrigramapproximationforthereasonthattheresultingprobabilityusesthetwoprecedingtagsti1andti2ascontextwhenpredictingtheprobabilityofthecurrenttagti.Bydoingso,thecurrenttagtiisconditionedontwoprevioustagsofcontext.Afterthat,Two-PassHeadTrigram[33]modelwereproposedbymakingadiﬀerentcontextualapproximationthanthatmadebythetrigrammodel,thetwo-passheadtrigrammodel[33]attemptstoovercomesomeofthemistakesthatthetrigrammodelmakes.Diﬀerentlytothetrigrammodelwhichalwaysconditionstheprobabilityofthecurrentsupertagonthesupertagsoftheimmediatelytwoprecedingwords,thetwo-passheadtrigrammodelconditionsonthesupertagsoftheimmediatelytwoprecedingheadwords.Allthoseworksshowthatlocalsupertagfeaturesareeﬀectiveinsupertagdisam-biguation.Sincethesupertagsencodedependencyinformation,theinformationaboutthedistributionofdistancesbetweenagivensupertaganditsdependentsupertagscanalsobeused.Chen[34]showsthatlongdistance"headsupertag"featuresarealsoeﬀective.Chen[34]redeﬁnesthenotionofheadednessintermsofsupertagsthemselveswhichen-ablesthedevelopmentoftheone-passheadmodel.Healsoshowsthatnotonlystructural(supertag)featuresbutalsolexicalfeaturescanbeimportant.Chen[34]showsthatitisnotonlyimportanttoidentifyimportantfeatures,butitisalsoimportanttodesignanappro-priateframeworkinordertousethosefeatureseﬀectively.SimilartoRatnaparkhi[35]forPOStagging,Chen[34]developedMEMMmodelforsupertagging.Moreover,Chen[34]implementedseveralsupertaggersbasedondistinctfeaturesets.1.6.2CCGsupertaggingInliterature,themostpopularapproachestosolvesequencelabelingproblemsusestatisticalmachinelearningtechniques.Theseapproachesprimarilyconsistofbuildingstatisticalmodelsthatassignawordsequencewiththemostprobabletagsequencebygivingthesequenceofwordsinamaximumlikelihoodapproach.Further,feature-basedclassiﬁcationalgorithms(e.g.,MaximumEntropy(ME)models,CRF,SupportVector-13- 哈尔滨工业大学工学博士学位论文Machines(SVM),etc.)havebeenwidelyusedandachievedgoodresults.ThefollowingdescribessomeoftheproposedmethodsfortheCCGsupertaggingproblem:SinceBangaloreandJoshi[7]havedemonstratedthatmethodsusedforPOStaggingcanbesuccessfullyappliedtothesupertaggingproblem,Clark[9]wasinspiredfromPOStaggersandLTAGsupertaggerstobetheﬁrstwhoperformsupertaggingtoCCGgrammar.However,ratherthanusingHMMmodelsthatweresuccessfullyusedforPOStagging[36];Clark[9]developedasimilarmodelusedforPOStaggingbyRatnaparkhi[35]basedonMEmodels.InsimpleHMMmodelsincorporatingadiversesetoffeaturesisdiﬃcult.Incontrast,anMEmodelcaneasilyintegratediversefeatures.Clark’s[9]supertaggerusesconditionalMEmodelstoestimateprobabilitiesofeachwordbeingassignedeachpossibletag,givenalocalcontext.Theseprobabilitiesarethenusedtoselectallpossibletagsthatcouldbeassignedtoaword.InClark’s[9]model,theprobabilityofacategoryC,givenacontextwindowhiscalculatedasfollows:1∑P¹Cjhº=eiifi¹c;hº;(1-8)Z¹hºwhereZ(h)isanormalizationconstant,fi¹c;hºdeﬁnesthefeaturesofthecategoryCandthecontexth,andtheiweightcorrespondstothefeaturethatcontributetotheprobabilityP(C|h)whenthecontexthcontains"the"asthecurrentword.SimilartoRatnaparkhi[37],Clark[9]usesacontextualpredicate"current_word_is".Forexamplethefunctionf¹c;hºicanbeasfollows:8>><1ifcurrent_word_is”the”¹hº&C=NPNf¹c;hº=(1-9)>>0Otherwise:Inotherwords,thefunctiontakesavalueof1ifthecurrentwordis"the"andthecategoryisNP/N.Thepredicatecontextualidentiﬁesthecontextthatmightbehelpfulinpredictingalexicalcategory.Clark[9]experimentswithanumberofcontextualpredicatesthatincludethecurrentword,propertiesoftheword(suchassuﬃxesandpreﬁxes),POStagofthecurrentword,wordsonthesideswiththeircorrespondingPOStagsandhealsoexperimentswithdiﬀerentwindowsize.Finally,themostprobablesequenceofcategoriestoagivensentenceisdeﬁnedas-14- 第1章Introductiontheproductoftheindividualprobabilitiesforeachcategory,asfollows:P¹cjhº=iP¹cijhiº(1-10)Duringtraining,thesupertaggerconsultsatag-dictionary,whichcontains,foreachword,thesetofcategoriesthewordwasseenwithinthedata.IfawordappearsatleastKtimes,thesupertaggeronlyconsidersthecategoriesintheword’scategoryset.IfawordappearslessthanKtimes,allcategoriesareconsidered.AfterabeamsearchalgorithmisusedtoretainonlytheN=10sequences.Clark[9]showshowthemodelcanbeusedtodeﬁneamulti-taggerwhichcanassignmorethanonecategorytoeachword.ClarkandCurran[14]followClark[9]andassumealog-linearMEmodelwhereanaturalcombinationofseveralfeatureshasbeenincorporated.ClarkandCurran’s[14]modelusewordsandPOStagsplusthetwopreviouslyassignedlexicalcategoriestotheleftasfeaturesintheﬁve-wordwindowtodeﬁneadistributionoverthelexicalcategorysetforeachlocalcontextcontainingthetargetword.Theyalsousedatagdictionarywhereeachentryisalistofallthecategoriesseenwiththewordinthetrainingdata.Thesupertaggerassignscategorieswhichhavebeenseenwiththewordinthedataforwordsseenatleastk=20timesandassigncategorieswhichhavebeenseenwiththePOStaginthedatatothewordsseenlessthanktimes.ThesetofthelexicalcategoriesusedbyClarkandCurran[14]isthesetofcategoriesthatappearatleastten(10)timesinSections02–21oftheCCGBankcorpus[38][39]resultingin425categoriesbecauseithasveryhighcoverageonunseendata[40].ThismodelreliesheavilyonPOStagstocome-upwithunknownandunseenwordsandisverysensitivetothequalityofthosetags;thisiswhythatitsperformancedecreasesaggressivelyoutsideofitstargetdomainwiththepresenceofunseenandrarewords.Following[41][11]forPOStagging,LewisandSteedman[23]weretheﬁrsttoexplorefeed-forwardNeuralNets(NN)withunsupervisedwordembeddingsasfeaturesinsu-pervisedmodelsfortheCCGsupertaggingtask.Theuseofunsupervisedvector-spaceembeddingsofwordsallowsthemodeltobetterassignlexicalcategorieswithoutde-pendingonPOS-tagsasfeatures.Thenetworkusesfeaturesof3-wordcontextwindowsurroundingaword.ThekeyfeatureiswordembeddingsratherthanPOStags,initial--15- 哈尔滨工业大学工学博士学位论文izedwiththe50-dimensionalembeddingstrainedin[41]andﬁne-tunedduringsupervisedtraining,wordswhichdonothaveanentryinthewordembeddingsarereplacedby"un-known"embedding.Themodelalsouses2-charactersuﬃxesandcapitalizationfeatureswithsomesimplepreprocessingtechniques(i.e.,wordsarelower-cased,andalldigitsarereplacedwith0.Ifanunknownwordishyphenated,backing-oﬀtothesubstringafterthehyphen).LewisandSteedman[23]predictCCGlexicalcategorieswithnsimilarneuralnetworktothatusedbyCollobertetal.,[11]forPOStaggingusinglookuptables.Wordembeddingsandnon-embeddingfeatureswereimplementedwithlookuptableswhichmapeachfeatureontoadimensionalspacevector.Theneuralnetconsistsofthreelayers:thelookuplayerwhichmapswordsanddiscretefeaturesintovectorembeddingswithaﬁxeddimension,thehiddenlayerwithahard-tanhactivationfunctionthatmakestheclassiﬁernon-linear,andtheSoftmaxtransferfunction,whichtakesthoseinputs,andoutputsaprobabilitydistributionoverlexicalcategoriesforthewordinthecenterofthecontextwindow.LewisandSteedman[23]followTurianetal.,[41]inusingalinearchainCRFastheprobabilityofeachsupertagisconditionedonthesurroundingsupertags[42].Sothattheprobabilitytopredictacategorydependsonwordembeddings,capitalization,andsuﬃxesasfeatures-aswellasthepreviouslypredictedcategory.WhentraditionalNNsareused,allinputsandoutputsareindependentofeachother,andonlyaﬁxedlengthofpredecessorwordsisusedtopredicttheprobabilityofthecurrentwordtobeassignedtoaspeciﬁcsupertag.Althoughitisnecessarytotakeintoaccountallthepreviouscalculations.Forinstance,fortheCCGsupertaggingtask,topredictaCCGlexicalcategorytoagivenwordinasentence,itisobvioustoknowthepreviousinformationaseachoutputisdependentonthepreviouscomputations.Forthisneed,thecurrentdirectionofresearchincludestheuseofmoresophisticatedmodelstoprocesssequentialinformationmainlybasedondeeplearningmethods.RNNswereproposedinthe80’s[43][44]formodelingtimeseriesandsequentialdata.ThestructureofRNNsissimilartothatofastandardmultilayerperceptron,withthedistinctionthatitallowsconnectionsamonghiddenunitsassociatedwithatimedelay.Throughtheseconnections,themodelcansaveandkeepinformationfromthepastandperformthesameprocessforeveryelementofasequencewiththeoutputbeingdependentontheprevious-16- 第1章Introductioncomputationsenablingittodiscovertemporalcorrelationsbetweeninputsthatarefarawayfromeachotherinthedata.Recently,alotofworkhastakenplaceontheconstructionofpowerfulCCGsu-pertaggers,Xuetal.,[45]exploitedRNNfortheCCGsupertaggingtask.Intheory,intheCCGsupertaggingtask,whenanRNNisused,thefullsequenceofpredecessorcompu-tationsisconsideredforpredictingthecurrentcategory.Xuetal.,[45]modelwasbasedonthreemainfeaturessimilarlytoLewisandSteedman[23]whichare,capitalization,suﬃxesandtheuseofwordembeddings[41]whichenablethemodeltodependonanylexicalorhand-craftedfeaturesandtheyperformsomedatapreprocessingsuchas;allwordsarelower-cased,alldigitsarereplacedbyasingledigit,etc.TheirworkrevealedtheeﬀectivenessofrecurrentnetworkstotheCCGsupertaggingtask.ForCCGsupertaggingaswellasmanysequencelabelingtasks,itisbeneﬁcialtohaveaccesstofutureaswellaspastcontext.BidirectionalRecurrentNeuralNetworks(BRNNs)[46][47][48]oﬀeramoreelegantsolution.ThebasicideaofBRNNsistopresenteachtrainingsequenceforwardsandbackwardsintwoseparaterecurrenthiddenlayers,bothofwhichareconnectedtothesameoutputlayer.Thisprovidesthenetworkwithcompletepastandfuturecontextforeverypointintheinputsequence.Again,Xu[49]inanotherresearchworkprovesthatBRNNsconsistentlyoutperformunidirectionalRNNsontheCCGpredictionproblem.Whileinprinciplerecurrentnetworksaresimpleandpowerfulandcanlearnfromlongsequencestoretaininformationabouttheirhiddenstateforalongtime.Inpractice,itisverydiﬃculttotrainproperlyandtogetthemtoeﬃcientlyusethisabilitytomemorizeinformationforlongdistances.AmongthemainreasonswhytheRNNsmodelsaresounwieldy,arethevanishinggradientandexplodinggradientproblemsdescribedinBengioetal.,[50].Toavoidthevanishing/explodinggradientdescentproblemsassociatedwithRNNs,manyauthorsmadenumerousattemptstoaddressthisissuesuchassskipconnec-tions[51][52],hierarchicalarchitectures[53],leakyintegrators[54],second-ordermethods[55],andregularization[56].Amongall,LSTMnetworks,inventedbyHochreiterandSchmid-huber[57]werethebestproposedrecurrentnetworkstocomeupwiththediﬃcultytotrainvanillaRNNsandsolvethevanishinggradientproblem.-17- 哈尔滨工业大学工学博士学位论文Lastly,Lewisetal.,[58]andVaswanietal.,[59]researchworkwasamongtheuseofLSTMrecurrentnetworkstotheCCGsupertaggingtasktoovercomethedrawbacksoftheRNNsbasedmodels.Lewisetal.,[58]andVaswanietal.,[59]usedBLSTMrecurrentnetworksbecausetheyarebestsuitedforthestructuredCCGsupertagginglearningtasktoprocessandpredicttimeserieswithtimelagsfrombothleftandrightdirections.AccordingtoLewisetal.,[58]andVaswanietal.,[59],theyuseddiﬀerentarchitecturesandtheirﬁndingsprovethatLSTMnetworkscanlearnmuchlongerhistoricalinputinformationthantraditionalRNNs.1.7EvaluationMetricThegoalofmachinelearningmodelsistolearntogeneralizewellforunseenexamplesinsteadofjustmemorizingthedatausedduringtraining.Onceyouhavebuiltyourmodel,itisessentialtodecideifyourmodelisperformingwellandthemostimportantquestionthatarisesishowgoodyourmodelis?So,evaluatingyourmodelisthemostimportanttaskinthedatascienceprojectwhichdelineateshowgoodyourpredictionsare.Manymetricsareusedinmachinelearningtomeasurethepredictiveaccuracyofamodel.Thechoiceofaccuracymetricdependsonthemachinelearningtask.InmultilabelsproblemssuchasPOStaggingandCCGsupertagging,the“accuracy”ispreciselytheeﬀectivenessmeasureandthemostcommonevaluationmetricusedinthearea.AccuracyfortheCCGsupertaggingcanbedeﬁnedastheproportionofthepredictedcorrectlabelstothetotalnumberoflabelsforthatinstance.Todothis,weusetheCCGsupertaggertoassignlexicalcategoriestoeachsymbolonthetestdatasetandthencomparethepredictedcategoriestothetruthsupertags.Overallaccuracyisthepercentageofsupertagscorrectlylabeledongoldlabeledsetasfollow:ThenumberofthecorrectlysupertaggedwordsreportedbythesystemAccuracy=Thetotalnumberofinstances(1-11)whereinstancesrefertothenumberofsupertaggedwords.1.8DatasetThePennTreeBank(PTB)isthecommondatausedformanyNLPtasks.ThePTBhasbeentranslatedtocarryoutmanylinguisticformalismssuchasTAG[60][61],LFG[62],andHPSG[63]andCCG[38][39].Tobecomparablewiththeresultsreportedbyprevi--18- 第1章IntroductionousworkonCCGsupertaggingtask[14][23][45][49][58][59],weexperimentedwiththesamedatasetsnamed"CCGBank"corpus[38][39].TheCCGBankisatreebankofCCGnormal-formderivations,createdfromthePTB(Marcusetal.,1993)withasemi-automaticconversionprocess.Hockenmaier[38]givesadetaileddescriptionoftheprocedureusedtocreatetheCCGbankdataset.TheCCGBankcorpusprovidesthelexicalcategorysetusedbythesupertagger.ThefollowingFigure1-3showsanexampleofasupertaggedsentencefromtheCCGBankcorpus.图1-3Examplefromsection00oftheCCGBankcorpus.WefollowthestandardsplitanddividetheCCGBankdatasetintosection02-21astrainingsetstotrainourmodels,section00asdevelopmentsetandsection23astestsetusedforevaluatingtheperformanceofourmodels.1.9ThesisContributionsThemaincontributionsofthisthesisareeﬃcientmodelsfortheCCGSupertaggingproblem,tothisendisnecessarytodevelopandproposenewtechniquesbasedonexploring-19- 哈尔滨工业大学工学博士学位论文deeplearningmethodsintheaimtoreducethenumberoftheassignedCCGsyntacticcategorieswhichisbeneﬁcialtomanyreal-worldapplications.Inparticular,thefollowingspeciﬁcissuesareconsideredinthisworks:a)GatedRecurrentUnitsforCCGSupertaggingb)Backward-BLSTMforCCGSupertaggingc)BLSTM-CRFmodelforCCGSupertaggingd)Character-WordembeddingsforCCGSupertaggingToaddresstheabovementionedissues,wedevelopnovelapproachesandmethodsforCCGSupertagging.Themaingoalsoftheseapproachesarebrieﬂyintroducedinthefollowing:a)GatedRecurrentUnitsforCCGSupertaggingIncontrasttopreviousstudiesbasedonmachinelearningalgorithmsforCCGsu-pertaggingwhichrequireextensivefeaturesengineering,theapplicationofdeeplearningtechniquestotheCCGstructuredpredictionproblemisthebasicobjectiveofourwork.DiﬀerentlytothelastproposedCCGsupertaggerproposedbasedonsimpleRNNs,weproposedanovelapproachforCCGSupertagging.Inthiswork,WehaveappliedGRUnetworks.Thismethodusedwordembeddingsfromeachinputentry,thenadeepGRUarchi-tectureisintroduced.UnliketheexistingRNNmethodthatusesonedirectionforinputrepresentation,wehaveproposedtwodirectionalmethodthatreadsinputsfrombothleftandrightpositionsbyBGRUnetworks.Moreover,weuseadeeparchitecturethatismoreconvenientincapturinginteractionsbetweenwords.Theexperimentalresultsshowthattheproposedarchitecturepresentsaneﬃcientmodelandachievesgoodperformancethanthestateoftheartmethodsonbothsupertaggingandmulti-tagging.b)Backward-BLSTMforCCGSupertaggingInthisapproach,amoreeﬃcientrecurrentnetworkisusedbasedonLSTMnetworkswhichareproventobemoreeﬀectiveinmemorizinginputdataforlongperiods.WeintroduceacombinedarchitecturebasedonbackwardandBLSTMnetworks.TheinputentriesrepresentationsareﬁrstfedintoaBackward-LSTMlayer,andthenaBLSTMlayerisusedtobettersavehistoricalentriesfrombothdirections.Afterthat,aSoftmaxactivationfunctionisusedtodecodeeachoutputprobabilityintoitscorrespondingCCG-20- 第1章Introductioncategory.Ourmethodwastestedonthreediﬀerentdatasets.Theexperimentsdemonstratethatourmethodachievesbetterresults.c)BLSTM-CRFmodelforCCGSupertaggingInthismodel,anewapproachfortheCCGsupertaggingasasequencelabelingproblemispresented.Theproposedmethodisbasedonacombinationofthebeneﬁtsofbothmachinelearninganddeeplearningtechniques.Deeplearningmethodsareusedtoautomaticallyextractinputfeaturesrepresentations.Whereas,thetraditionalstatisticalmodelsbasedonmachinelearningalgorithmsbeneﬁtfromknowledgeaboutneighboringprediction.AneﬃcientmethoddevelopedfortheCCGsupertaggingisintroducedbasedonLSTMandCRFmodels.Weconcatenatethetwostrategieswiththeaimtobeneﬁtfrombothinputrepresentationsandprioroutputpredictions.TheexperimentalresultsondiﬀerentdatasetsshowthattheproposedtechniqueiseﬃcientfortheCCGsupertaggingtask.Theproposedmodelachievesbetterperformancesthanthecurrentstate-of-the-artmethodsforbothsupertaggingandmulti-tagging.d)Character-WordembeddingsforCCGSupertaggingDiﬀerentLSTMbasedarchitecturehasbeenproposedfortheCCGSupertaggingtaskandachievesgoodresults.However,existingmodelsstillsuﬀerfromtheOOVproblemwhereunknownandrarewordsdonotappearinthepre-trainedwordembeddings.Inthiswork,weproposetoexploitthestrengthsofdiﬀerentembeddingsinasimplebuteﬀectivewaytodealwiththeOOVproblem.Intheproposedmodel,wehavecombinedbothwordembeddingsandcharacterembeddingsindiﬀerentBLSTMarchitecturetohaveeﬃcientinputrepresentationswhichwereaccurateonout-of-domaindatasets.1.10OrganizationoftheThesisThisthesisisorganizedinﬁvechapters.ThepresentchapterintroducesabriefoverviewofCCGsupertaggingproblem.Alsothebackground,themotivationandthemaincontributionofthisthesis,thisisimportanttotherestofthedocument,italsopresentstheliteraturereviewonthesupertaggingtechniquesdevelopedingeneralandalsotheapproachesproposedspeciﬁcallyfortheCCGsupertaggingsuchME,NNandothermethods.Therestofchaptersareaddressingourcontributionsdiscussedinsection1.9,wherewepresenttheproposedtechniquesdevelopedduringourPh.D.study.-21- 哈尔滨工业大学工学博士学位论文Chapter2providesabriefreviewofneuralnetworks.WedonotaimtogiveacomprehensivereviewofNNsandRNNs;instead,webrieﬂyreviewRNNsforsequencetaggingproblems.Also,wedescribeourapproachofapplyingGRUstoovercomethedrawbacksoftraditionalRNNs.Weoutlinethegeneralmodelarchitectureandourimplementations.Theevaluationandtheexperimentalresultsarepresentedattheendofthischapter.Chapter3introducestheproposedLSTMbasedarchitecturefortheCCGSupertag-gingtask.First,weprovideadescriptionofourneuralnetworkwhereaBackwardLSTMlayerisusedforinputsrepresentationsthentheoutputsfromtheBackwardLSTMlayerarefedasinputtoaBLSTMnetwork,aftertoaSoftmaxactivationfunctionastheﬁnaloutputgeneratorwhichdecodeseachprobabilityintoitsCCGcorrespondingcategory.Second,wegivedetailsaboutourexperimentssettings.Wealsoevaluateourmodel’eﬃciencyondiﬀerentdatasets.Experimentalresultsarepresentedattheendofthischapter.Chapter4describethethirdpointofourdissertation.Inthischapter,weﬁrstlydescribethecombinedarchitecturefrommachinelearninganddeeplearningstrategies.Weproposedtoexploitthestrengthsofbothapproachesinasimplebuteﬀectivewaytobeneﬁtfrombothinputandoutputinformation.Inthiswork,weuseBLSTMstomodelinputentriesandConditionalRandomFields(CRF)tomodelthetagoutputswhichbringfurtherimprovementtothesupertaggingaccuracyofthemodel.Secondly,wediscussourexperimentssettingsandparameterstuning.Andﬁnally,wealsoevaluateourmodelforbothin-domainandout-of-domaindatasets,theachievedresultsarediscussedintheendofthechapter.InChapter5,theproblemofunseenwordsisaddressed.TheOOVproblemcangreatlyinﬂuencetheperformanceofthesupertaggerwhentestedonout-of-domaindatasetsaswellasin-domaindatasetswithrareandinfrequentwords.Thus,wepro-posedaneﬀectiveandsimplemodelbasedoncharacterandwordembeddings,inordertogainmuchinformationaboutinputentriesthatdonotappearinthepre-trainedwordembeddings.Theﬁrstsectionofthischapterisdevotedtoourmodeldescription.Next,wereportourexperimentssettings.Inaddition,wediscussourexperimentsresults,andﬁnally,weconcludethechapter.Intheend,weconcludeourthesis.Furthermore,futureworksoftheresearchactivity-22- 第1章Introduction图1-4Dissertationoutlines.arediscussed.Figure1-4showstheoutlinesofthedissertation.-23- 哈尔滨工业大学工学博士学位论文第2章GatedRecurrentUnitsfortheCCGSupertaggingtask2.1IntroductionInthepreviouschapter,wehavepresentedbackgroundfortheCCGsupertaggingtaskandreviewedallthepreviouslyproposedmethodstosolveit.StatisticalmachinelearningmethodsworkwellfortheCCGsupertaggingonlybecauseoftheextensivelydesignedrepresentationsofinputssuchaslexicalfeatures.Deeplearninghasemergedtojointlylearngoodfeatures.RNNshavebeenproposedfortheCCGsupertaggingtaskasthesimplerecurrentmodelsthatusethememoryofknowledgeabouttheinputrepresentationapartfromcarryingonlexicalfeatures.However,memorizinglongsequenceinputsinvanillaRNN-basedsupertaggersisdiﬃculttotrain.SinceRNNscannoteﬀectivelymemorizeinformation,inthischapter,weexploreamoresophisticatedrecurrentmodelfortheCCGsupertaggingproblem.WepresentadeeplearningapproachbasedonGatedRecurrentUnit(GRU)networkstoimprovetheperformanceofthesupertagger.Inthiswork,wehavemadeuseofaneﬃcientmodelthatcanmemorizeinformationnotonlyforalonghistoricaltimebutalsofrompastandfutureinputsequences.Theorganizationofthechapterisasfollows:Section2.2describessomebasicdeﬁnitionsandnotationfordeepneuralnetworks.Section2.3isdevotedtoourpartic-ularapproachtoCCGsupertaggingusingGRUnetworks.Next,Section2.4describesthediﬀerentexperimentsconductedforthetask.Inaddition,Section2.5presentstheexperimentalresultsonsupertaggingandmulti-taggingaswell.Andﬁnally,Section2.6providestheconclusion.2.2NeuralNetworksInthissectionwewillbrieﬂyreviewsomeofdeepneuralnetworksusedforprocessingsequentialdataincludingRNNs[43][44],BRNNs[46][47][48],andGRUs[64].2.2.1DeepLearningStatisticalmachinelearningstrategieshavebeenwidelyusedforsolvingmanyNLP-24- 第2章GatedRecurrentUnitsfortheCCGSupertaggingtasktaskssuchassequencetaggingproblemsandmorespeciﬁcallyfortheCCGsupertaggingproblem[14].Mostmachinelearningbasedmodelshadexploitedshallowstructuredarchitectures.Thesearchitecturestypicallycontainatmostoneortwolayersforinstance.GaussianMixtureModels(GMMs),linearornonlineardynamicalsystems,CRFs,MEmodels,SupportVectorMachines(SVMs),andMulti-LayerPerceptron(MLP)aresomeexamplesoftheshallowarchitectures.Despitetheeﬀectivenessofshallowarchitecturestosolvemanysimpleorstrainedproblems;theirmaindisadvantagesaretheirlimitedmodelingandrepresentationalpowerwhichcancausediﬃcultieswhendealingwithmorecomplicatedreal-worldapplications.Moreover,thosemethodsrequirethedesignandselectionofanappropriatefeaturespacetobedevelopedbyexpertsanditiscostly,anddiﬃcultintermsofcomputationaltimeorexpertknowledge.Asanalternative,automaticallylearningthefeaturescanbeconsideredasarelevantchoice.ArtiﬁcialNeuralNetworks(ANNs)modelshavebeenintroducedoverdecadesifnotcenturies.EarlierstudieswithANNswerestartedinthelate1950swiththeintroductionoftheperceptron,atwo-layernetworkusedforsimpleoperations,andgrowthinthelate1960swiththedevelopmentofaneﬃcientgradientdescentmethodcalledtheback-propagationalgorithm[43]appliedtoNNforeﬃcientmultilayernetworkstraining.ANNsrepresentaclassofmachinelearningmodels.InANNs,theartiﬁcialneuronformsthecomputationalunitofthemodelandthenetworkdescribeshowtheseunitsareconnectedtooneanother.ThesimplestversionofANNsisfeed-forwardNN.Basically,afeedforwardNNreceivesandmapsasetofinputstooutputs.EachNNisconstructedbyseveralinterconnectedneurons,organizedinlayerswithassociatedweights.AnexampleofafeedforwardNNisshowninFigure2-1whichconsistsofthreelayers:theinputlayerwhichreadinputsandtransferthemtothehiddenlayerperformingcomputationstobetransferredtotheoutputlayer.Feed-forwardNNsexcelatsolvingtheCCGsupertaggingtask[23]andtopredictthemostlikelytagsforthewordsinagivensentencewithoutrequiringanylexicalfeatures.However,themaindisadvantagesofNNsarethehugenumberoffreeparameters(theweights)tobelearned.AndfortheCCGsupertaggingtask,feed-forwardNNscanonlymapfrominputtooutputvectorswithnocyclicconnections,andtheoutputmayonlydirectlydependonthecurrentinputatthattimestepwithoutanyinformationaboutthe-25- 哈尔滨工业大学工学博士学位论文图2-1AnexampleofanArtiﬁcialNeuralNetwork.surroundinginputs.Therehasbeenaresurrectionofintereststartingfromthemid-2000swiththeincep-tionofthefast-learningalgorithmbyG.Hinton[65]andtheintroductionofGPUs,roughlyin2011,formassivenumericcomputationthatopenedtherouteformoderndeeplearningasthenew-generationofNNscharacterizedbydeeparchitectures.图2-2AnexampleofaDeepNeuralNetwork.Feed-forwardNNsorMulti-LayerPerceptronsMLPswithmanyhiddenlayers,oftenreferredtodeepneuralnetworks(DNNs),areexamplesofthemodelswithadeeparchi-tecturewiththepresenceofmorethantwolayers,aninputlayer,oneormoreso-called-26- 第2章GatedRecurrentUnitsfortheCCGSupertaggingtask"hidden"layers,andanoutputlayerasdepictedinFigure2-2.Afewyearsago,re-searcherscalledthedeeplearningnetworksas"deep"with3-5layers,andnowithasgoneupto100-200layers.DeepLearninghasappearedasthenewareaofmachinelearningresearch[65][66]withtheobjectiveofmovingmachinelearningtowardsitsoriginalgoals:AI.ModerndeeplearningnetworkshavebeenappliedwithsuccessformanyNPcom-pleteproblems.Byaddingmorelevels(layers),researchersreportedpositiveexperimentalresultsforseveraltasks[67][68][69][70].Sincethemidofthe2000stonowadays,thetech-niquesdevelopedfromdeeplearningresearchhavealreadybeenimpactingawiderangeofsignalandinformationprocessingworkwithinthetraditionalandthenew,keyaspectsofmachinelearningandAI[66][71][72][73][74].2.2.2RecurrentNeuralNetworksInCCGsupertagging,theobservationsequencemaydependonmultipleinputsforlonghistoricaldependencies.TheneedformodelsandNNsthatcanmapfromtheentirehistoryofinputstopredicteachoutputandallowrecurrentconnectionsaswellisnecessary.OnewaytosatisfytheabovecriteriaistouseRNNstoestimatetheoutputprobabilitiesbasedonthecurrentandpastinputsallowingcyclicconnectionswithasuﬃcientnumberofhiddenunits[75].RecurrentnetworksarethemostimportantonesforCCGsupertagging.RNNsaretheveryﬂexiblemethodsasafamilyofANNarchitecturesthathavetheabilitytomakeuseofsequentialinformationperformingthesameactionforeachelementinasequencewheretheoutputatagiventimestepisrelatedtothatofprevioustimestepsonlong-distancedependencies.TheprimaryadvantageofRNNsistherecurrentconnectionsmemorytocaptureinformationandstorepreviousinputsintotheinternalnetworkstateinordertoinﬂuencethenetworkoutput.RNNisviewedasanNNspecializedforprocessingasequenceofsymbols(x1;x2;:::;xt).Mostrecurrentnetworkscanalsoprocesssequencesofvariablelengthandtoomanylongsequencesthanfeed-forwardNNs.DiﬀerentvariantsofRNNshavebeenproposed,suchasElmannetworks[44],Jordannetworks[76],timedelayneuralnetworks[77]andechostatenetworks[78].ThestructureofwidelyusedRNNsmodelsforsequencetaggingproblemsconsistsofaninputlayer,hiddenlayer,andoutputlayerasdepictedin-27- 哈尔滨工业大学工学博士学位论文图2-3GeneralstructureofsimpleRNNs.Figure2-3.AusefulwaytovisualizeRNNsisby’unfolding’thecyclicconnectionsofthenetworkovertheinputsequence.Figure2-4isanexampleofanunfoldedRNNforthree(3)timesteps.IntheFigure2-4,SectionArepresentsafoldedstateofRNNswithitscorrespondingunfoldedversioninSectionBobtainedbyunrollingthenetworkstructureforthecompleteinputsequence,atdiﬀerentanddiscretetimeswhichinthisexamplecontainsthree-layerneuralnetworksandcanbereferredtodeepneuralnetworkbecauseithasmorethan1hiddenlayer.Notethattheunfoldedgraph,unlikethefoldedgraph,containsnocycles.FromFigure2-4,theUweightsrepresenttheweightofneuronsbetweentheinputsxandthehiddenstateh.Wweightsrepresenttheweightsoftheneuronsbetweenhiddenstateh.AndVweightsareneuronsweightsbetweenthehiddenstateshandtheoutputO.Eachnoderepresentsalayerofnetworkunitsatasingletime-step.TheformulasthatgovernthecomputationhappeninginanRNNareasfollows:•xtistheinputattimestept.•htisthehiddenstateattimestept.Itisthe"memory"ofthenetwork.htiscalculatedbasedontheprevioushiddenstateandtheinputatthecurrentstep:ht=f¹Uxt+Wht1º(2-1)ThefunctionfusuallyisanonlinearitysuchastanhorReLU.ht1isrequiredtocalculate-28- 第2章GatedRecurrentUnitsfortheCCGSupertaggingtask图2-4GeneralstructureofasimpleRNNunfoldedforthreetimesteps.theﬁrsthiddenstate,andistypicallyinitializedtozeros.•Otistheoutputatthetimestept.InCCGsupertagging,thehiddenlayerhtisupdatedbasedontheinputxtwhichrepresentinputfeaturesandtheprevioushiddenstateht1,andtheoutputlayerytrepresentthepredictedlexicalcategories.FormallytheRNNcomputesthehiddenlayerhtandtheoutputlayerytasfollows:ht=f¹Uxt+Wht1º(2-2)yt=g¹Vhtº(2-3)whereU,W,andVaretheconnectionweights,andgistheactivationfunction.2.2.3BidirectionalRNNTheRNNswehavepresented,have"causal"structures,i.e.,thecurrentinputisinﬂuencedbythepast,butnotthefuture.Insequentialdata,thestateoftheoutputdependsonthepreviousinputsaswellasthefuturestate.ThereisaspecialcategoryofRNNs,inwhichthestateofthesystematthetimesteptdependsnotonlyonthelearnedinputsfromthepastbutalsoontheinputsfromthefuture.ThissortofRNNswhichcancaptureinformationfromthewholesequenceisknownasBidirectionalRNNs(BRNN).Fromitsname,aBRNNhas2RNNsinittoprocessthesequencefromtwodirections,sothat,wehaveinformationfromthewholesequence.InBRNNs,ateachtimestep,we-29- 哈尔滨工业大学工学博士学位论文图2-5GeneralstructureofBRNNunfoldedforthreetimesteps.havetwohiddenstates:onehiddenstatetocaptureinformationfromlefttorightwhilethesecondcapturestheinformationusingtheoppositedirectionfromrighttoleft.AnunfoldedgraphicalrepresentationofBRNNisdepictedinFigure2-5.IthasbeenprovedthatsimpleandbidirectionalRNNsbasedmodelsdobetterthanfeed-forwardNNsmodelsfortheCCGsupertaggingtask[45][49]andhaveachievedthestate-of-the-art.WhiletraditionalRNNsareabletousecontextualinformationwhenmappingbetweeninputandoutputsequences,thelengthofthecontextthatcanbeinpracticememorizedisquitelimited.ThemaincomplicationwithvanillaRNNsisthatthemodelcan’tconcentrateonlonger-termpredictionsandtheinﬂuenceofinputsonthehiddenlayers,andthereforeonthenetworkoutput,eitherdecaysorblowsupexponentiallyasitcyclesaroundthenetwork’srecurrentconnections.Inotherwords,RNNshaveareasonablememorybutnocapacitytorememberthingsthathappenedforlongdistancedependencies.Thismeansthatitbecomeshardertothemodeltolearnlong-termdependenciesintheinputsequencewhichisoftenreferredtothevanishinggradientproblem[79][80][50]asisillustratedschematicallyinFigure2-6.Severalalternativesofrecurrentcellswereproposedtosatisfytheabovecriteriatoeasilytrainwhileavoidingthevanishinggradientsproblem.OnevariationistheGatedRecurrentUnitnetworksrecentlyproposedbyChoetal.,[64]thatareeasytotrainwhileavoidingthevanishinggradientsproblemandrisewiththediﬃcultytotraintraditionalRNNsusinggatedmechanism.-30- 第2章GatedRecurrentUnitsfortheCCGSupertaggingtask图2-6Illustrationofthevanishinggradientproblem.2.2.4GatedRecurrentUnitsGRUnetsareausefulfamilyofrecurrentdeepneuralnetworkstoprocesssequentialdata.GRUshavebeenproposedbyChoetal.,[64]toovercometheshortcomingsofRNNs.ThemaincomponentofGRUsisthe"memorycell"whichdecidesthedegreeofinformationtokeepinthememoryfromthepreviousstates.GRUsareknowntobegoodatpreservinglong-distancedependencieswithadditionalparametersthatcontrolwhenandhowtheirmemoryisupdated.Conceptually,GRUnetworkshaveresetandforgetgatesthathelptoprotectitsmemory,sothatitisabletomakelonger-termpredictionsandcancontroltheinformationasfollow:•Theresetgater:determineshowtocombinethenewinputwiththepreviousmemoryanddecidewhetherthepastsequenceisrelevantforthefutureornot.•Theupdategatez:deﬁneshowmuchofthepreviousmemoryinformationtokeeparound.Mathematically,aGRUhiddenstatehtgivenaninputxiscalculatedasisdescribedbyequationsbelow:zt=¹Wz»ht1;xt¼+bzº(2-4)rt=¹Wr»ht1;xt¼+brº(2-5)h˜t=tanh¹Wh»rt⊙ht1;xt¼+bhº(2-6)ht=¹1ztº⊙ht1+zt⊙h˜t(2-7)-31- 哈尔滨工业大学工学博士学位论文whereisthelogisticsigmoidfunction,randzarerespectivelyresetandupdategates,⊙standsforelement-wisemultiplication,Waretheweightmatrices,andthebtermsdenotebiasvectors.Figure2-7illustrateGRUcomponentswhererandzaretheresetandupdategates,respectively,andhandh˜aretheactivationandcandidateactivation.图2-7GatedRecurrentUnitsarchitecture[64].2.3BGRUproposedmodelfortheCCGSupertaggingtaskRecurrentnetworksareconsideredasaclassofdeepnetworksforsupervisedaswellassequencelearningtasks,wherethedepthcanbeaslargeasthelengthoftheinputdatasequence.IntheCCGsupertagging,wewanttooutputapredictionywhichmaydependonthewholeinputsequence.Theinformationfrombothpastandfuturedirectionsofaninputentryareveryimportantforthepredictionofthecurrentoutput.Forthisreason,itisreasonabletousemodelsthatareabletocapturepreviousandfutureinputinformationaswell.AnelegantsolutiontomodeltheCCGsequentialdatathathasachievedhighaccuraciesinmanysequencelabelingtasksisbidirectionalmodels.Inourproposedmodel,weusebidirectionalarchitecturebasedonGRUnetworks.TheideabehindBidirectionalGRUs(BGRUs)istopresenteachsequenceintwoseparatelayerstocaptureinformationfromthetwosidesofaninputentry:onelayerprocessthedatafromrighttoleftandretainthepreviousinformation,whereasthesecondlayerusestheoppositedirectionandprocessthedatafromlefttotherightsavingthefuturecontext,whichiswell-suitedforourtask.Finally,Thetwooutputsfromeachlayerarethenconcatenatedtoformtheﬁnaloutput.OurproposedmethodconsistsofthreemaincomponentstopredicttheﬁnalCCG-32- 第2章GatedRecurrentUnitsfortheCCGSupertaggingtaskoutputsupertags:InputLayer,GRUNeuralNetwork,andOutputLayer.2.3.1InputLayerTheemergenceofNNhasintroducedandmakesuseof"embedding",whichreferstotherepresentationofsymbolicinformationinnaturallanguagetextfromthesparsevectorscodedwithaveryhighdimension(i.e.,thevocabularysizeV)intolow-dimensional,real-valuedvectorsviaaneuralnetworkandthenusedforprocessingbyNNlayers.TheearlyworkhighlightingtheimportanceofwordembeddingcamefromCollobertandWeston[68],Turianetal.,[41],andCollobertetal.,[11],althoughtheoriginalformcamefromBengioetal.,[81]asasideproductoflanguagemodeling.GivenasentenceofNwords(W1;W2;:::;Wn),theembeddingfeatureofwt,W:words!Rnisaparameterizedfunctionmappingwordsinsomelanguagetovectorsandisobtainedbyprojectingitintoann-dimensionalvectorspacethroughthelookuptable.Eachdimensiondescribessyntacticorsemanticpropertiesoftheword.IthasbeenshownthatwordembeddingshavebeenexceptionallysuccessfulandplayavitalroletoimprovemanyNLPtasksperformancesuchassequencetaggingproblem[11][23].Thekeyadvantageofusingthecontinuousspacetorepresentwords(orphrases)isitsdistributednature,whichenablessharingorgroupingtherepresentationsofwordswithasimilarmeaning.Intheinputlayerofourmodel,wemakeuseofwordembeddingsthathavebeenproventobeusefulfortheCCGsupertaggingtask[23].Wemakeuseoftwokindsofwordembeddingswhereeachwordistransformedintoid(i.e.,identiﬁcation)whichisdeﬁnedinalookupdictionary;thedictionaryconsistsofwordsinthetrainingsetandthenembedsintolow-dimensionalrepresentation.1.Wordindex(task-speciﬁc)embeddings:weusetask-speciﬁcwordembeddingsmodel,becauseseveralmisspellingswords,abbreviations,andcompositionsofwordsoccurinthetrainingdata.Thesewordsareidentiﬁedas’UNKNOWN’wordsbyapre-trainedwordembeddingsmodel.Webuildourtask-speciﬁcwordembeddingmodelusingthe’EMBEDDING’layeroftheKeras[82]library.Theembeddingslayertakesasinputa2-dimensionalmatrixofintegersrepresentingeachwordinthecorpus(indexofthewordinthecorpus)andoutputsa3-dimensionalmatrix,whichrepresentsthewordembeddingmodelthatmaptheintegerinputstothevectorsfoundatthecorrespondingindexinthe-33- 哈尔滨工业大学工学博士学位论文embeddingmatrix[82].2.Pre-trainedWordembeddings:ourbestmodelusespre-trainedGoogle’sWord2Vec300-dimensionalembeddingstrainedon100billionwordsfromGooglenews[83].FollowingCollobertetal.,[11]allwordsarelowercasedbeforepassingthroughthelookuptablestoconvertthemintotheircorrespondingembeddingsandalsoallnumbersarereplacedbyasingledigit’0’.Forwordthatdoesnothaveanentryinthepre-trainedwordembeddings,the’UNKNOWN’entryfromthepre-trainedembeddingsisused.Moreover,followingLewisandSteedman[23];twosetoffeaturesareusednamelysuﬃxesandcapitalization:1.Capitalizationfeature:thecapitalizationfeaturehasonlytwovaluesindicatingwhetheragivenwordiscapitalizedornot.2.Suﬃxesfeature:wefollowthealmoststate-of-the-artexistingCCGsupertaggersusingsuﬃxesofsizetwo.Thelook-uptablesareﬁrstconcatenatedintheinputlayerandthenfedintothenetwork.2.3.2GRUNeuralNetworkCCGsupertaggingwasperformedusingBGRUarchitecture.Asthenamesuggests,ourmodelhasabidirectionalarchitecturewhichcombinestwoGRUlayers:theﬁrstGRUlayermovesforwardthroughtimebeginningfromthestartofthesequencetoitsend,andthesecondGRUlayermovesthroughtheoppositedirectionoftheﬁrstlayerandprocessthesequencestartingfromitsendtoitsbeginning.ThisallowstheoutputunitsO(t)tocomputearepresentationthatdependsonboththepastandthefuture.Inourproposedmodel,thegiveninputsencodedfromthepreviousprocessintheinputlayerarefedtoaBGRUneuralnetwork.AforwardGRUlayerprocesstheinput!sequencefromlefttorightandcomputethehiddenstate(ht)tosaveinformationfromthepast,andthebackwardGRUlayersavesthefutureinformationofagiveninputbyprocessingthesequencestartingfromitsendandcalculatethehiddenstate(ht).Deeparchitectureshaveprovedtobefruitfulformanytasks.Therefore,inourmodel,weinvestigatewith2-BGRUnetworksarchitecturethatismoreconvenientincapturingcomplexinteractionsinthecontextbetweenwords.Theoutputsfromeach-34- 第2章GatedRecurrentUnitsfortheCCGSupertaggingtaskGRU(backwardandforward)arethenfedintoanotherbackwardandforwardGRU!layers.Finally,theoutputsfromeachlayerateachtimestepareconcatenated[ht,ht]andfedthroughtheoutputlayer.ThearchitectureofourmodelisshowninFigure2-8.图2-8BGRUproposedmodelfortheCCGsupertagging.2.3.3OutputLayerThesoftmaxintheequation(2-8)representsthesoftmaxfunctionusedforourCCGSupertaggingtaskasamulticlasspredictionproblemwhichistheratiooftheexponentialoftheinputvaluesandthesumoftheexponentialofinputvaluesandoutputstheprobabilitiesofeachCCGclassoverallpossibleclasses.IntheCCGsupertaggingtask,theSoftmaxactivationfunctionreturnstheprobabilitiesofeachclasswhichwillbehelpfulfordeterminingtheﬁnalmostlikelyCCGsupertagtopredictwhichhavethehighprobability.exp¹xiºSoftmax¹xiº=∑(2-8)jexp¹xjºThemainadvantageofusingSoftmaxisthatitensuresthattheoutputprobabilitieswillrangebetween0and1,andthesumofalltheprobabilitieswillbeequaltoone.Asaresult,theoutputateachtimestepfromtheBGRUarchitectureisfedthrough-35- 哈尔滨工业大学工学博士学位论文aSoftmaxlayertodecodeitintoprobabilitiesforeachSupertagformingtheﬁnaloutputofthenetwork.2.4ExperimentSettingsInthissection,wereportthedatasetsandtrainingparametersofourexperiments,theachievedresultsarethendiscussedwhereweconductexperimentstoevaluateourmodelbyapplyingittosupertaggingandmulti-taggingfortheCCGgrammar.2.4.1DatasetAsdescribedinchapter1Section1.8,weusetheCCGBankcorpus[39]forourexperiments.Followingthesamesplit;wetrainedourmodelsonSections2-21fromtheCCGBankdatasetsusingsection00(1913sentences)fordevelopment.Ourexperimentstesttheutilityofourproposedmodelsinsection23fromtheCCGBank(2407sentences)asthetestset.2.4.2DataPreprocessingInourexperiments,ﬁrstdatapreprocessingwasemployedbeforepassingthedatasetthroughthelookuptables.Wepreprocessedallthedatasetsasfollows:•Allwordswerelowercased,•allsequencesofdigitswerecollapsedintoasingledigit’0’,•forwordsandnumberscontaining′n′,webacked-oﬀtothesubstringafterthedelimiter.2.4.3Hyper-ParametersandTrainingWeimplementedtheneuralnetworkusingtheversion1.2.2ofkeras[82];aTheano-basedneuralnetworklibrary.Trainingandtestweredoneonthesentencelevel.2.4.4WordembeddingsSettingsWefollowtherecentworkreportedin[11]basedonneuralnetworksarchitectures.Collobertetal.,[11]appliedneuralnetworkarchitecturesandrelateddeeplearningalgo-rithmstosolveNLPproblemsfrom"scratch",wherenotraditionalNLPmethodsareusedtoautomaticallyextractfeaturesandavoidhand-craftedfeatureengineering.Collobertetal.,[11]automaticallylearninternalrepresentationsorwordembeddingfromvastamountsofmostlyunlabeledtrainingdatawhileperformingawiderangeofNLPtaskssuchaschunking,POStaggingandSemanticRoleLabeling(SRL).-36- 第2章GatedRecurrentUnitsfortheCCGSupertaggingtaskForpre-trainedword-embeddings,weinitializedourmodelwiththepubliclyavail-ablepre-trainedvectorscreatedusingword2vecwith300-dimensionalvectorstrainedonGoogleNewsnamed’Word2vec’[83].ForCCGsupertagging,weapplyBGRUneuralnetworkarchitectureusingtwobackwardandforwardlayers,wetestedtheaccuracyofourmodelonthedevelopmentsetwiththehiddendimensionvaluesrangeintheset{100,200,256,300,400,512,600}andfoundthatthehiddendimensionwithsize300showsthebestaccuracy.Forsuﬃxesandcapitalization,wefollowLewisandSteedman[23],andweuseembeddingswithaﬁxedsizeof5.2.4.5LearningAlgorithmWeuseSGDoptimizertotrainourmodelsasagradientdescentlearningratemethodwithaﬁxedlearningrateto0.01.WehaveexploredothermoresophisticatedoptimizationalgorithmssuchasAdamandAdeDelta[84]withoutanyremarkableimprovementoverSGD.Finally,theoutputsreceivedfromtheGRUneuralnetworkarefedtotheoutputlayerwithSoftmaxactivationfunctiontooutputaCCGsupertagcategoryforeachwordinaninputsentence.2.4.6DropoutOver-ﬁttingisverycommonindeepneuralnetworkstraining.Inrecentyears,weseeimportantsuccessindeeplearningapproacheswiththeintroductionofthenewregularizationmethodbasedon"dropout",originallyproposedbyHintonetal.,[85].Weapplieddropouttotheinputlayerwithaﬁxedprobabilityof0.2thatwasquiteeﬀectivetoregularizeourmodelandreduceover-ﬁttinggivingsigniﬁcantimprovementsintheaccuracy.2.5ResultsandAnalysisInthissection,wepresenttheresultsoftheevaluationofourproposedBGRUarchitectureforCCGsupertaggingontheCCGBankdatasets.Wealsoperformmulti-taggingexperiments,theresultsarediscussedbellow.2.5.1SupertaggingResultsWetrainedourmodelsfor90epochsandweusedthemodel’sparametersthatgivethehighestaccuracyonthedevelopmentset.Wetunedthehyper-parametersthentrained-37- 哈尔滨工业大学工学博士学位论文themodels.TheﬁnalchosenparametersarereportedinTable2-1.表2-1Theﬁnalchosenhyper-parameters.Hyper-parameterValueWordembeddingsgoogle’sWord2Vechiddendimension300Dropout0.2OptimizerSGDLearningrate0.01CCGsupertaggingasmanysequencetaggingproblemshavelongbeendominatedbymachinelearningmethods.WecompareourmodelperformancewiththeexperimentalresultsreportedbyClarkandCurran’smodelwithgoldandautoPOStagswhichareobtainedusingMEmodelswithasetoflexicalfeature.Moreover,NNswithwordembeddingswereapopularapproach,wealsocomparewiththemodelproposedbyLewisandSteedman[23],andthebestresultsreportedfortheCCGsupertaggingbyXuetal.,[45].Table2-2comparesourresultswiththosemodelsonsection00fromtheCCGBankcorpus(developmentset).表2-2Performancecomparisonwithstate-of-the-artmethodsonthedevelopmentset.ModelAccuracyC&C(goldPOS)92.60C&C(autoPOS)91.50NN91.10RNN93.07Ours93.47ResultsfromTable2-2indicatethatdeeplearningmodels(RNNandBGRU)haveproducedbetterresultsthanmachinelearningbasedapproaches.ItcanbeseenthatourBGRUachieveshigheraccuracyoverC&CmodelwithgoldPOStagswithanimprove-mentof(+0.9%).Moreover,ourmodelgain(+0.40%)overRNNmodel,itconcludesthat-38- 第2章GatedRecurrentUnitsfortheCCGSupertaggingtasktheuseofGRUcanbringbetterperformancethansimplerecurrentnetworksandtheuseofBGRUwasveryusefultomodelandmemorizemoreinformationfrombothdirectionsofaninputentry.TheoverallresultsofourexperimentsonthetestsetareshowninTable2-3.BGRUmodelimprovestheperformanceofCCGsupertaggingtoasigniﬁcantextent,bringingaccuracyupfrom91.57%to93.87%comparingtofeedforwardNNbyLewisandSteedman[23],andalsooutperformedtheRNNmodelproposedbyXuetal.,[45]withasigniﬁcantimprovement.Thismaybeduetothehigherqualityofthenetworkthatcanlearnfrompastandfutureentries,whichhelpthemodeltomakemoreaccuratepredictions.表2-3Performancecomparisonwithstate-of-the-artmethodsonthetestset.ModelSection23C&C(goldPOS)93.32C&C(autoPOS)92.02NN91.57RNN93.00Ours93.872.5.2Multi-taggingResultsSupertaggershavebeenusedeﬀectivelyinarangeofNLPtasks,suchasInformationRetrieval(IR)andparsing.Thereareavarietyofwaystoprocessthesupertaggingtask.Initially,supertaggerswereusedtochooseasinglesupertag,speciﬁcallythemostlikelyprobablesupertagtoagivenwordinagivencontextinthetrainingdata.Byreducingthesetofprobablyassignedpossiblelexicalcategoriesforeachwordinaninputsequence,supertaggerscontributetodramaticallyimprovetheeﬃciencyofmanyNLPcompletetasksbyprovidingacrucialsourceofinformation.However,insomecases,supertaggerswillprovidenotperfectandincorrectsupertags.Forexamplewhenparsing,ifthesupertaggerassignsonlyasinglesupertagtoeachword,thenitdoesnotleadtoavalidparsestructureanditsaccuracyistoolowtobeeﬀectivelyincorporatedintoaparserastheparserhasnootheralternativestoconsiderresultingadegradationintheaccuracy.-39- 哈尔滨工业大学工学博士学位论文Asatisfyingsolutiontothisproblemandthatisbeneﬁcialforaccuracyimprovementwouldbethroughmulti-tagging.Multi-taggingreferstothetaskofassigningmorethanonesupertagtoeachwordinthesentence.However,theimmediatequestionthatmulti-taggingraiseis:whatorderthetagsshouldbeconsidered?Toanswersuchaquestion,supertagginghistoryhasproposedsomediﬀerentwaysthatarecapableofperformingthemulti-taggingtask.Chenetal.,[86]statethisquestionusingatrigram-basedsupertaggertochoosemultipletags,thentheViterbialgorithmtodeterminethemostlikelysequence.Afterthat,inplaceofassociatingeachwordwiththemostlikelypredictedsupertagfromthemostlikelypath,eachwordwasassociatedwiththe"n"supertagsthathadthehighestpreﬁxprobabilities.Byincreasingthenumberofthesupplementaryassignedsupertags,thenumberofparsedsentenceswillincreasesimultaneouslyandthemorecorrectthesetofsupertagsprovidedthehighercoveragewillbe,leadingtoanincreaseintheaccuracy.However,whenthenumberoftheprovidedsupertagsisoverfour,theparsingbecameunattainableduetotimeconstraintsofparsingspeed.Accuracyisdecreasedintwoways:bynotprovidingenoughcategoriesatanylevel,leadingtonospanninganalysis;orbyprovidingtoomanycategories,causinganexplosioninthechart.Bymultitaggingwecanmakethesupertaggermoreaccurate,butatthecostofspeedastheparsermustconsiderlargersetsofpossiblecategories.ClarkandCurran[14]alsoapproachedthisquestion.Diﬀerently,theauthorsdevelopamulti-taggerbasedontheME-basedsupertaggeranddeﬁnelevelsascutoﬀsformulti-taggingbasedontheprobabilitiesfromthemodel.ThelevelssetbyClarkandCurran[14]deﬁnecutoﬀsformultitaggingbasedontheprobabilitiesfromthemaximumentropymodel.Iftheparserisunabletoformaspanninganalysisthelevelisdecreasedandthesupertaggerisrerun.Theexactvaluesoftheselevelsgreatlyinﬂuencesparsingaccuracyandspeedwhereeachlevelrefertotheambiguityofthenumberoftheassignedsupertagstoeachword.Ratherthandeﬁningaﬁxednumberoftagstobeproducedperword,thesupertaggerincludedalltagswhoseprobabilitieswerewithinthe""factorofthehighestprobabilitycategory.Mathematically,ifwehavenobservations,theselectedsupertagsaccordingto""factorshouldsatisfythefollowingequation:-40- 第2章GatedRecurrentUnitsfortheCCGSupertaggingtaskYi=fyjP¹Yi=yjSº>g(2-9)WhereYiisthesetofsupertagsassignedtothewordxiatthetimestepifromagivensentenceS.AccordingtotheEquation,foreachwordinthesentence,themulti-taggerthenassignsallthosecategorieswhoseprobabilitiesarewithinthefactorwiththehighestprobabilitycategoryforthatword.Wealsoevaluateourproposedmodelformulti-taggingwhereoursupertaggercanassignmorethanonecategorytoeachwordwhoseprobabilitiesarewithinthefactors.Theperformanceofourproposedmodelonmulti-taggingismeasuredontermofWORDaccuracywhereweconsiderthewordtobetaggedcorrectlyifthecorrectcategoryisincludedinthesetoftheassignedlexicalcategoriesandSENT(sentence)accuracyisthepercentageofsentenceswhosewordsareallsupertaggedcorrectlyusingthedefaultlevelsusedbytheC&Cparser[14]onthedevelopmentset.Theresultsoftheseexper-imentsarepresentedinTable2-4.ItcanbeseenthatourmodelresultsaremuchbetterperformancethanthepreviousmodelsforbothWORDandSENTaccuracyonalllevels.表2-4Performancecomparisonofdiﬀerentmodelsformulti-taggingaccuracyonSection00fordiﬀerentlevels.GRURNNNNC&C(autopos)C&C(goldpos)WordSENTWordSENTWordSENTWordSENTWordSENT0.07597.2267.2297.3366.0796.8361.2796.3460.2797.3467.430.03098.0874.9098.1274.3997.8170.8397.0565.5097.9272.870.01098.7181.8798.7181.7098.5479.2597.6370.5298.3777.730.00599.0185.0499.0184.7998.8483.3897.8672.2498.5279.250.00199.4290.9299.4190.5499.2989.0798.2580.2499.1787.192.6SummaryBackwardGRUisverypowerfulincapturingpastinformationonalongtimememo-rizingpreviouscontextinformation;ontheotherhand,forwardGRUisalsoveryeﬃcientonmemorizingfutureinformationonlongperiods.However,itiswellknownthatsin--41- 哈尔滨工业大学工学博士学位论文gledirectionGRUsuﬀersaweaknessofnotutilizingthecontextualinformationfromtheotherdirectionofinput.BGRUsutilizeboththepreviousandfuturecontextbyprocessingthesequenceontwodirections,oneprocessestheinputsequenceintheforwarddirection,whiletheotherprocessestheinputfromthefuturedirectionandgeneratetwoGRUoutputvectors,thatwasverysuitabletoourtask.Inthischapter,wehavedescribedanapproachforCCGsupertagging.ThemodeldescribedhereisverysimpleandeﬃcientforCCGsupertaggingandmulti-tagging.Theproposedapproachuseslookuptablesoffeatures.WeperformedexperimentsontheCCGBankdatasetswiththeaccuracyasanevaluationmetric.Experimentresultsshowthatourapproachachievesstate-of-the-artperformances.Wealsoﬁndthat(1)traditionalRNNsareextremelyweakinmodelingsequentialdata,whileaddingneuralgatesdramaticallybooststheperformance,(2)BGRUperformsbetterthansimpleonestocaptureinformationfromtwodirections,(3)deeparchitectureismoreconvenienttocaptureinteractionsbetweeninputs,(4)Themodelshaveamuchhigheraccuracythanthenaivebaselinemodel.ThisworkdemonstratesthatBGRUarchitecturesarecapableofmodelingsequentialdataandrichhighaccuracyoversimplerecurrentmodels.TheimprovementscanonlybeduetothedeepbidirectionalGRUarchitectureadvantagescapturingtheinteractionbetweensequencesintwodirections.AlthoughGRUperformsreasonablywellforCCGsupertaggingtask,insomecasestheyareweakinmodelinglongdistanceinformation.UsingagoodnetworkthatismorepowerfulthanGRUsisagoodchoice.Further,weplantouseothersophisticateddeepneuralnetworks,whichhavebeenprovedtobeveryeﬀectiveformanyNLPtaskswhichareLSTMnetworks.LSTMnetscanmakeuseoflong-distanceinformation.HencewewillliketoexploretheirusefortheCCGsupertagging.-42- 第3章Backward-BLSTMmodelfortheCCGSupertaggingtask第3章Backward-BLSTMmodelfortheCCGSupertaggingtask3.1IntroductionAsdiscussedinthepreviouschapter,animportantbeneﬁtofrecurrentnetworksistheirabilitytousecontextualinformationwhenmappingbetweeninputandoutputsequences.Unfortunately,traditionalRNNsfunctionbetterintheorythaninpracticebecauseRNNscanbuildtheircontextualinformationuponnomorethanthelastten(10)timesteps.Thatisbecausetheysuﬀerfromproblemswithvanishingorexplodinggradientswherethecontextualinputinformationcanonlybeheldinanetwork’s"memory"foralimitedamountoftime.Sincethe1990s,toaddressthislimitation,researchershavedevelopedmanyalgo-rithmsandproposedmanyarchitectures,forinstance,GRUnetworksusedinthepreviouschapterbyChoetal.,[64].However,themostsuccessfulsolutioninapplicationsthathaveproventogivethebestresultsuptillnowandfavoredinthischapterisnamedLongShort-TermMemory(LSTM)networks.LSTMsarearedesignoftheRNNarchitecturearoundspecial"memorycell"units.LSTMbasedmodelshavebeenprovedtoperfectlyhandlewithgradientvanishingproblemofRNNs.Inthecontextofsequencemodeling,manyresearchershavesuccessfullyappliedsuchmechanismtolearnsequencesforlongtimespans.Inthiswork,thespeciﬁctypeofneuralnetworkusedwasaBidirectionalLongShort-TermMemory(BLSTM)basedrecurrentnetwork.Wedesignasimpleandeﬀectivearchitecture.Moreover,wedemonstratethatbysimplycombiningabackwardLSTMandBLSTM,wecancapturelong-terminformationandwecanobtaincompetitiveperformancecomparedtothestate-of-the-artrecurrentnetworksfortheCCGsupertaggingtask.Thechapterisorganizedasfollows:Section3.1describessomebasicsofLSTMs.Next,Section3.2isdevotedtoourparticularapproachtoCCGsupertaggingusingBLSTMmodel.Inaddition,Section3.3describesthediﬀerentexperimentsconductedforthistask.Moreover,Section3.4presentstheexperimentalresultsforbothsupertaggingand-43- 哈尔滨工业大学工学博士学位论文multi-taggingexperiments,andﬁnallySection3.5providestheconclusion.3.1.1LongShortTermMemoryNetworksInthissection,wegiveadetaileddescriptionofLSTMnetworks.WealsodescribeBLSTMnetworkswhichhavegreatinﬂuenceontheCCGsupertaggingasmanysequencelabelingtasks.ThecyclicmechanismenablesRNNstorememberinputsatdiﬀerenttimesteps.Theyare,therefore,averygoodchoiceforsequencelearning.However,becauseoftheirdiﬃculttrainingwheregradientdescentbasedalgorithmsgenerallyfailtoconvergeortaketoomuchtimeorbecauseoftheexploding/vanishinggradientproblem,whichimpliesthatthegradients,duringthetraining,eitherbecomeverylargeorverysmalltheirapplicationsinpracticewerequitelimitedtillthelate1990s.TherehavebeenmanyproposedapproachestodiminishthedrawbackswhentrainingRNNsincludingGRUnetworksintroducedinthepreviousChapter.Amongall,LSTMdiscoveredbyHochreiterandSchmidhuber[57]andlaterreﬁnedbyGers[87],appearstobeoneofthemostextensivelyadoptedsolutionstothevanishinggradientproblemandlearndependenciesrangingoverarbitrarilylongtimeintervalsthathavebeensuccessfullyadoptedandusedformanysequencemodelingtasks.图3-1FromRNNtoLSTM[87].LSTMnetworkshaveapowerfulandexpressivearchitecturethathasbecomethemostpopularvariantofRNNtohandlesequentialdataandhavebeensuccessfullyappliedto-44- 第3章Backward-BLSTMmodelfortheCCGSupertaggingtaskarangeofsequencetaggingproblemssuchasPOStagging[88][89],NER[90][91],sentimentanalysis[92][93][94],speechrecognition[95].HochreiterandSchmidhuber[57]proposedtochangethebasicunitofRNN,whichisasimpleneuronwithacomputermemory-likecell,called"LSTMcell".LSTMnetworkshavebeenmadeinaspeciﬁcway.TheyarethesameasRNNsexpectthehiddenlayerswerereplacedbymemoryblocks[87]thathavemadeadiﬀerenceintheircapabilitytolearnlong-termdependencies.Figure3-1providesacomparisonbetweenRNNandLSTMarchitectures,whileRNNscontaincyclicconnectionsintheirhiddenstates,LSTMsstillhavetherecursiveconnectionofRNNwithmemorycells.Thememoryblocksstorethestateovertimeandhavebeenshowntobebetteratﬁndingandexploitinglong-rangedependenciesinthedata.HochreiterandSchmidhuber[57]introduceasimilartermtothatproposedbyChoetal.,[64]usinggatestopreventlimitedmemoryinRNNs.Amemoryblockcontainsoneormorememorycells:LSTMhastheabilitytoaddortoremoveinformationfromthememorycellthatiscontrolledandprotectedbygates.Amemoryblockiscomposedmainlyofthreegates:inputgate,forgetgateandoutputgate.图3-2LongShort-TermMemorynetworkarchitecture.ThearchitectureofanLSTMunitisshowninﬁgure3.2andisthearchitectureusedinthisthesis.ThemaincomponentsoftheLSTMunitare:•Input:theLSTMunittakesthecurrentinputvectoratthetimesteptdenotedbyxtandthehiddenstateoftheprevioustimestepdenotedbyht1.Thesumoftheweightinputandhiddenstateispassedthroughanactivationfunction,resultinginht:-45- 哈尔滨工业大学工学博士学位论文xt=¹Wx:»ht1;xt¼+bxº(3-1)•Inputgate:toprovidetheinputﬂowingintothememorycell.Theinputgatedecideswhichvalueswillbeupdatedandwhatinformationtostoreinthecell.Theinputgatereadsxtandht1,computestheweightedsumandappliessigmoidactivation:it=¹Wi:»ht1;xt¼+biº(3-2)•Forgetgate:TheforgetgateisthemechanismthroughwhichanLSTMlearnstoresetthememorycontentswhentheybecomeoldandarenolongerrelevant.Thismayhappenforexamplewhenthenetworkstartsprocessinganewsequence.Torememberorthrowawaytheinformationfromthecellstate,theforgetgatereadsxtandht1asinputsandappliesasigmoidactivationfunctiontothesummedweightedinputs:ft=¹Wf:»ht1;xt¼+bfº(3-3)•Memorycell:ThecurrentcellstateCtiscomputedbyforgettingirrelevantinformationfromtheprevioustimestepandacceptingrelevantinformationfromthecurrentinput.Theresult,ftismultipliedbythecellstateatprevioustimestepi.e.,Ct1whichallowsforforgettingthememorycontentswhicharenolongerneededandsummedwiththemultiplicationoftheinputgatewiththecurrenthiddenstateht:Ct=ftCt1+itC˜t(3-4)C˜t=tanh¹WC:»ht1;xt¼+bCº(3-5)•Outputgate:theoutputgatedecideswhatpartsofthecellstatetooutputfromthememorycell.Theoutputgatetakestheweightedsumofxtandht1andappliessigmoidactivationtocontrolwhatinformationwouldﬂowoutoftheLSTMunit:Ot=¹Wo»ht1;xt¼+boº(3-6)•Output:TheoutputoftheLSTMunit,ht,iscomputedbypassingthecellstatestthroughatanhandmultiplyingitwiththeoutputgateot:ht=Ottanh¹Ctº(3-7)TheparametersoftheLSTMmodelaretheweightmatricesWandbiasesvectorsb-46- 第3章Backward-BLSTMmodelfortheCCGSupertaggingtaskinequations(3-1)-(3-7).3.2Backward-BLSTMproposedmodelfortheCCGSupertaggingtaskOneshortcomingofsimpleLSTMisthattheyareonlyabletomakeuseofthepre-viousinformationandmemorizethecontextonlyfromthepast,withoutanyinformationaboutthefuture.However,withasophisticatedapproachwherethewholecontextisaccessible,thereisnoreasontonotexploitfuturecontextaswellasthepreviousone.InCCGsupertagging,wehaveaccesstoboththeprecedingandfollowingcontextofaninputatagiventimestep,insteadofordinaryLSTM,apowerfulsolutionwhoseeﬀectivenesshasachievedhighaccuraciesinmanysequence-labelingtasks,suchasPOStagging[96],NER[97]andspeechrecognition[95]isrecurrentnetworksespeciallywithBLSTMcells[98].ThebidirectionalvariantofLSTM,(BLSTM)relyonasimpleidea,itusesLSTMrecur-rentmodelinbackwardandforwarddirectionsintimeonegoingleftandonegoingrighttocaptureinformationfromanywhereintheinputsentence.Inourmodel,weuseaBLSTMnetwork-basedmodel.ThefundamentalobjectiveistogainaccessfromtwodiﬀerentLSTMlayers;forwardandbackward,respectively,totakeadvantagefromthetwoorientationsofaninputatagiventimestept.ThentheoutputsfromeachLSTMlayerareconcatenatedtoformtheﬁnaloutput.Contrarytosomecases,whereathirdnetworkisusedinplaceoftheoutputlayer,wehaveusedthesimplermodel.OurproposedmodelconsistsofthreemainmodulestopredicttheﬁnalCCGoutputsupertags:theInputLayer,LSTMNeuralNetwork,andOutputLayer.3.2.1InputLayerIntheinputlayer,ourNNisinspiredbytheworkofCollobertetal.,[11],wherefeaturevectorsarecomputedbylook-uptables,concatenatedtogetherandthenfedtothenetwork.Theinputlayerconsistsofthreemaincomponents:wordembedding,suﬃx,andcapitalization-basedfeatures:1.Wordembeddings:ourbestmodelusepre-trainedGoogle’sWord2Vec300-dimensionalembeddingstrainedon100billionwordsfromGooglenews[83].WealsorunourexperimentsonotherpublishedembeddingsfromLingetal.,[99]of100dimensionstrainedonReutersnewsdata.Inaddition,aswehypothesizedthatthewordembeddings-47- 哈尔滨工业大学工学博士学位论文usedinthestate-of-the-artmayperformbetter,wealsousedthepubliclyavailableTurianembeddingswith50and100-dimensionalembeddingsfromTurianetal.,[41].FollowingCollobertetal.,[11],allwordswerelowercasedbeforepassingthroughthelookuptabletoconvertthemtotheircorrespondingembeddingsandalsoallnumberswerereplacedbyasingledigit’0’.Additionally,inthesamewayasLewisandSteedman[23],weaddtwofeaturesnamelycapitalizationandsuﬃxesforeachword.2.Capitalizationfeature:followingLewisandSteedman[23]andXuetal.,[45],thecapitalizationfeaturehasonlytwovaluesindicatingwhetheragivenwordiscapitalizedornot.Thisfeatureiscalculatedbeforethepreprocessingofthedata.3.Suﬃxfeature:wefollowthealmoststate-of-the-artexistingCCGsupertaggersusingsuﬃxesofsizetwo.Weseparatelyconcatenaterepresentationsofthesefeatures,andthenusethemastheinputtothenetwork.3.2.2NeuralNetworkCCGsupertaggingisperformedusingBLSTMbasedmodels.Inthisarchitecture,theinputsencodedfromthepreviousprocessintheinputlayerarefedtoabackwardLSTMlayerthentoaBLSTMlayerasfollows:1.BackwardLSTMLayer:theextractedfeaturesofeachwordinasentenceareﬁrstconcatenatedintheinputlayerandthenfedthroughabackwardLSTMlayer,whichhasstrongabilitytomemorizeinformationforlongdistance.Tocomputethehiddenstate(hBt),thebackwardLSTMreadtheinputsequencefromtheendtothebeginningateachtimestep,andtheoutputofthislayerisusedastheinputrepresentationtotheBLSTM.2.BLSTMLayer:theinputsrepresentationoutputtedfromtheﬁrstbackwardLSTMlayeraretheinputtedtoasecondbackwardLSTMtocomputethehiddenstate!(ht),andaforwardLSTMtocomputethehiddenforwardsequence(ht).Thisallowsourmodeltoprocessthedatasequenceandcomputerepresentationforeachinputthatdependsjointlyoninformationlearnedfromthetwoorientationsofaninput(leftandright)atatimestept.Finally,theoutputsfromeachLSTM(backwardandforward)at!eachtimestepareconcatenatedtogether[ht,ht]andthenfedthroughtheoutputlayer.-48- 第3章Backward-BLSTMmodelfortheCCGSupertaggingtask3.2.3OutputLayerTheoutputoftheneuralnetworkateachtimesteptisfedthroughaSoftmaxlayertodecodeitintoprobabilitiesforeachSupertagandmakecertainthatthenetworkoutputsareallbetweenzeroandone,andthentheysumtooneoneachtimestep.Figure3-3illustratesthenetworkarchitectureindetails.图3-3Backward-BLSTMmodelfortheCCGsupertagging.3.3ExperimentsSettingsThedatasetsandparametersvaluesofourexperimentsaredescribedinthefollowingsections.3.3.1ExperimentalDataWeuseddiﬀerentdatasetstotestthevalidityofourapproaches,mainlyweusein-domainandout-of-domaindatasets.Forin-domaindatasets,weusedtheCCGBankcorpus[39]describedintheﬁrstchapter.Followingthesamesplit,Sections2-21astraining,Section00asdevelopmentsetandSection23asindomaintestset.Fortheout-of-domaindatasets,weusetwodatasetsnamelyWikipedia(200Sen--49- 哈尔滨工业大学工学博士学位论文[100]O1tences)fromHonnibaletal.,,andBioinfercorpus(1,000Sentences)fromPyysaloetal.,[101].3.3.2DataPreprocessingThefollowingpreprocessingstepswereperformedtoallourdatasets:•Forallwordswecamedowntotheirlowercaseform.•Allsequencesofdigitswereconvertedintoasingledigit′0′.•Forwordsandnumberscontaining′n′,webacked-oﬀtothesubstringafterthedelimiter.3.3.3ImplementationThecodeforourexperimentswaswritteninPython2.7.5.WeimplementedourBackward-BLSTMmodelusingtheversion0.2.0ofKeras[82],aTheano-basedNNlibrary.Bothtrainingandtestingweredoneonthesentencelevel.3.3.4Hyper-ParametersAsmentionedinthepreviousSection,weperformedexperimentswithdiﬀerentsetsofpubliclypublishedwordembeddings.Table3-1givestheperformanceofdiﬀerentwordembeddingsonthetermof1-bestaccuracy.AccordingtotheresultsinTable3-1,themodelsusingGoogle’sWord2Vec300-dimensionalembeddingsobtainasigniﬁcantim-provementandshowthatthechoiceofembeddingsiscrucialtoimprovetheperformanceofthistask.表3-1Comparisonoftheaccuracyresultsonthedevelopmentsetusingdiﬀerentwordembeddings.WordembeddingsAccuracyGoogle-30093.53Turian-5093.35Turian-10093.29Ling-10092.81AbbreviationssuchasGoogle-300refertotheGoogleWord2vecembeddingswitha300-dimensionalembeddingsspace.Wemeasuredtheaccuracyofthedatasetforthecapitalizationandsuﬃxesdimen-O1https://sites.google.com/site/stephenclark609/resources-50- 第3章Backward-BLSTMmodelfortheCCGSupertaggingtasksionswithdiﬀerentvaluesof5,10,16,32,64and128.Theexperimentalresultsshowedthatthedimensionsizeofﬁve(5)achievedthehighestaccuracy.Forthehiddendimen-sion,weexperimentedwithvaluesrangingfrom100to900,andthehiddendimensionwithsize400showedthehighestaccuracy.3.3.5LearningAlgorithmSinceagoodoptimizationmethodyieldsbetterresults,optimizationisthemaintaskofdealingwithmachinelearningproblems.TrainingwasdonebyAdamoptimizerwithaﬁxedlearningrateof0.001.Duringtraining,wehaveexploreddiﬀerenttypesofoptimizationstrategiessuchasSGDandAdeDelta[84]withoutanyimprovementoverAdam.Fortheoutputlayers,weusedtheSoftmaxactivationfunction.3.3.6DropoutWeobtainsigniﬁcantimprovementsinourmodelperformanceafterusingdropout;Table3-2comparetheresultswithandwithoutusingdropoutforbothdevelopmentandtestset;alltheotherparametersarethesame.表3-21-bestAccuraciesresultswithandwithoutdropoutondevelopmentandtestdata.DevelopmentSetTestsetDropout94.0994.25Nodropout93.5393.85Weobserveanessentialimprovementintheaccuracy,whichdemonstratesthatthedropoutbringssigniﬁcantimprovementintheperformanceandismoreeﬀectiveinreducingover-ﬁtting[102].Weusedaﬁxeddropoutrateof0.5.Table3-3reportsthechosenhyper-parametersforourbestmodels.Wetunedthehyper-parametersthentrainedthemodels.Weevaluateourmodelsonthetermof1-bestaccuracy(themostlikelypredictedsupertag).Wetrainedthemodelsforthirty(30)epochs;ourbestmodelwasobtainedatthe27thepoch.Weusedthemodel’sparametersofthehighestaccuracyonthedevelopmentset.Figure3-4showsthe1-bestaccuracyofourBLSTMproposedmodelonSection00(developmentset)oftheCCGBank.-51- 哈尔滨工业大学工学博士学位论文表3-3Theﬁnalchosenhyper-parameters.Hyper-parameterValueWordembeddingsgoogle’sWord2VecCapitalizationdimension5Suﬃxdimension5Dropout0.5Numberofepochs30Hiddendimension400OptimizerAdamLearningrate0.001图3-41-bestaccuracyofourBackward-BLSTMproposedmodelonthedevelopmentsetwithandwithoutdropout.-52- 第3章Backward-BLSTMmodelfortheCCGSupertaggingtask3.4ExperimentResultsInthissection,wecompareourapproachwiththepreviousstate-of-the-artexistingmodelsforCCGSupertaggingandmultitagging.3.4.1SupertaggingResultsTotestthefeasibilityofourbackward-BLSTMmodelweconduct2kindsofexperi-ments:ﬁrstly,weusethefullsetoflabelsthatappearinthetrainingdataintotalof1286labels,second,followingthestate-of-the-art,weconductexperimentsonthesetof425labelsthatappearmorethan10timesinthetrainingset.Recurrentmodelsbecamestate-of-artatsequentialtasks.Xuetal.(2015)reportthehighestaccuracyonthedevelopmentset(93.07percent).Besides,LewisandSteedman[23](91.10percent)andClarkandCurran[14]alsoreachaccuracyabove(92.6percent)withgoldPOStags.Thesethreesystemsareconsideredasstate-of-the-artsystemsinCCGsupertagging.Nevertheless,thebestmodelaccuracyreportedbyXuetal.,[45]hasaccessonlytothepreviouscontext,whileourmodelthankstothestrongabilityofBLSTMnetworkscanaccesstoinformationinbothpreviousandfuturecontext.Table3-4presentsthemostlikelypredictedsupertag(1-bestaccuracy)resultsofourBackward-BLSTMmodelonthedevelopmentsetcomparingwithallpreviousworks.表3-41-bestaccuracyonthedevelopmentset(Section00).ModelAccuracyC&C(goldPOS)92.60C&C(autoPOS)91.50NN91.10RNN93.07Ours94.09AspresentedinTable3-4,ourmodelimprovestheresultsandproducedthebestperformanceandoutperformstheC&CsupertaggerwithgoldPOS(+1.5percent)andgiveshighaccuracythanRNN[45](+1.02percent).Thisimpliesthatourmodelmanagestolearnlong-termdependenciesfromthedata.Asanoverallevaluationonthetestset,Table3-5showscomparabletestingresults-53- 哈尔滨工业大学工学博士学位论文intermsof1-bestaccuracyonSection23fromtheCCGBankcorpus.Ourproposedsupertaggersigniﬁcantlyoutperformsthestate-of-the-artsystems.Itisclearthatoursupertaggerisverycompetitive,despiteusingverysimplearchitectureandalargenumberofCCGsupertags.Moreover,wealsotestedourmodelontwoout-of-domaintestset.ItcanbeseenthatourBackward-BLSTMmodelyieldsbetterresultsandismuchbetterperformancethanthepreviousmodelswithalltestdata.TheonlyoneexceptionistheC&Csupertaggerwithgold-standardPOStags,inwhichweunderperformtheirresultsinBio-GENIA(weusedBio-GENIAgold-standardCCGlexicalcategorydatafromRimellandClark[103]asnogoldcategoriesareavailableintheBioinferdata).表3-51-bestaccuracyonthetestset.ModelSection23WikiGeniaC&C(goldPOS)93.3288.8091.85C&C(autoPOS)92.0288.8089.08NN91.5789.0088.16RNN93.0090.0088.27Ours94.2590.6288.55TomakeadirectcomparisonwiththeclosestworktoourspresentedbyLewisetal.,[58]andVaswanietal.,[59]usingLSTMarchitectures,weconductexperimentsusingthesamesetof425labels.TheresultsarereportedinTable3-6.表3-61-bestaccuracycomparison.ModelSection00Section23LabelSizeLewisetal.,201694.194.3425Ours94.2894.47425Vaswanietal.,201694.08–1286Vaswanietal.,2016+LM+Beam94.2494.51286Ours94.0994.251286-54- 第3章Backward-BLSTMmodelfortheCCGSupertaggingtaskFromTable3-6,wecanseethatourmodelisnomorethan0.01%lowerinaccuracytothemodelproposedbyLewisetal.,[58]onsection00usingthewholelabelset(1,286labels)andis(+0.12%)usingthesetof425labels.ThismodelusedadeeparchitectureofBLSTMwithasubsetof425lexicaltags.WhileourmodelachievesthesamelevelofaccuracyasVaswanietal.,[59]onsection00andisslightlyloweronsection23(-0.03%).Comparedwiththelatter,weshouldconsiderthatourmodelismuchsimplerintermofarchitectureandourhiddenstateisthirtypercentsmaller(400versus512).Vaswanietal.,[59]usedadiﬀerentarchitecturewithdiﬀerenttrainingprocedurebasedonBLSTM+LanguageModel+beamencoding.Themajordiﬀerencesareresumedasfollow;ourmodelintroducedbackwardLSTMtocapturelong-rangecontextualinformationandeliminatetheneedforcontextwindowsandusedBLSTMtocaptureinformationfrombothpastandfuturedirections.Thesemakeoursupertaggermoreaccurateinrecoveringlong-distancedependenciesandmuchsimplerandcomparabletotherecentmodelsproposedbyLewisetal.,[58]andVaswanietal.,[59].3.4.2Multi-taggingResultsFollowingClarkandCurran[40]andCharniaketal.,[104],thesupertaggercanpoten-tiallyassignmorethanonesupertagtoeachwordwhoseprobabilitiesarewithinsomefactors.Forcategorieswhoseprobabilitiesarenotwithinfactor,theprobabilityofthe1-bestcategoryispruned.Tovalidateourapproach,wealsoconductmulti-taggingexperiments.Weevalu-atedourmulti-taggerusingthesamelevelsintroducedin[14].Forthemulti-taggingexperiments,wecalculatedperwordaccuracy,whereweconsiderthewordtobetaggedcorrectlyifthecorrectcategoryisincludedinthesetoftheassignedcategories.Wealsocalculatesentenceaccuracywhichisthepercentageofsentenceswhosewordsarealltaggedcorrectly.Wecompareourproposedmodelwithstate-of-the-artmethodsformulti-tagging.TheresultsarereportedinTable3-7.Inthiscase,itisobservedthatourmodelincreasestheperformanceofbothWORDandSENTaccuraciesonalllevels(0.075,0.030,0.010,0.005and0.001).-55- 哈尔滨工业大学工学博士学位论文表3-7Performancecomparisonofdiﬀerentmodelsformulti-taggingaccuracyonSection00fordiﬀerentlevels.OursRNNNNC&C(autopos)C&C(goldpos)WordSENTWordSENTWordSENTWordSENTWordSENT0.07597.3668.2697.3366.0796.8361.2796.3460.2797.3467.430.03098.1575.9598.1274.3997.8170.8397.0565.5097.9272.870.01098.7182.0198.7181.7098.5479.2597.6370.5298.3777.730.00599.0585.4199.0184.7998.8483.3897.8672.2498.5279.250.00199.5391.2999.4190.5499.2989.0798.2580.2499.1787.193.5SummaryIntermoflearninglongdependencies,LSTMgivesgoodperformanceoverstandardRNNbasedmodels.Insomecases,theinformationontherightsideisveryimportant.BackwardLSTMisverypowerfulincapturingtheinformationforalongtime.However,theLSTMhiddenstatetakesinformationonlyfromthepast,knowingnothingaboutthefuture.Ontheotherhand,forwardLSTMisveryeﬃcientonmemorizinginformationontheleftcontext,butinsomecases,itismoreimportanttoobservethepreviouscontextratherthanthefutureone.InCCGlabelingtask,wehaveaccesstobothleftandrightinformationcontext(previousandfuture).Inthischapter,wedemonstratedtheadvantagesofBLSTMthansimpleRNNsandGRUsfortheCCGSupertagging.Ourproposedmodeloutperformedpreviousresultsonsupertaggingandmulti-taggingcomparingtostate-of-the-artmodelsonvariousbenchmarkdatasets.AnanalyzeofourexperimentsresultsindicatetheneedofBLSTMtocaptureinformationinbothdirections.ThemainﬁndingsfromthedirectcomparisonofourBackward-BLSTMmodelagainstthestate-of-the-artexistingmodelsareasfollows:(1)ourBackward-BLSTMmodelreachesahigheraccuracyscore.(2)ItissigniﬁcantlybetterabletotrainthesupertaggeronthefullsetofCCGlexicalcategoriesobservedduringtraining.(3)Itoutperformsevenforsupertaggingandmulti-tagging.OurmainﬁndingssupportthehypothesisthattheLSTM-basedmodelsaremorepowerfulinmodelingsequentialdata.TheimprovementscanonlybeduetoBLSTMarchitectureadvantages.Overall,theresultswepresentinthischapterindicatesthatallofourresultsarecomparablewithstate-of-the-artresults.Ourresultsarepromisingand-56- 第3章Backward-BLSTMmodelfortheCCGSupertaggingtaskshowthatourmodelcancompetewith,andinmostcasesoutperform.AlthoughBLSTMperformsreasonablywellfortheCCGsupertaggingtask,itusessentencelevelrepresentationprocessingasequencewithoutanycorrelationsbetweenlabelsinneighborhoodswhichhavegreatinﬂuencesonpredictingthecurrentlabel.Usingagoodnetworkthatcanlearnsentencerepresentationwherewecangainfrombothpastandfutureinputfeaturesandcanusesentenceleveltaginformationmightbebeneﬁcialforourtask.Inthenextchapter,weplantouseacombinationofmachinelearninganddeeplearningmodelswhichcanmakeuseofbothtagandsentencelevelsrepresentationsfortheCCGsupertaggingtask.-57- 哈尔滨工业大学工学博士学位论文第4章BLSTM-CRFmodelfortheCCGSupertaggingtask4.1IntroductionMachinelearningmethodsweresuccessfullyappliedtotheCCGsupertaggingtaskincludingMEmodels[14]andNNwithCRFs[23].MachinelearningmodelstreattheCCGsupertaggingtaskasastructuredpredictionproblemandtrytojointlypredicttheentiresequenceoutputbutrequireextensivefeatureengineeringsuchaslexicalfeatures(POStags)toprovidegoodresults.Ontheotherhand,deeplearningmodelssuchasRNNsandBLSTMsusediﬀerentmethodstoautomaticallyextractfeaturesthatcontaininformationaboutthecurrentwordanditsneighboringcontextwhileonlyrequiringasequenceoftokensasinput.Insimplerecurrentnetworks,theycontaintheentiresentenceperformingprocessingofsentencewiththeoutputdependingonthepreviouscomputations.Bidirectionalrecurrentnetworkscontaintheentiresentenceandperformcomputationsfromtheprecedingandfollowingdirections.InthepreviousChapter,wehavedescribedLSTMbasedmodelsfortheCCGsupertaggingtask.LSTMsareconsideredasthebestmodelsinassigningCCGlexicalcategoriestoagivensentencebasedontheirabilitytoretaininformationforlonghistoricaltimedependencies,aswellastheirabilitytoworkwithbothpastandfutureinformationwhenBLSTMsareexploredforthetask.However,itiswellknownthatevenBLSTMshaveshowntobeextremelygoodatmemorizinginformationforalongdistancetheystillpredicteachwordoutput(label)inisolationwithoutanyregardstothepreviouslypredictedsupertagsandnotaspartofasequence.Machinelearningmodelsanddeepnetworkshavetheirowncapabilitiesandshort-comings.Insimplerterms,whiledeeplearningmodelsattempttobeneﬁtfromrecognizingsamplesinthesurroundinginputfeatureswithoutrelyingonanyfeaturesengineeringandlearntopredicttheoutputsforthesequencebyrequiringonlyplaintextasinputwithoutanyinformationaboutthepreviouslypredictedsupertags,themachinelearningmodelslikeCRFevennecessitatemanyhand-craftedfeaturesbutstillbeneﬁtfromtheknowledge-58- 第4章BLSTM-CRFmodelfortheCCGSupertaggingtaskaboutadjacentlabelpredictions(surroundingoutputs).Capturingdependenciesbetweenpredictionsbymodelingthedependenciesbetweentheinputrepresentationsusingdeeplearningmodelsorbymodelingthestructuraldepen-denciesbetweenoutputpredictionsusingmachinelearningalgorithmsareveryimportantandbeneﬁcialfortheCCGsupertaggingtask.Forthisreason,inthisChapter,webeneﬁtfromthetwoapproaches.WeintroduceastructuredneuralnetworkarchitecturefortheCCGsupertaggingtask.Themethodisonthebasisofthecombinationofmachinelearn-inganddeeplearningmethods.Speciﬁcally,theapproachassignsCCGlexicalcategoriestoeachwordinaninputsentenceintwosteps;intheﬁrststep,weuseBLSTMnetworkstooperateoninputcontext;themodelisabletomemorizeinformationfromtheprecedingandfollowingwordsforlongspansandlongsequences.Afterward,inthesecondstep,themodelbeneﬁtsfromtheknowledgeaboutneighboringlabelpredictionswhereaCRFlayerisexploitedtojointlypredicttheﬁnalsupertags.Theorganizationofthechapterisasfollows:Section4.2providessomebasicdeﬁ-nitionandnotationoftheLSTMandCRFmodelswiththedescriptionofourparticularapproachsetuptoCCGsupertaggingusingLSTM-CRFcombination.Section4.3de-scribesourexperimentssetupforthetask.Section4.4presentstheexperimentalresultsandSection4.5providestheconclusion.4.2ModelDescription4.2.1BLSTMNetworkLSTMsarethebesttechniquefortheCCGsupertaggingtaskamongthefamilyofRNNtechniquesandexistingconventionalmachinelearningalgorithmsbecausetheyhaveaprovencapabilitytostorelong-rangecontextualinformationandweresuccessfullyappliedtotheCCGsupertaggingtask[58][59].LSTMscanresolvethevanishinggradientproblemsfacedintrainingsimpleRNNsandarebetterthanGRUtolearnoverlongtimesteps.LSTMshavetheabilitytouseitsmemoryblockswhichconsistsofthreegates:inputgate,forgetgate,andoutputgate,togetherwitharecurrentcellasdiscussedinthepreviousChapter(Chapter3)tomakedecisionsonwhatinformationisallowedtobestoredinthememory,readfromitandsavedonit.OneshortcomingofLSTMsisthattheyareonlyabletomakeuseoftheprevious-59- 哈尔滨工业大学工学博士学位论文contextwithoutanyinformationaboutthefuturecontext.BLSTMsarechosenasanextensionoftraditionalLSTMstoovercomethedrawbacksofsimpleLSTMsandcanimprovemodelperformanceontheCCGsupertaggingproblemwhichcanprovidefullerlearningontheproblem.IntheCCGsupertaggingproblem,alltimestepsoftheinputareaccessible,BLSTMstraintwoinsteadofoneLSTMsontheinputsequence.Theﬁrstoperatetheinputsequencefromthebeginningtoitsendandthesecondusesthereverseddirectionofthesequenceentry,whichisforwardandbackwardpasses,respectively.Inthiswork,wefocusonusingBLSTMnetworks.TofurtherextendthebeneﬁtoftheBLSTMarchitecturetocapturecomplexinputinteractions,ourmodelusesadeeperarchitecturewhere2-BLSTMlayersareusedasshowninFigure4-1.DeepBLSTMscanbecreatedbystackingmultipleBLSTMslayersontopofeachother,withtheoutputsequenceofonelayerformingtheinputsequenceforthenext,inotherwords,eachhiddenlayerreceivesinputfromboththeforwardandbackwardlayersoftheBLSTMatthelevelbelow.图4-1DeepBLSTMarchitecturewith2-BLSTMLayers.ThestrengthofdeeplearningmodelssuchasBLSTMnetworksreliesontheuseofdiﬀerentmethodstoautomaticallyextractfeaturesthatcontaininformationaboutthecurrentwordanditsneighboringcontextwithoutanyneedforhand-craftedorlexicalfeatures.However,itiswellknownthatevenrecurrentnetshaveshowntobeextremelygoodatmemorizinginformationforalongdistancetheirmainweaknessresidewhenpredictingeachwordoutput(label)inisolationandwithoutanyregardstothepreviously-60- 第4章BLSTM-CRFmodelfortheCCGSupertaggingtaskpredictedsupertagsandnotaspartofasequence,comparedtomachinelearningmodelssuchasHMMsandCRFmodelsthatarepowerfulforstructuredpredictionproblemsastheycangainknowledgefromthesurroundinglabels.AninterestingapproachtosolvetheCCGsupertaggingproblemistobeneﬁtfrombothdeeplearningandmachinelearningmethods.Thisisveryimportantbecausepre-dictionwillnotonlydependoninputrepresentationsbutalsodependsonthedependencebetweenoutputpredictions.Inthiswork,wealsobeneﬁtfromtheadvantagesofamachinelearningalgorithmthatwewillcombinewithBLSTMarchitecturedescribedinFigure4-1.Firstly,belowwewilldescribetheCRFmodels.4.2.2ConditionalRandomFieldsTheCCGsupertaggingtaskis,givenasentenceofn-words,assignCCGlexicalcategories(supertags)toeachwordinthesentence.OneapproachtoCCGsupertaggingistoclassifyeachwordindependentlywhichisthecaseofdeeplearningmodels.Theproblemwiththisapproachisthatitassumesthatgiventheinput,alloftheCCGlabelsareindependentandoftenproducesunsatisfactoryresults.Infact,toachievebetterresults,wemusttakeintoaccountthatwearepredictingstructuredoutputsandmodelingtheproblemtoincludeourpreviousknowledge.IntheCCGsupertaggingtask,labelsofneighboringwordsaredependentanditisnecessarytohaveinformationaboutthesurroundingpreviouslypredictedsupertags.Predictingthecurrentsupertagsbytakingintoaccounttheadjacenttagscanbemadein2ways:ﬁrst,bypredictingadistributionofsupertagsateachtimestep,thenusebeamsearchtoﬁndtheoptimalsequence[59].Second,byrelaxingtheindependenceassumptionthatcanbedonewiththefocusonsentence-levelinsteadofindividualpositionwheretheadjacentoutputvaluesinﬂuenceeachotherandtakeadvantageofthesurroundinglabels,thusleadingtoConditionalrandomﬁelds(CRF)asoneofthebestperformingstatisticalmodelsformanysequencetaggingtasksbyarrangingtheoutputvariablesinalinearchain.TheadvantageofCRFsoverHMMmodelsistheirconditionalnature,resultingintherelaxationoftheindependenceassumptionsrequiredbyHMMstoensuretractableinference.Additionally,CRFsavoidthelabelbiasproblem[105],aweaknessexhibitedbyMaximumEntropyMarkovModels[106](MEMMs)andotherconditionalmarkovmodels-61- 哈尔滨工业大学工学博士学位论文basedondirectedgraphicalmodels.CRFsoutperformbothMEMMsandHMMsonsomeofreal-worldsequencelabelingtasks[105][107][108].图4-2CRFGraph.CRF[105][109]isafamilyofstatisticalmodelsasprovensupervisedlearningmethodthathasbeenusedextensivelyformanyNLPapplicationsaswellasmanylabelingse-quentialdatatasks.CRFareprobabilisticgraphicalmodelsoftheconditionaldistributionp(y|x)trainedtomaximizeaconditionalprobabilityofstructuredoutputvariablesygivenobservationsx.Whenusedforsequencetaggingproblems,acommongraphstructureusedisalinearchainwithastatetransitionmatrixwherewecaneﬃcientlyusepreviousandfutureoutputstopredictthecurrentoutput.WhenwemodeltheCCGSupertaggingproblem,themostcommongraphstructureisillustratedinﬁgure4-2.FortheCCGsupertagging,thelinearchainCRFisgivenaninputsequence:x=¹x1;x2;:::;xTº;(4-1)andanoutputstatesequence:y=¹y1;y2;:::;yTº;(4-2)alinear-chainCRFwithparametersWdeﬁnesaconditionalprobabilityfortheoutputsequence(Eq.4-2)asfollows:1∏NP¹y˜jxº=expf¹y˜tº+¹y˜t;y˜t+1ºg;(4-3)Zt=1where¹y˜tºistheunarypotentialforthelabelatpositiont,¹y˜t;y˜t+1ºisthepair-wisepotentialbetweenthepositionstandt+1,andZisanormalizationfactor.-62- 第4章BLSTM-CRFmodelfortheCCGSupertaggingtask4.2.3BLSTM-CRFproposedmodelfortheCCGSupertaggingtaskRecentworksonNERbyHuangetal.,[110]andothershavecombinedthebeneﬁtsoflinearstatisticalmodelswithneuralnetworkstosolvemanysequencetaggingtasks.Inthisapproach,weintroduceastructuredneuralnetworkarchitecturefortheCCGsupertaggingtask.Speciﬁcally,theapproachassignsCCGlexicalcategoriestoeachwordinaninputsentenceintwosteps.Intheﬁrststep,itusesBLSTMnetworktooperateoninputcontextandtoconsidertheinputfeatures;themodelisabletomemorizeinformationforlong-rangedependenciesandfromleftandrightpositions.Afterward,themodelbeneﬁtsfromtheknowledgeaboutneighboringlabelpredictionswhereaCRFlayerisexploitedtoobtainsentenceleveltaginformationandjointlypredicttheﬁnalsupertags.Therefore,theoutputisanoptimaltagsequenceinsteadofmutuallyindependenttagswhichcomprisestwoaspectsforcouplinginputandoutputlevels.OurproposedmodelconsistsofthreemainoperationstopredicttheﬁnalCCGoutputsupertags:InputLayer,BLSTMNeuralNetworkandtheCRFOutputLayer.1.InputLayer:followingCollobertetal.,[11],inputfeaturevectorsarecomputedbylook-uptables,concatenatedtogetherandthenfedtothenetwork.Theinputlayerconsistsof3lookuptablesoffeaturevectorsasinputfeaturesthatareﬁrstconcatenatedandthenfedintothenetwork,asdescribedbelow:Pretrainedwordembeddings:tocapturethesemanticandsyntacticsimilaritybetweenwordsandreducetherequirementforhandcraftedfeatures,wemakeuseofpre-trainedwordembeddingsasdistributedwordrepresentationswhichmapeachwordtoahighdimensionalvectorspace.Toobtaintheﬁxedwordembeddingofeachwordweuseapre-trainedwordembeddingsmodel.Ourmodelusethepre-trainedGoogle’sWord2Vec300-dimensionalembeddingstrainedon100billionwordsfromGoogleNews[83].FollowingCollobertetal.,[11],allwordsarelower-casedbeforepassingthroughthelook-uptablestoconvertthemintotheircorrespondingembeddingsandalsoallnumbersarereplacedbyasingledigit’0’.Forwordsthatdonothaveanentryinthepre-trainedwordembeddings,the’UNKNOWN’entryfromthepre-trainedembeddingsisused.Twofeaturesthatcontaincharacter-levelinformation,namelycapitalizationandsuﬃxwasusedinourexperiments.Capitalizationfeature:thecapitalizationfeaturehasonlytwovaluesindicatingwhether-63- 哈尔滨工业大学工学博士学位论文thegivenwordiscapitalizedornot.Suﬃxfeature:followingthealmoststate-of-the-artexistingCCGsupertaggingmodels,weusesuﬃxesofsizetwo.2.BSLTMNeuralNetwork:inthesupertaggingfortheCCGgrammar,itisbeneﬁcialtoemployasophisticatednetworksuchasBLSTM[98],whichcanberegardedasapileoftwoLSTMlayers.ThepreviousinputrepresentationsareextractedbyaforwardLSTMlayer,andthefutureinputrepresentationsarecapturedbyabackwardLSTMlayer.Inthisway,wecaneﬀectivelyutilizethepreviousandfuturefeatures;asdescribedinChapter3.OurneuralnetworkforCCGsupertaggingisconstructedofadeepBLSTMnetwork.The!BLSTMreadstheinputwhereaforwardLSTMcomputesthehiddensequence(ht)andreadsinputfromthebeginningtotheendandabackwardLSTM(ht)usestheoppositedirection.Inordertocapturecomplexinteractionsbetweeninputwords,weusedtwolayersofBLSTM,whichisthesameasLewisetal.,[58]wheretheoutputoftheﬁrstBLSTMlayerisusedastheinputrepresentationtothesecondBLSTMlayer.ThentheoutputsfromthesecondBLSTM(backwardandforward)areprovidedasinputtotheoutputlayer.图4-3Theneuralnetmechanism.3.OutputLayer:inourpreviousworks,theoutputsfromtheneuralnetworkateachtimesteparefedintoadenselayerwiththeSoftmaxfunctionaslinearactivationfunction,whoseoutputsizeequalsthenumberofsupertags.ThediﬀerenceinthisworkisthatwedonotusetheSoftmaxoutputbutratherutilizetheoutputofthedenselayerforanadditionalCRFlayerwhichcomputestheﬁnaloutputsbyjointlydecodingtheminto-64- 第4章BLSTM-CRFmodelfortheCCGSupertaggingtaskprobabilitiesforeachSupertagformingthebestlabelsequenceofthenetwork.TheCRFlayerensuresmodelingtheoutputprobabilityofthecurrentinputgivenasequenceofneighboringlabelsasillustratedinFigure4-3.ThearchitecturerepresentationofourneuralBSLTM-CRFcombinedmodelfortheCCGsequencelabelingtaskisshowninFigure4-4.图4-4BLSTM-CRFnetworkmodelfortheCCGsupertagging.4.3ExperimentSettingsNeuralnetworksarediﬃculttoconﬁgure,andtherearealotofparametersthatmajorlyinﬂuencethelearningandtheperformanceofthenetwork,andneedtobewelltunedtoﬁndtheoptimumvaluestoimprovetheaccuracyofthemodel.Inthissection,weprovidedetailsabouthyper-parameterstuningtotraintheneuralnetwork.4.3.1DatasetsWeevaluatedtheeﬀectivenessofourmodelontheCCGsupertaggingtaskonindomainandout-of-domaindatasets.Section00(1913sentences)oftheCCGBankcorpus[39]isusedasadevelopmentsettoselectourhyper-parametersandSections02–21fortraining.-65- 哈尔滨工业大学工学博士学位论文Supertaggingperformancesarereportedbasedontheaccuracyonsection23(2407sentences)asindomaintestdata,Wikipedia(200sentences)fromHonnibaletal.,[100]andBio-Geniacorpus(1000sentences)fromPyysaloetal.,[101]asout-of-domaindatasets.Similartoourpreviousworks,somestepswereperformedbeforethesupertaggermodelcanbebuiltsuchasallwordswerelowercased,andallsequencesofdigitswereturnedintoasingledigit′0′.Forallsymbols(wordsornumbers)containing′n′,webacked-oﬀtothesubstringafterthedelimiter.4.3.2WordembeddingsAlldatasetsentenceswererepresentedasasequenceofone-hotvectorswhichwerebeingtransformedintoasequenceofwordembeddingsbytheembeddingweights.Theseembeddingweightswereinitializedwithpre-trainedwordrepresentationsandmorespeciﬁcallywiththepubliclyavailablepre-trainedvectorscreatedusingword2vec;weused300-dimensionalvectorstrainedonGoogleNews[83].4.3.3OptimizationAlgorithmParameterswereoptimizedusingAdamoptimizer[111]totrainourmodelwithaninitiallearningrateof0.001.WehaveexploredothermoresophisticatedoptimizationalgorithmssuchasSGDandAdeDelta[84]withoutanyimprovementoverAdam.4.3.4DropoutTrainingDeepneuralnetworksarediﬃculttotrain,andover-ﬁttingtodataisamajorchallenge.Themostcommonregularizationtechniquetopreventover-ﬁttingisDropout[102].Duringtraining,weapplieddropouttotheinputlayerofﬁxedrateto0.3thatwasquiteeﬀectivetoregularizeourmodelandreduceover-ﬁttinggivingsigniﬁcantimprovementsinaccuracy.4.3.5Hyper-ParametersTuningImplementationwasdoneinTheano[112]usingtheversion1.2.2oftheKerasdeeplearninglibrary[82]andallmodelsweretrainedonTeslaK40mGPU.Westartbyevaluatingtheperformanceofourmodelonthedevelopmentsetateveryepoch,andthebest-performingmodelwasthenusedforevaluationonthetestset.Thelargerthenetwork,themorepowerfulbutitisalsoeasiertooverﬁt.Intheexperiments,wetestedtheaccuracyofourmodelonthedevelopmentsetwiththehiddendimensionvaluesrangeinthesetof{100,200,300,400,600,700}andfoundthatthe-66- 第4章BLSTM-CRFmodelfortheCCGSupertaggingtask表4-1Theﬁnalhyper-parameterssettingsforourmodel.Hyper-parameterValueWordembeddingsWord2VecHiddendimension400OptimizerAdamDropout0.3Learningrate0.001hiddendimensionwithsize400showsthehighestaccuracy.Forsuﬃxandcapitalization,wefollowedthestate-of-the-artandusedaﬁxedembeddingofsizeequaltoﬁve(5).Wetunedthehyper-parametersthentrainedthemodels.Theresultsofourexper-imentsarereportedwiththebestmodel,whichisselectedbytheperformanceonthedevelopmentset.TheﬁnalchosenparametersarereportedinTable4-1.4.4ResultsandAnalysisInthissection,wereporttheevaluationoftheperformanceofourproposedBLSTM-CRFmodelforCCGsupertaggingforbothin-domainandout-of-domaindatasets.Wealsoperformmultitaggingexperiments,theresultsarediscussedbelow.4.4.1SupertaggingResults表4-2Performancecomparisonwithstate-of-the-artmethodsonthedevelopmentset.ModelAccuracyC&C(goldPOS)92.60C&C(autoPOS)91.50NN91.10RNN93.07BLSTM94.1BLSTM+LM+Beam94.24BLSTM+Attention94.31Ours94.37Table4-2providestheaccuracyoftheproposedBLSTM-CRFmodelforCCG-67- 哈尔滨工业大学工学博士学位论文supertaggingonthedevelopmentset.AsshowninTable4-2,wecompareourmodelwithbaselineexistingmodelsincludingC&CmodelwithbothgoldPOSandautoPOSproposedbyClarkandCurran[14],thefeed-forwardNNmodelbyLewisandSteedman[23]andtheRNNsupertaggerbyXuetal.,[45]whereourmodelsigniﬁcantlyoutperformintermofaccuracy(1-bestpredictedlexicalcategory).ItconcludesthattheuseofBLSTMcanbringbetterperformancethansimplerecurrentnetworksandtheuseofCRFcanmodelmorestructuredependence.Withtheemergenceofdeeplearning,therearelotsofworkonCCGsupertagging.Wealsomakeacomparisonofourmodelwithsomerecentworks.WecomparedourmodelwiththerecentlyproposedmodelsbasedonBLSTMarchitecturesincludingLewis’setal.,[58]modelbasedondeepBLSTMnetwork,Vaswani’setal.,[59]modelbasedonBLSTMandenhancedwithaLanguageModel(LM)andbeamsearchwhileXu[49]usedBLSTMwithattentionmodel.Ourmodelgainmoreaccuracy(orcloseto)thandeeplearningmodelsbasedonBLSTMarchitectureswhichshowthatmodelingoutputwithastructuredmodelasCRFisveryimportantforCCGsupertaggingasasequencelabelingtask.SocombiningBLSTMwithCRFishelpfulinthistask.TheresultsreportedinTable4-3presenttheevaluationofourmodelwithexistingmodelsonthetestset(Section23oftheCCGBank).Furthermore,wealsoevaluateourmodelontwoout-of-domaindatasetsnamelyWikipediaandBio-Geniacorpus.表4-3Performancecomparisonwithstate-of-the-artmethodsonthetestset.ModelSection23WikiGeniaC&C(goldPOS)93.3288.8091.85C&C(autoPOS)92.0288.8089.08NN91.5789.0088.16RNN93.0090.0088.27BLSTM94.30––BLSTM+LM+Beam94.5––BLSTM+Attention94.46––Ours94.4990.388.51-68- 第4章BLSTM-CRFmodelfortheCCGSupertaggingtaskTheresultsofthetestsetarecompetitive,evenwhencomparedtopreviousworkusingmanyfeatures,thenetworkachieves94.49%onSection23comparedto93.32%and90.3%onWikidatacomparedto88.80%byClarkandCurran[14]withgoldPOS.Insomecases,wearealsoabletobeat(orcloseto)thebestresultswhereweobtain94.49%onSection23comparedto94.46%byXu[49]andourmodelisnomorethan0.01%loweraccuracytothemodelproposedbyVaswanietal.,[59]astheirmodelisenhancedwithalanguagemodel.However,ClarkandCurran[14]reportaconsiderablyhigherresultof91.85%onBio-Geniadatasets,comparedtotheexperimentspresentedhere,theirmodelusedthegoldcategorieshoweverinourexperimentsnogoldcategoriesareavailableintheBio-Geniadata.4.4.2Multi-taggingResultsItisalsoimportanttocompareourmodelformulti-tagginginwhichweaimtoincreasethenumberoftheassignedlexicalcategoriestoeachword.Weperformmulti-taggingexperimentstomakeourBLSTM-CRFsupertaggermoreaccuratewherethesupertaggerisabletoassignmorethanonelexicalcategorytoeachwordwithinafactor.WeusedthelevelsdeﬁnedbytheC&Cparser[14]todeﬁnecut-oﬀsformulti-taggingbasedontheprobabilitiesfromtheBLSTM-CRFmodel.表4-4Performancecomparisonofdiﬀerentmodelsformulti-taggingaccuracyonSection00fordiﬀerentlevels.OursRNNNNC&C(autopos)C&C(goldpos)WordSENTWordSENTWordSENTWordSENTWordSENT0.07597.3568.1297.3366.0796.8361.2796.3460.2797.3467.430.03098.1275.9298.1274.3997.8170.8397.0565.5097.9272.870.01098.7281.9598.7181.7098.5479.2597.6370.5298.3777.730.00599.0185.2399.0184.7998.8483.3897.8672.2498.5279.250.00199.4991.1999.4190.5499.2989.0798.2580.2499.1787.19Theperformanceformulti-taggingismeasuredforbothWORDaccuracywhereweconsiderthewordtobetaggedcorrectlyifthecorrectcategoryisincludedinthesetoftheassignedlexicalcategoriesandSENT(sentence)accuracywhichisthepercentageofsentenceswhosewordsarealltaggedcorrectly.Theresultsoftheseexperiments-69- 哈尔滨工业大学工学博士学位论文arepresentedinTable4-4whereTheWORDcolumngivesthewordaccuracies,andtheSENTcolumngivesthesentenceaccuracies.ItcanbeseenthatourmodelresultsimproveperformanceoneverylevelsthanthepreviouslyproposedmodelsforbothWORDandSENTaccuracy.4.5SummaryDevelopmentofdeeplearningmodelsforCCGsupertaggingtaskisapowerfulcomplementtoclassicalmachinelearningmodelsthatworkwellwithoutrequiringanylexicalorhand-craftedrepresentations.Whiledeepnetworksarepowerfulformodelinginputsequences,thesemodelsstillpredicttheoutputwithoutanyregardstothepreviouslypredictedlexicalcategories.Inthischapter,theproposedmethodemploysacombinationofbothdeeplearningandmachinelearningmethodstoperformrepresentationlearningjointlyoverbothinputsandoutputs.WehavedescribedacombinedBLSTMandCRFmodelsbasedapproachforautomaticCCGsupertagging.TheBLSTM-CRFcombinationbasedsupertaggerperformreasonablybettercom-paredtothemachinelearninganddeeplearningsupertaggers.AkeyaspectofourmodelisthatitmodelsoutputlabelsviaasimpleCRFarchitecture,andinputwordsviaBLSTMnetworkscapturingcomplexinteractionsbetweenwordsandmemorizinginformationforlonghistoricaltimefrombothpastandfutureinputdirections.Themodeldescribedhereissimpleandquiteeﬀectiveforsupertagging.ThebestperformanceisachievedfortheBLSTM-CRFmodelonin-domainandout-of-domaindatasetsshowingthatthecombinedmodeliseﬃcientandpowerfultosupertaggingfortheCCGgrammar.-70- 第5章Character-WordembeddingsfortheCCGSupertaggingtask第5章Character-WordembeddingsfortheCCGSupertaggingtask5.1IntroductionMachinelearninganddeeplearningmethodshaveallbeenprovedtobeeﬀectiveinsolvingtheCCGsupertaggingproblem.However,someexistingapproachesheavilyrelyonfeatureengineeringswhererecentworksarebasedonneuralnetworkarchitecturesthatareabletoachieveimprovedresults,whileonlyrequiringasequenceoftokensasinput[11].Theemergenceofdeepneuralnetworksaimatbuildingdeepandcomplexencoderstotransformasentenceintoencodedvectorsandhavereachedstate-of-the-artperformanceintheNLPﬁeldandrelyonlyonwordembeddingstocapturesimilaritybetweenwordsbyreplacingeachwordinalower-dimensionaldistributionandinitializetheweightsofembeddingslayerwithpre-trainedwordvectorssuchasTurian[41]andword2vec[83]embeddingswhichenablethemtolearnsimilaritybetweenwordswithoutrequiringanylexicalfeatures.However,theeﬀectivenessofwordembeddingsislimitedbyunseenandwordsofverylowfrequencyinthetrainingdatawhereembeddingsdonotexist.Inotherwords,themostobviousproblemofwordembeddingsbasedmodelsarerelatedwhendealingwithwordsthatdon’tappearinthepre-trainedwordembeddingvectors-ifasymbol(token)hasbeenseenrarely,ithasanembeddingsentry,however,itwillbeoflowquality,anotherimportantcaseiswhensymbolsdidn’tappearbefore,then,ithasnoentriestotheembeddingsandthemodelneedstoback-oﬀtotheOutOfVocabulary(OOV)representation.Toaddressthisissue,weexploredeepneuralnetworkembeddingsbasedmodelsforhandlingrareandunseenwordsbycombiningCharacterandWordembeddings.WeimprovethatCharacter-basedmodelrevealssimilaritiesbetweenwordsandcanbeusedtomodelinfrequentandunknownwords.InthisChapter,weproposeaneuralsequencelabelingarchitectureforCCGsu-pertaggingwhereweapproachthechallengeofunseenandrarewords.WeusetheBLSTMneuralnetworkforword-levelrepresentation.Todealwiththedrawbacksof-71- 哈尔滨工业大学工学博士学位论文thewordbasedmodel,aCharacterBLSTMrepresentationmodelisaddedtothewordmodel,thereby,thecombinedmodelcaninferrepresentationsforpreviouslyunseenandrarewords.Thechapterisorganizedasfollows.Section5.2introducesourneuralnetworkarchi-tectureusedforCCGsupertagging.Next,Section5.3givesdetailsaboutourexperimentsandSection5.4providesexperimentalresultswithacomparisonwithpreviousworksforbothsupertaggingandmulti-tagging.Finally,insection5.5weconcludethechapter.5.2Character-WordembeddingsproposedmodelfortheCCGSu-pertaggingtaskDespitetherelativelylargeamountofworkdoneonCCGsupertaggingproblem,therehasbeennoworkaddressingthedegradationoftheperformanceonout-of-domaindatasetswherethemainreasonisOOVeﬀects[113][114]asthemodelperformancesuﬀersbecauselexicalknowledgeisnotavailableinthepre-trainedwordembeddingsforthesewords.Thereby,toachievehigheraccuracyinCCGsupertagging,itisalsoimportanttohaveagoodmodeldealingwithunknownandrarewords.Inthissection,wediscussthemodelweemployedtopredictaCCGlexicalcategoryforeachwordinasequenceinput.Forourmodel,threestepsarerequiredtoassignCCGsupertagstoagivenstringorlistoftokens:•ﬁrst,trainabasicword-levelneuralnetwork,•next,trainacharacter-levelrepresentationsneuralnetworkand,•ﬁnally,combinethetwoarchitecturesinordertopredicttheﬁnaloutput.5.2.1Word-LevelNeuralNetworkGivenasequenceofwordsasinput,weﬁrstdescribethebasicword-levelneuralnetworktowhichtheinputlayerofinputvectorsisfed.Wordembeddingshavebeenprovedtobeusefulforvarioustasks,suchasPOSTagging[11],sentenceclassiﬁcation[115],sentimentanalysis[116],sarcasmdetection[117]andCCGsupertagging[23].Themodelreceivesasinputasequenceofwords(W1;W2;:::;Wm),wheretokensaremappedtowordembeddingslayerinitializedwithpretrainedvectors,resultingina-72- 第5章Character-WordembeddingsfortheCCGSupertaggingtasksequenceofwordembeddings(eW1;eW2;:::;eWm).IthasbeenprovedthatBLSTMnetworksareverypowerfulfortheCCGsupertaggingtask[58][59]andourpreviousworksinChapters3and4.Tobettermemorizeinformation,theinputrepresentationsfromthepre-trainedwordembeddingsarethenfedintothewordlevelnetworkasapartialnetwork,whichconsistsoftwoLSTMRNNslayers—abackwardLSTMtobettermemorizeinformationfromthepastandaforwardfortheoppositedirectionperformingcomputationonbothprecedentandnextwordinputsasfollows:!!ht=LSTM¹eWt;ht1º;(5-1)ht=LSTM¹eWt;ht+1º:(5-2)Next,therespectiveLSTMrepresentations(backwardandforward)areconcatenatedforeachwordrepresentations(equation5-3)asdepictedinFigure5-1.!ht=»ht;ht¼:(5-3)Ourbestmodelusesthepre-trainedGoogle’sWord2Vec300-dimensionalembed-dingsfromGooglenews[83].FollowingCollobertetal.,[11]allwordsarelowercasedbeforepassingthroughthelookuptablestoconvertthemintotheircorrespondingembeddingsandalsoallnumbersarereplacedbyasingledigit’0’.Forwordsthatdonothaveanentryinthepre-trainedwordembeddings,the’UNKNOWN’entryfromthepre-trainedembeddingsisused.FollowingLewisandSteedman[23]twosetsoffeaturesareusedinourexperimentsnamelycapitalizationthathasonlytwovaluesindicatingwhetheragivenwordiscapitalizedornotandsuﬃxesfeatureofsizetwo.图5-1Wordlevelneuralnetwork.-73- 哈尔滨工业大学工学博士学位论文Inourword-levelbasedmodel,theinputwordstotheBLSTMlayerateachtimesteparethesequenceofpre-trainedwordembeddingswherewordsthathavesimilarmeaningcanbemadetocorrespondtoclosevectorrepresentations.However,usingsuchembeddingsinaparticulardomainsuchasBio-GeniacorpusleadstotheOOVproblem:whereNoembeddingsfordomain-speciﬁcwords.Forexample,therearesomewordsfromtheBio-Geniadatasetthatarenotpresentinthepre-trainedvectorsreleasedbyGoogleandeveninthetrainingdataoftheCCGBankcorpus.Currentword-basedmodelsareweaktohandleOOVwords,theaimofthisworkistohandlethisweaknessoftheexistingmodels.ThischallengesustoapproachtheproblemofOOVbyaCharacter-levelneuralnetworkspecializedtodealwiththesewords.5.2.2Character-LevelNeuralNetworkSeveraltechniquesforreducingOOVeﬀectshavebeenintroducedintheliterature.Anadequatesolutionistooperateonindividualcharactersofeachtokenascharactersmayalsoplayanimportantroleinmodelingsemanticmeaningsofwords.Researchintocharacterembeddingsmodelsisstillinthefairlyearlylevelofdevelopment,andmodelsthatoperateexclusivelyoncharactersarenotyetbetterthanword-levelmodelsonmosttasks.WeproposetoaddresstherarewordsprobleminCCGsupertaggingtaskbytrainingcharacterembeddingsneuralnetworkbasedmodel;however,insteadoffullyreplacingwordembeddings,weareinterestedincombiningthetwoapproaches,therebyallowingthemodeltotakeadvantageofinformationfrombothinputrepresentations(wordsandcharacters).Inthecharacterlevelrepresentation,eachwordisdividedintoindividualchar-acters(C1;C2;:::;Cn)thataremappedtoalook-uptableofcharacterembeddings(eC1;eC2;:::;eCn)andthenfedintoBLSTMnetworktoperformcomputationsonbothpreviousandfutureinputsequenceasshownisFigure5-2.ThecharacterembeddingsaregeneratedbytakingtheﬁnalhiddenstatesoftheBLSTMappliedtoembeddingsofcharactersforeachtoken.WethenusethelasthiddenvectorsfromeachoftheLSTMcomponentsandconcatenatethemtogetherasfollows:-74- 第5章Character-WordembeddingsfortheCCGSupertaggingtask!!ht=LSTM¹eCt;ht1º;(5-4)ht=LSTM¹eCt;ht+1º;(5-5)!ht=»ht;ht¼:(5-6)图5-2Characterlevelneuralnetwork.5.2.3ConcatenationThegeneraloutlineofourapproachisshownintheFigure5-3.Theinputstotheword-levelnetworkarepre-trainedwordembeddingsrepresentations(seeSection.5.2.1),andindividualcharacterstothecharacter-levelnetworkdescribedinSection5.2.2.Now,wehavetwoalternativefeaturerepresentationsforeachword;oneistheembeddingslearnedonthewordlevel,andthesecondistherepresentationbuiltfromindividualcharactersonthet-thwordoftheinputtext.FollowingLampleetal.,[118],theapproachistoconcatenatethetworepresentationsanduseitasthenewrepresentationinordertogeneratetheprobabilitydistributionovertagsforeachwordinput,suchthatthemodelcanachievebetterperformance.TheoutputsfromeachrepresentationareﬁrstconcatenatedandusedasthenewinputrepresentationtoanotherBLSTMlayerastheﬁnalsequencelabelernetwork.Afterward,aSoftmax-75- 哈尔滨工业大学工学博士学位论文activationfunctionisusedtodecodetheoutputsfromBLSTMasprobabilitiesforeachlexicalcategory(CCGsupertags).图5-3Word-CharacterbasedembeddingsmodelfortheCCGsupertagging.5.3ExperimentssettingsInthissection,weprovidedetailsabouthyper-parameterstotraintheWord-Characterneuralnetwork.5.3.1DatasetsTobecomparablewiththeresultsreportedbypreviousworkonCCGsupertaggingmodels[14][23][45][58][59],wecarryoutthesimilardatasetformyexperiments:theCCGBankcorporawiththesameregulardivisionfortraining,developmentandtestsections.Sinceitisimportanttoprovetheeﬀectivenessofourapproach,wechosetotesttheperformanceofourmodelsonout-of-domaindatasetnamelyBio-Geniacorpus(1000sentences)fromPyysaloetal.,[101].Foreachdataset,thefollowingstepswerenecessarytopreparethedatafortheexperiments:-76- 第5章Character-WordembeddingsfortheCCGSupertaggingtask•Allwordscontaininguppercasewerelowercase.•Alldigitswereconvertedtoasingledigit:′0′.•Forwordsthatcontainthe′n′delimiter,weback-oﬀtothestringbeforethedelimiter.Pre-trainedwordembeddingscapturedsemanticsbetweenwords.Thewordembed-dingswereinitializedwiththe300-dimensionalpubliclyavailablepre-trainedvectors,createdusingWord2Vec[83].Whileusingsuchpre-trainedwordembeddings,somewordsarenotpartoftheoriginaltextcorpusonwhichthewordembeddingswerepre-trainedwhicharecalledOOVwords.Sometimestheseunseenwordsmightbethoserarekeytermswhichareimportantforthesemanticsofthewholetext.Inourexperiments,suchwordswerereplacedbythegeneric’UNKNOWN’tokenforthepre-trainedwordembeddingsbutwerestillusedinthecharacter-levelcomponents.Withtheemergenceofdeeplearning,multiplesoftwarepackagesprovideimplemen-tationsofdeepnetworkmodels,TheimplementationofourneuralnetworksisconductedusingKeras[82]withTheanobackend.Kerasprovidesahigh-levelAPIforneuralnetworksenablingquickexperimentation.Weusetheversion2.0ofKerasonTeslaK40mGPU.Forevaluation,weadopttheoﬃcialevaluationmetricfortheCCGsupertaggingtasktoevaluateourproposedmodelwhichis1-bestaccuracy(themostlikelypredictedsupertag).5.3.2Hyper-ParametersWetestedourneuralnetworkperformancewithvaryingparameters;theﬁnalchosenhyper-parameterswereselectedaccordingtotheperformanceonthedevelopmentsetgivingthebestaccuracy.TheLSTMhiddendimensionsweresetto256forbothwordandcharactercom-ponents.TheoptimizationalgorithmusedtotrainourmodelwastheAdamoptimizerwithaﬁxedlearningrateof0.001.Performanceonthedevelopmentsetwasmeasuredateveryepoch,andthebest-performingmodelonthedevelopmentsetwasthenusedforevaluation.Anoftenencounteredproblemintrainingneuralnetworksisover-ﬁtting.Ourmodelwasregularizedwithdropouttechniquewhichwasappliedtoeachlayeroftheinputembeddingswithaﬁxedprobabilityof0.5.TheBLSTMneuralnetworkisusedthroughoutourmodel;weutilizeonelayerof-77- 哈尔滨工业大学工学博士学位论文BLSTMtocomputecharacter-levelembeddingsandonelayerBLSTMtocomputewordlevelembeddingscombinedtogetherandthefedtoanotherBLSTM.Fortheoutputlayers,weusedtheSoftmaxactivationfunctionasthemostpopularactivationfunctionusedinsequencelabelingproblems,todecodeeachoutputateverytimestepintoprobabilitiesforeachsupertagandensurethatalltheoutputsrangefrom0to1andtheirsumis1.5.4ResultsandAnalysisInthissection,wewillcovertwosetsofexperimentsresultstoevaluatetheproposedapproachbasedonthecombinationofcharacterandwordlevelembeddingsonbothin-domainandout-of-domaindatasets,oneissupertaggingandthesecondismulti-tagging.5.4.1SupertaggingresultsWecomparedourWord-CharactercombinedBLSTMsupertaggeragainstmachinelearninganddeeplearningstate-of-the-artbasedmodelsincludingtheMEmodel[14],feedforwardNN[23],simpleRNN[45].WealsocomparedourresultswiththeBLSTMbasedproposedarchitectures:themodelproposedbyLewisetal.,[58]trainedwith2-layerdeepBLSTM,thearchitecturedevelopedbyVaswanietal.,[59]trainedonacombinedarchitectureofBLSTM,languagemodelandbeamsearchtogeneratetheﬁnaloutputs.表5-1Accuracyresultsonthedevelopmentset.ModelAccuracyC&C(goldPOS)92.60C&C(autoPOS)91.50NN91.10RNN93.07BLSTM94.1BLSTM+LM+Beam94.24Ours94.35Table5-1lists1-bestaccuracyofthemodelspredictingthebestCCGlexicalcategoryonthedevelopmentset.Inthestate-of-the-artproposedmodels,allmethodsobtaingoodresultsonthistask,theBLSTMarchitecturesreachthehighestscores,withanaccuracyoutperformingfeedforwardNNandvanillaRNNsaswell,themainreasonisthatBLSTM-78- 第5章Character-WordembeddingsfortheCCGSupertaggingtasknetworksareverystronginmodelingsequentialdataandmemorizinginformationfrombothsidesofaninputforlongperiodsoftime.Beyondthis,ournetworkoutperformsallothernetworks,achievingstate-of-the-artperformancesdemonstratingthataddingmoreinformationwiththecharacter-levelasinputforcesthemodelandimprovetheresults.表5-2Accuracyresultsonthetestset.ModelSection23GeniaC&C(goldPOS)93.3291.85C&C(autoPOS)92.0289.08NN91.5788.16RNN93.0088.27BLSTM94.30–BLSTM+LM+Beam94.5–Ours94.4688.85Table5-2showstheﬁnalresultsoftheCCGBanktestdata.ToevaluatehowwelltheCharacter-levelcombinationwithWord-basedmodeldo,wealsotestourmodelontheBio-Geniacorpusasout-of-domaindatasets.AsreportedinTable5-2,allBLSTMbasedmodelsobtaingoodresultsonthistask.Itcanbeseenthattheaccuracyhasbeenimprovedsigniﬁcantlyinbothin-domainandout-of-domaindatasets.Someofourresultsonthetestsetmayseemveryclosetoothers,thisslightlackofgeneralizationonthetestsetmaysuggestthatmoreﬁneparameteroptimizationsmayleadtoevenbetterresults.Vaswanietal.,[59]obtainthebestresultsonSection23(in-domaintestdata).Ourmodelusedasimpliﬁedarchitecturewith1-BLSTMlayerasthesequencetagger,whilethelatterusedacombinationofBLSTM,languagemodelandbeamsearchforoutputgen-erationmakingthemodelverystrong.DespitethatourmodelshowsgoodimprovementsonBio-Geniadata(+0.3%).TheonlyoneexceptionistheC&CsupertaggerasweusedBio-GENIAgold-standardCCGlexicalcategoriesdatafromRimellandClark[103]sincenogoldcategoriesareavailableintheBio-inferdataweunderperformtheirresults.TheabilityofWord-level,togetherwithCharacter-leveltoencodeinputrepresenta-tionsmakesourmodelaveryeﬀectivemodelfortheCCGsupertaggingasastructured-79- 哈尔滨工业大学工学博士学位论文predictiontaskshowingthataddingcharacterinformationasinputforcesthemodeltohandleOOVwordsforbothin-domainandout-of-domaindata.5.4.2Multi-taggingResultsWeexaminetheeﬀectivenessoftheproposedarchitectureforotherexperiments,weconductedmulti-taggingexperimentswithdiﬀerentlevelsasdescribedinChapter2.5.2.Bydoingso,wecanassignmorethanonelexicalcategorytoeachwordinaninputsentence.Wecomparedourresultsonsection00withthepreviouslyproposedmulti-taggersaslistedinTable5-3.Table5-3reportsexperimentresultsformulti-taggingwithSENTcolumnasthepercentageofsentenceswhosewordsarealltaggedcorrectlyandWORDcolumnastheaccuracyofwordstobetaggedcorrectlyifthecorrectcategoryisincludedinthesetoftheassignedcategories.Asshownintable5-3,mostofourresultsobtainedwiththecombinationofCharacterandWordembeddingsmodelarestate-of-the-artintermsofbothWORDandSENTaccuracyamongthediﬀerentlevels.Overall,theCharacterandWordembeddingsmodelalsodemonstrateitssuperiorityinmulti-taggingasinthesupertagging.表5-3Performancecomparisonofdiﬀerentmodelsformulti-taggingaccuracyonSection00fordiﬀerentlevels.OursRNNNNC&C(autopos)C&C(goldpos)WordSENTWordSENTWordSENTWordSENTWordSENT0.07597.3467.9997.3366.0796.8361.2796.3460.2797.3467.430.03098.1775.9998.1274.3997.8170.8397.0565.5097.9272.870.01098.7282.0698.7181.7098.5479.2597.6370.5298.3777.730.00599.0785.6199.0184.7998.8483.3897.8672.2498.5279.250.00199.4891.5799.4190.5499.2989.0798.2580.2499.1787.195.5SummaryInthischapter,weproposedanovelsequencelabelingframeworkforCCGsupertag-gingwithasecondaryobjective-overcomeOOVwordsinthetrainingandout-of-domaindatasets.OnebidirectionalLSTMistrainedforwordinputs,whereasanotheroneistrainedforindividualcharactersofeachword.Atthesametime,bothofthoseare-80- 第5章Character-WordembeddingsfortheCCGSupertaggingtaskcombined,inordertopredictthemostprobablelabelforeachword.ThemodelwehavedescribedisasimpleandeﬀectivebasedonCharactersandWordembeddingsapproachfortheCCGsupertaggingtask.Theobjectiveoflearningcharacterlevelembeddingsprovidesanadditionalsourceofinformationduringtrainingforunseenandinfrequentwordsinthetrainingdata.ThisadditionaltrainingobjectiveleadstomoreaccuratemodelonthesupertaggingfortheCCGgrammar.Ourmethodimprovesperformanceonin-domainandout-of-domaindatasetsonbothsupertaggingandmulti-taggingtasks.Theexperimentalresultsshowthatthemodeliseﬃcientwhilestillachievingbetterperformancesthansomestate-of-the-artmethods.-81- 哈尔滨工业大学工学博士学位论文结论Inthisthesis,weinvestigatedanddevelopeddiﬀerenttechniquesandapproachesforsupertaggingappliedtotheCCGgrammar.Inparticular,wecarriedoutonapplyingdeeplearningbasedmethods;wehaveworkedontheEnglishCCGBankcorpusfortheCCGsupertaggingproblem.Themajorcontributionsofthisthesisaresummarizedbelow:(1)AdeeplearningmethodforCCGsupertaggingisproposed.WeproposedtheuseofGRUmodels.GRUscanmemorizeandrepresentinputsequencesforlongperiodsoftime.TheexperimentalresultsshowthatBGRUareeﬃcienttotheproblemofsupertaggingfortheCCGgrammar,whilestillachievingbetterperformancesthanthecurrentstateoftheartmethods.Wealsointroducethemulti-taggingstrategytopredictCCGsupertags,whereoursupertaggercanselectmorethanoneCCGcategory.WeﬁrstobtainWORDaccuracy,andthenSENTENCEaccuracy,andinbothcases,weobtainstate-of-the-art.(2)Next,anewapproachforCCGSupertaggingbasedonLSTMnetsispresented.TheproposedmethodisbasedonBLSTMs.ABackwardLSTMisintroducedtocombineinputlookuptables.Thenthecurrentdefactostandardinsequencelabelingtasks:BLSTMbasedneuralnetisusedandaSoftmaxactivationfunctionisusedtopredicttheﬁnaloutputs.Wetestedtheeﬃciencyofourproposedmethodforbothsupertaggingandmulti-tagging.TheexperimentalresultsonthreediﬀerentdatasetsshowthattheBackward-BLSTMtechniqueiseﬃcientforthetask.Theproposedmethodstillachievesbetterperformancesthanthecurrentstate-of-the-artmethods.(3)Inchapter4,weproposedasimpleandeﬀectiveCCGsupertaggingmethodbasedonthecombinationofBLSTMandCRFmodels.Comparedwithstate-of-the-artmodelsthecombinedmodelobtainedthestate-of-the-art.ThisisachievedbytakingadvantageofBLSTMmodels,andstrengtheningthepredictionlayerwiththeCRFmodelhavemoreadvantage.Evaluationsonin-domainandout-of-domaindatasetscomparingtostate-of-the-artsdemonstratetheeﬀectivenessofourproposedmethod.(4)AndinChapter5,weproposedanovelsequencelabelingframeworkforCCGsupertaggingwithasecondaryobjective-overcomeOOVwordsinthetrainingandout-of--82- 结论domaindatasets.OneBLSTMistrainedforwordinputs,whereasanotheroneistrainedforindividualcharactersofeachword.Atthesametime,bothofthosearecombined,inordertopredictthemostprobablelabelforeachword.Theobjectiveoflearningcharacterlevelembeddingsprovidesanadditionalsourceofinformationduringtrainingforunseenandinfrequentwordsinthetrainingdata.ThisadditionaltrainingobjectiveleadstomoreaccuratemodelontheCCGsupertaggingtask.Insummary,allthemodelsdescribedinthisdissertationareverysimpleandeﬃcientforautomaticCCGsupertaggingofEnglishlanguagetextevenwiththepresenceofrareandunseenwords.Themodelshavemuchhigheraccuracythanthemachinelearningmodelsandarestate-of-the-art.EvenourproposedmodelsprovetheireﬀectivenessfortheCCGsupertaggingtask,thebasicproblemwithGRUsandLSTMsnetworksisthattheyuseinternalmemorywithgatingmechanismwhichallowsthememorytodeleteandupdateovertheinputsequence.PredictingCCGsupertagsbymakingmultiplecomputationsstepsoveraninputstorymaybeverybeneﬁcialtoourtaskbyintegratingthepreviouslylearnedinformationfrommultiplesentencesasaglobalmemorywhichcanbedonewithend-to-endmemorynetworksviamultiplehopsoverthememory.Weleavethisasfuturework.Furtherworkinthisareacouldbedoneinseveraldirections.Someofthesecanbetakenupasimmediategoals,andotherscanbeconsideredaslong-termgoals.RegardingBLSTMbasedCCGsupertaggingmodels,therearesomepossibleextensionsthatmustbeentakenintoconsiderationasimmediategoals,andwethinkthattheyshouldbestudiedsuchasmultidimensionalLSTMs.WealsoplantoexploresomeotherdeeplearningalgorithmssuchasConvolutionalNeuralNetworks(CNNs)thathavebeenproventobeverybeneﬁcialtocomposewordrepresentationsfromcharactersandtoencodecontextinformation[119].Moreover,theapplicationofreinforcementlearningwhichaimtoautomaticallydeterminetheidealbehaviorwithinaspeciﬁccontext,tomaximizetheperformanceandself-trainingwouldbeveryadvantageous,tobuildmoreaccuratesupertaggers.Asonelong-termgoal,ItwouldbebeneﬁcialandusefultoapplyoursupertaggerstosomeNLPtaskssuchasparsing,MT,andQAsystems.Also,itwouldbeinterestingtointegrateoursupertaggerswithsomeexistingparserssuchastheC&Cparserandtotestoursupertaggersforseverallanguageswithdiﬀerentdatasets.-83- 哈尔滨工业大学工学博士学位论文参考文献[1]MarcusMP,MarcinkiewiczMA,SantoriniB.BuildingalargeannotatedcorpusofEnglish:ThePennTreebank[J].Computationallinguistics,1993,19(2):313–330.[2]NadeauD,SekineS.Asurveyofnamedentityrecognitionandclassiﬁcation[J].LingvisticaeInvestigationes,2007,30(1):3–26.[3]PiskorskiJ,YangarberR.InformationExtraction:Past,PresentandFuture[J].Multi-source,MultilingualInformationExtractionandSummarization,2013:23–49.[4]BangaloreS.Complexityoflexicaldescriptionsanditsrelevancetopartialpars-ing[D].[S.l.]:UniversityofPennsylvania,1997:77–79.[5]SrinivasB."Almostparsing"techniqueforlanguagemodeling[C]//SpokenLan-guage,1996.ICSLP96.Proceedings.,FourthInternationalConferenceon:Vol2.1996:1173–1176.[6]ChandrasekarR,DoranC,SrinivasB.Motivationsandmethodsfortextsimpliﬁca-tion[C]//Proceedingsofthe16thconferenceonComputationallinguistics-Volume2.1996:1041–1044.[7]BangaloreS,JoshiAK.Supertagging:Anapproachtoalmostparsing[J].Compu-tationallinguistics,1999,25(2):237–265.[8]MatsuzakiT,MiyaoY,TsujiiJ.ProbabilisticCFGwithlatentannotations[C]//Proceedingsofthe43rdAnnualMeetingoftheAssociationforComputationalLinguistics(ACL’05).2005:75–82.[9]ClarkS.Supertaggingforcombinatorycategorialgrammar[C]//ProceedingsoftheSixthInternationalWorkshoponTreeAdjoiningGrammarandRelatedFrameworks(TAG+6).2002:19–24.[10]SteedmanM,BaldridgeJ.Combinatorycategorialgrammar[J].Encyclopediaoflanguageandlinguistics,2006,2:610–622.[11]CollobertR,WestonJ,BottouL,etal.Naturallanguageprocessing(almost)fromscratch[J].JournalofMachineLearningResearch,2011,12(Aug):2493–2537.-84- 参考文献[12]DandapatS.Part-of-speechtaggingforBengali[D].[S.l.]:DepartmentofComputerScienceandEngineeringIndianInstituteofTechnology,KharagpurJanuary,2009:3–7.[13]JoshiAK,SrinivasB.Disambiguationofsuperpartsofspeech(orsupertags):Al-mostparsing[C]//Proceedingsofthe15thconferenceonComputationallinguistics-Volume1.1994:154–160.[14]ClarkS,CurranJR.Wide-coverageeﬃcientstatisticalparsingwithCCGandlog-linearmodels[J].ComputationalLinguistics,2007,33(4):493–552.[15]AuliM.CCG-basedmodelsforstatisticalmachinetranslation[D].[S.l.]:Ph.D.Proposal,UniversityofEdinburgh,2009:11–15.[16]NadejdeM,ReddyS,SennrichR,etal.PredictingTargetLanguageCCGSupertagsImprovesNeuralMachineTranslation[C]//ProceedingsoftheSecondConferenceonMachineTranslation.2017:68–79.[17]ClarkS,SteedmanM,CurranJR.Object-extractionandquestion-parsingusingCCG[C]//Proceedingsofthe2004ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.2004:111–118.[18]Bar-HillelY.Aquasi-arithmeticalnotationforsyntacticdescription[J].Language,1953,29(1):47–58.[19]SteedmanM.Categorialgrammar[J].TechnicalReports(CIS),1992:466.[20]AdesAE,SteedmanMJ.Ontheorderofwords[J].Linguisticsandphilosophy,1982,4(4):517–558.[21]NakornTN.CombinatoryCategorialGrammarParserinNaturalLanguageToolkit[J],2009:1–19.[22]ZhangY,ClarkS.Shift-reduceCCGparsing[C]//Proceedingsofthe49thAn-nualMeetingoftheAssociationforComputationalLinguistics:HumanLanguageTechnologies-Volume1.2011:683–692.[23]LewisM,SteedmanM.ImprovedCCGparsingwithsemi-supervisedsupertag-ging[J].TransactionsoftheAssociationforComputationalLinguistics,2014,2:327–338.[24]LewisM,HeL,ZettlemoyerL.Jointa*ccgparsingandsemanticrolelabelling[C]//Proceedingsofthe2015ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.2015:1444–1454.-85- 哈尔滨工业大学工学博士学位论文[25]ZettlemoyerLS,CollinsM.OnlineLearningofRelaxedCCGGrammarsforParsingtoLogicalForm[J].EMNLP-CoNLL2007,2007:678.[26]BaralC,DzifcakJ,SonTC.Usinganswersetprogrammingandlambdacalcu-lustocharacterizenaturallanguagesentenceswithnormativesandexceptions[C]//Proceedingsofthe23rdnationalconferenceonArtiﬁcialintelligence-Volume2.2008:818–823.[27]BirchA,OsborneM,KoehnP.CCGsupertagsinfactoredstatisticalmachinetrans-lation[C]//ProceedingsoftheSecondWorkshoponStatisticalMachineTransla-tion.2007:9–16.[28]SteedmanM.Thesyntacticprocess[M].[S.l.]:TheMITPress,2000:31–34.[29]BangaloreS,JoshiAK.Supertagging:UsingComplexLexicalDescriptionsinNaturalLanguageProcessing[M].[S.l.]:TheMITPress,2010:219–354.[30]AmbatiBR,DeoskarT,SteedmanM.UsingCCGcategoriestoimproveHindide-pendencyparsing[C]//Proceedingsofthe51stAnnualMeetingoftheAssociationforComputationalLinguistics(Volume2:ShortPapers):Vol2.2013:604–609.[31]JinlongZ,XipengQ.CHINESECCGPARSINGBASEDONA*SEARCHANDSUPERTAGGING[J].ComputerApplicationsandSoftware,2014,9:059.[32]ChenJ,BangaloreS,CollinsM,etal.Rerankingann-gramsupertagger[C]//ProceedingsoftheSixthInternationalWorkshoponTreeAdjoiningGrammarandRelatedFrameworks(TAG+6).2002:259–268.[33]SrinivasB.Performanceevaluationofsupertaggingforpartialparsing[C]//ProceedingsoftheFifthInternationalWorkshoponParsingTechnologies.1997:187–198.[34]ChenJ.Towardseﬃcientstatisticalparsingusinglexicalizedgrammaticalinfor-mation[D].[S.l.]:UniversityofDelaware,2001:7–54.[35]RATNAPARKHIA.MaximumEntropyModelforPart-Of-SpeechTagging[J].Proc.EmpiricalMethodforNaturalLanguageProcessings,1996:133–142.[36]BrantsT.TnT:astatisticalpart-of-speechtagger[C]//ProceedingsofthesixthconferenceonAppliednaturallanguageprocessing.2000:224–231.[37]RatnaparkhiA.Maximumentropymodelsfornaturallanguageambiguityresolu-tion[J].PhDthesis.UniversityofPennsylvania,1998:32–36.-86- 参考文献[38]HockenmaierJ.Dataandmodelsforstatisticalparsingwithcombinatorycategorialgrammar[D].[S.l.]:UniversityofEdinburgh,2003:41–107.[39]HockenmaierJ,SteedmanM.CCGbank:acorpusofCCGderivationsanddepen-dencystructuresextractedfromthePennTreebank[J].ComputationalLinguistics,2007,33(3):355–396.[40]ClarkS,CurranJR.Theimportanceofsupertaggingforwide-coverageCCGparsing[C]//Proceedingsofthe20thinternationalconferenceonComputationalLinguistics:Vol282.2004.[41]TurianJ,RatinovL,BengioY.Wordrepresentations:asimpleandgeneralmethodforsemi-supervisedlearning[C]//Proceedingsofthe48thannualmeetingoftheassociationforcomputationallinguistics.2010:384–394.[42]CurranJR,ClarkS,VadasD.Multi-taggingforlexicalized-grammarparsing[C]//Proceedingsofthe21stInternationalConferenceonComputationalLinguisticsandthe44thannualmeetingoftheAssociationforComputationalLinguistics.2006:697–704.[43]RumelhartDE,HintonGE,WilliamsRJ.Learningrepresentationsbyback-propagatingerrors[J].nature,1986,323(6088):533.[44]ElmanJL.Findingstructureintime[J].Cognitivescience,1990,14(2):179–211.[45]XuW,AuliM,ClarkS.CCGsupertaggingwitharecurrentneuralnetwork[C]//Proceedingsofthe53rdAnnualMeetingoftheAssociationforComputationalLinguisticsandthe7thInternationalJointConferenceonNaturalLanguagePro-cessing(Volume2:ShortPapers):Vol2.2015:250–255.[46]SchusterM,PaliwalKK.Bidirectionalrecurrentneuralnetworks[J].IEEETrans-actionsonSignalProcessing,1997,45(11):2673–2681.[47]SchusterM.Onsupervisedlearningfromsequentialdatawithapplicationsforspeechrecognition[J].Daktarodisertacija,NaraInstituteofScienceandTechnol-ogy,1999:37–39.[48]BaldiP,BrunakS,FrasconiP,etal.Exploitingthepastandthefutureinproteinsecondarystructureprediction[J].Bioinformatics,1999,15(11):937–946.[49]XuW.LSTMshift-reduceCCGparsing[C]//Proceedingsofthe2016ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.2016:1754–1764.-87- 哈尔滨工业大学工学博士学位论文[50]BengioY,SimardP,FrasconiP.Learninglong-termdependencieswithgradientdescentisdiﬃcult[J].IEEEtransactionsonneuralnetworks,1994,5(2):157–166.[51]WaibelA,HanazawaT,HintonG,etal.Phonemerecognitionusingtime-delayneuralnetworks[C]//Readingsinspeechrecognition.1990:393–404.[52]LinT,HorneBG,TinoP,etal.Learninglong-termdependenciesinNARXrecurrentneuralnetworks[J].IEEETransactionsonNeuralNetworks,1996,7(6):1329–1338.[53]ElHihiS,BengioY.Hierarchicalrecurrentneuralnetworksforlong-termdepen-dencies[C]//Advancesinneuralinformationprocessingsystems.1996:493–499.[54]JaegerH,LukoševičiusM,PopoviciD,etal.Optimizationandapplicationsofechostatenetworkswithleaky-integratorneurons[J].Neuralnetworks,2007,20(3):335–352.[55]MartensJ,SutskeverI.Learningrecurrentneuralnetworkswithhessian-freeopti-mization[C]//Proceedingsofthe28thInternationalConferenceonMachineLearn-ing(ICML-11).2011:1033–1040.[56]PascanuR,MikolovT,BengioY.Onthediﬃcultyoftrainingrecurrentneuralnetworks[C]//InternationalConferenceonMachineLearning.2013:1310–1318.[57]HochreiterS,SchmidhuberJ.Longshort-termmemory[J].Neuralcomputation,1997,9(8):1735–1780.[58]LewisM,LeeK,ZettlemoyerL.Lstmccgparsing[C]//Proceedingsofthe2016ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies.2016:221–231.[59]VaswaniA,BiskY,SagaeK,etal.Supertaggingwithlstms[C]//Proceedingsofthe2016ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies.2016:232–237.[60]ChenJ,ShankerVK.AutomatedextractionofTAGsfromthePennTreebank[G]//Newdevelopmentsinparsingtechnology.[S.l.]:Springer,2004:73–89.[61]XiaF,PalmerM,JoshiA.Auniformmethodofgrammarextractionanditsapplications[C]//Proceedingsofthe2000JointSIGDATconferenceonEmpiricalmethodsinnaturallanguageprocessingandverylargecorpora:heldinconjunctionwiththe38thAnnualMeetingoftheAssociationforComputationalLinguistics-Volume13.2000:53–62.-88- 参考文献[62]BurkeM,LamO,CahillA,etal.Treebank-basedacquisitionofaChineselexical-functionalgrammar[C]//Proceedingsofthe18thPaciﬁcAsiaConferenceonLan-guage,InformationandComputation.2004:161–172.[63]MiyaoY,NinomiyaT,TsujiiJ.Corpus-orientedgrammardevelopmentforac-quiringahead-drivenphrasestructuregrammarfromthepenntreebank[C]//InternationalConferenceonNaturalLanguageProcessing.2004:684–693.[64]ChoK,vanMerrienboerB,GulcehreC,etal.LearningPhraseRepresentationsusingRNNEncoder–DecoderforStatisticalMachineTranslation[C]//Proceedingsofthe2014ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP).2014:1724–1734.[65]HintonGE,OsinderoS,TehY-W.Afastlearningalgorithmfordeepbeliefnets[J].Neuralcomputation,2006,18(7):1527–1554.[66]BengioY,others.LearningdeeparchitecturesforAI[J].Foundationsandtrends®inMachineLearning,2009,2(1):1–127.[67]RanzatoM,SzummerM.Semi-supervisedlearningofcompactdocumentrepresen-tationswithdeepnetworks[C]//Proceedingsofthe25thinternationalconferenceonMachinelearning.2008:792–799.[68]CollobertR,WestonJ.Auniﬁedarchitecturefornaturallanguageprocessing:Deepneuralnetworkswithmultitasklearning[C]//Proceedingsofthe25thinternationalconferenceonMachinelearning.2008:160–167.[69]MnihA,HintonGE.Ascalablehierarchicaldistributedlanguagemodel[C]//Advancesinneuralinformationprocessingsystems.2009:1081–1088.[70]WestonJ,RatleF,MobahiH,etal.Deeplearningviasemi-supervisedembed-ding[G]//NeuralNetworks:TricksoftheTrade.[S.l.]:Springer,2012:639–655.[71]ArelI,RoseDC,KarnowskiTP.Deepmachinelearning-anewfrontierinarti-ﬁcialintelligenceresearch[researchfrontier][J].IEEEcomputationalintelligencemagazine,2010,5(4):13–18.[72]YuD,DengL.Deeplearninganditsapplicationstosignalandinformationprocess-ing[exploratorydsp][J].IEEESignalProcessingMagazine,2011,28(1):145–154.[73]HintonG,DengL,YuD,etal.Deepneuralnetworksforacousticmodelinginspeechrecognition:Thesharedviewsoffourresearchgroups[J].IEEESignalProcessingMagazine,2012,29(6):82–97.-89- 哈尔滨工业大学工学博士学位论文[74]BengioY,CourvilleA,VincentP.Representationlearning:Areviewandnewperspectives[J].IEEEtransactionsonpatternanalysisandmachineintelligence,2013,35(8):1798–1828.[75]HammerB.Ontheapproximationcapabilityofrecurrentneuralnetworks[J].Neu-rocomputing,2000,31(1-4):107–123.[76]JordanM.Attractordynamicsandparallelisminaconnectionistsequentialma-chine[C]//EighthAnnualConferenceoftheCognitiveScienceSociety,1986.1986:513–546.[77]LangKJ,WaibelAH,HintonGE.Atime-delayneuralnetworkarchitectureforisolatedwordrecognition[J].Neuralnetworks,1990,3(1):23–43.[78]JaegerH.The“echostate”approachtoanalysingandtrainingrecurrentneuralnetworks-withanerratumnote[J].Bonn,Germany:GermanNationalResearchCenterforInformationTechnologyGMDTechnicalReport,2001,148(34):13.[79]HochreiterS.UntersuchungenzudynamischenneuronalenNetzen[J].Diploma,TechnischeUniversitätMünchen,1991,91:1.[80]HochreiterS,BengioY,FrasconiP,etal.Gradientﬂowinrecurrentnets:thediﬃcultyoflearninglong-termdependencies.(2001)[J].Citedon,2001:114.[81]BengioY,DucharmeR,VincentP,etal.Aneuralprobabilisticlanguagemodel[J].Journalofmachinelearningresearch,2003,3(Feb):1137–1155.[82]CholletF.Keras:Theano-baseddeeplearninglibrary[J].Code:https://github.com/fchollet.Documentation:http://keras.io,2015.[83]MikolovT,SutskeverI,ChenK,etal.Distributedrepresentationsofwordsandphrasesandtheircompositionality[C]//Advancesinneuralinformationprocessingsystems.2013:3111–3119.[84]ZeilerMD.ADADELTA:AnAdaptiveLearningRateMethod[J].CoRR,2012,abs/1212.5701.[85]HintonGE,SrivastavaN,KrizhevskyA,etal.Improvingneuralnetworksbypreventingco-adaptationoffeaturedetectors[J].arXivpreprintarXiv:1207.0580,2012.[86]ChenJ,BangaloreS,CollinsM,etal.Rerankingann-gramsupertagger[C]//ProceedingsoftheSixthInternationalWorkshoponTreeAdjoiningGrammarandRelatedFrameworks(TAG+6).2002:259–268.-90- 参考文献[87]GersF.Longshort-termmemoryinrecurrentneuralnetworks[D].[S.l.]:Unpub-lishedPhDdissertation,EcolePolytechniqueFédéraledeLausanne,Lausanne,Switzerland,2001:15–20.[88]PetersM,AmmarW,BhagavatulaC,etal.Semi-supervisedsequencetaggingwithbidirectionallanguagemodels[C]//Proceedingsofthe55thAnnualMeetingoftheAssociationforComputationalLinguistics(Volume1:LongPapers):Vol1.2017:1756–1765.[89]PlankB,SøgaardA,GoldbergY.MultilingualPart-of-SpeechTaggingwithBidi-rectionalLongShort-TermMemoryModelsandAuxiliaryLoss[C]//Proceedingsofthe54thAnnualMeetingoftheAssociationforComputationalLinguistics(Vol-ume2:ShortPapers):Vol2.2016:412–418.[90]LimsopathamN,CollierN.BidirectionalLSTMforNamedEntityRecognitioninTwitterMessages[C]//Proceedingsofthe2ndWorkshoponNoisyUser-generatedText(WNUT).2016:145–152.[91]YanS,HardmeierC,NivreJ.MultilingualNamedEntityRecognitionusingHy-bridNeuralNetworks[C]//TheSixthSwedishLanguageTechnologyConference(SLTC).2016.[92]TangD,QinB,FengX,etal.EﬀectiveLSTMsforTarget-DependentSentimentClassiﬁcation[C]//ProceedingsofCOLING2016,the26thInternationalConfer-enceonComputationalLinguistics:TechnicalPapers.2016:3298–3307.[93]WangY,HuangM,ZhaoL,etal.Attention-basedlstmforaspect-levelsentimentclassiﬁcation[C]//Proceedingsofthe2016ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.2016:606–615.[94]YangM,TuW,WangJ,etal.AttentionBasedLSTMforTargetDependentSenti-mentClassiﬁcation.[C]//AAAI.2017:5013–5014.[95]GravesA,MohamedA-r,HintonG.Speechrecognitionwithdeeprecurrentneuralnetworks[C]//Acoustics,speechandsignalprocessing(icassp),2013ieeeinterna-tionalconferenceon.2013:6645–6649.[96]WangP,QianY,SoongFK,etal.Part-of-speechtaggingwithbidirectionallongshort-termmemoryrecurrentneuralnetwork[J].arXivpreprintarXiv:1510.06168,2015.-91- 哈尔滨工业大学工学博士学位论文[97]ChiuJP,NicholsE.NamedEntityRecognitionwithBidirectionalLSTM-CNNs[J].TransactionsoftheAssociationforComputationalLinguistics,2016,4:357–370.[98]GravesA,SchmidhuberJ.FramewisephonemeclassiﬁcationwithbidirectionalLSTMandotherneuralnetworkarchitectures[J].NeuralNetworks,2005,18(5-6):602–610.[99]LingW,DyerC,BlackAW,etal.Two/toosimpleadaptationsofword2vecforsyntaxproblems[C]//Proceedingsofthe2015ConferenceoftheNorthAmeri-canChapteroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies.2015:1299–1304.[100]HonnibalM,NothmanJ,CurranJR.EvaluatingastatisticalCCGparseronWikipedia[C]//Proceedingsofthe2009WorkshoponThePeople’sWebMeetsNLP:CollaborativelyConstructedSemanticResources.2009:38–41.[101]PyysaloS,GinterF,HeimonenJ,etal.BioInfer:acorpusforinformationextractioninthebiomedicaldomain[J].BMCbioinformatics,2007,8(1):50.[102]PhamV,BlucheT,KermorvantC,etal.Dropoutimprovesrecurrentneuralnetworksforhandwritingrecognition[C]//FrontiersinHandwritingRecognition(ICFHR),201414thInternationalConferenceon.2014:285–290.[103]RimellL,ClarkS.Adaptingalexicalized-grammarparsertocontrastingdo-mains[C]//ProceedingsoftheConferenceonEmpiricalMethodsinNaturalLan-guageProcessing.2008:475–484.[104]CharniakE,CarrollG,AdcockJ,etal.Taggersforparsers[J].ArtiﬁcialIntelligence,1996,85(1-2):45–57.[105]LaﬀertyJ,McCallumA,PereiraFC.ConditionalRandomFields:ProbabilisticModelsforSegmentingandLabelingSequenceData[C]//Proceedingsofthe18thInternationalConferenceonMachineLearning:Vol951.2001:282–289.[106]McCallumA,FreitagD,PereiraFC.MaximumEntropyMarkovModelsforInformationExtractionandSegmentation.[C]//Icml:Vol17.2000:591–598.[107]PintoD,McCallumA,WeiX,etal.Tableextractionusingconditionalrandomﬁelds[C]//Proceedingsofthe26thannualinternationalACMSIGIRconferenceonResearchanddevelopmentininformaionretrieval.2003:235–242.-92- 参考文献[108]ShaF,PereiraF.Shallowparsingwithconditionalrandomﬁelds[C]//Proceedingsofthe2003ConferenceoftheNorthAmericanChapteroftheAssociationforCom-putationalLinguisticsonHumanLanguageTechnology-Volume1.2003:134–141.[109]SuttonC,McCallumA.Anintroductiontoconditionalrandomﬁeldsforrelationallearning:Vol2[M].[S.l.]:Introductiontostatisticalrelationallearning.MITPress,2006:9–21.[110]HuangZ,XuW,YuK.BidirectionalLSTM-CRFmodelsforsequencetagging[J].arXivpreprintarXiv:1508.01991,2015.[111]KingmaDP,BaJ.Adam:Amethodforstochasticoptimization[J].arXivpreprintarXiv:1412.6980,2015.[112]BastienF,LamblinP,PascanuR,etal.Theano:newfeaturesandspeedimprove-ments[J].arXivpreprintarXiv:1211.5590,2012.[113]BlitzerJ,DredzeM,PereiraF.Biographies,bollywood,boom-boxesandblenders:Domainadaptationforsentimentclassiﬁcation[C]//Proceedingsofthe45thannualmeetingoftheassociationofcomputationallinguistics.2007:440–447.[114]DauméIIIH,JagarlamudiJ.Domainadaptationformachinetranslationbyminingunseenwords[C]//Proceedingsofthe49thAnnualMeetingoftheAssociationforComputationalLinguistics:HumanLanguageTechnologies:shortpapers-Volume2.2011:407–412.[115]KimY.ConvolutionalNeuralNetworksforSentenceClassiﬁcation[C]//Proceedingsofthe2014ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP).2014:1746–1751.[116]TangD,WeiF,YangN,etal.Learningsentiment-speciﬁcwordembeddingfortwittersentimentclassiﬁcation[C]//Proceedingsofthe52ndAnnualMeetingoftheAssociationforComputationalLinguistics(Volume1:LongPapers):Vol1.2014:1555–1565.[117]JoshiA,TripathiV,PatelK,etal.AreWordEmbedding-basedFeaturesUsefulforSarcasmDetection?[C]//Proceedingsofthe2016ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.2016:1006–1011.-93- 哈尔滨工业大学工学博士学位论文[118]LampleG,BallesterosM,SubramanianS,etal.NeuralArchitecturesforNamedEntityRecognition[C]//ProceedingsofNAACL-HLT.2016:260–270.[119]YuX,FalenskaA,VuNT.AGeneral-PurposeTaggerwithConvolutionalNeuralNetworks[C]//ProceedingsoftheFirstWorkshoponSubwordandCharacterLevelModelsinNLP.2017:124–129.-94- 攻读博士学位期间发表的论文及其他成果攻读博士学位期间发表的论文及其他成果（一）发表的学术论文[1]RekiaKadari，YuZhang,WeinanZhang,TingLiu.CCGsu-pertaggingwithbidirectionallongshort-termmemorynetworks[J].NaturalLanguageEngineering,2018,24(1):77-90.[Online].Available:https://www.cambridge.org/core/journals/natural-language-engineering/article/ccg-supertagging-with-bidirectional-long-shortterm-memory-networks/8C06FF6F717744B29C9BD330CABACD16.(SCI,IF=1.065).[2]RekiaKadari，YuZhang,WeinanZhang,TingLiu.CCGSu-pertaggingviaBidirectionalLSTM-CRFNeuralArchitecture.[J].Neurocomputing,2018,283:31-37.[Online].Available:https://www.sciencedirect.com/science/article/pii/S0925231217319124.(SCI,IF=3.317).[3]RekiaKadari，YuZhang,WeinanZhang,TingLiu.GatedRecurrentUnitmodelforaSequenceTaggingproblem.[J].HighTechnologyLetters,2018.(EI-index,Accepted).-95- 致谢致谢MyheartfeltgratitudegoestotheAlmightyGod,ALLAHforthewisdom,knowl-edge,abilityandthestrengthgivenmefromthebeginningofmystudiestothecompletionofthiswork.Secondly,IamgratefultomysupervisorProf.LiuTingforgivingmetheopportunitytojointheSCIRlaboratory.HisguidanceandmotivationinspiredmethroughtheentiredurationofmyPh.D.studies.IamindebtedtomyassociatesupervisorProf.ZhangYuforhispatience,advice,andsupervisionwithcontinuoussupportandhelpfuldiscussionsthroughoutallthework.Ihavegreatlybeneﬁtedfromhisideasandrecommendations.IamforevergratefulSIR!Ioweaparticulardebttomyparentsfortheirpatience,support,andencouragementsduringthisresearchandallmylife.Myheartythanksalsogotoallmyfamilymemberswhoencouragedmeandprayedformethroughoutthetimeofmyresearch.IalsoextendmyappreciationtoProf.QinBing,Prof.Chewanxiang,Zhangwei-nanandZhaoYanyanfortheirguidanceandhelp.SpecialthanksgotoLiuYijia,GuoMaosheng,QingyuYin,WangXuxiang,JiangGuo,WangBinghao,QiLeandallmylab-matesfromtheSCIRlaboratory,especiallyQAgroup.Thankyouverymuch.TomyfriendLydia,thankyouforlistening,oﬀeringmeadvice,andsupportingmethroughthisentireprocess.Ithankallwhoinonewayoranothercontributedinthecompletionofthisthesis.IwouldliketodedicatethisworktomymotherMrs.BelgacemKheirawhosedreamsformehaveresultedinthisachievementandwithoutherlove,support,andblessings;IwouldnothavebeenwhereIamtodayandwhatIamtoday.Youhavealwaysbeenpresentforme,youaremyBestfriend.Thankyouwithallmyheart.Thisoneisforyoumom!RekiaKADARI-97- 哈尔滨工业大学工学博士学位论文个人简历•Name:RekiaKadari•Nationality:Algerian•Languages:English&Arabic&French•DateofBirth:27-May-1990•Sex:Female•Maritalstatus:Single•PresentAddress:HarbinInstituteofTechnology,Harbin,Heilongjiang,150001•Telephone:15776462745•Email:rekia@ir.hit.edu.cnProfessionalqualiﬁcations:•2014-2018:(Ph.D.inSocialComputingandInformationRetrievallaboratory),GraduateStudent,SchoolofComputerScienceandTechnology,HarbinInstituteofTechnology,Harbin,China.•2011-2013:(M.Sc.inComputerScience),M.Sc.inComputerScienceFacultyofScienceandTechnology,June2013,UniversityDr.TaharMoulay,Saida,Algeria.•2008-2011:(B.Sc.inComputerScience),b.Sc.inComputerScienceFacultyofScienceandTechnology,June2011,UniversityDr.TaharMoulay,Saida,Algeria.Subjectstaught:1.NaturalLanguageProcessing2.ArtiﬁcialIntelligence3.MachineLearning4.DeepLearning5.Sequencelabeling6.CCGsupertagging-98-

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 114



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

基于深度学习模型的CCG超标注.pdf

基于深度学习模型的CCG超标注.pdf

最近更新

大家都在看

相关文章

相关标签