资源描述:
《Dueling Network Architectures for Deep Reinforcement Learning》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、DuelingNetworkArchitecturesforDeepReinforcementLearningZiyuWangZIYU@GOOGLE.COMTomSchaulSCHAUL@GOOGLE.COMMatteoHesselMTTHSS@GOOGLE.COMHadovanHasseltHADO@GOOGLE.COMMarcLanctotLANCTOT@GOOGLE.COMNandodeFreitasNANDODEFREITAS@GMAIL.COMGoogleDeepMind,London,UKAbstractInspiteofthis,mostoftheapproac
2、hesforRLusestandardneuralnetworks,suchasconvolutionalnetworks,MLPs,InrecentyearstherehavebeenmanysuccessesLSTMsandautoencoders.Thefocusintheserecentad-ofusingdeeprepresentationsinreinforcementvanceshasbeenondesigningimprovedcontrolandRLal-learning.Still,manyoftheseapplicationsusegorithms,or
3、simplyonincorporatingexistingneuralnet-conventionalarchitectures,suchasconvolutionalworkarchitecturesintoRLmethods.Here,wetakeanal-networks,LSTMs,orauto-encoders.Inthispa-ternativebutcomplementaryapproachoffocusingprimar-per,wepresentanewneuralnetworkarchitec-ilyoninnovatinganeuralnetworkar
4、chitecturethatisbettertureformodel-freereinforcementlearning.Oursuitedformodel-freeRL.Thisapproachhasthebenefitthatduelingnetworkrepresentstwoseparateestima-thenewnetworkcanbeeasilycombinedwithexistingandtors:oneforthestatevaluefunctionandoneforfuturealgorithmsforRL.Thatis,thispaperadvancesa
5、newthestate-dependentactionadvantagefunction.network(Figure1),butusesalreadypublishedalgorithms.Themainbenefitofthisfactoringistogeneral-izelearningacrossactionswithoutimposinganychangetotheunderlyingreinforcementlearningTheproposednetworkarchitecture,whichwenamethedu-algorithm.Ourresultssho
6、wthatthisarchitec-elingarchitecture,explicitlyseparatestherepresentationoftureleadstobetterpolicyevaluationinthepres-statevaluesand(state-dependent)actionadvantages.Theenceofmanysimilar-valuedactions.Moreover,duelingarchitectureconsistsoftwostreamsthatrepresenttheduelingarchitectureenableso
7、urRLagenttothevalueandadvantagefunctions,whilesharingacommonoutperformthestate-of-the-artontheAtari2600domain.1.IntroductionarXiv:1511.06581v3[cs.LG]5Apr2016Overthepastyears,deeplearninghascontributedtodra-maticadvancesinscalabilityandperformanceofmachin