资源描述:
《基于深度强化学习的flappy-bird.docx》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、SHANGHAIJIAOTONGUNIVERSITYProjectTitle:PlayingtheGameofFlappyBirdwithDeepReinforcementLearningGroupNumber:G-07GroupMembers:WangWenqing116032910080GaoXiaoning116032910032QianChen116032910073Contents1Introduction12DeepQ-learningNetwork22.1Q-learning22.1.1ReinforcementLearn
2、ingProblem22.1.2Q-learningFormulation[6]32.2DeepQ-learningNetwork42.3InputPre-processing52.4ExperienceReplayandStability52.5DQNArchitectureandAlgorithm63Experiments73.1ParametersSettings73.2ResultsAnalysis94Conclusion115References12IPlayingtheGameofFlappyBirdwithDeepRein
3、forcementLearningPlayingtheGameofFlappyBirdwithDeepReinforcementLearningAbstractLettingmachineplaygameshasbeenoneofthepopulartopicsinAItoday.Usinggametheoryandsearchalgorithmstoplaygamesrequiresspecificdomainknowledge,lackingscalability.Inthisproject,weutilizeaconvolutio
4、nalneuralnetworktorepresenttheenvironmentofgames,updatingitsparameterswithQ-learning,areinforcementlearningalgorithm.WecallthisoverallalgorithmasdeepreinforcementlearningorDeepQ-learningNetwork(DQN).Moreover,weonlyusetherawimagesofthegameofflappybirdastheinputofDQN,which
5、guaranteesthescalabilityforothergames.Aftertrainingwithsometricks,DQNcangreatlyoutperformhumanbeings.1IntroductionFlappybirdisapopulargameintheworldrecentyears.Thegoalofplayersisguidingthebirdonscreentopassthegapconstructedbytwopipesbytappingscreen.Iftheplayertapthescree
6、n,thebirdwilljumpup,andiftheplayerdonothing,thebirdwillfalldownataconstantrate.Thegamewillbeoverwhenthebirdcrashonpipesorground,whilethescoreswillbeaddedonewhenthebirdpassthroughthegap.InFigure1,therearethreedifferentstateofbird.Figure1(a)representsthenormalflightstate,(
7、b)representsthecrashstate,(c)representsthepassingstate.(a)(b)(c)Figure1:(a)normalflightstate(b)crashstate(c)passingstateOurgoalinthispaperistodesignanagenttoplayFlappybirdautomaticallywiththesameinputcomparingtohumanplayer,whichmeansthatweuserawimagesandrewardstoteachour
8、agenttolearnhowtoplaythisgame.Inspiredby[1],weproposeadeepreinforcementlearningarchitecturetolearnandpl