资源描述:
《Mastering the Game of Go with Deep Neural Networks and Tree Search.pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、MasteringtheGameofGowithDeepNeuralNetworksandTreeSearchDavidSilver1*,AjaHuang1*,ChrisJ.Maddison1,ArthurGuez1,LaurentSifre1,GeorgevandenDriessche1,JulianSchrittwieser1,IoannisAntonoglou1,VedaPanneershelvam1,MarcLanctot1,SanderDieleman1,DominikGrewe1,JohnNham2,N
2、alKalchbrenner1,IlyaSutskever2,TimothyLillicrap1,MadeleineLeach1,KorayKavukcuoglu1,ThoreGraepel1,DemisHassabis1.1GoogleDeepMind,5NewStreetSquare,LondonEC4A3TW.2Google,1600AmphitheatreParkway,MountainViewCA94043.*Theseauthorscontributedequallytothiswork.Corresp
3、ondenceshouldbeaddressedtoeitherDavidSilver(davidsilver@google.com)orDemisHassabis(demishassabis@google.com).ThegameofGohaslongbeenviewedasthemostchallengingofclassicgamesforar-tificialintelligenceduetoitsenormoussearchspaceandthedifficultyofevaluatingboardposit
4、ionsandmoves.WeintroduceanewapproachtocomputerGothatusesvaluenetworkstoevaluateboardpositionsandpolicynetworkstoselectmoves.Thesedeepneuralnetworksaretrainedbyanovelcombinationofsupervisedlearningfromhumanexpertgames,andreinforcementlearningfromgamesofself-pla
5、y.Withoutanylookaheadsearch,theneuralnetworksplayGoatthelevelofstate-of-the-artMonte-Carlotreesearchprogramsthatsim-ulatethousandsofrandomgamesofself-play.WealsointroduceanewsearchalgorithmthatcombinesMonte-Carlosimulationwithvalueandpolicynetworks.Usingthisse
6、archal-gorithm,ourprogramAlphaGoachieveda99.8%winningrateagainstotherGoprograms,anddefeatedtheEuropeanGochampionby5gamesto0.Thisisthefirsttimethatacom-puterprogramhasdefeatedahumanprofessionalplayerinthefull-sizedgameofGo,afeatpreviouslythoughttobeatleastadecad
7、eaway.Allgamesofperfectinformationhaveanoptimalvaluefunction,v(s),whichdeterminestheoutcomeofthegame,fromeveryboardpositionorstates,underperfectplaybyallplayers.Thesegamesmaybesolvedbyrecursivelycomputingtheoptimalvaluefunctioninasearchtreecontainingapproxima
8、telybdpossiblesequencesofmoves,wherebisthegame’sbreadth(number1oflegalmovesperposition)anddisitsdepth(gamelength).Inlargegames,suchaschess(b35;d80)1andespeciallyGo(b250;d150)1,