欢迎来到天天文库
浏览记录
ID:41236615
大小:2.93 MB
页数:326页
时间:2019-08-20
《[] - Reinforcement Learning : An Introductio》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、iReinforcementLearning:AnIntroductionSecondedition,inprogressRichardS.SuttonandAndrewG.BartoABradfordBookTheMITPressCambridge,MassachusettsLondon,EnglandiiInmemoryofA.HarryKlopfContentsPreface..................................viiiSeriesForward...........................
2、..xiiSummaryofNotation..........................xiiiITheProblem11Introduction31.1ReinforcementLearning.....................41.2Examples.............................61.3ElementsofReinforcementLearning..............71.4AnExtendedExample:Tic-Tac-Toe..............101.5Summa
3、ry.............................151.6HistoryofReinforcementLearning...............161.7BibliographicalRemarks.....................232BanditProblems252.1Ann-ArmedBanditProblem..................262.2Action-ValueMethods......................272.3SoftmaxActionSelection......
4、...............302.4IncrementalImplementation...................322.5TrackingaNonstationaryProblem...............332.6OptimisticInitialValues.....................352.7AssociativeSearch(ContextualBandits)............37iiiivCONTENTS2.8Conclusions..........................
5、..382.9BibliographicalandHistoricalRemarks.............403TheReinforcementLearningProblem433.1TheAgent{EnvironmentInterface................433.2GoalsandRewards........................483.3Returns..............................493.4UniedNotationforEpisodicandContinuingTa
6、sks......523.5TheMarkovProperty.......................533.6MarkovDecisionProcesses....................583.7ValueFunctions..........................603.8OptimalValueFunctions.....................663.9OptimalityandApproximation.................713.10Summary..............
7、...............723.11BibliographicalandHistoricalRemarks.............74IITabularAction-ValueMethods794DynamicProgramming834.1PolicyEvaluation.........................844.2PolicyImprovement........................874.3PolicyIteration..........................914.4ValueIt
8、eration..........................954.5AsynchronousDynamicProgramming..............984.6GeneralizedPolicyIterat
此文档下载收益归作者所有