欢迎来到天天文库
浏览记录
ID:40378596
大小:1.88 MB
页数:98页
时间:2019-08-01
《Algorithm for reinforcement learning 》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、AlgorithmsforReinforcementLearningDraftofthelecturepublishedintheSynthesisLecturesonArticialIntelligenceandMachineLearningseriesbyMorgan&ClaypoolPublishersCsabaSzepesvariJune9,2009Contents1Overview32Markovdecisionprocesses72.1Preliminaries...................................72.2MarkovDecisionPro
2、cesses............................82.3Valuefunctions..................................122.4DynamicprogrammingalgorithmsforsolvingMDPs..............163Valuepredictionproblems173.1Temporaldierencelearninginnitestatespaces...............183.1.1TabularTD(0)..............................183.1.2Every-
3、visitMonte-Carlo.........................213.1.3TD():UnifyingMonte-CarloandTD(0)................233.2Algorithmsforlargestatespaces........................253.2.1TD()withfunctionapproximation...................293.2.2Gradienttemporaldierencelearning..................333.2.3Least-squaresmethods..
4、........................36Lastupdate:August18,201013.2.4Thechoiceofthefunctionspace.....................424Control454.1Acatalogoflearningproblems..........................454.2Closed-loopinteractivelearning.........................474.2.1Onlinelearninginbandits........................474.2.2Activ
5、elearninginbandits........................494.2.3ActivelearninginMarkovDecisionProcesses.............504.2.4OnlinelearninginMarkovDecisionProcesses.............514.3Directmethods..................................564.3.1Q-learninginniteMDPs........................564.3.2Q-learningwithfunctionappro
6、ximation................594.4Actor-criticmethods...............................624.4.1Implementingacritic...........................644.4.2Implementinganactor..........................655Forfurtherexploration725.1Furtherreading..................................725.2Applications....................
7、................735.3Software......................................735.4Acknowledgements................................73AThetheoryofdiscountedMarkoviandecisionprocesses74A.1ContractionsandBanach'sxed-pointtheo
此文档下载收益归作者所有