A Bayesian sampling approach to exploration in reinforcement learning

A Bayesian sampling approach to exploration in reinforcement learning

ID:39353442

大小:255.83 KB

页数:8页

时间:2019-07-01

A Bayesian sampling approach to exploration in reinforcement learning_第1页
A Bayesian sampling approach to exploration in reinforcement learning_第2页
A Bayesian sampling approach to exploration in reinforcement learning_第3页
A Bayesian sampling approach to exploration in reinforcement learning_第4页
A Bayesian sampling approach to exploration in reinforcement learning_第5页
资源描述:

《A Bayesian sampling approach to exploration in reinforcement learning》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库

1、ABayesianSamplingApproachtoExplorationinReinforcementLearningJohnAsmuth†LihongLi†MichaelL.Littman†AliNouri†DavidWingate‡†DepartmentofComputerScience‡ComputationalCognitiveScienceGroupRutgersUniversityMassachusettsInstituteofTechnologyPiscataway,NJ08854Cambridge,MA02143

2、AbstractMyopic(Wangetal.,2005)approachesmakedecisionstoreduceuncertainty,buttheydonotexplicitlycon-Wepresentamodularapproachtoreinforce-siderhowthisreduceduncertaintywillimpactfuturementlearningthatusesaBayesianrepre-reward.Whilemyopicapproachescanlaynoclaimsentationof

3、theuncertaintyovermodels.tooptimalityingeneral,someincludeguaranteesonTheapproach,BOSS(BestofSampledSet),theirtotalregretoronthenumberofsubtoptimalde-drivesexplorationbysamplingmultiplemod-cisionsmadeduringlearning.Anexampleofsuchanelsfromtheposteriorandselectingaction

4、salgorithmisRMAX(Brafman&Tennenholtz,2002),optimistically.Itextendspreviousworkbywhichdistinguishes“known”and“unknown”statesprovidingarulefordecidingwhentore-basedonhowoftentheyhavebeenvisited.Itexploressampleandhowtocombinethemodels.byactingtomaximizerewardundertheass

5、umptionWeshowthatouralgorithmachievesnear-thatunknownstatesdelivermaximumreward.optimalrewardwithhighprobabilitywithaUndirected(Thrun,1992)approachestakeexploratorysamplecomplexitythatislowrelativetotheactions,butwithoutregardtowhatpartsoftheirspeedatwhichtheposteriord

6、istributioncon-environmentmodelsremainuncertain.Classicap-vergesduringlearning.Wedemonstratethatproachessuchas-greedyandBoltzmannexplorationBOSSperformsquitefavorablycomparedthatchooserandomactionsoccasionallyfallintothistostate-of-the-artreinforcement-learningap-cate

7、gory.Theguaranteespossibleforthisclassofalgo-proachesandillustrateitsflexibilitybypair-rithmsaregenerallyweaker—convergencetooptimalingitwithanon-parametricmodelthatgen-behaviorinthelimit,forexample.Asophisticatederalizesacrossstates.approachthatfallsintothiscategoryisB

8、ayesianDP(Strens,2000).ItmaintainsaBayesianposte-riorovermodelsandperiodicallydrawsasamplefrom1INTRODUCTIONthisdistri

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。