欢迎来到天天文库
浏览记录
ID:39353442
大小:255.83 KB
页数:8页
时间:2019-07-01
《A Bayesian sampling approach to exploration in reinforcement learning》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、ABayesianSamplingApproachtoExplorationinReinforcementLearningJohnAsmuth†LihongLi†MichaelL.Littman†AliNouri†DavidWingate‡†DepartmentofComputerScience‡ComputationalCognitiveScienceGroupRutgersUniversityMassachusettsInstituteofTechnologyPiscataway,NJ08854Cambridge,MA02143
2、AbstractMyopic(Wangetal.,2005)approachesmakedecisionstoreduceuncertainty,buttheydonotexplicitlycon-Wepresentamodularapproachtoreinforce-siderhowthisreduceduncertaintywillimpactfuturementlearningthatusesaBayesianrepre-reward.Whilemyopicapproachescanlaynoclaimsentationof
3、theuncertaintyovermodels.tooptimalityingeneral,someincludeguaranteesonTheapproach,BOSS(BestofSampledSet),theirtotalregretoronthenumberofsubtoptimalde-drivesexplorationbysamplingmultiplemod-cisionsmadeduringlearning.Anexampleofsuchanelsfromtheposteriorandselectingaction
4、salgorithmisRMAX(Brafman&Tennenholtz,2002),optimistically.Itextendspreviousworkbywhichdistinguishes“known”and“unknown”statesprovidingarulefordecidingwhentore-basedonhowoftentheyhavebeenvisited.Itexploressampleandhowtocombinethemodels.byactingtomaximizerewardundertheass
5、umptionWeshowthatouralgorithmachievesnear-thatunknownstatesdelivermaximumreward.optimalrewardwithhighprobabilitywithaUndirected(Thrun,1992)approachestakeexploratorysamplecomplexitythatislowrelativetotheactions,butwithoutregardtowhatpartsoftheirspeedatwhichtheposteriord
6、istributioncon-environmentmodelsremainuncertain.Classicap-vergesduringlearning.Wedemonstratethatproachessuchas-greedyandBoltzmannexplorationBOSSperformsquitefavorablycomparedthatchooserandomactionsoccasionallyfallintothistostate-of-the-artreinforcement-learningap-cate
7、gory.Theguaranteespossibleforthisclassofalgo-proachesandillustrateitsflexibilitybypair-rithmsaregenerallyweaker—convergencetooptimalingitwithanon-parametricmodelthatgen-behaviorinthelimit,forexample.Asophisticatederalizesacrossstates.approachthatfallsintothiscategoryisB
8、ayesianDP(Strens,2000).ItmaintainsaBayesianposte-riorovermodelsandperiodicallydrawsasamplefrom1INTRODUCTIONthisdistri
此文档下载收益归作者所有