资源描述:
《Chapter 5 - Markov Devision Processes v1》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、§5MarkovDecisionProcessesCOMP7404Computationalintelligenceandmachinelearning!Dr.DirkSchniedersAcknowledgements:BasedonmaterialsfromCS188@Berkeley&TextbookChapter17SequentialDecisionProblem★Inasequentialdecisionproblemtheagent’sutilitydependsonasequenceofdecision
2、s★Sequentialdecisionproblemsincorporateutilities,uncertaintyandsensing★Optimalbehaviorbalancestherisksandrewardsofactinginanuncertainenvironment2013-2014COMP7404!2Example:GridWorld★Maze-likeproblemspecialexitstateexitstates★Agentlivesinagrid★Wallsblocktheagent’s
3、patha★Unreliableactions★Eachactionachievestheintendedeffectb80%ofthetime★10%ofthetimetheactionmovesthecagentatrightanglestotheintendeddirection1234★IftheagentbumpsintoawallitstaysinthesamesquareExample(IntendtogoNorth):★Theagentreceivesrewardsforeachaction★Small
4、“living”rewards(canbenegative)0.8N★Bigrewardscomeattheend(goodorbad)★Goal:maximizesumofrewards0.1WE0.12013-2014COMP7404!3GridWorld-StochasticmotionDeterministicactionStochasticaction0.10.80.12013-2014COMP7404!4GridWorld-Stochasticmotionreward=0.0reward=-0.1rewar
5、d=-0.2reward=-0.3exitreward=0.4reward=-0.6reward=-0.5reward=-0.42013-2014COMP7404!5Example:GridWorld★Considerthefixedsequence[North,North,East,East,East]a★Canwereachthegoalstateata4withthissequence?b★Whatistheprobability?c★Probabilityofalwaystakingtheintendedact
6、ionis0.851234★Thereisalsoachanceof0.8Naccidentallyreachingthegoalbygoingtheotherwayaroundwithprobability0.14×0.80.1WE0.1★Overallchanceofreaching(4,3)is0.327762013-2014COMP7404!6Quiz★Whichsquarescanbereachedfromc1bytheactionsequence[East,East,East,North,North]and
7、withwhatprobabilities?abc12342013-2014COMP7404!7Transitionmodel★ThetransitionmodelT(s,a,s’)describestheoutcomeofeachactionineachstate★TheoutcomeisstochasticandwewriteP(s’
8、s,a)todenotetheprobabilityofreachingstates’ifactionaisdoneinstates★Weassumethattransitionsa
9、reMarkovian★I.e.,theprobabilityofreachings’fromsdependsonlyonsandnotonthehistoryofearlierstates★TheMarkovpropertyisnamedaftertheRussianmathematicianAndreyMarkov2013-2