欢迎来到天天文库
浏览记录
ID:12130431
大小:249.50 KB
页数:10页
时间:2018-07-15
《算法类外文资料翻译》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、Q-LearningByExamplesInthistutorial,youwilldiscoverstepbystephowanagentlearnsthroughtrainingwithoutteacher(unsupervised)inunknownenvironment.YouwillfindoutpartofreinforcementlearningalgorithmcalledQ-learning.Reinforcementlearningalgorithmhasbeenwidelyusedformanyapplicationssuchasrobot
2、ics,multiagentsystem,game,andetc.Insteadoflearningthetheoryofreinforcementthatyoucanreaditfrommanybooksandotherwebsites(seeResourcesformorereferences),inthistutorialwillintroducetheconceptthroughsimplebutcomprehensivenumericalexample.YoumayalsodownloadtheMatlabcodeorMSExcelSpreadshee
3、tforfree.Supposewehave5roomsinabuildingconnectedbycertaindoorsasshowninthefigurebelow.WegivenametoeachroomAtoE.Wecanconsideroutsideofthebuildingasonebigroomtocoverthebuilding,andnameitasF.NoticethattherearetwodoorsleadtothebuildingfromF,thatisthroughroomBandroomE.Wecanrepresenttheroo
4、msbygraph,eachroomasavertex(ornode)andeachdoorasanedge(orlink).RefertomyothertutorialonGraphifyouarenotsureaboutwhatisGraph.10Wewanttosetthetargetroom.Ifweputanagentinanyroom,wewanttheagenttogooutsidethebuilding.Inotherword,thegoalroomisthenodeF.Tosetthiskindofgoal,weintroducegiveaki
5、ndofrewardvaluetoeachdoor(i.e.edgeofthegraph).Thedoorsthatleadimmediatelytothegoalhaveinstantrewardof100(seediagrambelow,theyhaveredarrows).Otherdoorsthatdonothavedirectconnectiontothetargetroomhavezeroreward.Becausethedooristwoway(fromAcangotoEandfromEcangobacktoA),weassigntwoarrows
6、toeachroomofthepreviousgraph.Eacharrowcontainsaninstantrewardvalue.ThegraphbecomesstatediagramasshownbelowAdditionalloopwithhighestreward(100)isgiventothegoalroom(FbacktoF)sothatiftheagentarrivesatthegoal,itwillremainthereforever.Thistypeofgoaliscalledabsorbinggoalbecausewhenitreache
7、sthegoalstate,itwillstayinthegoalstate.Ladiesandgentlemen,nowisthetimetointroduceoursuperstaragent….Imagineouragentasadumbvirtualrobotthatcanlearnthroughexperience.Theagentcanpassoneroomtoanotherbuthasnoknowledgeoftheenvironment.Itdoesnotknowwhichsequenceofdoorstheagentmustpasstogoou
8、tsidethebuil
此文档下载收益归作者所有