欢迎来到天天文库
浏览记录
ID:9851871
大小:146.50 KB
页数:10页
时间:2018-05-12
《算法类外文资料翻译》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、Q-LearningByExamplesInthistutorial,youwilldiscoverstepbystephowanagentlearnsthroughtrainingwithoutteacher(unsupervised)inunknownenvironment.YouwillfindoutpartofreinforcementlearningalgorithmcalledQ-learning.Reinforcementlearningalgorithmhasbeenwidelyusedformanyapplicationssuchasroboti
2、cs,multiagentsystem,game,andetc.Insteadoflearningthetheoryofreinforcementthatyoucanreaditfrommanybooksandotherwebsites(seeResourcesformorereferences),inthistutorialwillintroducetheconceptthroughsimplebutcomprehensivenumericalexample.YoumayalsodownloadtheMatlabcodeorMSExcelSpreadsheetf
3、orfree.Supposewehave5roomsinabuildingconnectedbycertaindoorsasshowninthefigurebelow.WegivenametoeachroomAtoE.Wecanconsideroutsideofthebuildingasonebigroomtocoverthebuilding,andnameitasF.NoticethattherearetwodoorsleadtothebuildingfromF,thatisthroughroomBandroomE.Wecanrepresenttheroomsb
4、ygraph,eachroomasavertex(ornode)andeachdoorasanedge(orlink).RefertomyothertutorialonGraphifyouarenotsureaboutwhatisGraph.10Wewanttosetthetargetroom.Ifweputanagentinanyroom,wewanttheagenttogooutsidethebuilding.Inotherword,thegoalroomisthenodeF.Tosetthiskindofgoal,weintroducegiveakindof
5、rewardvaluetoeachdoor(i.e.edgeofthegraph).Thedoorsthatleadimmediatelytothegoalhaveinstantrewardof100(seediagrambelow,theyhaveredarrows).Otherdoorsthatdonothavedirectconnectiontothetargetroomhavezeroreward.Becausethedooristwoway(fromAcangotoEandfromEcangobacktoA),weassigntwoarrowstoeac
6、hroomofthepreviousgraph.Eacharrowcontainsaninstantrewardvalue.ThegraphbecomesstatediagramasshownbelowAdditionalloopwithhighestreward(100)isgiventothegoalroom(FbacktoF)sothatiftheagentarrivesatthegoal,itwillremainthereforever.Thistypeofgoaliscalledabsorbinggoalbecausewhenitreachesthego
7、alstate,itwillstayinthegoalstate.Ladiesandgentlemen,nowisthetimetointroduceoursuperstaragent….Imagineouragentasadumbvirtualrobotthatcanlearnthroughexperience.Theagentcanpassoneroomtoanotherbuthasnoknowledgeoftheenvironment.Itdoesnotknowwhichsequenceofdoorstheagentmustpasstogooutsideth
8、ebuil
此文档下载收益归作者所有