资源描述:
《博弈算法 英文版》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、BootstrappingfromGameTreeSearchJoelVenessDavidSilverUniversityofNSWandNICTAUniversityofAlbertaSydney,NSW,Australia2052Edmonton,ABCanadaT6G2E8joelv@cse.unsw.edu.ausilver@cs.ualberta.caWilliamUtherAlanBlairNICTAandtheUniversityofNSWUniversityofNSWandNICTASydney,NSW,Australia2052Sydney,NSW,Austr
2、alia2052William.Uther@nicta.com.aublair@cse.unsw.edu.auAbstractInthispaperweintroduceanewalgorithmforupdatingtheparametersofaheuris-ticevaluationfunction,byupdatingtheheuristictowardsthevaluescomputedbyanalpha-betasearch.Ouralgorithmdiffersfrompreviousapproachestolearningfromsearch,suchasSamu
3、el’scheckersplayerandtheTD-Leafalgorithm,intwokeyways.First,weupdateallnodesinthesearchtree,ratherthanasinglenode.Second,weusetheoutcomeofadeepsearch,insteadoftheoutcomeofasubse-quentsearch,asthetrainingsignalfortheevaluationfunction.WeimplementedouralgorithminachessprogramMeep,usingalinearhe
4、uristicfunction.Afterinitialisingitsweightvectortosmallrandomvalues,Meepwasabletolearnhighqualityweightsfromself-playalone.Whentestedonlineagainsthumanoppo-nents,Meepplayedatamasterlevel,thebestperformanceofanychessprogramwithaheuristiclearnedentirelyfromself-play.1IntroductionTheideaofsearch
5、bootstrappingistoadjusttheparametersofaheuristicevaluationfunctionto-wardsthevalueofadeepsearch.Themotivationforthisapproachcomesfromtherecursivenatureoftreesearch:iftheheuristiccanbeadjustedtomatchthevalueofadeepsearchofdepthD,thenasearchofdepthkwiththenewheuristicwouldbeequivalenttoasearcho
6、fdepthk+Dwiththeoldheuristic.Deterministic,two-playergamessuchaschessprovideanidealtest-bedforsearchbootstrapping.Theintricatetacticsrequireasignificantlevelofsearchtoprovideanaccuratepositionevaluation;learningwithoutsearchhasproducedlittlesuccessinthesedomains.Muchofthepriorworkinlearningfro
7、msearchhasbeenperformedinchessorsimilartwo-playergames,allowingforclearcomparisonswithexistingmethods.Samuel(1959)firstintroducedtheideaofsearchbootstrappinginhisseminalcheckersplayer.InSamuel’sworktheheuristicfunctionwasupdatedtowardsthevalue