资源描述:
《Reinforcement Learning or Evolutionary Strategies_ Nature has a solution_ Both_英文学习资料》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、RecommendedbyDanielJeffriesand19othersArthurJulianiDeepLearning@Unity3D&CognitiveNeurosciencePhDstudent.Apr19·8minreadReinforcementLearningorEvolutionaryStrategies?Naturehasasolution:Both.Abearlearningtohuntfor shwithit’sparent.AfewweeksagoOpenAImadeasplas
2、hintheDeepLearningcommunitywiththereleaseoftheirpaper“EvolutionStrategiesasaScalableAlternativetoReinforcementLearning.”TheworkcontainsimpressiveresultssuggestingthatlookingelsewherethanReinforcementLearning(RL)methodsmaybeworthwhilewhentrainingcomplexneur
3、alnetworks.ItsparkedadebatearoundtheimportanceofReinforcementLearning,andperhapsit’slessthannecessarystatusasthego-totechniqueforlearningtosolvetasks.WhatIwanttoarguehereisthatinsteadofbeingseenastwocompetingstrategies,oneofwhichbeingnecessarilybetterthant
4、heother,theyareultimatelycomplementary.Indeed,ifwethinkalittlebitforwardtothegoalofArtificialGeneralIntelligence(AGI),andsystemsthatcantrulyperformlifelonglearning,reasoning,andplanning,whatwefindisthatacombinedsolutionisalmostcertainlygoingtobenecessary.And
5、indeed,itisjustthissolutionthatnaturearrivedatforendowingmammalsandothercomplexanimallifewithintelligence.EvolutionaryStrategiesThebasicpremiseoftheOpenAIpaperwasthatinsteadofusingReinforcementLearningcoupledwithtraditionalgradientbackpropagation,theysucce
6、ssfullytrainedneuralnetworkstoperformdifficulttasksusingwhattheycalledEvolutionaryStrategy(ES).ThisESapproachconsistsofmaintainingadistributionovernetworkweightvalues,andhavingalargenumberofagentsactinparallelusingparameterssampledfromthisdistribution.Eachag
7、entactsinitsownenvironment,andonceitfinishesasetnumberofepisodes,orstepsofanepisode,cumulativerewardisreturnedtothealgorithmasafitnessscore.Withthisscore,theparameterdistributioncanbemovedtowardthatofthemoresuccessfulagents,andawayfromthatoftheunsuccessfulon
8、es.Byrepeatingthisapproachmillionsoftimes,withhundredsofagents,theweightdistributionmovestoaspacethatprovidestheagentswithagoodpolicyforsolvingthetaskathand.Indeed,themostimpressiveresultfromthepapershowsthat