资源描述:
《31-Learning Complex Neural Network Policies with Trajectory Optimization(icml2014)》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、LearningComplexNeuralNetworkPolicieswithTrajectoryOptimizationSergeyLevineSVLEVINE@CS.STANFORD.EDUComputerScienceDepartment,StanfordUniversity,Stanford,CA94305USAVladlenKoltunVLADLEN@ADOBE.COMAdobeResearch,SanFrancisco,CA94103USAAbstract2013).Suchspecializedpolicyclassesarelimitedinthetypesofbe
2、haviorstheycanrepresent,andengineeringnewDirectpolicysearchmethodsofferthepromisepolicyclassesrequiresconsiderableeffort.ofautomaticallylearningcontrollersforcom-plex,high-dimensionaltasks.However,priorap-Inrecentwork,weintroducedanewclassofpolicysearchplicationsofpolicysearchoftenrequiredspe-a
3、lgorithmsthatcanlearnmuchmorecomplexpoliciesbycialized,low-dimensionalpolicyclasses,limit-usingmodel-basedtrajectoryoptimizationtoguidethepol-ingtheirgenerality.Inthiswork,weintroduceicysearch(Levine&Koltun,2013a;b).Byoptimizingtra-apolicysearchalgorithmthatcandirectlylearnjectoriesintandemwith
4、thepolicy,guidedpolicysearchhigh-dimensional,general-purposepolicies,rep-methodscombinetheflexibilityoftrajectoryoptimizationresentedbyneuralnetworks.Weformulatethewiththegeneralityofpolicysearch.Thesemethodscanpolicysearchproblemasanoptimizationoverscaletohighlycomplexpolicyclassesandcanbeusedt
5、otrajectorydistributions,alternatingbetweenopti-traingeneral-purposeneuralnetworkcontrollersthatdonotmizingthepolicytomatchthetrajectories,andrequiretask-specificengineering.Furthermore,thetrainingoptimizingthetrajectoriestomatchthepolicytrajectoriescanbeinitializedwithexamplesforlearningandmini
6、mizeexpectedcost.Ourmethodcanfromdemonstration.learnpoliciesforcomplextaskssuchasbipedalAkeychallengeinguidedpolicysearchisensuringthatpushrecoveryandwalkingonuneventerrain,thetrajectoriesareusefulforlearningthepolicy,sincenotwhileoutperformingpriormethods.alltrajectoriescanberealizedbypolicies
7、fromaparticularpolicyclass.Forexample,apolicyprovidedwithpartialobservationscannotmakedecisionsbasedonunobserved1.Introductionstatevariables.Inthispaper,wepresentaconstrainedDirectpolicysearchoffersthepromiseofautomaticallyguidedp