资源描述:
《A Convergent Actor–Critic-Based FRL Algorithm with Application to Power Management of Wireless Transmitters》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、478IEEETRANSACTIONSONFUZZYSYSTEMS,VOL.11,NO.4,AUGUST2003AConvergentActor–Critic-BasedFRLAlgorithmwithApplicationtoPowerManagementofWirelessTransmittersHamidR.Berenji,Fellow,IEEE,andDavidVengerovAbstract—ThispaperprovidesthefirstconvergenceproofforInSectionII,wepresentageneraldiscussionofactor-
2、criticfuzzyreinforcementlearning(FRL)aswellasexperimentalresultsalgorithms.TheACFRLalgorithmisdescribedinSectionIIIsupportingouranalysis.WeextendtheworkofKondaandTsit-anditsconvergenceisprovedinSectionIV.SectionVdescribessiklis,whopresentedaconvergentactor–critic(AC)algorithmfortheapplicationd
3、omainandpresentsoursimulationresults.ageneralparameterizedactor.InourworkweprovethatafuzzyrulebaseactorsatisfiesthenecessaryconditionsthatguaranteetheSectionVIpresentsrelatedworkandSectionVIIconcludesconvergenceofitsparameterstoalocaloptimum.Ourfuzzyrule-thepaper.baseusesTakagi–Sugeno–Kangrule
4、s,Gaussianmembershipfunc-tions,andproductinference.Asanapplicationdomain,wechoseaII.ACALGORITHMSFORRLdifficulttaskofpowercontrolinwirelesstransmitters,character-izedbydelayedrewardsandahighdegreeofstochasticity.TotheACmethodswereamongthefirstreinforcementlearningbestofourknowledge,noreinforcem
5、entlearningalgorithmshavealgorithmstousetemporal-differencelearning.Thesemethodsbeenpreviouslyappliedtothistask.OursimulationresultsshowwerefirststudiedinthecontextofaclassicalconditioningthattheACFRLalgorithmconsistentlyconvergesinthisdomaintoalocallyoptimalpolicy.modelinanimallearningbySutto
6、nandBarto[17].Later,Bartoetal.[3]successfullyappliedACmethodstothecart-poleIndexTerms—Actor–critic(AC),convergence,fuzzyreinforce-balancingproblem,wheretheydefinedforthefirsttimethementlearning(FRL),powercontrol.termsactorandcritic.Inthesimplestcaseoffinite-stateandactionspaces,thefol-I.INTROD
7、UCTIONlowingACalgorithmhasbeensuggestedbySuttonandBarto[18].AfterchoosingtheactioninthestateandreceivingEINFORCEMENTlearningtechniquesprovidepow-thereward,thecriticevaluatesthenewstateandcomputesRerfulmethodologiesforlearningthroughinte