资源描述:
《5542-recurrent-models-of-visual-attention.pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、RecurrentModelsofVisualAttentionVolodymyrMnihNicolasHeessAlexGravesKorayKavukcuogluGoogleDeepMind{vmnih,heess,gravesa,korayk}@google.comAbstractApplyingconvolutionalneuralnetworkstolargeimagesiscomputationallyex-pensivebecausetheamountofcomputationscaleslinearlywithth
2、enumberofimagepixels.Wepresentanovelrecurrentneuralnetworkmodelthatisca-pableofextractinginformationfromanimageorvideobyadaptivelyselectingasequenceofregionsorlocationsandonlyprocessingtheselectedregionsathighresolution.Likeconvolutionalneuralnetworks,theproposedmodel
3、hasadegreeoftranslationinvariancebuilt-in,buttheamountofcomputationitper-formscanbecontrolledindependentlyoftheinputimagesize.Whilethemodelisnon-differentiable,itcanbetrainedusingreinforcementlearningmethodstolearntask-specificpolicies.Weevaluateourmodelonseveralimagec
4、lassificationtasks,whereitsignificantlyoutperformsaconvolutionalneuralnetworkbaselineonclutteredimages,andonadynamicvisualcontrolproblem,whereitlearnstotrackasimpleobjectwithoutanexplicittrainingsignalfordoingso.1IntroductionNeuralnetwork-basedarchitectureshaverecentlyh
5、adgreatsuccessinsignificantlyadvancingthestateoftheartonchallengingimageclassificationandobjectdetectiondatasets[8,12,19].Theirexcellentrecognitionaccuracy,however,comesatahighcomputationalcostbothattrainingandtestingtime.Thelargeconvolutionalneuralnetworkstypicallyused
6、currentlytakedaystotrainonmultipleGPUseventhoughtheinputimagesaredownsampledtoreducecomputation[12].InthecaseofobjectdetectionprocessingasingleimageattesttimecurrentlytakessecondswhenrunningonasingleGPU[8,19]astheseapproacheseffectivelyfollowtheclassicalslidingwindowp
7、aradigmfromthecomputervisionliteraturewhereaclassifier,trainedtodetectanobjectinatightlycroppedboundingbox,isappliedindependentlytothousandsofcandidatewindowsfromthetestimageatdifferentpositionsandscales.Althoughsomecomputationscanbeshared,themaincomputationalexpensefo
8、rthesemodelscomesfromconvolvingfiltermapswiththeentireinputimage,thereforetheircomputationalcomplexityisatleastlinearinthenum