资源描述:
《6.Long-term recurrent convolutional networks for visual recognition and description》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、1Long-termRecurrentConvolutionalNetworksforVisualRecognitionandDescriptionJeffDonahue,LisaAnneHendricks,MarcusRohrbach,SubhashiniVenugopalan,SergioGuadarrama,KateSaenko,TrevorDarrellAbstract—Modelsbasedondeepconvolutionalnetworkshavedominatedrecentim
2、ageinterpretationtasks;weinvestigatewhethermodelswhicharealsorecurrentareeffectivefortasksinvolvingsequences,visualandotherwise.Wedescribeaclassofrecurrentconvolutionalarchitectureswhichisend-to-endtrainableandsuitableforlarge-scalevisualunderstandin
3、gtasks,anddemonstratethevalueofthesemodelsforactivityrecognition,imagecaptioning,andvideodescription.Incontrasttopreviousmodelswhichassumeafixedvisualrepresentationorperformsimpletemporalaveragingforsequentialprocessing,recurrentconvolutionalmodelsare
4、“doublydeep”inthattheylearncompositionalrepresentationsinspaceandtime.Learninglong-termdependenciesispossiblewhennonlinearitiesareincorporatedintothenetworkstateupdates.Differentiablerecurrentmodelsareappealinginthattheycandirectlymapvariable-lengthi
5、nputs(e.g.,videos)tovariable-lengthoutputs(e.g.,naturallanguagetext)andcanmodelcomplextemporaldynamics;yettheycanbeoptimizedwithbackpropagation.Ourrecurrentsequencemodelsaredirectlyconnectedtomodernvisualconvolutionalnetworkmodelsandcanbejointlytrain
6、edtolearntemporaldynamicsandconvolutionalperceptualrepresentations.Ourresultsshowthatsuchmodelshavedistinctadvantagesoverstate-of-the-artmodelsforrecognitionorgenerationwhichareseparatelydefinedoroptimized.F1INTRODUCTIONRecognitionanddescriptionofimag
7、esandvideosisInputVisualSequenceOutputafundamentalchallengeofcomputervision.DramaticFeaturesLearningprogresshasbeenachievedbysupervisedconvolutionalneuralnetwork(CNN)modelsonimagerecognitiontasks,andanumberofextensionstoprocessvideohavebeenCNNLSTMyre
8、centlyproposed.Ideally,avideomodelshouldallowpro-1cessingofvariablelengthinputsequences,andalsoprovideforvariablelengthoutputs,includinggenerationoffull-lengthsentencedescriptionsthatgobeyondconventionalCNNLSTMyone-versus-allpredictiontasks.Inthispap