资源描述:
《Two-Stream Convolutional Networks for Action Recognition in videos》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、Two-StreamConvolutionalNetworksforActionRecognitioninVideosKarenSimonyanAndrewZissermanVisualGeometryGroup,UniversityofOxfordfkaren,azg@robots.ox.ac.ukAbstractWeinvestigatearchitecturesofdiscriminativelytraineddeepConvolutionalNet-works(ConvNets)foractionrecognitioninvideo.Thechalle
2、ngeistocapturethecomplementaryinformationonappearancefromstillframesandmotionbe-tweenframes.Wealsoaimtogeneralisethebestperforminghand-craftedfeatureswithinadata-drivenlearningframework.Ourcontributionisthree-fold.First,weproposeatwo-streamConvNetarchitec-turewhichincorporatesspatia
3、landtemporalnetworks.Second,wedemonstratethataConvNettrainedonmulti-framedenseopticalflowisabletoachieveverygoodperformanceinspiteoflimitedtrainingdata.Finally,weshowthatmulti-tasklearning,appliedtotwodifferentactionclassificationdatasets,canbeusedtoincreasetheamountoftrainingdataandi
4、mprovetheperformanceonboth.OurarchitectureistrainedandevaluatedonthestandardvideoactionsbenchmarksofUCF-101andHMDB-51,whereitiscompetitivewiththestateoftheart.Italsoexceedsbyalargemarginpreviousattemptstousedeepnetsforvideoclassifica-tion.1IntroductionRecognitionofhumanactionsinvideo
5、sisachallengingtaskwhichhasreceivedasignificantamountofattentionintheresearchcommunity[11,14,17,26].Comparedtostillimageclassification,thetemporalcomponentofvideosprovidesanadditional(andimportant)clueforrecognition,asanumberofactionscanbereliablyrecognisedbasedonthemotioninformation.
6、Additionally,videoprovidesnaturaldataaugmentation(jittering)forsingleimage(videoframe)classification.Inthiswork,weaimatextendingdeepConvolutionalNetworks(ConvNets)[19],astate-of-the-artstillimagerepresentation[15],toactionrecognitioninvideodata.Thistaskhasrecentlybeenaddressedin[14]b
7、yusingstackedvideoframesasinputtothenetwork,buttheresultsweresignif-icantlyworsethanthoseofthebesthand-craftedshallowrepresentations[20,26].Weinvestigateadifferentarchitecturebasedontwoseparaterecognitionstreams(spatialandtemporal),whicharethencombinedbylatefusion.Thespatialstreampe
8、rformsactionrecogni