资源描述:
《Translating videos to natural language using deep recurrent neural networks》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、TranslatingVideostoNaturalLanguageUsingDeepRecurrentNeuralNetworksSubhashiniVenugopalanHuijuanXuJeffDonahueUTAustinUMassLowellUCBerkeley,ICSIAustin,TXLowell,MABerkeley,CAvsub@cs.utexas.eduhxu1@cs.uml.edujdonahue@eecs.berkeley.eduMarcusRohrbachRaymondMooneyKateSaenkoUCBerkeley,I
2、CSIUTAustinUMassLowellBerkeley,CAAustin,TXLowell,MArohrbach@eecs.berkeley.edumooney@cs.utexas.edusaenko@cs.uml.eduAbstractInputvideo:Solvingthevisualsymbolgroundingprob-lemhaslongbeenagoalofartificialintel-ligence.Thefieldappearstobeadvancingclosertothisgoalwithrecentbreakthrough
3、sOuroutput:Acatisplayingwithatoy.indeeplearningfornaturallanguageground-Humans:AFerretandcatfightingwitheachother./Acatandinginstaticimages.Inthispaper,weproposeaferretareplaying./Akittenisplayingwithaferret./Akittentotranslatevideosdirectlytosentencesusingandaferretareplayfully
4、wrestling.aunifieddeepneuralnetworkwithbothcon-Figure1:Oursystemtakesashortvideoasinputandout-volutionalandrecurrentstructure.Describedputsanaturallanguagedescriptionofthemainactivityinvideodatasetsarescarce,andmostexistingthevideo.methodshavebeenappliedtotoydomainsverylimitedtr
5、ainingdataconsistingofvideoswithwithasmallvocabularyofpossiblewords.Bytransferringknowledgefrom1.2M+im-associateddescriptivesentences.Anotherseriousageswithcategorylabelsand100,000+im-obstaclehasbeenthelackofrichmodelsthatcanageswithcaptions,ourmethodisabletocapturethejointdepe
6、ndenciesofasequenceofcreatesentencedescriptionsofopen-domainframesandacorrespondingsequenceofwords.Pre-videoswithlargevocabularies.Wecompareviousworkhassimplifiedtheproblembydetectingourapproachwithrecentworkusinglanguageafixedsetofsemanticroles,suchassubject,verb,generationmetri
7、cs,subject,verb,andobjectandobject(Guadarramaetal.,2013;Thomasonetpredictionaccuracy,andahumanevaluation.al.,2014),asanintermediaterepresentation.Thisfixedrepresentationisproblematicforlargevocabu-1Introductionlariesandalsoleadstooversimplifiedrigidsentencetemplateswhichareunable
8、tomodelthecomplexFormostpeople,watchingabriefvideoandd