资源描述:
《2014_ECCV_Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition_183》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、SpatialPyramidPoolinginDeepConvolutionalNetworksforVisualRecognitionTechnicalreportKaimingHe1XiangyuZhang2ShaoqingRen3JianSun11MicrosoftResearch2Xi’anJiaotongUniversity3UniversityofScienceandTechnologyofChinafkahe,jiansung@microsoft.comxyz.clx@stu.xj
2、tu.edu.cnsqren@mail.ustc.edu.cnAbstractExistingdeepconvolutionalneuralnetworks(CNNs)re-quireafixed-size(e.g.224224)inputimage.Thisrequire-cropwarpmentis“artificial”andmayhurttherecognitionaccuracyfortheimagesorsub-imagesofanarbitrarysize/scale.Inimagecr
3、op/warpconvlayersfclayersoutputthiswork,weequipthenetworkswithamoreprincipledpoolingstrategy,“spatialpyramidpooling”,toeliminateimageconvlayersspatialpyramidpoolingfclayersoutputtheaboverequirement.Thenewnetworkstructure,calledFigure1.Top:croppingorwar
4、pingtofitafixedsize.Middle:aSPP-net,cangenerateafixed-lengthrepresentationregard-conventionaldeepconvolutionalnetworkstructure.Bottom:ourlessofimagesize/scale.Byremovingthefixed-sizelimi-spatialpyramidpoolingnetworkstructure.tation,wecanimproveallCNN-based
5、imageclassificationmethodsingeneral.OurSPP-netachievesstate-of-the-artaccuracyonthedatasetsofImageNet2012,PascalVOCtestingoftheCNNs:theprevalentCNNsrequireafixed2007,andCaltech101.inputimagesize(e.g.,224224),whichlimitsboththeas-ThepowerofSPP-netismores
6、ignificantinobjectdetec-pectratioandthescaleoftheinputimage.Whenappliedtion.UsingSPP-net,wecomputethefeaturemapsfromthetoimagesofarbitrarysizes,currentmethodsmostlyfittheentireimageonlyonce,andthenpoolfeaturesinarbitraryinputimagetothefixedsize,eitherviac
7、ropping[16,33]regions(sub-images)togeneratefixed-lengthrepresenta-orviawarping[8,12],asshowninFigure1(top).Butthetionsfortrainingthedetectors.Thismethodavoidsrepeat-croppedregionmaynotcontaintheentireobject,whiletheedlycomputingtheconvolutionalfeatures.
8、Inprocessingwarpedcontentmayresultinunwantedgeometricdistor-testimages,ourmethodcomputesconvolutionalfeaturestion.Recognitionaccuracycanbecompromisedduetothe30-170fasterthantherecentleadingmethodR-CNN(andcontentlossordistortion.Besides