资源描述:
《基于.hadoop的海量教育资源中小文件的存储的研究与实现》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、ABSTRACTABSTRACTEducationresourceislearningresourcesthatexistinginthenetwork,whichhasmanyforms,suchastext,video,audioandotherforms.Amongthem,textresourcesaccountformorethan80%ofalllearningresources.ThenumberoftextresourcesislargeandsizeoffileisgenerallyforKBlevel,rarelyreachMBlevel,
2、thuscallededucationresources.AttheageofInternet,thescaleofonlineeducationresourcesbecomemoreandmorelarge,calculationprocessingishuge,leadstothattraditionaldistributedfilesystemcan'tmeetthedemandforprocessingmassiveeducationresourcessmallfiles.Hadoopisanopensourcedistributedprocessin
3、gplatform,providingareliable,scalableandefficientmethodtohandlemassivedata.HadoopdistributedfilesystemHDFShasabilityofdatastorageandperformsexcellentlyatlarge-scaledatahandling.Unfortunately,HDFSisdesignedforprocessinglargefiles,whichmeanstherehassomeshortagesinprocessingmassivesmal
4、lfiles.Forinstance,thememoryofNameNodewillbeoccupiedquicklywhenstoremassivesmallfilesonHDFS,whichmaycausethememorybottlenecks.Whenaccessingsmallfilefrequently,itneedstojumpamongseveralDataNode,whichleadingtotheaccessspeedslowly.Comparedwithlargefile’sprocessing,smallfileprocessingsp
5、eedistooslow.InordertosolvethestorageproblemofmassiveeducationresourcessmallfilesonHadoopplatform,thisthesisproposesastorageoptimizationschemeforsmallfiles,whichincludesthefollowingfourparts:1)Classificationoftheassociatedsmallfiles:JudgingthesizeoffilebeforefilesuploadedtotheHDFScl
6、uster,ifit’ssmallfile,classifieditwithclassificationalgorithm,thenassociatedcategorysmallfileswithhierarchicalclusteringalgorithm,generatingassociatedsmallfiles.2)Mergingofsmallfiles:mergeclassifiedofassociatedsmallfilesintoalargefile,uploadlargefiletotheHDFScluster,mergingwillreduc
7、ealotofsmallfiles’metadatatooccupythememoryofNameNode.3)Setupindex:establishtheindexforfilelargefiles,whenretrievingsmallfile,itwillberetrievedrapidlybyindexfile,whichimprovingretrievalspeedofsmallfile.4)Metadatacacheandassociatedsmallfileprefetching:Afterfirstreadingthefile,IIABSTR
8、ACTthefilemetadataa