《symphony an integrated multimedia file system》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
Symphony:AnIntegratedMultimediaFileSystemPrashantJ.Shenoy,PawanGoyal,SriramS.Rao,andHarrickM.VinDistributedMultimediaComputingLaboratoryDepartmentofComputerSciences,UniversityofTexasatAustinTaylorHall2.124,Austin,Texas78712-1188,USAE-mail:fshenoy,pawang,sriram,ving@cs.utexas.edu,Telephone:(512)471-9732,Fax:(512)471-8885URL:http://www.cs.utexas.edu/users/dmclABSTRACTAnintegratedmultimediafilesystemsupportsthestorageandretrievalofmultipledatatypes.Inthispaper,wefirstdiscussvariousdesignmethodologiesforbuildingintegratedfilesystemsandexaminetheirtradeoffs.Wearguethat,toefficientlysupportthestorageandretrievalofheterogeneousdatatypes,anintegratedfilesystemshouldenablethecoexistenceofmultipledatatypespecifictechniques.WethendescribethedesignofSymphony–anintegratedfilesystemthatachievesthisobjective.SomeofthenovelfeaturesofSymphonyinclude:aQoS-awarediskschedulingalgorithm;supportfordatatypespecificplacement,failurerecovery,andcachingpolicies;andsupportforassigningdatatypespecificstructuretofiles.WediscusstheprototypeimplementationofSymphony,andpresentresultsofourpreliminaryexperimentalevaluation.Keywords:Multimediafilesystems,multimediaoperatingsystems,integratedfilesystems1.INTRODUCTIONRecentadvancesincompression,storage,andcommunicationtechnologieshaveresultedintheproliferationofapplicationsthatinvolvestoringandretrievingmultipledatatypessuchastext,audio,video,imagery,etc.(collectivelyreferredtoasmultimedia).Realizingsuchnextgenerationapplicationsrequiresthedevelopmentoffilesystemsthatcanefficientlymanagethestorageandretrievalofmultipledatatypes(referredtoasintegratedmultimediafilesystems).Mostexistingfilesystemshavebeenoptimizedforasingledatatype.Forinstance,conventionalfilesystemshavebeenoptimizedforthestorageandretrievaloftextualdata.1Sincethesefilesystemsprovideonlyabest-effortservicetouserrequests,theyareill-suitedformeetingthereal-timeperformancerequirementsofaudioandvideodata(continuousmedia).Ontheotherhand,continuousmediaserversexploittheperiodicandsequentialnatureofcontinuousmediaaccessestoimproveserverthroughput,andemployresourcereservationalgorithmstoprovidereal-timeperformanceguarantees.2{6Mostoftheseservers,however,assumeapredominantlyread-onlyenvironmentwithsubstantiallylessstringentresponsetimerequirements,andhence,areunsuitablefortextualdata.Thus,existingfilesystemsareinadequateformanagingheterogeneousdatatypes.Thedesignandimplementationofafilesystemthataddressestheselimitationstypesconstitutesthesubjectmatterofthispaper.Thispapermakesthefollowingresearchcontributions.First,weproposetwodifferentarchitecturesfordesigningintegratedmultimediafilesystems,namelyphysicallyandlogicallyintegratedfilesystems.Weexaminethetradeoffsbetweenthesearchitecturesanddemonstratethatphysicallyintegratedfilesystemsyieldbetterutilizationoffilesystemresources.Next,wediscusstherequirementsimposedbyvariousdatatypesonphysicallyintegratedfilesystems.Specifically,wediscusstherequirementsimposedontheservicemodel,theretrievalarchitecture,andtechniquesforplacement,faulttolerance,caching,andmetadatamanagementemployedbythefilesystem.Weconcludefromtheserequirementsthat,toefficientlysupportheterogeneousdatatypes,anintegratedfilesystemshouldenablethecoexistenceofmultipledatatypespecifictechniques.Finally,wepresentthedesignandimplementationofSymphony–aphysicallyintegratedfilesystemthatmeetstheserequirements.Symphonyconsistsofanumberofnovelfeaturesincluding:(1)aQoS-awarediskschedulerthatefficientlysupportsbothreal-timeandnon-real-timerequests,(2)astoragemanagerthatsupportsmultipleblocksizesandallowscontrolovertheirplacement,therebyenablingdiverseplacementpoliciestobeimplementedinthefilesystem,(3)afault-tolerancelayerthatenablesdatatypespecificfailurerecovery,and(4)atwolevelmetadata(inode)structurethatenablesdatatypespecificstructuretobeassignedtofileswhilecontinuingtosupportthetraditionalbytestreaminterface.Inadditiontothesenovelfeatures,Symphonysynthesizesanumberofrecentinnovationsintothefilesystem.Thesefeaturesinclude:(1)resourcereservation(i.e.,admissioncontrol)algorithmstoprovideQoSguarantees,7(2)supportforclient-pullandserver-pusharchitectures,8(3)supportforfixed-sizeandvariable-sizeblocks,9and(4)supportfordatatypespecificcachingtechniques.10,11ThisresearchwassupportedinpartbyanIBMFacultyDevelopmentAward,Intel,theNationalScienceFoundation(ResearchInitiationAwardCCR-9409666andCAREERawardCCR-9624757),NASA,MitsubishiElectricResearchLaboratories(MERL),andSunMicrosystemsInc. ClientsClientsIntegrationlayerVideoTextIntegratedServerServerServerDisksDisksDisks(a)Logicallyintegratedfilesystem(b)PhysicallyintegratedfilesystemFigure1.Logicallyandphysicallyintegratedlesystems.Thelogicallyintegratedlesystempartitionsthesetofdisksamongcomponentservers;thephysicallyintegratedleserversharesalldisksamongalldatatypes.Integratingthesediversetechniquesintothefilesystemisakeychallenge,sinceitcancomplicatethefilesystemarchitecture.Symphonyaddressesthisbyemployingatwolayerarchitecture.ThelowerlayerofSymphonyimplementsasetofdatatypeindependentmechanisms(suchasdiskscheduling,buffermanagement,storagespacemanagement,etc.)thatprovidecorefilesystemfunctionality.Theupperlayerusesthesemechanismstoimplementmultipledatatypespecificpolicies.Itconsistsofasetofmodules,oneperdatatype,eachofwhichimplementspoliciesoptimizedforthatdatatype.Suchatwolayeredarchitectureallowsthefilesystemdesignertoeasilyaddmodulesfornewdatatypes,orchangepoliciesimplementedinexistingmoduleswithoutchangingtherestofthefilesystem.WehaveimplementedaprototypeofSymphonyinSolaris2.5.1andhaveevaluateditinanenvironmentconsistingofaclusterofworkstationsinterconnectedbyanATMnetwork.Wepresentandanalyzeourexperimentalresults.Therestofthepaperisorganizedasfollows.InSection2,weproposeandevaluatetwodifferentarchitecturesfordesigningintegratedfilesystems.TherequirementsimposedbyvariousdatatypesonanintegratedfilesystemareoutlinedinSection3.Sections4,5,and6describethedesignofSymphony.InSection7,weexperimentallyevaluatetheSymphonyprototype.Section8describesrelatedwork,andfinally,Section9summarizesourresults.2.DESIGNMETHODOLOGIESTherearetwomethodologiesfordesigningintegratedfilesystems:(1)Logicallyintegratedfilesystems,whichconsistofmultiplefileservers,eachoptimizedforaparticulardatatype,gluedtogetherbyanintegrationlayerthatprovidesauniformwayofaccessingfilesstoredondifferentfileservers;and(2)physicallyintegratedfilesystems,whichconsistofasinglefileserverthatstoresalldatatypes.Figure1illustratesthesearchitectures.Eacharchitecturehasitsadvantagesanddisadvantages.Sincetechniquesforbuildingfileserversoptimizedforasingledatatypearewellknown,1,6logicallyintegratedfilesystemsareeasytoimplement.Insuchfilesystems,thesetofdisksisstaticallypartitionedamongcomponentfileservers.Thiscausesrequestsaccessingdifferentcomponentserverstoaccessmutuallyexclusivesetofdisks,therebypreventinginterferencebetweenuserrequests(e.g.,servicingbest-efforttextrequestsdoesnotviolatedeadlinesofreal-timecontinuousmediarequests).However,staticpartitioningofresourceshasthefollowinglimitations:Staticpartitioningofstorageresourcesisgovernedbytheexpectedstoragespaceandbandwidthrequirementsofworkloads.Iftheobservedworkloaddeviatessignificantlyfromtheexpected,thenrepartitioningofdisksmaybenecessary.Repartitioningofdisksistediousandmayrequirethesystemtobetakenoffline.12Analternativetorepartitioningistoaddnewdiskstotheserver,whichcausesresourcesinunder-utilizedpartitionstobewasted.Thestoragespacerequirementsoffilesstoredonacomponentfileservercanbesignificantlydifferentfromtheirbandwidthrequirements.Insuchascenario,allocationofdiskstothecomponentserverwillbegovernedbythemaximumofthetwovalues.Thiscanleadtounder-utilizationofeitherstoragespaceordiskbandwidthontheserver.Incontrast,physicallyintegratedfilesystemsshareallresourcesamongalldatatypes,anddonotsufferfromtheabovelimitations.Insuchservers,staticpartitioningofresourcesisnotrequired;storagespaceandbandwidthareallocatedtodatatypesondemand.Suchanarchitecturehasseveraladvantages.First,byco-locatingasetoffileswithlargestoragespacebutsmallbandwidthrequirementswithanothersetoffileswithsmallstoragespacebutlargebandwidthrequirements,thisarchitectureyieldsbetterresourceutilization.Second,sinceresourcesareallocatedondemand,itcaneasilyaccommodatedynamicchangesinaccesspatterns.Finally,sinceallthestorageresourcesaresharedamongallthefiles,moreresourcesare Textserver(logical)=8disks,Physical=16disks400900logicallogical350physical,40videoclients800physical,50videoclientsphysical,60videoclients700physical,70videoclients300600250500200400150300Responsetime(msec)Responsetime(msec)100200501000020253035402030405060708090100NumberoftextclientsNumberoftextclients(a)SCAN(b)SymphonydiskschedulingalgorithmFigure2.Responsetimesinlogicallyandphysicallyintegratedlesystemsfora8KBrequest.UseoftheSCANdiskschedulingalgorithminthephysicallyintegratedserveryieldsresponsetimethataresubstantiallyhigherthanthatinthelogicallyintegratedserver,whereasuseoftheSymphonydiskschedulingalgorithmyieldscomparableorbetterperformance.availabletoserviceeachrequest,whichinturnimprovestheperformance.Adisadvantageofphysicallyintegratedfilesystemsisthatthefilesystemdesignbecomesmorecomplexduetotheneedforsupportingmultipledatatypes.Moreover,sincemultipledatatypessharethesamesetofresources,datatypescaninterferewitheachother.Forinstance,arrivalofalargenumberoftextrequestscancausedeadlinesofreal-timecontinuousmediarequeststobeviolated.Algorithmsthatmultiplexresourcesinaphysicallyintegratedfilesystemmustbedesignedcarefullytopreventsuchinterference.Toquantitativelycomparelogicallyandphysicallyintegratedfilesystems,wemeasuredtheresponsetimesyieldedbythetwoarchitecturesusingtrace-drivensimulations.Eachserverwasassumedtocontainsixteendisks.Inthelogicallyintegratedserver,eightdiskseachwereallocatedtothecontinuousmediaserverandthetextfileserver,whileinthephysicallyintegratedserver,alldisksweresharedbybothdatatypes.ThetextserverinthelogicallyintegratedcasewasassumedtousetheSCANdiskschedulingalgorithm.However,useofSCANinthephysicallyintegratedserveryieldspoorresponsetimesfortextrequests.Thisisbecause,SCANdoesnotdifferentiateamongrequestswithdifferentrequirementsandschedulestextandvideorequestsmerelybasedontheircylindernumbers.Asshownin2(a),evenatamoderatevideoload,thisinterleavingcausestheresponsetimeofthephysicallyintegratedservertobesubstantiallyhigherthanthatofthelogicallyintegratedserver.Theresponsetimeimprovessignificantlyifthephysicallyintegratedserverusesadiskschedulingalgorithmthatdelaysreal-timevideorequestsuntiltheirdeadlinesandusestheavailableslacktoscheduletextrequests.Bygivingprioritytotextrequestsovervideorequestswheneversufficientslackisavailable,suchaschedulerprovideslowresponsetimetotextrequestswithoutviolatingthedeadlinesofreal-timevideorequests(seeSection5foradetaileddescriptionofthealgorithm).Figure2(b)comparesthetwoarchitectureswhenthisdiskschedulingalgorithmisused.Thefigureshowsthattheresponsetimeofthephysicallyintegratedserveriscomparabletothatofthelogicallyintegratedserveratlighttextloads.Moreover,atmoderatetoheavytextloads,thephysicallyintegratedserversignificantlyoutperformsitscounterpart.Thisisbecause,atheavyloads,theresponsetimeisgovernedbythequeuingdelayatadisk(i.e.,thetimeforwhicharequestwaitsinthediskschedulingqueuebeforeitreceivesservice).Sincetextfilesareinterleavedacrossalldisks,thenumberofdisksservicingtextrequests,andhencethenumberofqueues,islargerthanthatinthelogicallyintegratedserver.Thisyieldsasmallernumberofrequestsperqueue,andhence,betterresponsetimestotextrequests.Atlightvideoloads,thedifferentintheresponsetimefortextrequestsobservedinthelogicallyandthephysicallyintegratedserverisevenlarger.Thus,bycarefullydesigningthediskschedulingalgorithm,aphysicallyintegratedservercanprovideidenticalorbetterperformanceascomparedtoalogicallyintegratedserver.Tosummarize,physicallyintegratedfileserversobviatetheneedforstaticpartitioningofresourcesinherentinthelogicallyintegratedservers.Theresultingdynamicsharingofresourcesleadstobetterperformanceandhigherutilizationofresources,albeitattheexpenseofincreasedfilesystemcomplexity.Duetotheadvantagesofphysicalintegration,inthispaper,wefocusonlyonphysicallyintegratedfilesystems.Nextwediscusstherequirementsimposedbyvariousdatatypesonphysicallyintegratedfilesystems.3.REQUIREMENTSFORAPHYSICALLYINTEGRATEDFILESYSTEMAphysicallyintegratedfilesystemsshouldachieveefficientutilizationoffilesystemresourceswhilemeetingtheperformancerequirementsofheterogeneousdatatypesandapplications.Inwhatfollows,wediscusstheeffectofthisobjectiveontheservicemodelandtheretrievalarchitecturesupportedbytheintegratedfilesystem,aswellasontechniquesforefficientplacement,faulttolerance,metadatamanagement,andcaching. ServiceModelMostconventionalfilesystemsprovideasingleclassofbest-effortservicetoallapplicationsregardlessoftheirrequirements.Asinglebest-effortserviceclassisinadequateformeetingthediverseQoSrequirementsofapplications(e.g.,real-timerequirementsofcontinuousmediaapplications).Toefficientlysupportapplicationswithdiverserequirements,anintegratedfilesystemshouldsupportmultipleserviceclasses,suchasperiodicreal-time,aperiodicreal-timeandbest-effort.Instantiatingmultipleserviceclasseswillrequirethefilesystemtoemploy:(1)adiskschedulingalgorithmthatisolatesserviceclassesfromeachotherandalignstheserviceprovidedwiththerequirementsoftheseclasses(e.g.,providelowresponsetimestobest-effortrequestsandmeetdeadlinesofreal-timecontinuousmediarequests),and(2)admissioncontrolalgorithmstoprovideperformanceguaranteestorequestswithintheperiodicandaperiodicreal-timeclasses.RetrievalArchitectureAnintegratedfilesystemcanserviceuserrequestsusingeithertheclient-pullortheserver-push(i.e.,streaming)architecture.Intheformercase,theserverretrievesdatafromdisksonlyinresponsetoexplicitreadrequestsfromclients,whileinthelattercase,theserverperiodicallyretrievesandtransmitsdatatoclientswithoutexplicitreadrequests.Typicallytextualapplicationsareservicedusingtheclient-pullarchitecture,whereas,duetotheirperiodicandsequentialnature,theserver-pusharchitectureisbettersuitedforcontinuousmediaapplications.Adaptingcontinuousmediaapplicationstotheclient-pullarchitectureisdifficult.8Also,theserver-pusharchitectureisinappropriateforsupportingaperiodictextualrequests.Hence,toefficientlysupportmultipledatatypes,anintegratedfilesystemwillneedtosupportboththeclient-pullandtheserver-pusharchitectures.PlacementTechniquesDuetothelargestoragespaceandbandwidthrequirementsofcontinuousmedia,integratedfilesystemswillusediskarraysforstoringdata.Toeffectivelyutilizethearraybandwidth,thefileservermustinterleave(i.e.,stripe)eachfileacrossdisksinthearray.Placementoffilesonsuchstripedarraysisgovernedbytwoparameters:(1)thestripeunitsizey,whichisthemaximumamountoflogicallycontiguousdatastoredonasingledisk,and(2)thestripingpolicy,whichmapsstripeunitstodisklocations.Bothparametersdependsignificantlyonthecharacteristicsofthedatatype.Forinstance,thelargebandwidthrequirementsofreal-timecontinuousmediayieldsastripeunitsizethatisanorderofmagnitudelargerthanthatfortext.13Useofasinglestripeunitsizeforalldatatypeseitherdegradesperformancefordatatypeswithlargebandwidthrequirements(e.g.,continuousmedia),orcausesinternalfragmentationinsmallfiles(e.g.,textualfiles).Similarly,wehaveshownthatapolicythatstripeseachfileacrossalldisksinthearrayissuitablefortextualdata,butyieldssub-optimalperformanceforvariablebit-ratecontinuousmedia.13Moreover,sincecontinuousmediafilescanhaveamulti-resolutionnature(e.g,MPEG-2encodedvideo),thestripingpolicyoptimizestheplacementofsuchfilesbystoringblocksbelongingtodifferentresolutionsadjacenttoeachotherondisk.14Suchcontiguousplacementofblockssubstantiallyreducesseekandrotationallatenciesincurredduringplayback.Nosuchoptimizationsarenecessaryfor“singleresolution”textualfiles.Sinceplacementtechniquesfordifferentdatatypesdiffersignificantly,toenhancesystemthroughput,anintegratedfilesystemshouldsupportmultipledatatypespecificstripeunitsizesandstripingpolicies.MetadatastructuresTypicallytextualfilesareaccessedasasequenceofbytes.Forcontinuousmediafiles,however,thelogicalunitofaccessisavideoframeoranaudiosample.Toaccessafileasasequenceoflogicalunits,anapplicationmusteithermaintainlogicalunitindices,ordynamicallygeneratethisinformationatrun-time.Developingsuchapplicationswouldbesimplifiedifthefilesystemweretosupportlogicalunitaccesstofiles.Suchsupportisalsorequiredtoimplementtheserver-pusharchitecturesincethefilesystemneedslogicalunitsizes(e.g.,framesizes)todeterminetheamountofdatathatmustbeaccessedonbehalfofclients.Consequently,anintegratedfilesystemmustmaintainmetadatastructuresthatallowbothlogicalunitandbytelevelaccesstofiles,therebyenablinganydatatypespecificstructuretobeassignedtofiles.FailureRecoveryTechniquesSincediskarraysarehighlysusceptibletodiskfailures,afileservermustemployfailurerecoverytechniquestoprovideunin-terruptedservicetoclientseveninthepresenceoffailures.15Diskfailurerecoveryinvolvestwotasks:on-linereconstruction,whichinvolvesrecoveringthedatastoredonthefaileddiskinresponsetoanexplicitrequestforthatdata;andrebuild,whichinvolvescreatinganexactreplicaofthefaileddiskonareplacementdisk.Conventionaltextualfilesystemsusemirroringorparity-basedtechniquesforexacton-linereconstructionofdatablocksstoredonthefaileddisk.15,16Somecontinuousmediaapplicationswithstringentqualityrequirementsalsorequiresuchexacton-linereconstructionoflostdata.However,formanycontinuousmediaapplications,approximateon-linereconstructionoflostdatamaybesufficient.Fortheseapplications,severalon-linefailurerecoverytechniquesthatexploitthespatialandtemporalredundanciespresentwithincontinuousmediadatatoreconstructareasonableapproximationofthedatastoredonthefaileddiskhavebeendeveloped.17UnlikemirroringandyAstripeunitisalsoreferredtoasamediablock.Weusethesetermsinterchangeablyinthispaper FileServerInterfaceStorageMetaDataDatatypeManagerManagerVideoAudioTextSpecificmodulemodulemoduleLayerFaultToleranceLayerResourceManagerDatatypeDiskBufferIndependentPRTQARTQBEQPRTQARTQBEQPRTQARTQBEQSubsystemSubsystemLayerScheduledQScheduledQScheduledQDisksServiceManagerServiceManagerServiceManager(a)TwolevelarchitectureofSymphony(b)DiskSubsystemArchitectureFigure3.ArchitectureofSymphonyparitybasedtechniques,thesetechniquesdonotrequireanyadditionaldatatobeaccessedforfailurerecovery,andtherebysignificantlyreducethefailurerecoveryoverheads.Hence,toenhancesystemutilization,anintegratedfilesystemmustsupportmultipledatatypespecificfailurerecoverytechniques.CachingTechniquesAcachingpolicysuchasLRUimprovesresponsetimesfortextrequests.ItiswellknownthatLRUperformspoorlyforsequentialdataaccesses,10andhence,isinappropriateforcontinuousmedia.Policiesthataretailoredforsequentialdataaccesses,suchasIntervalCaching,18aremoredesirableforcontinuousmedia,butareunsuitablefortextualdata.Sincenosinglepolicyissuitableforalldatatypes,anintegratedfilesystemwillneedtosupportmultipledatatypespecificcachingpolicies.Fromtheabovediscussion,weconcludethattoefficientlysupportdifferentdatatypes,anintegratedfilesystemshouldenablethecoexistenceofmultipledatatypespecifictechniques.Inwhatfollows,wedescribethedesignandimplementationofSymphony–aphysicallyintegratedfilesystemthatmeetstheserequirements.4.ARCHITECTUREOFSYMPHONYTheneedforsupportingmultipledatatypespecifictechniquesinafilesystemisakeychallenge.Symphonyaddressesthisbyemployingatwolayerarchitecture.ThelowerlayerofSymphony(datatypeindependentlayer)implementsasetofdatatypeindependentmechanismsthatprovidecorefilesystemfunctionality.Theupperlayer(datatypespecificlayer)consistsofasetofmodules,oneperdatatype,whichusethemechanismsprovidedbythelowerlayertoimplementdatatypespecifictechniquesforplacement,failurerecovery,caching,etc.Thelayeralsoexportsafileserverinterfacecontainingmethodsforreading,writing,andmanipulatingfiles.Figure3(a)depictsthisarchitecture.Suchatwolayerarchitectureprovidescleanseparationofthedatatypeindependentmechanismsfromthedatatypespecificpolicies.Thisallowsthefilesystemdesignertoeasilyaddmodulesfornewdatatypes,ormodifytechniquesimplementedinexistingmodules.Wenowdescribeeachoftheselayersindetail.5.THEDATATYPEINDEPENDENTLAYERThedatatypeindependentlayerofSymphonymultiplexesstoragespace,diskbandwidth,andbufferspaceamongdifferentdatatypes.Itconsistsofthreecomponents:thedisksubsystem,thebuffersubsystem,andtheresourcemanager(seeFigure3(a)).5.1.TheDiskSubsystemThedisksubsystemofSymphonyisresponsibleforefficientlymultiplexingstoragespaceanddiskbandwidthamongdifferentdatatypes.Itconsistsoffourcomponents(seeFigure3(b)):(1)aservicemanagerthatsupportsmechanismsforefficientschedulingofbest-effort,aperiodicreal-time,andperiodicreal-timerequests;(2)astoragemanagerthatsupportsmechanismsforallocationanddeallocationofblocksofdifferentsizes,aswellastechniquesforcontrollingtheirplacementonadiskarray;(3)afaulttolerancelayerthatenablesmultipledatatypespecificfailurerecoverytechniques;and(4)ametadatamanagerthatenablesdatatypespecificstructuretobeassignedtofiles.Thekeyfeaturesandthealgorithmsusedtoimplementthesecomponentsaredescribedbelow. 5.1.1.ServiceManagerThemainobjectiveofthiscomponentistosupportmultipleserviceclasses,andmeetthequalityofservicerequirementsofrequestswithineachclass.Theservicemanagersupportsthreeserviceclasses:best-effort,periodicreal-timeandaperiodicreal-time;andtwoservicearchitectures:client-pullandserver-push.Requestsinthebest-effortclassdonothaveanydeadlines,butdesirelowaverageresponsetimes.Periodicandaperiodicreal-timerequestshavedeadlinesthatmustbemetbytheservicemanager.Best-effortandaperiodicreal-timerequestscanarriveatarbitraryinstantsandareservicedusingtheclient-pullarchitecture.Periodicreal-timerequests,ontheotherhand,areservicedintermsofperiodicroundsusingtheserver-pusharchitecture.Requestsinthisclassarriveatthebeginningofeachroundandmustbeservicedbytheendoftheround(i.e.,allrequestshavetheendoftheroundastheirdeadline).Tomeettherequirementsofrequestsineachclass,theservicemanageremploysaQoS-awarediskschedulingalgorithm.19Thediskschedulingalgorithmexploitsthecharacteristicsofrequestswithineachclasswhileschedulingrequestsandalignstheserviceprovidedwithrequestneeds.Toillustrate,sincemeetingrequestdeadlinesismoreimportanttoreal-timeclientsthanresponsetimes,thediskschedulerdelaystheschedulingofreal-timerequestsuntiltheirdeadlinesandusestheavailableslacktoservicebest-effortrequests.Thus,bygivingprioritytobest-effortrequestswheneverpossible,thediskschedulingalgorithmprovideslowresponsetimetobest-effortrequestswithoutviolatingdeadlinesofreal-timerequests.Toisolateserviceclassesfromeachother,thediskschedulerassignsfractions,,and(++=1)totheperiodicreal-time,aperiodic123123real-time,andbest-effortserviceclasses,respectively,zandallocatesdiskbandwidthtoclassesinproportiontothesefractions.Thatis,thediskschedulerensuresthatatleastRdurationwithineachroundisspentinservicingrequestsofclassi,whereiRdenotesthedurationofaround.Ifallserviceclasseshavependingrequests,theneachclassisallocatedR.However,ifisomeserviceclasshasnooutstandingrequests,thentheunusedportionofitsallocationisredistributedtootherclasses.Toimplementthisdiskschedulingalgorithm,theservicemanagermaintainsfourqueuesperdisk:threependingqueues,oneforeachserviceclass,andascheduledqueue(seeFigure3(b)).Theoverallschedulingproceedsintermsofperiodicrounds.Newlyarrivingrequestsarequeuedupintheappropriatependingqueue.Pendingrequestsareeventuallymovedtothescheduledqueue,wheretheyareguaranteedtobeservicedbytheendofthecurrentround.Tomovearequestfromthependingqueuetothescheduledqueue,thediskschedulermust:(1)determinetheinsertpositioninthescheduledqueue,and(2)determineiftheserviceclasstowhichtherequestbelongshassufficientunusedallocationtoinsertthisrequest.Theinsertpositioninthescheduledqueueisdeterminedbasedonthefollowingprinciples:20Pendingperiodicandaperiodicreal-timerequestsareinsertedintothescheduledqueueinSCAN-EDForder.Thatis,real-timerequestsaremaintainedinthescheduledqueueinorderofincreasingdeadlines,andrequestswithsamedeadlinesaremaintainedinCSCANorder.Notethat,sinceallperiodicreal-timerequestshavethesamedeadline(endoftheround),theycanbeefficientlyservicedinCSCANorder.Abest-effortrequestisinsertedintothescheduledqueueatthefirstpositionwheretheavailableslackisatleastequaltothetimerequiredtoservicetherequest;asequenceofbest-effortrequestsisalwaysmaintainedinCSCANorder.Suchastrategyensuresthatbest-effortrequestsaregivenpriorityoverreal-timerequestswheneverpossible.Thus,atanyinstant,thescheduledqueueconsistsofasequenceofCSCANschedules,whereeachschedulecontainseitherreal-timerequestswiththesamedeadline,orbest-effortrequests.ApendingrequestisinsertedintothescheduledqueueonlyifitsserviceclasshasnotusedupitsallocateddurationofR.Topreciselyformulatethiscriteria,letUdenotethetimeusedupbyrequestsofclassiinthecurrentround,andletIiidenotethetimeforwhichthediskwasidle(i.e.,theschedulequeuewasempty)inthecurrentround.Todetermineifapendingrequestcanbeinsertedintothescheduledqueue,thediskschedulerfirstestimatestheservicetimeoftherequest(definedasthesumoftheseektime,rotationallatency,andtransfertime).LetusassumethattherequestistobeinsertedbetweenrequestsAandBinthescheduledqueue,andletdenotetheservicetimeoftherequest.Observethat,insertingtherequestbetweenAandBwillcausetheservicetimeofrequestBtochange(sincetheservicetimeofBwasbasedonseekandrotationallatencyfrom00AtoB,whichisnolongertrueaftertheinsertion).LetbethecurrentservicetimeofrequestB,andletbeitsservicetimeiftherequestisinserted.Then,thediskschedulerinsertsthependingrequestintothescheduledqueueonlyif:(1)newtheclasscontainingthependingrequesthassufficientunusedallocation,and(2)thetotalusedallocationdoesnotexceedtheroundduration.Thatis,U+(R,I)(1)iiand3X00U++(,)R,I(2)jnewj=1zThefractionsassociatedwithaserviceclassarespeciedatlesystemstartuptime.Theservicemanagercanalsomonitortheworkloadfromeachclassandadaptsthesefractionsaccordingly;techniquesfordoingsoarebeyondthescopeofthispaper. Afterinsertingtherequest,UisincrementedasU=U+.SincetheinsertioncausestheservicetimeofBtochange,theiii00usedallocationofitsclassisupdatedasU=U+(,).jjnewOnceaserviceclassusesupitsallocateddurationofR,thenoutstandingrequestsinitspendingqueueareservicedonlyiifsomeotherclassdoesnotuseupitsallocation.Thatis,ifthediskbecomesidle(thescheduledqueueisempty)andaserviceclassthathasusedupitsallocationhasoutstandingrequests,thenthediskschedulerinsertstheserequestsintothescheduledqueueoneatatime.RequestsareinsertedoneatatimesuchthatthediskschedulercanrevertbacktoservicingrequestsbasedonEquations(1)and(2)assoonasanewrequestbelongingtoaclasswithunusedallocationarrives.5.1.2.TheStorageManagerThemainobjectiveofthestoragemanageristoenablethecoexistenceofmultipleplacementpoliciesinthedatatypespecificlayer.Toachievethisobjective,thestoragemanagersupportsmultipleblocksizesandallowscontrolovertheirplacementonthediskarray.Toallocateblocksofdifferentsizes,thestoragemanagerrequirestheminimumblocksize(alsoreferredtoasthebaseblocksize)andthemaximumblocksizetobespecifiedatfilesystemcreationtime.Theseparametersdefinethesmallestandthelargestunitsofallocationsupportedbythestoragemanager.Thestoragemanagercanthenallocateanyblocksizewithinthisrange,providedthattherequestedblocksizeisamultipleofthebaseblocksize.Thestoragemanagerconstructseachrequestedblockbyallocatingasequenceofcontiguousbaseblocksondisk.Toallowcontrolovertheplacementofblocksonthearray,thestoragemanagerallowslocationhintstobespecifiedwitheachallocationrequest.Alocationhintconsistsofa(disknumber,disklocation)pairanddenotesthepreferredlocationforthatblock.Thestoragemanagerattemptstoallocateablockconformingtothespecifiedhint,butdoesnotguaranteeit.Ifunabletodoso,thestoragemanagerallocatesafreeblockthatisclosesttothespecifiedlocation.Ifthediskisfull,orifthestoragemanagerisunabletofindacontiguoussequenceofbaseblockstoconstructtherequestedblock,thentheallocationrequestfails.Theabilitytoallocateblocksofdifferentsizesandallowcontrolovertheirplacementhasthefollowingimplications:Byallowingalocationhinttobespecifiedwitheachallocationrequest,thestoragemanagerexposesthedetailsoftheunderlyingstoragemedium(i.e.,thepresenceofmultipledisks)totherestofthefilesystem.Thisisafundamentaldeparturefromconventionalfilesystemswhichusemechanismssuchaslogicalvolumestohidethepresenceofmultipledisksfromthefilesystem.Byprovidinganabstractionofasinglelargelogicaldisk,alogicalvolumemakesthefilesystemobliviousofthepresenceofmultipledisks.12Thisenablesfilesystemsbuiltforsinglediskstooperatewithoutanymodificationsonalogicalvolumecontainingmultipledisks.Thedisadvantage,however,isthatthefilesystemhasnocontrolovertheplacementofblocksondisks(sincetwoadjacentlogicalblockscouldbemappedbythevolumemanagertodifferentlocations,possiblyondifferentdisks).Incontrast,byexposingthepresenceofmultipledisks,thestoragemanagerallowsthedatatypespecificlayerprecisecontrolovertheplacementofblocks,albeitattheexpenseofhavingtoexplicitlymanagemultipledisks.Themechanismsprovidedbythestoragemanagerenableanyplacementpolicytobeimplementedinthedatatypespecificlayer.Forinstance,byappropriatelygeneratinglocationhints,aplacementpolicycanstripeafileacrossalldisksinthearray,oronlyasubsetofthedisks.Similarly,locationhintscanbeusedtoclusterblocksofafileondisks,therebyreducingseekandrotationallatencyoverheadsincurredinaccessingtheseblocks.Theplacementpolicycanalsotailortheblocksizeonaper-filebasis(dependingonthecharacteristicsofthedata)andmaximizediskthroughput.However,allowingalargenumberofblocksizestocoexistcanleadtofragmentation.Thestoragemanagerattemptstominimizetheeffectsoffragmentationbycoalescingadjacentfreeblockstoconstructlargerfreeblocks.21However,suchcoalescingdoesnotcompletelyeliminatefragmentationeffects,andhence,theflexibilityprovidedbythestoragemanagermustbeusedjudiciouslybyplacementpoliciesinthedatatypespecificlayer.Thiscanbeachievedbyrestrictingtheblocksizesusedbythesepoliciestoasmallsetofvalues.5.1.3.TheFaultToleranceLayerThemainobjectivesofthefaulttolerancelayeraretosupportdatatypespecificreconstructionofblocksintheeventofadiskfailure,andtorebuildfaileddisksontosparedisks.Toachievetheseobjectives,thefault-tolerancelayermaintainsparityinformationonthearray.Toenabledata-typespecificreconstruction,thefault-tolerancelayersupportstwomechanisms:(1)areliableread,inwhichparityinformationisusedtoreconstructblocksstoredonthefaileddisk,and(2)anunreliableread,inwhichparitybasedreconstructionisdisabled,therebyshiftingtheresponsibilityoffailurerecoverytotheclient.17Unlikereadrequests,paritycomputationcannotbedisabledwhilewritingblocks,sinceparityinformationisrequiredtorebuildfaileddisksontosparedisks. Disk0Disk1Disk2BaseblockParityblockDatablockofaparitygroupAnallocatedblockMaximum(containedcompletelyblocksizewithinadatablock)=Parity=DataFigure4.ParityplacementinthefaulttolerancelayerThekeychallengeindesigningthefault-tolerancelayeristoreconcilethepresenceofparityblockswithdatablocksofdifferentsizes.Thefault-tolerancelayerhidesthepresenceofparityblocksonthearraybyexportingasetoflogicaldisks,eachwithasmallercapacitythantheoriginaldisk.Thestoragemanagerthenconstructsablockbyallocatingasequenceofcontiguousbaseblocksonalogicaldisk.Tominimizeseekandrotationallatencyoverheads,werequirethatthissequencebestoredcontiguouslyonthephysicaldiskaswell.Sincethefault-tolerancelayeruniformlydistributesparityblocksacrossdisksinthearray(analogoustoRAID-516),theresultinginterleavingofparityanddatablockscancauseasequenceofcontiguousblocksonalogicaldisktobeseparatedbyintermediateparityblocks.Toavoidthisproblem,thefault-tolerancelayerusesaparityblocksizethatisequaltothemaximumblocksizethatcanbeallocatedbythestoragemanager(seeFigure4).Eachdatablockwithinaparitygroupnowcontainsasequenceofbaseblocks,allofwhicharecontiguousondisk.Byensuringthateachallocatedblockiscontainedwithinadatablockofaparitygroup,thestoragemanagercanensurethattheblockisstoredcontiguouslyondisk.5.1.4.TheMetaDataManagerThemetadatamanagerisresponsibleforallocatinganddeallocatingstructuresthatstoremetadatainformation,andallowsanydatatypespecificstructuretobeassignedtofiles.LikeintheUNIXfilesystem,22metadatastructuresareofafixedsizeandarestoredonareservedportionofthedisk.Eachmetadatastructurecontainsinformationsuchasthefileowner,filecreationtime,accessprotectioninformation,theblocksizeusedtostorethefile,thetypeofdatastoredinthefileandatwolevelindex.Leveloneoftheindexmapslogicalunits(e.g.,frames)tobyteoffsets,whereasleveltwomapsbyteoffsetstodiskblocklocations.Thisenablesafiletobeaccessedasasequenceoflogicalunits.Moreover,byusingonlythesecondleveloftheindex,bytelevelaccesscanalsobeprovidedtoclients.Thus,byappropriatelydefiningthelogicalunitofaccess,anydatatypespecificstructurecanbeassignedtoafile.Notethat,theinformationcontainedinmetadatastructuresisdatatypespecificinnature.Hence,themetadatamanagermerelyallocatesanddeallocatesmetadatastructures;theactualmetadataitselfiscreated,interpreted,andmaintainedbythedatatypespecificlayer.5.2.TheBufferSubsystemThemainobjectiveofthebuffersubsystemistoenablemultipledatatypespecificcachingpoliciestocoexist.Toachievethisobjective,thebuffersubsystempartitionsthecacheamongvariousdatatypesandallowseachcachingpolicytoindependentlymanageitspartition.Toenhanceutilization,thebuffersubsystemallowsthebufferspaceallocatedtocachepartitionstovarydynamicallydependingontheworkload.Toimplementsuchapolicy,thebuffersubsystemmaintainstwobufferpools:apoolofdeallocatedbuffers,andapoolofcachedbuffers.Thecachepoolisfurtherpartitionedamongvariouscachingpolicies.Bufferswithinacachepartitionaremaintainedbythecorrespondingcachingpolicyinalistorderedbyincreasingaccessprobabilities;thebufferthatisleastlikelytobeaccessedisstoredattheheadofthelist.Toillustrate,theLRUpolicymaintainstheleastrecentlyaccessedbufferattheheadofthelist,whiletheMRUpolicymaintainsthemostrecentlyaccessedbufferatthehead.Foreachbuffer,thecachingpolicyalsocomputesatimetoreaccess(TTR)metric,whichisanestimateofthenexttimeatwhichthebufferislikelytobeaccessed.23SincetheTTRvalueisinverselyproportionaltotheaccessprobability,thebufferattheheadofeachlisthasthemaximumTTRvalue,andTTRvaluesdecreasemonotonicallywithinalist.Onreceivingabufferallocationrequest,thebuffersubsystemfirstchecksiftherequestedblockiscached,andifso,returnstherequestedblock.Incaseofacachemiss,thebuffersubsystemallocatesabufferfromthepoolofdeallocatedbuffersandinsertsthisbufferintotheappropriatecachepartition.Thecachingpolicythatmanagesthepartitiondeterminesthepositionatwhichthebuffermustbeinsertedintheorderedlist. Wheneverthepoolofdeallocatedbuffersfallsbelowalowwatermark,buffersareevictedfromthecacheandreturnedtothedeallocatedpool.xThebuffersubsystemusesTTRvaluestodeterminewhichbuffersaretobeevictedfromthecache.Ateachstep,thebuffersubsystemcomparestheTTRvaluesofbuffersattheheadofeachlistandevictsthebufferwiththelargestTTRvalue.Ifthebufferisdirty,thenthedataiswrittenouttodiskbeforeeviction.Theprocessisrepeateduntilthenumberofbuffersinthedeallocatedpoolexceedsahighwatermark.5.3.TheResourceManagerThekeyobjectiveoftheresourcemanageristoreservesystemresources(i.e.,diskbandwidthandbufferspace)toprovideperformanceguaranteestoreal-timerequests.Toachievethisobjective,theresourcemanageruses:(1)aQoSnegotiationprotocolwhichallowsclientstospecifytheirresourcerequirements,and(2)admissioncontrolalgorithmsthatdeterminesifsufficientresourcesareavailabletomeettheQoSrequirementspecifiedbytheclient.TheQoSnegotiationprocessbetweentheclientandtheresourcemanagerusesatwophaseprotocol.Inthefirstphase,theclientspecifiesthedesiredQoSparameterstotheresourcemanager.TypicalQoSparametersspecifiedaretheamountofdataaccessedbyarequest,thedeadlineofarequestanddurationbetweensuccessiverequests.Dependingontheserviceclassoftheclient(i.e.,periodicreal-timeoraperiodicreal-time),theresourcemanagertheninvokestheappropriateadmissioncontrolalgorithmtodetermineifithassufficientdiskbandwidthandbufferspacetoservicetheclient.TheadmissioncontrolalgorithmreturnsasetofQoSparameters,whichindicatetheresourcesthatareavailabletoservicetheclientatthecurrenttime.Dependingontheavailableresources,theQoSparametersreturnedbytheadmissioncontrolalgorithmcanbelessthanorequaltotherequestedQoS.Theresourcemanagertentativelyreservestheseresourcesfortheclientandreturnstheresultsoftheadmissioncontrolalgorithmtotheclient.Inthesecondphase,dependingonwhethertheQoSparametersareacceptabletotheclient,iteitherconfirmsorrejectstheseparameters.Intheformercase,thetentativelyreservedresourcesarecommittedfortheclient.Inthelattercase,resourcesarefreedandthenegotiationprocessmustberestarted,eitherwithareducedQoSrequirement,oratalatertime.Iftheresourcemanagerdoesnotreceiveeitheraconfirmationorrejectionfromtheclientwithinaspecifiedtimeinterval,itreleasestheresourcesthatwerereservedfortheclient.Thispreventsmaliciousorcrashedclientsfromholdingupunusedresources.OnceQoSnegotiationiscomplete,theclientcanbeginreadingorwritingthefile.Thereservedresourcesarefreedwhentheclientclosesthefile.Dependinguponthenatureoftheguaranteesprovided,admissioncontrolalgorithmsproposedintheliteraturecanbeclassifiedaseitherdeterministicorstatistical.4,7,24Deterministicadmissioncontrolalgorithmsmakeworstcaseassumptionsaboutresourcesrequiredtoserviceaclientandprovidedeterministic(i.e.,hard)guaranteestoclients.Incontrast,statisticaladmissioncontrolalgorithmsuseprobabilisticestimatesaboutresourcerequirementsandprovideonlyprobabilisticguarantees.Thekeytradeoffbetweendeterministicandstatisticaladmissioncontrolalgorithmsisthatthelatterleadstobetterutilizationofresourcesthantheformerattheexpenseofweakerguarantees.Duetospaceconstraints,wedonotdiscussadmissioncontrolalgorithmsinthispaper.6.THEDATATYPESPECIFICLAYERThedatatypespecificlayerconsistsofasetofmodulesthatusethemechanismsprovidedbythedatatypeindependentlayertoimplementpoliciesoptimizedforaparticulardatatypes.Thelayeralsoexportsafileserverinterfacecontainingmethodsforreading,writing,andmanipulatingfiles.25Eachmoduleimplementsadatatypespecificversionofthesemethods,therebyenablingapplicationstocreateandmanipulatefilesofthatdatatype.Inwhatfollows,wedescribedatatypespecificmodulesfortwodatatypes,namelyvideoandtext.6.1.TheVideoModuleThevideomoduleimplementspoliciesforplacement,retrieval,metadatamanagement,andcachingofvideodata.Beforedescribingthesepolicies,letusfirstexaminethestructureofavideofile.Digitizationofvideoyieldsasequenceofframes.Sincethesizeofaframeisquitelarge(atypicalframeis300KBinsize),digitizedvideodataisusuallycompressedpriortostorage.Compressedvideodatacanbemulti-resolutioninnature,andhence,eachvideofilecancontainoneormoresub-streams.Forinstance,anMPEG-1encodedvideofilealwayscontainsasinglesub-stream,whileMPEG-2encodedfilescancontainmultiplesub-streams.26Dependingonthedesiredresolution,onlyasubsetofofthesesub-streamsneedtoberetrievedduringvideoplayback;allsub-streamsmustberetrievedforplaybackatthehighestresolution.Thevideomodulesupportsvideofilescompressedusingavarietyofcompressionalgorithms.Thisispossiblesincethevideomoduledoesnotmakeanycompression-specificassumptionsaboutthestructureofvideofiles.Eachfileisallowedtocontainanynumberofsub-streams,andeachframeisallowedbearbitrarilypartitionedamongthesesub-streams.Hence,axAnalternativeistoevictcachedbuersonlyondemand,therebyeliminatingtheneedforthedeallocatedpool.However,thiscanslowdownthebuerallocationroutine,sincethebuertobeevictedmaybedirtyandwouldrequireadiskwritebeforetheeviction.Maintainingasmallpoolofdeallocatedbuersenablesfastbuerallocationwithoutanysignicantreductioninthecachehitratio. sub-streamcancontainallthedatafromaparticularframe,afractionofthedata,ornodatafromthatframe.Suchafilestructureisgeneralandencompassesfilesproducedbymostcommonlyusedcompressionalgorithms.Assumingthisstructureforavideofile,wenowdescribeplacement,retrieval,metadatamanagementandcachingpoliciesforvideodata.6.1.1.PlacementPolicyPlacementofvideofilesondiskarraysisgovernedbytwoparameters:theblocksizeandthestripingpolicy.Thevideomodulesupportsbothfixed-sizeblocks(eachofwhichcontainsafixednumberofbytes)andvariable-sizeblocks(eachofwhichcontainsafixednumberofframes).Fixed-sizeblocksaremoresuitableforenvironmentswithfrequentwritesanddeletes,whereasvariable-sizeblocksincurlowerseekandrotationallatencyoverheadsandenableafileservertoemployloadbalancingpolicies.9Ineithercase,thespecificblocksizetobeusedcanbespecifiedbytheclientatfilecreationtime(adefaultvalueisusediftheblocksizeisunspecified).Thevideomodulethenusestheinterfacesexportedbythestoragemanagertoallocateblocksofthespecifiedsize.Whiletheblocksizeisknownaprioriforfixed-sizeblocks,itcanchangefromoneblocktoanotherforvariable-sizeblocks.Inthelattercase,thevideomoduledeterminesthetotalsizeofthenextfframeswithinasub-stream(assumingthateachvariable-sizeblockcontainsfframes),roundsitupwardstoamultipleofthebaseblocksize,andrequestsablockofthatsizefromthestoragemanager.Sincethesizeoffframesmaynotbeanexactmultipleofthebaseblocksize,topreventinternalfragmentation,thevideomodulestoressomedatafromthenextfframesintheunusedspaceinthecurrentvariable-sizeblock.Hence,accessingavariable-sizeblockcausesthisextradatatoberetrieved,whichisthencachedbythevideomoduletoservicefuturereadrequests.Toeffectivelyutilizethearraybandwidth,thevideomodulestripeseachsub-streamacrossdisksinthearray.Ifsub-streamsarestoredonthearrayintermsofvariable-sizeblocks,thenthemodulestripeseachsub-streamacrossallthedisksinthearray.Ontheotherhand,whensub-streamsarestoredonthearrayintermsoffixed-sizeblocks,thestripingpolicydependsonthearraysize.Forsmalldiskarrays,eachsub-streamisstripedacrossalldisksinthearray.However,suchapolicydegradesperformanceforlargediskarrays.Hence,largearrays(consistingoffewtensofdisksormore)arepartitionedintosub-arraysandeachfileisstripedacrossasinglesub-array.13Sincethestoragemanagerallowsthedisknumbertobespecifiedwiththehint,suchastripingpolicycanbeeasilyimplementedbygeneratingappropriatelocationhints.Thestripingpolicyalsooptimizestheplacementofmulti-resolutionvideofilesonthearray.Thisisachievedbystoringblocksofdifferentsub-streamsthatarelikelytobeaccessedtogetheradjacenttoeachotherondisk.14Suchcontiguousplacementsignificantlyreducesseekandrotationallatencyoverheadsincurredduringvideoplayback.Thesemulti-resolutionoptimizationscanbeeasilyimplementedbygeneratingappropriatelocationhints.6.1.2.RetrievalPolicyThevideomoduleusestheinterfaceprovidedbytheservicemanagertosupportbothperiodicreal-timeandaperiodicreal-timerequests.Periodicreal-timerequestsaresupportedinthesever-pushmode,whileaperiodicreal-timerequestsareservicedintheclient-pullmode.Sinceperiodicreal-timerequestsareservicedbythediskschedulerintermsofperiodicrounds,suchrequestsareissuedbythevideomoduleatthebeginningofeachround.Foreachperiodicreal-timeclient,thevideomodulegeneratesalistofblockstobereadorwrittenandinsertsthemintheperiodicreal-timequeueoftheservicemanageratthebeginningofaround.Incontrast,foraperiodicreal-timeclients,thevideomodulewaitsforanexplicitrequestfromtheclientbeforeretrievingdata.Suchrequestsarriveatarbitraryinstantsandareinsertedonarrivalintotheaperiodicreal-timequeueoftheservicemanager.6.1.3.MetadataManagementSymphonyallowsanydatatypespecificstructuretobeassignedtofiles.Sincethelogicalunitofaccessforvideoisaframe,thevideomoduleallowseachfiletobeaccessedasasequenceofframes.Eachfilecanalsobeaccessedasasequenceofbytestosupportapplicationsthatrequireabytestreaminterface.Toallowefficientrandomaccessatthebytelevelandtheframelevel,themodulemaintainsatwolevelindexstructure.Thefirstleveloftheindex,referredtoastheframemap,mapsframeoffsetstobyteoffsets,whilethesecondlevel,referredtoasthebytemap,mapsbyteoffsetstodiskblocklocations.Figure5illustratestheindexstructure.Whereasbothlevelsoftheindexareusedduringframe-levelaccess,onlythebytemapisusedduringbyte-levelaccess.Thetwolevelstructurepermitsefficientrandomaccessforfixed-sizeblocks.However,supportingrandomaccessforvariable-sizeblocksisnotstraightforward,sincevariableblocksizescomplicatethemappingfrombyteoffsetstoblocklocationsinthebytemap.Recallthat,eachvariable-sizeblockconsistsofasequenceofbaseblockswhichareoffixedsize.Hence,bymaintainingamappingfrombyteoffsetstoblocklocationsforbaseblocks(insteadofvariable-sizeblocks),itispossibletosupportefficientrandomaccessforvariable-sizeblocksaswell.Thetradeoffthoughistheincreasedstoragespacerequirementforthebytemap. FramemapBytemapFrame#forsub−stream2ByteoffsetBlocklocationFrame#−>ByteoffsetBlock#−>BlocklocationFigure5.Theindexstructureforthevideoinode.Assumingthatavideolecontainsnsub-streams,boththeframemapandthebytemapcontainasequenceofn-tuples.Eachtupleintheframemaprepresentsaframe,andththeieldofatupledenotesthelocation(i.e.,byteoset)ofthatframeinsub-streami.Eachtupleinthebytethmaprepresentsablock,andtheieldofatupledenotesthelocationoftheblockinsub-streami.6.1.4.CachingPolicySincevideoaccessesaresequential,cachingpoliciessuchasLRUareineffectiveforvideofiles.10Recently,theIntervalCachingpolicywasproposedforcachingvideoblocks.Thepolicycachestheintervalbetweentwoclientsaccessingthesamefile,therebyservingrequestsofthetrailingclientfromthecache.Givenafixedamountofbufferspace,thepolicymaximizesthenumberofcachedintervals(andhence,utilization)bycachingintervalsinincreasingorderofsizes.18ThevideomoduleusestheIntervalCachingpolicytocachevideoblocks.Blocksofafilethatareaccessedbyasingleclientarenevercached(i.e.,theyarereturnedtothedeallocatedpoolafteruse).Whenasecondclientstartsaccessingafile,thenthevideomodulebeginscachingblocksbeingaccessedbythefirstclientinitscachepartition.Thetrailingclientmustaccesstheinitialportionofthefilefromdisk(sincethoseblocksweren'tcached).Allsubsequentaccesses,however,areservicedfromthecache.6.2.TheTextModuleThepoliciesimplementedbythetextmoduleareverysimilartothoseemployedbyconventionalUNIXfilesystems.27,22Forinstance,thetextmodulesupportsonlybest-effortrequests,allofwhichareservicedintheclient-pullmode.Theplacementpolicyemployedbythetextmodulesupportsonlyfixed-sizeblocks,andstripessuccessiveblocksofafileontoconsecutivedisksinaround-robinmanner.Thetextmodulesupportsonlyabyte-levelaccesstoeachfile.Todoso,itmaintainsabytemapforeachtextfile,whichissimilartotheUNIXinode.Finally,likeinUNIX,thetextmoduleusesanLRUpolicytocachetextblocks.7.EXPERIMENTALEVALUATIONOFTHESYMPHONYPROTOTYPEWehaveimplementedaprototypeofSymphonybasedonthedesignproposedinthispaper.TheprototypeimplementationofSymphonyrunsasasinglemulti-threadedprocessinuserspaceandaccessesdisksasrawdevices.Wehaveusedtheprototypeto:(1)demonstratetheefficacyofthediskschedulingalgorithmemployedbySymphony;and(2)measuretheperformanceoftextandvideoclientsinthefault-freestateandinthepresenceofdiskfailures.Thetest-bedforourexperimentsconsistsofaclusterofSunworkstationsconnectedbyanATMnetwork.TheSymphonyprototyperunsonadual-processorUltraSparc(Model2700)thathas128MBofRAMandrunsSolaris2.5.1.Thestoragemediumusedfortheserverconsistsoffour2.1GBSeagateBarracudadisks(ModelST12450W)connectedtotheUltraSparcviaafastwideSCSIinterface.SymphonyapplicationprogramsrunonaclusteroffourSparc-20andSparc-5workstations,allofwhichrunSolaris2.5.Allmachinesareconnectedtoa155Mb/sForeATMswitch(ModelASX-200)usingATMadaptercards.Inwhatfollows,wedescribeourexperimentsandanalyzeourresults.7.1.PerformanceofTextandVideoClientsRecallfromSection5.1.1that,thediskschedulingalgorithmdelaystheservicingofreal-timerequestsuntiltheirdeadlinesandusestheavailableslacktoservicebest-effortrequests.Bygivingprioritytotextrequestswheneversufficientslackisavailable,thediskschedulerprovideslowresponsetimestotextrequests.Todemonstratethisbehavior,wecompiledtwoversionsoftheprototype,onewhichusedtheschedulingalgorithmdescribedinSection5.1.1andtheotherwhichusedtheCSCANdiskschedulingalgorithm.Inbothcases,wepopulatedtheserverwithalargenumberoftextandvideofiles.Eachtextfilewas64KB Numberofdisks=4(b)Numberofdisks=4100400Symphonydiskscheduler10clients90CSCANdiskscheduling15clients35020clients80300702506020050150Textresponsetime(ms)Busytimeperround(msec)401003050024681012146080100120140160180200220240260NumbervideoclientsBlocksize(KB)(a)(b)Figure6.(a)Responsetimesseenbytextrequests.Byexploitingthesemanticsofthedatatypes,theSymphonydiskschedulerisabletoprovidebetterresponsetimestotextrequeststhanCSCAN.(b)Performanceofvideoclientsinsizeandwasstripedusingablocksizeof4KB.Eachvideofilewas6.5MBinsizeandcontained1000MPEG-1compressedframes,stripedusingablocksizeof64KB.{Theplaybackratefortheeachvideofilewas30frames/sandtheaveragebitratewas1.5Mb/s.Weassignedfractionsof=0:6;=0:05,and=0:35totheperiodicreal-time,theaperiodicreal-time123andbest-effortclasses,respectively,inthediskscheduler.Thedurationofaroundwassetto1second.Forbothversionsoftheprototype,wevariedthenumberofvideoclientsandmeasuredtheresponsetimeseenbytextrequests.Eachvideoclientinourexperimentswasamodifiedversionofmpegplayandretrievedarandomlyselectedfileintheperiodicreal-timemode.Eachtextclientreadarandomlyselectedtextfileinsequentialorderusing8KBrequests.Figure6(a)plotstheaverageresponsetimefora8KBrequestobservedinthetwocases.ThefigureshowsthattheSymphonydiskschedulerprovidesbetterresponsetimestotextrequeststhanCSCAN.Thisisbecause,CSCANservicesrequestsintheorderoftheirdiskcylindernumbers.Sinceitinterleavestextandvideorequestsbasedonthisordering,theresponsetimesseenbytextrequestsincreaseswithincreaseinnumberofvideorequests.Incontrast,theSymphonydiskscheduleralwaysgivesprioritytotextrequestsregardlessofthenumberofvideorequests(providedsufficientslackisavailable).Moreover,unusedallocationoftheperiodicreal-timeclassisreassignedtothebest-effortclass.Thisresultsinbetterresponsetimestotextrequests,andtheresponsetimedegradesonlyslightlywithincreaseinnumberofvideoclients.Todemonstratethattheimprovementintheresponsetimefortextrequestsdidnotcomeattheexpenseofdeadlineviolationsforvideorequests,werepeatedtheaboveexperimentbyvaryingtheblocksizeusedforeachvideofile,andmeasuredtheservicetimeofdisks(i.e.,thedurationforwhichadiskwasbusyineachround).Figure6(b)depictstheservicetimeofadiskfordifferentnumberofvideoclientsanddifferentblocksizes.Itshowsthattheservicetimeofthediskiswithin600ms,whichisthedurationofeachroundallocatedtoperiodicreal-timerequests(since=0:6andtherounddurationis1s).Hence,the1diskschedulerisabletomeetthereal-timerequirementofvideoclients.7.2.PerformanceinthepresenceofdiskfailureThefault-tolerancelayerallowsmultiplefaulttolerancepoliciestocoexistwithinthefilesystem.Specifically,itallowsclientstoenableordisableparity-basedreconstruction(usingreliableorunreliablereads)forrecoveringdata.Sinceentireparitygroupmustberetrievedtoreconstructablockrequestedfromthefaileddisk,paritybasedreconstructionimposesalarge(100%)overheadontheserver15;nosuchoverheadisimposedwhenparity-basedreconstructionisdisabled.Todemonstratethisfact,weconfiguredtheprototypetoassumethatoneofthedisksinthearrayhadfailed.Wevariedthenumberofvideoclientsandmeasuredtheloadontheserver(intermsoftheservicetimeofadisk)withparity-basedreconstructionenabled.Next,werepeatedtheexperimentwithparity-basedreconstructiondisabled.Asexpected,theservicetimeofdiskwithparity-basedreconstructionenabledwastwiceofthatwithparity-basedreconstructiondisabled.Thus,disablingparity-basedreconstructionsignificantlyreducestheloadontheserver.Thetradeoffthoughisthatthisoptionrequiressophisticatedclientsthatcanexploitredundanciesinvideodatatoapproximatelyreconstructlostdata.Also,approximatereconstructioncausesadegradationinimagequality.However,studieshaveshownthatthedegradedqualityiswithinhumanperceptualtolerancesformanycommonlyusedcompressionalgorithms(e.g.,JPEGandMPEG).178.RELATEDWORKSeveraltechniquesforthestorageandretrievalofcontinuousmediathathavebeenproposedintheliterature.2{6,28Symphonybuildsuponthesetechniquesandintegratesthemintoageneralpurposefilesystem.Forinstance,theintervalcachingpolicy{Thedataforthevideoleswasobtainedbydigitizingseveraltelevisionsitcoms,newscastsandsportsevents. 4disks,Singlediskfailure,64KBblocksize180Paritybasedrecovery160Parityrecoverydisabled1401201008060Busytimeperround(msec)4020011.522.533.544.55NumberofclientsFigure7.Reliablereadsversusunreliablereads.Thegureshowsthatdisablingparity-basedreconstructionsignicantlylowerstheloadontheserver.employedbythevideomodulewasproposedbyDanet.al.18Theefficacyofusingvariable-sizeblocksforstoringvideofileshasbeendemonstratedbyseveralstudies.3,29Techniquesforapproximatereconstructionofimageandvideodata(e.g.,JPEGandMPEG)havebeenproposedbyVinet.al.17andDanskinet.al.30AdmissioncontrolalgorithmshavebeendiscussedbyRanganet.al.4andVinet.al.7Severalrecentandongoingresearcheffortshavefocussedonbuildingintegratedfilesystems.Forinstance,Fellini31andCMFS2arefilesystemsthatcanhandlebothreal-timecontinuousmediadataandnon-realtimetextualdata.Botharesinglediskfilesystemsanddonotemploymulti-diskoptimizationssuchasstriping.Similarly,MMFSisasinglediskfilesystemthataddscontinuousmediasupporttoaFreeBSD-basedfilesystem.32TheTigerSharkfilesystemfromIBMandXFSfromSGIareresultsofcommercialeffortstobuildintegratedfilesystems.33,12ThesefilesystemscomeclosesttoSymphonyintermsofthefeaturesoffered.Forinstance,thesefilesystemssupportvariable-sizeblocks(referredtoasextents),guaranteedrateI/O,etc.However,theydonotsupportfeaturessuchasaQoS-awarediskscheduler,datatypespecificplacement,failurerecovery,andcachingpolicies.Moreover,theystaticallypartitionthestoragespaceavailableonthediskarrayusinglogicalvolumes.Logicalvolumescaneitherspanamutuallyexclusivesetofdisks,orsharethesamesetofdisksbystaticallypartitioningthespaceoneachdisk.Intheformercase,bothstoragespaceanddiskbandwidthgetstaticallypartitionedamonglogicalvolumes,whileinthelattercaseonlythestoragespaceisstaticallypartitionedandthediskbandwidthisdynamicallysharedbylogicalvolumes.Incontrast,Symphonyisaphysicallyintegratedfilesystem,inwhichbothstoragespaceanddiskbandwidtharedynamicallysharedamongdatatypes.Finally,thelogicaldiskabstraction34providesaninterfacethatallowsmultiplefilesystemstocoexistonasinglestoragedevice.LogicaldisksprovidefunctionalitiessimilartothoseprovidedbythedatatypeindependentlayerofSymphony,suchasmultipleblocksizes,locationhints,etc.However,akeydifferencebetweenlogicaldisksandSymphonyisthattheformerdoesnotdifferentiatebetweenrequesttypes,andconsequentlyprovidesonlyabest-effortservicemodel.Incontrast,Symphonyemploysmultipleserviceclassesthatenableittoefficientlysupportreal-timecontinuousmediarequestsaswellasnon-realtimerequests.9.CONCLUDINGREMARKSInthispaper,wediscussedvariousarchitecturalconsiderationsindesigninganintegratedmultimediafilesystemandtheirtradeoffs.Wearguedthat,toefficientlysupportstorageandretrievalofheterogeneousdatatypes,integratedfilesystemsshouldenabletheco-existenceofmultipledatatypespecificpolicies.WethenpresentedthedesignofSymphony–anintegratedmultimediafilesystemthatachievesthisobjective.ThearchitectureofSymphonyconsistsoftwolayers.Thelowerlayerprovidesasetofdatatypeindependentmechanismsthatimplementcorefilesystemfunctionality.Theupperlayerusesthesemechanismstoimplementdatatypespecificpolicies,andexportsafileserverinterfacecontainingmethodsforreading,writing,andmanipulatingfiles.SomeofthenovelfeaturesofSymphonyinclude:supportformultipleserviceclasses;aQoS-awarediskschedulingalgorithm;andsupportfordatatypespecificplacement,failurerecovery,metadatamanagement,andcachingtechniques.Aspartoffuturework,weplantoaddsupportfornewdatatypessuchasaudioandmulti-resolutionimages.Wealsoplantoaddsupportforeditoperationsandimplementfeaturessuchascopy-on-writethatallowfastcopyingoflargemultimediafiles.Weexpectthatourtwo-layerarchitecturewillsimplifythetaskofaddingthesefeaturestoSymphony. ACKNOWLEDGMENTSRenuTewaricontributedtothedesignandimplementationofthebuffersubsystem.WethankMikeDahlinandRenuTewarifortheircommentsonearlierdraftsofthepaper.REFERENCES1.M.K.McKusick,W.N.Joy,S.J.Leffler,andR.S.Fabry,“AfastfilesystemforUNIX,”ACMTransactionsonComputerSystems2(3),pp.181–197,August1984.2.D.Anderson,Y.Osawa,andR.Govindan,“Afilesystemforcontinuousmedia,”ACMTransactionsonComputerSystems10,pp.311–337,Nov.1992.3.P.W.Jardetzky,NetworkFileServerDesignforContinuousMedia.PhDthesis,UniversityofCambridge,August1992.4.P.V.RanganandH.Vin,“Designingfilesystemsfordigitalvideoandaudio,”inProceedingsofthe13thSymposiumonOperatingSystemsPrinciples(SOSP'91),OperatingSystemsReview,Vol.25,No.5,pp.81–94,Oct.1991.5.F.Tobagi,J.Pang,R.Baird,andM.Gang,“StreamingRAID:Adiskstoragesystemforvideoandaudiofiles,”inProceedingsofACMMultimedia'93,Anaheim,CA,pp.393–400,August1993.6.M.Vernick,C.Venkatramini,andT.Chiueh,“AdventuresinbuildingtheStonyBrookvideoserver,”inProceedingsofACMMultime-dia'96,1996.7.H.M.Vin,P.Goyal,A.Goyal,andA.Goyal,“Astatisticaladmissioncontrolalgorithmformultimediaservers,”inProceedingsoftheACMMultimedia'94,SanFrancisco,pp.33–40,October1994.8.S.S.Rao,H.M.Vin,andA.Tarafdar,“Comparativeevaluationofserver-pushandclient-pullarchitecturesformultimediaservers,”inProceedingsofNOSSDAV'96,pp.45–48,April1996.9.H.Vin,S.Rao,andP.Goyal,“Optimizingtheplacementofmultimediaobjectsondiskarrays,”inProceedingsoftheSecondIEEEInternationalConferenceonMultimediaComputingandSystems,Washington,D.C.,pp.158–165,May1995.10.P.Cao,ApplicationControlledFileCachingandPrefetching.PhDthesis,PrincetonUniversity,1996.11.R.H.Pattersonandet.al.,“Informedprefetchingandcaching,”inProceedingsofthe15thACMSymposiumonOperatingSystemsPrinciples,December1995.12.M.HoltonandR.Das,“XFS:Anextgenerationjournalled64-bitfilesystemwithguaranteedratei/o,”tech.rep.,SiliconGraphics,Inc,Availableonlineashttp://www.sgi.com/Technology/xfs-whitepaper.html.13.P.J.ShenoyandH.M.Vin,“Efficientstripingtechniquesformultimediafileservers,”inProceedingsofNOSSDAV'97,St.Loius,MO,pp.25–36,May1997.14.P.J.ShenoyandH.M.Vin,“Efficientsupportforscanoperationsinvideoservers,”Tech.Rep.TR96-35,DepartmentofComputerSciences,Univ.ofTexasatAustin,December1996.15.P.M.Chen,E.K.Lee,G.A.Gibson,R.H.Katz,andD.A.Patterson,“Raid:High-performance,reliablesecondarystorage,”ACMComputingSurveys,pp.145–185,June1994.16.D.Patterson,G.Gibson,andR.Katz,“Acaseforredundantarrayofinexpensivedisks(RAID),”ACMSIGMOD'88,pp.109–116,June1988.17.H.M.Vin,P.J.Shenoy,andS.Rao,“Efficientfailurerecoveryinmulti-diskmultimediaservers,”inProceedingsofthe25thInternationalSymposiumonFaultTolerantComputingSystems,Pasadena,CA,pp.12–21,June1995.18.A.DanandD.Sitaram,“Ageneralizedintervalcachingpolicyformixedinteractiveandlongvideoworkloads,”inProceedingsofMultimediaComputingandNetworking(MMCN)Conference,pp.344–351,1996.19.P.J.ShenoyandH.M.Vin,“Cello:Adiskschedulingframeworkfornextgenerationoperatingsystems,”tech.rep.,DepartmentofComputerSciences,Univ.ofTexasatAustin,1997.http://www.cs.utexas.edu/users/dmcl.20.A.N.ReddyandJ.Wyllie,“DiskschedulinginmultimediaI/Osystem,”inProceedingsofACMMultimedia'93,Anaheim,CA,pp.225–234,August1993.21.D.E.Knuth,TheArtofComputerProgrammingVolume1:FundamentalAlgorithms,AddisonWesley,1973.22.S.J.Leffler,M.K.McKusick,M.J.Karels,andJ.S.Quartermann,TheDesignandImplementationofthe4.3BSDUnixOperatingSystem,AddisonWesley,1989.23.R.Tewari,H.M.Vin,A.Dan,andD.Sitaram,“Cachinginbandwidthandspaceconstrainedhierarchicalhyper-mediaservers,”Tech.Rep.TR96-30,DepartmentofComputerSciences,Univ.ofTexasatAustin,December1996.24.H.M.Vin,A.Goyal,andP.Goyal,“Algorithmsfordesigninglarge-scalemultimediaservers,”ComputerCommunications18,pp.192–203,March1995.25.P.J.Shenoy,P.Goyal,S.Rao,andH.M.Vin,“DesignandimplementationofSymphony:Anintegratedmultimediafilesystem,”Tech.Rep.TR97-09,Dept.ofComputerSciences,Univ,ofTexasatAustin,March1997.26.InternationalOrganisationforStandardisation,InformationTechnology-GenericCodingofMovingPicturesandAssociatedAudioSystems:Systems,VideoandAudio,InternationalStandard(MPEG2),ISO/IEC13818,,November1994.27.M.J.Bach,TheDesignoftheUnixOperatingSystem,PrenticeHall,1986. 28.M.Buddhikot,G.Parulkar,andJ.Cox,“Designofalargescalemultimediastorageserver,”JournalofComputerNetworksandISDNSystems,pp.504—524,Dec1994.29.E.ChangandA.Zakhor,“Scalablevideoplacementonparalleldiskarrays,”inProceedingsofIS&T/SPIEInternationalSymposiumonElectronicImaging:ScienceandTechnology,SanJose,February1994.30.J.M.Danskin,G.M.Davies,andX.Song,“Fastlossyinternetimagetransmission,”inProceedingsoftheThirdACMConferenceonMultimedia,SanFrancisco,California,pp.321–332,November1995.31.C.Martin,P.S.Narayan,B.Ozden,R.Rastogi,andA.Silberschatz,“TheFellinimultimediastorageserver,”MultimediaInformationStorageandManagement,EditorS.M.Chung,KluwerAcademicPublishers,1996.32.T.Niranjan,T.Chiueh,andG.Schloss,“Implementationandevaluationofamultimediafilesystem,”inProceedingsofICMCS'97,Ottawa,Canada,1997.33.R.HaskinandF.Schmuck,“TheTigerSharkfilesystem,”ProceedingsofCOMPCON,Spring1996.34.W.Jonge,M.F.Kaashoek,andW.C.Hsieh,“Thelogicaldisk:Anewapproachtoimprovingfilesystems,”inProceedingsoftheFourteenthSymposiumonOperatingSystemsPrinciples,1993.
此文档下载收益归作者所有