资源描述:
《Data Mining Techniques for Effective and Scalable Traffic Analysis》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、DataMiningTechniquesforEffectiveandScalableTrafficAnalysisM.Baldi,E.Baralis,F.RissoDipartimentodiAutomaticaeInformatica-PolitecnicodiTorinoCorsoDucadegliAbruzzi,2410129Torino,Italy{mario.baldi,elena.baralis,fulvio.risso}@polito.itAbstractThispaperdescribesa
2、novelapproachtotrafficanalysisinhighspeednetworksbasedondataminingtechniques.Dataminingtechniquesarehereappliedasameanstoeffectivelyprocessthesignificantamountofcaptureddata.Thepaperprovidesafirstevaluationoftheproposedapproachintermsofitsabilityofextractin
3、grelevantinformationanditscomputationalrequirements.Suchevaluationisbasedonexperimentsrunonaprototypalimplementationoftheproposedapproach.KeywordsTrafficAnalysis,NetworkMonitoring,DataMining1.IntroductionOneofthemostcriticalissuesinkeepinganetworkundercontr
4、oliscapturingandanalyzingitstraffic.Thecomplexityofthesetasksisincreasingasnetworksbecomefasterandfaster.MajorproblemsstemfromtheCPUpowerneededtoprocesscapturednetworktrafficandthestoragerequirementsofhistoricaldata.Often,trafficcapturingandanalysisgoesthro
5、ughthestepsdepictedinFigure1,allofwhicharecriticalwhenoperatingathighdatarates.Somelimitedprocessing(e.g.associatingeachpackettoitscorrespondingflow)iscarriedoutinreal-timeimmediatelyduringthecapturesession.Then,resultscanbestoredonadisktobefurtherelaborate
6、dwithoff-linetools,whichdonotsufferthelimitationsstemmingfromreal-timeprocessing.Ad-hocsolutionsbasedonadvancedhardware(e.g.thenetworkinterfacecardsprovidedbyEndace[16])andtheuseofSMPworkstationsorevenclusterscanmitigatetheproblemsrelatedtoon-linemonitoring
7、andanalysis(thefirststepsinFigure1).However,nostraightforwardsolutionexiststoreducethecriticalitiesofthesubsequentsteps.Forinstance,a10Gbpspipecarriesmorethan100TBytesinthecourseofaday,whichisatremendousamountofdatatobestoredforsubsequentprocessing.Thisresu
8、ltsintwoproblems:ontheonehand,theinfrastructureneededtostoresuchamountofdataissophisticatedandcostlyand,ontheotherhand,locatingrelevantinformationwithinthesaveddataiscomputationallyintenseandtimeconsum