资源描述:
《a trend discovery system for dynamic web content mining》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、ATrendDiscoverySystemforDynamicWebContentMining1,211A.Méndez-Torreblanca,M.Montes-y-GómezandA.López-López1InstitutoNacionaldeAstrofísica,ÓpticayElectrónica-INAOE.LuísEnriqueErroNo1,Tonantzintla,Puebla.,72840.Méxicoamendez@cseg.inaoep.mx,{mmontesg,allopez}@inaoep.mx2InstitutoTecnológicodePuebla
2、-ITP,Av.TecnológicoNo420,Puebla,Pue.,72000.México.Abstract.Therapidexpansionofthewebiscausingtheconstantgrowthofin-formation,leadingtoseveralproblemssuchasanincreaseddifficultyofex-tractingpotentiallyusefulknowledge.Webcontentminingconfrontsthisprob-lemgatheringexplicitinformationfromdifferent
3、websitesforitsaccessandknowledgediscovery.Itscurrentmethodsfocusonanalyzingstaticwebsitesandcannotdealwithconstantlychangingwebsites,suchasnewssites.Inthispaper,weproposeamethodforminingonlinenewssites.Thismethodappliesdynamicschemesforexploringthesewebsitesandextractingnewsreports,andusesdoma
4、inindependentstatisticalanalysisfortrendanalysis.Theoverallmethodisanapplicationofwebminingthatgoesbeyondstraightforwardnewsanalysis,tryingtounderstandcurrentsocietyinterestsandtomeasurethesocialimportanceofongoingevents.Keywords:Webcontentmining,dynamiccrawler,trenddiscovery,statisticalanalys
5、is.1IntroductionThewebisamediumforaccessingagreatvarietyofinformationstoredindifferentpartsoftheworld.Therapidexpansionofthewebiscausingtheconstantgrowthofthisinformation,leadingtoseveralproblems:anincreaseddifficultyoffindingrelevantinformation,extractingpotentiallyusefulknowledgeandlearninga
6、boutconsumersorindividualusers[1].Webminingisanemergingresearchareafocusedonresolvingtheseproblems.Basically,webminingisconcernedwith“theuseofdataminingtechniquestoauto-maticallydiscoverandextractinformationfromWorldWideWebdocumentsandservices”[2].Itiscategorizedinthreeareasofinterest:webusage
7、mining,webstruc-tureminingandwebcontentmining[1].Webusageminingfindsaccesspatternsfromwebsites.Webstructureminingprovidesstructuralinformationaboutwebdocumentsandsites,andwebcontentminingfindsusefulinformationfromdataintheweb.Webcontent