资源描述:
《Introduction to Data Mining》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、DataMining:IntroductionLectureNotesforChapter1IntroductiontoDataMiningbyTan,Steinbach,Kumar©Tan,Steinbach,KumarIntroductiontoDataMining4/18/2004‹#›WhyMineData?CommercialViewpointLotsofdataisbeingcollectedandwarehoused–Webdata,e-commerce–purchasesatdepartment/groce
2、rystores–Bank/CreditCardtransactionsComputershavebecomecheaperandmorepowerfulCompetitivePressureisStrong–Providebetter,customizedservicesforanedge(e.g.inCustomerRelationshipManagement)©Tan,Steinbach,KumarIntroductiontoDataMining4/18/2004‹#›WhyMineData?ScientificV
3、iewpointDatacollectedandstoredatenormousspeeds(GB/hour)–remotesensorsonasatellite–telescopesscanningtheskies–microarraysgeneratinggeneexpressiondata–scientificsimulationsgeneratingterabytesofdataTraditionaltechniquesinfeasibleforrawdataDataminingmayhelpscientist
4、s–inclassifyingandsegmentingdata–inHypothesisFormationMiningLargeDataSets-MotivationThereisofteninformation“hidden”inthedatathatisnotreadilyevidentHumananalystsmaytakeweekstodiscoverusefulinformationMuchofthedataisneveranalyzedatall4,000,0003,500,0003,000,000The
5、DataGap2,500,0002,000,000Totalnewdisk(TB)since19951,500,0001,000,000Numberof500,000analysts019951996199719981999©Tan,Steinbach,KumarFrom:R.Grossman,C.Kamath,V.Kumar,“DataMiningforScientificandEngineeringApplications”IntroductiontoDataMining4/18/2004‹#›WhatisDataMin
6、ing?ManyDefinitions–Non-trivialextractionofimplicit,previouslyunknownandpotentiallyusefulinformationfromdata–Exploration&analysis,byautomaticorsemi-automaticmeans,oflargequantitiesofdatainordertodiscovermeaningfulpatterns©Tan,Steinbach,KumarIntroductiontoDataMinin
7、g4/18/2004‹#›Whatis(not)DataMining?WhatisnotDataWhatisDataMining?Mining?–Lookupphone–CertainnamesaremorenumberinphoneprevalentincertainUSdirectorylocations(O’Brien,O’Rurke,O’Reilly…inBostonarea)–QueryaWeb–Grouptogethersimilarsearchenginefordocumentsreturnedbyinfo
8、rmationaboutsearchengineaccordingto“Amazon”theircontext(e.g.Amazonrainforest,Amazon.com,)©Tan,Steinbach,KumarIntroductiontoDataMining4/18/2004‹#›