欢迎来到天天文库
浏览记录
ID:39547835
大小:45.50 KB
页数:7页
时间:2019-07-06
《Cluster Analysis R的用法》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、ClusterAnalysisTheTwoStepClusterAnalysisprocedureisanexploratorytooldesignedtorevealnaturalgroupings(orclusters)withinadatasetthatwouldotherwisenotbeapparent.Thealgorithmemployedbythisprocedurehasseveraldesirablefeaturesthatdifferentiateitfromtraditionalclusteringtechniques:·Theabilitytocreat
2、eclustersbasedonbothcategoricalandcontinuousvariables.·Automaticselectionofthenumberofclusters.·Theabilitytoanalyzelargedatafilesefficiently.Inordertohandlecategoricalandcontinuousvariables,theTwoStepClusterAnalysisprocedureusesalikelihooddistancemeasurewhichassumesthatvariablesintheclustermo
3、delareindependent.Further,eachcontinuousvariableisassumedtohaveanormal(Gaussian)distributionandeachcategoricalvariableisassumedtohaveamultinomialdistribution.Empiricalinternaltestingindicatesthattheprocedureisfairlyrobusttoviolationsofboththeassumptionofindependenceandthedistributionalassumpt
4、ions,butyoushouldtrytobeawareofhowwelltheseassumptionsaremet.ThetwostepsoftheTwoStepClusterAnalysisprocedure'salgorithmcanbesummarizedasfollows:Step1.TheprocedurebeginswiththeconstructionofaClusterFeatures(CF)Tree.Thetreebeginsbyplacingthefirstcaseattherootofthetreeinaleafnodethatcontainsvari
5、ableinformationaboutthatcase.Eachsuccessivecaseisthenaddedtoanexistingnodeorformsanewnode,baseduponitssimilaritytoexistingnodesandusingthedistancemeasureasthesimilaritycriterion.Anodethatcontainsmultiplecasescontainsasummaryofvariableinformationaboutthosecases.Thus,theCFtreeprovidesacapsulesu
6、mmaryofthedatafile.Step2.TheleafnodesoftheCFtreearethengroupedusinganagglomerativeclusteringalgorithm.Theagglomerativeclusteringcanbeusedtoproducearangeofsolutions.Todeterminewhichnumberofclustersis"best",eachoftheseclustersolutionsiscomparedusingSchwarz'sBayesianCriterion(BIC)ortheAkaikeInfo
7、rmationCriterion(AIC)astheclusteringcriterion.Carmanufacturersneedtobeabletoappraisethecurrentmarkettodeterminethelikelycompetitionfortheirvehicles.Ifcarscanbegroupedaccordingtoavailabledata,thistaskcanbelargelyautomaticusingclusteranalysis.S
此文档下载收益归作者所有