欢迎来到天天文库
浏览记录
ID:52432553
大小:879.59 KB
页数:29页
时间:2020-03-27
《数据挖掘讲座7 Cluster Analysis Cluster Validity an.pdf》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、Lecture7:ClusterAnalysis:ClusterValidityandAdvancedTopicsIntelligentDataEngineering,2010ClusterValidityForsupervisedclassificationwehaveavarietyofmeasurestoevaluatehowgoodourmodelis-Accuracy,precision,recallForclusteranalysis,theanalogousquestionishowtoevaluatethe“goodness”oftheresulting
2、clusters?But“clustersareintheeyeofthebeholder”!Thenwhydowewanttoevaluatethem?2ClustersFoundinRandomData110.90.90.80.80.70.7Random0.60.6DBSCANy0.5y0.5Points0.40.40.30.30.20.20.10.10000.20.40.60.8100.20.40.60.81xx110.90.9K-means0.80.8Complete0.70.70.60.6Linky0.5y0.50.40.40.30.30.20.20.10.1
3、0000.20.40.60.8100.20.40.60.81xx3DifferentAspectsofClusterValidationDeterminingtheclusteringtendencyofasetofdata,i.e.,distinguishingwhethernon-randomstructureactually.Comparingtheresultsofaclusteranalysistoexternallyknownresults,e.g.,toexternallygivenclasslabels.Evaluatinghowwelltheresu
4、ltsofaclusteranalysisfitthedatawithoutreferencetoexternalinformation.-UseonlythedataComparingtheresultsoftwodifferentsetsofclusteranalysestodeterminewhichisbetter.-samedata+samealgorithm+differentparameters;-samedata+differentalgorithm-differentdata4MeasuresofClusterValidityNumericalmeas
5、uresthatareappliedtojudgevariousaspectsofclustervalidity,areclassifiedintothefollowingthreetypes.ExternalIndex:Usedtomeasuretheextenttowhichclusterlabelsmatchexternallysuppliedclasslabels.-E.g.,Entropy.InternalIndex:Usedtomeasurethegoodnessofaclusteringstructurewithoutrespecttoexternalin
6、formation.-E.g.,SumofSquaredError(SSE)RelativeIndex:Usedtocomparetwodifferentclusteringsorclusters.5MeasuringClusterValidityViaCorrelationTwomatrices-SimilarityMatrix-IdealSimilarityMatrix:Onerowandonecolumnforeachdatapoint.Anentryis1iftheassociatedpairofpointsbelongtothesamecluster.Anen
7、tryis0iftheassociatedpairofpointsbelongstodifferentclusters.Computethecorrelationbetweenthetwomatrices-Sincethematricesaresymmetric,onlythecorrelationbetweenn(n-1)/2entriesneedstobecalculated.Highcorrelationindicatesthatpointsthatbelongtothesameclusterareclo
此文档下载收益归作者所有