数据挖掘讲座7 Cluster Analysis Cluster Validity an.pdf

数据挖掘讲座7 Cluster Analysis Cluster Validity an.pdf

ID:52432553

大小:879.59 KB

页数:29页

时间:2020-03-27

数据挖掘讲座7 Cluster Analysis Cluster Validity an.pdf_第1页
数据挖掘讲座7 Cluster Analysis Cluster Validity an.pdf_第2页
数据挖掘讲座7 Cluster Analysis Cluster Validity an.pdf_第3页
数据挖掘讲座7 Cluster Analysis Cluster Validity an.pdf_第4页
数据挖掘讲座7 Cluster Analysis Cluster Validity an.pdf_第5页
资源描述:

《数据挖掘讲座7 Cluster Analysis Cluster Validity an.pdf》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库

1、Lecture7:ClusterAnalysis:ClusterValidityandAdvancedTopicsIntelligentDataEngineering,2010ClusterValidityForsupervisedclassificationwehaveavarietyofmeasurestoevaluatehowgoodourmodelis-Accuracy,precision,recallForclusteranalysis,theanalogousquestionishowtoevaluatethe“goodness”oftheresulting

2、clusters?But“clustersareintheeyeofthebeholder”!Thenwhydowewanttoevaluatethem?2ClustersFoundinRandomData110.90.90.80.80.70.7Random0.60.6DBSCANy0.5y0.5Points0.40.40.30.30.20.20.10.10000.20.40.60.8100.20.40.60.81xx110.90.9K-means0.80.8Complete0.70.70.60.6Linky0.5y0.50.40.40.30.30.20.20.10.1

3、0000.20.40.60.8100.20.40.60.81xx3DifferentAspectsofClusterValidationDeterminingtheclusteringtendencyofasetofdata,i.e.,distinguishingwhethernon-randomstructureactually.Comparingtheresultsofaclusteranalysistoexternallyknownresults,e.g.,toexternallygivenclasslabels.Evaluatinghowwelltheresu

4、ltsofaclusteranalysisfitthedatawithoutreferencetoexternalinformation.-UseonlythedataComparingtheresultsoftwodifferentsetsofclusteranalysestodeterminewhichisbetter.-samedata+samealgorithm+differentparameters;-samedata+differentalgorithm-differentdata4MeasuresofClusterValidityNumericalmeas

5、uresthatareappliedtojudgevariousaspectsofclustervalidity,areclassifiedintothefollowingthreetypes.ExternalIndex:Usedtomeasuretheextenttowhichclusterlabelsmatchexternallysuppliedclasslabels.-E.g.,Entropy.InternalIndex:Usedtomeasurethegoodnessofaclusteringstructurewithoutrespecttoexternalin

6、formation.-E.g.,SumofSquaredError(SSE)RelativeIndex:Usedtocomparetwodifferentclusteringsorclusters.5MeasuringClusterValidityViaCorrelationTwomatrices-SimilarityMatrix-IdealSimilarityMatrix:Onerowandonecolumnforeachdatapoint.Anentryis1iftheassociatedpairofpointsbelongtothesamecluster.Anen

7、tryis0iftheassociatedpairofpointsbelongstodifferentclusters.Computethecorrelationbetweenthetwomatrices-Sincethematricesaresymmetric,onlythecorrelationbetweenn(n-1)/2entriesneedstobecalculated.Highcorrelationindicatesthatpointsthatbelongtothesameclusterareclo

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。