Silhouette Clustering Metric Analysis¶
Description¶
The Silhouette is a method for interpretation and validation of consistency within data clusters. This technique provides a succinct graphical representation of how well each object has been classified. The silhouette value measures how similar an object is to its own cluster (cohesion) compared to other clusters (separation). Values range from \(-1\) to \(+1\), where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. The metric is specialized for measuring cluster quality when clusters are convex-shaped and may not perform well if data clusters have irregular shapes or varying sizes.
A clustering with an average silhouette width greater than \(0.7\) is considered "strong", greater than \(0.5\) "reasonable" and greater than \(0.25\) "weak".
Formulas¶
For a data point \(i\) in cluster \(C_I\), we define:
Intra-cluster distance (cohesion)¶
\(a(i)\) represents the average distance between \(i\) and all other points in the same cluster.
where :
-
\(C_i\) is the number of points in the cluster
-
\(d(i,j)\) is the distance between points \(i\) and \(j\)
Inter-cluster distance (separation)¶
\(b(i)\) represents the minimum average distance from \(i\) to all points in any other cluster.
Silhouette Value¶
The silhouette value \(s(i)\) is defined by :
Sources¶
Applying Deep Learning algorithm to perform lung cells annotation, A. Collin