Kolmogorov–Smirnov Statistic¶
Description¶
The Kolmogorov–Smirnov (KS) statistic measures the maximal difference between two empirical cumulative distribution functions (CDFs). In single-cell analysis, it is often used as a two-sample test to quantify separation between two distributions (e.g. inter- vs intra-type cell distance distributions). Intuitively, \(D\) lies between 0 and 1 : a value near 0 implies the distributions largely overlap, while larger values indicate greater divergence. Pachter et al. use the two-sample KS statistic to quantify how well clusters or cell types separate in embedded vs ambient spaces (high \(D\) means less overlap of the two distance distributions).

« Illustration of the Kolmogorov–Smirnov statistic. The red line is a model CDF, the blue line is an empirical CDF, and the black arrow is the KS statistic. » Wikipedia
Formulas¶
Two-sample KS :¶
For two independent samples of sizes \(n\) and \(m\), with empirical CDFs \(F_{1,n}(x)\) and \(F_{2,m}(x)\), the KS statistic is
i.e., the maximum vertical distance between the two CDFs .
One-sample KS :¶
Given a sample CDF \(F_n(x)\) and a reference CDF \(F(x)\), the one-sample KS statistic is
measuring the largest difference between the sample CDF and the reference distribution .
Interpretation :¶
In both cases, \(\sup_x\) denotes the supremum (maximum) over all real \(x\).
The KS statistic ranges from 0 (identical distributions) up to 1 (completely non-overlapping), with higher \(D\) indicating stronger separation of the two distributions.