Average Jaccard Distance¶
Description¶
The Average Jaccard Distance (AJD) is a metric introduce to quantify the distortion introduced by dimensionality reduction in single-cell RNA-seq (scRNA-seq) data. It measures how well local neighborhoods of cells are preserved after projecting the data into a lower-dimensional space. For each cell, the set of its \(k\) nearest neighbors is computed both in the original (high-dimensional) space and in the reduced space. The Jaccard distance is then calculated between these two sets, and the AJD is defined as the mean of these distances over all cells. A low AJD indicates that local neighborhoods are well preserved (low distortion), while a high AJD implies substantial structural changes caused by the dimensionality reduction.
Formulas¶
Jaccard Distance¶
Given two sets \(A\) and \(B\), the Jaccard distance is defined as :
where \(|A \cap B|\) is the number of elements in the intersection and \(|A \cup B|\) is the number in the union.
-
If \(A = B\), then \(d_J = 0\) (no distortion).
-
If \(A \cap B = \emptyset\), then \(d_J = 1\) (complete distortion).
Average Jaccard Distance (AJD)¶
Let \(N\) be the total number of cells in the dataset. For each cell \(i\), let \(A_i\) and \(B_i\) be the sets of its \(k\) nearest neighbors in the original and reduced space, respectively. The Average Jaccard Distance is given by :
Value Range and Interpretation¶
Since \(0 \leq d_J(A, B) \leq 1\), it follows that \(0 \leq \mathrm{AJD} \leq 1\).
-
AJD \(= 0\) indicates that local neighborhoods are perfectly preserved (ideal embedding).
-
AJD \(= 1\) indicates that all local neighborhoods have changed entirely (maximum distortion).
Intermediate values represent partial preservation of neighborhood structures.