co-KNN AUC & co-KNN size¶
Description¶
The co-KNN AUC and co-KNN size metrics assess how well local neighborhood structures are preserved after dimensionality reduction.
They are inspired by co-ranking theory and extend traditional nearest-neighbor preservation metrics by comparing local structures between the original and embedded spaces.
-
co-KNN size measures the average number of shared neighbors between the high-dimensional and reduced representations. It captures local stability, reflecting to what extent individual neighborhoods are preserved.
-
co-KNN AUC evaluates global fidelity of neighborhood preservation by computing the area under the curve (AUC) of a co-ranking–based function across different neighborhood sizes. This function, \(QNN(k)\), quantifies how many of the top-\(k\) neighbors in the original space are also in the top-\(k\) in the reduced space, averaged over all data points.
Note: While these simplified metrics follow the intuition behind co-ranking matrices (see Zhang et al., 2021), they may differ slightly in formulation from the original definitions (e.g., QNX or QNN(k) curves).
Formulas¶
co-KNN size¶
This metric estimates how many of the \(k\) nearest neighbors are preserved between the high-dimensional space and the reduced space, averaged over all data points:
Where:
- \(N\): total number of data points
- To normalize this value into the range \([0, 1]\), we divide by \(k\)
- \(N_k^{\text{orig}}(i)\): the set of \(k\) nearest neighbors of point \(i\) in the original space
- \(N_k^{\text{embed}}(i)\): the set of \(k\) nearest neighbors of point \(i\) in the reduced space
Range: \([0, 1]\) — higher values indicate better local preservation.
co-KNN AUC¶
This metric is based on the co-ranking matrix \(Q\), which compares the ranks of neighbor relationships in the original and embedded spaces.
Let:
- \(Q(i,j)\) be the number of points ranked \(\leq i\) in the original space and \(\leq j\) in the reduced space.
- \(QNN(k) = \frac{1}{N k} \sum_{i=1}^{k} \sum_{j=1}^{k} Q(i,j)\), normalized to fall in \([0,1]\)
Then:
- This corresponds to the area under the \(QNN(k)\) curve, which summarizes neighbor preservation quality across different neighborhood sizes.
- A value close to 1 indicates excellent global fidelity of neighborhood preservation.
Sources¶
Wikipedia - k-nearest neighbors