High Variable Genes (HVG) overlap¶
Description¶
The High Variable Gene (HVG) Overlap metric evaluates how well batch integration preserves biologically informative gene variability in single-cell RNA-seq data. It compares the sets of highly variable genes (HVGs) identified before and after integration. A high overlap score indicates that the integration method retains key biological signals across batches.
Formulas¶
Let :
-
\(X\) be the set of HVGs before integration
-
\(Y\) be the set of HVGs after integration
-
\(\left | X \cap Y \right |\) the number of genes common to both sets
-
\(\min(X,Y)\) the size of the smaller set
The HVG Overlap is defined as :
This formulation emphasizes the preservation of informative genes, even when the integration process alters the number of HVGs. The overall HVG score is computed as the mean of per-batch HVG overlap scores.
Sources¶
Open Problems in Single-Cell Analysis