High Variable Genes (HVG) overlap

Static Badge

Static Badge

Static Badge

Description

The High Variable Gene (HVG) Overlap metric evaluates how well batch integration preserves biologically informative gene variability in single-cell RNA-seq data. It compares the sets of highly variable genes (HVGs) identified before and after integration. A high overlap score indicates that the integration method retains key biological signals across batches.

Formulas

Let :

  • \(X\) be the set of HVGs before integration

  • \(Y\) be the set of HVGs after integration

  • \(\left | X \cap Y \right |\) the number of genes common to both sets

  • \(\min(X,Y)\) the size of the smaller set

The HVG Overlap is defined as :

\[ overlap(X,Y)=\frac{\left | X \cap Y \right |}{\min(\left |X \right |,\left |Y \right |)} \]

This formulation emphasizes the preservation of informative genes, even when the integration process alters the number of HVGs. The overall HVG score is computed as the mean of per-batch HVG overlap scores.

Sources

Luecken, M.D., Büttner, M., Chaichoompu, K. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2022)

Open Problems in Single-Cell Analysis

Code

SCIB