Gini coefficient¶
Description¶
The Gini coefficient is a statistical metric used to measure inequality within a distribution. Originally used in economics to measure income inequality, , it has been adapted in bioinformatics to assess how uniformly a gene is expressed across multiple samples. A low Gini value indicates stable expression across all samples, while a high Gini indicates variability, making the gene less suitable for normalization.
Formulas¶
The Gini coefficient for a gene \(g\), across \(K\) samples, is defined as :
Where :
-
\(K\) is the number of samples
-
\(x_{g,i}\) : expression value of gene \(g\) in sample \(i\)
-
Values \(x_{g,i}\) must be sorted in ascending order (\(x_{g,1} \leq x_{g,2} \leq \dots \leq x_{g,K}\))
The formula captures cumulative expression weighted by rank.
The first term \(\frac{K+1}{K}\) represents the theoretical maximum (perfect equality), while the second term computes deviation from that ideal, normalized by the total expression \(\sum x_{g,i}\).