Coefficient of Determination \(R^2\)

Description

The coefficient of determination, denoted as \(R^2\), is a statistical measure used to assess the goodness of fit of a regression model. It represents the proportion of the variance in the dependent variable that is predictable from the independent variables. An \(R^2 = 1\) indicates a perfect fit, while \(R^2 = 0\) means the model explains none of the variability.
Although commonly between 0 and 1, \(R^2\) can be negative when the model fits the data worse than a horizontal mean line. In bioinformatics, \(R^2\) is often used to evaluate model performance in tasks like spatial transcriptomics or gene expression prediction.

Formulas

The standard formula for \(R^2\) is :

\[ R^2 = 1 - \frac{SS_{\text{res}}}{SS_{\text{tot}}} \]

Where :

  • \(SS_{\text{res}} = \sum (y_i - \hat{y}_i)^2\) is the residual sum of squares (prediction errors),

  • \(SS_{\text{tot}} = \sum (y_i - \bar{y})^2\) is the total sum of squares (total variance),

  • \(y_i\) are the observed values,

  • \(\hat{y}_i\) are the predicted values,

  • \(\bar{y}\) is the mean of the observed values.

Sources

Wikipedia

Newcastle University

Miles, J. (2005). Encyclopedia of Statistics in Behavioral Science. Wiley.

OpenProblems

Code

[Scikit](https://scikit-learn.org/0.17/modules/generated/sklearn.metrics.r2_score.html#:~:text=R%5E2%20(coefficient%20of%20determination,R%5E2%20score%20of%200.0.)