Coefficient of Determination \(R^2\)¶
Description¶
The coefficient of determination, denoted as \(R^2\), is a statistical measure used to assess the goodness of fit of a regression model.
It represents the proportion of the variance in the dependent variable that is predictable from the independent variables.
An \(R^2 = 1\) indicates a perfect fit, while \(R^2 = 0\) means the model explains none of the variability.
Although commonly between 0 and 1, \(R^2\) can be negative when the model fits the data worse than a horizontal mean line.
In bioinformatics, \(R^2\) is often used to evaluate model performance in tasks like spatial transcriptomics or gene expression prediction.
Formulas¶
The standard formula for \(R^2\) is :
Where :
-
\(SS_{\text{res}} = \sum (y_i - \hat{y}_i)^2\) is the residual sum of squares (prediction errors),
-
\(SS_{\text{tot}} = \sum (y_i - \bar{y})^2\) is the total sum of squares (total variance),
-
\(y_i\) are the observed values,
-
\(\hat{y}_i\) are the predicted values,
-
\(\bar{y}\) is the mean of the observed values.
Sources¶
Miles, J. (2005). Encyclopedia of Statistics in Behavioral Science. Wiley.
Code¶
[Scikit](https://scikit-learn.org/0.17/modules/generated/sklearn.metrics.r2_score.html#:~:text=R%5E2%20(coefficient%20of%20determination,R%5E2%20score%20of%200.0.)