what is coefficient of determination
The coefficient of determination is a statistic (usually written as R2R^2R2) that tells you what proportion of the variation in a dependent variable can be explained by the independent variable(s) in a regression model.
What is the coefficient of determination?
- It is denoted by R2R^2R2 (read âR-squaredâ).
- It measures how well your regression modelâs predictions match the actual data (a âgoodnessâofâfitâ measure).
- Conceptually:
- R2=0R^2=0R2=0: the model explains none of the variability in YYY.
* R2=1R^2=1R2=1: the model perfectly explains all variability in YYY.
In words: R2R^2R2 answers âhow much of whatâs going on in YYY can be accounted for by the predictors in my model?â
Intuition with a quick example
Imagine you regress exam scores (Y) on hours studied (X).
- Suppose you get R2=0.72R^2=0.72R2=0.72.
- Interpretation: 72% of the variation in exam scores across students is explained by differences in study time included in your model; the remaining 28% is due to other factors or random noise.
This is why many practitioners say âhigher R2R^2R2 usually means a better fit,â though context always matters.
Basic formula (conceptual)
For a regression model, one common definition is:
R2=1âSSresSStotR^2=1-\frac{SS_{\text{res}}}{SS_{\text{tot}}}R2=1âSStotâSSresââ
- SSresSS_{\text{res}}SSresâ: residual sum of squares (unexplained variation).
- SStotSS_{\text{tot}}SStotâ: total sum of squares (total variation in YYY around its mean).
So SSres/SStotSS_{\text{res}}/SS_{\text{tot}}SSresâ/SStotâ is the fraction not explained, and 1â1-1â that fraction is the explained fraction.
In simple linear regression, R2R^2R2 is also the square of Pearsonâs correlation coefficient rrr, so R2=r2R^2=r^2R2=r2.
How to talk about it in plain English
When writing or discussing results, you might phrase it like:
âThe model has an R2R^2R2 of 0.60, meaning 60% of the variability in the outcome is explained by the predictors included in the regression.â
Thatâs the essence behind the phrase âcoefficient of determinationâ: it quantifies how much the model determines (explains) about the dependent variable.
TL;DR:
The coefficient of determination (R2R^2R2) is a number between 0 and 1 that
tells you what fraction of the variance in your dependent variable is
explained by your regression model; higher values indicate a better fit.
Information gathered from public forums or data available on the internet and portrayed here.