what is coefficient of determination
The coefficient of determination is a statistic (usually written as R2R^2R2) that tells you what proportion of the variation in a dependent variable can be explained by the independent variable(s) in a regression model.
What is the coefficient of determination?
- It is denoted by R2R^2R2 (read “R-squared”).
- It measures how well your regression model’s predictions match the actual data (a “goodness‑of‑fit” measure).
- Conceptually:
- R2=0R^2=0R2=0: the model explains none of the variability in YYY.
* R2=1R^2=1R2=1: the model perfectly explains all variability in YYY.
In words: R2R^2R2 answers “how much of what’s going on in YYY can be accounted for by the predictors in my model?”
Intuition with a quick example
Imagine you regress exam scores (Y) on hours studied (X).
- Suppose you get R2=0.72R^2=0.72R2=0.72.
- Interpretation: 72% of the variation in exam scores across students is explained by differences in study time included in your model; the remaining 28% is due to other factors or random noise.
This is why many practitioners say “higher R2R^2R2 usually means a better fit,” though context always matters.
Basic formula (conceptual)
For a regression model, one common definition is:
R2=1−SSresSStotR^2=1-\frac{SS_{\text{res}}}{SS_{\text{tot}}}R2=1−SStotSSres
- SSresSS_{\text{res}}SSres: residual sum of squares (unexplained variation).
- SStotSS_{\text{tot}}SStot: total sum of squares (total variation in YYY around its mean).
So SSres/SStotSS_{\text{res}}/SS_{\text{tot}}SSres/SStot is the fraction not explained, and 1−1-1− that fraction is the explained fraction.
In simple linear regression, R2R^2R2 is also the square of Pearson’s correlation coefficient rrr, so R2=r2R^2=r^2R2=r2.
How to talk about it in plain English
When writing or discussing results, you might phrase it like:
“The model has an R2R^2R2 of 0.60, meaning 60% of the variability in the outcome is explained by the predictors included in the regression.”
That’s the essence behind the phrase “coefficient of determination”: it quantifies how much the model determines (explains) about the dependent variable.
TL;DR:
The coefficient of determination (R2R^2R2) is a number between 0 and 1 that
tells you what fraction of the variance in your dependent variable is
explained by your regression model; higher values indicate a better fit.
Information gathered from public forums or data available on the internet and portrayed here.