US Trends

what is coefficient of determination

The coefficient of determination is a statistic (usually written as R2R^2R2) that tells you what proportion of the variation in a dependent variable can be explained by the independent variable(s) in a regression model.

What is the coefficient of determination?

  • It is denoted by R2R^2R2 (read “R-squared”).
  • It measures how well your regression model’s predictions match the actual data (a “goodness‑of‑fit” measure).
  • Conceptually:
    • R2=0R^2=0R2=0: the model explains none of the variability in YYY.
* R2=1R^2=1R2=1: the model perfectly explains all variability in YYY.

In words: R2R^2R2 answers “how much of what’s going on in YYY can be accounted for by the predictors in my model?”

Intuition with a quick example

Imagine you regress exam scores (Y) on hours studied (X).

  • Suppose you get R2=0.72R^2=0.72R2=0.72.
  • Interpretation: 72% of the variation in exam scores across students is explained by differences in study time included in your model; the remaining 28% is due to other factors or random noise.

This is why many practitioners say “higher R2R^2R2 usually means a better fit,” though context always matters.

Basic formula (conceptual)

For a regression model, one common definition is:

R2=1−SSresSStotR^2=1-\frac{SS_{\text{res}}}{SS_{\text{tot}}}R2=1−SStot​SSres​​

  • SSresSS_{\text{res}}SSres​: residual sum of squares (unexplained variation).
  • SStotSS_{\text{tot}}SStot​: total sum of squares (total variation in YYY around its mean).

So SSres/SStotSS_{\text{res}}/SS_{\text{tot}}SSres​/SStot​ is the fraction not explained, and 1−1-1− that fraction is the explained fraction.

In simple linear regression, R2R^2R2 is also the square of Pearson’s correlation coefficient rrr, so R2=r2R^2=r^2R2=r2.

How to talk about it in plain English

When writing or discussing results, you might phrase it like:

“The model has an R2R^2R2 of 0.60, meaning 60% of the variability in the outcome is explained by the predictors included in the regression.”

That’s the essence behind the phrase “coefficient of determination”: it quantifies how much the model determines (explains) about the dependent variable.

TL;DR:
The coefficient of determination (R2R^2R2) is a number between 0 and 1 that tells you what fraction of the variance in your dependent variable is explained by your regression model; higher values indicate a better fit.

Information gathered from public forums or data available on the internet and portrayed here.