how to find correlation coefficient
The correlation coefficient (usually Pearson’s rrr) measures how strongly two numeric variables move together, on a scale from −1-1−1 to 111.
What correlation coefficient is
- r=1,r=1r=1: perfect positive linear relationship (as xxx increases, yyy increases exactly).
- r=−1,r=-1r=−1: perfect negative linear relationship (as xxx increases, yyy decreases exactly).
- r=0,r=0r=0: no linear relationship (points look like a cloud, not a line).
In many real datasets, you’ll see values like r=0.2r=0.2r=0.2 (weak), 0.50.50.5 (moderate), 0.80.80.8 (strong), depending on context.
Formula for Pearson’s r
For paired data (x1,y1),(x2,y2),…,(xn,yn)(x_1,y_1),(x_2,y_2),\dots,(x_n,y_n)(x1,y1),(x2,y2),…,(xn,yn), the usual computational formula is:
r=n∑xy−(∑x)(∑y)(n∑x2−(∑x)2)(n∑y2−(∑y)2)r=\frac{n\sum xy-(\sum x)(\sum y)}{\sqrt{\bigl(n\sum x^2-(\sum x)^2\bigr)\bigl(n\sum y^2-(\sum y)^2\bigr)}}r=(n∑x2−(∑x)2)(n∑y2−(∑y)2)n∑xy−(∑x)(∑y)
This looks scary at first, but it’s mostly about organizing sums.
Step‑by‑step: how to calculate r by hand
Imagine you have nnn pairs of values, like height and weight for several people.
1. Make a table
Create a table with columns:
- xxx
- yyy
- xyxyxy
- x2x^2x2
- y2y^2y2
For each row (each person or observation):
- Fill in xxx and yyy.
- Compute xyxyxy.
- Compute x2x^2x2.
- Compute y2y^2y2.
2. Compute the sums
At the bottom of each column, compute:
- ∑x\sum x∑x
- ∑y\sum y∑y
- ∑xy\sum xy∑xy
- ∑x2\sum x^2∑x2
- ∑y2\sum y^2∑y2
You also need nnn, the number of pairs.
3. Plug into the formula
Use:
r=n∑xy−(∑x)(∑y)(n∑x2−(∑x)2)(n∑y2−(∑y)2)r=\frac{n\sum xy-(\sum x)(\sum y)}{\sqrt{\bigl(n\sum x^2-(\sum x)^2\bigr)\bigl(n\sum y^2-(\sum y)^2\bigr)}}r=(n∑x2−(∑x)2)(n∑y2−(∑y)2)n∑xy−(∑x)(∑y)
- First compute the numerator : n∑xy−(∑x)(∑y)n\sum xy-(\sum x)(\sum y)n∑xy−(∑x)(∑y).
- Then compute each denominator part:
- A=n∑x2−(∑x)2A=n\sum x^2-(\sum x)^2A=n∑x2−(∑x)2
- B=n∑y2−(∑y)2B=n\sum y^2-(\sum y)^2B=n∑y2−(∑y)2
- Multiply them: A×BA\times BA×B.
- Take the square root: A×B\sqrt{A\times B}A×B.
- Finally, divide numerator by that square root to get rrr.
For example, one worked dataset with six observations gives r≈0.53,r\approx 0.53r≈0.53, interpreted as a moderate positive correlation.
Quick way: using Excel (or similar)
If you have your data in two columns in Excel (say xxx in A2:A10 and yyy in B2:B10), you can compute rrr directly:
- Put data for variable 1 in column A (without the header row).
- Put data for variable 2 in column B.
- In an empty cell, type:
=CORREL(A2:A10, B2:B10)- or
=PEARSON(A2:A10, B2:B10)(modern Excel treats them similarly).
- Press Enter; the result is the correlation coefficient between those two variables.
Many calculators and stats packages (SPSS, Minitab, etc.) have built‑in correlation functions and menus as well.
How to interpret your r value
Once you have rrr, think about:
- Sign :
- r>0r>0r>0 → as xxx increases, yyy tends to increase.
- r<0r<0r<0 → as xxx increases, yyy tends to decrease.
- Magnitude (rough guide, not a strict rule):
- ∣r∣<0.3|r|<0.3∣r∣<0.3: weak linear relationship.
- 0.3≤∣r∣<0.70.3\le |r|<0.70.3≤∣r∣<0.7: moderate.
- ∣r∣≥0.7|r|\ge 0.7∣r∣≥0.7: strong.
Always remember: correlation does not prove causation, even if ∣r∣|r|∣r∣ is large.
Tiny story example
Imagine a small café owner tracking daily ad spend (in dollars) and number of
customers over 10 days. After typing both lists into Excel and using
=CORREL, they get r=0.82r=0.82r=0.82. This strong positive correlation
suggests that days with higher ad spending tend to have more customers—but it
doesn’t guarantee that ads are the only cause (weather, holidays, or nearby
events could also matter). Still, it gives the owner a data‑backed hint
that ads are worth testing more systematically.
TL;DR:
To find the correlation coefficient rrr: organize your paired data in a table,
compute ∑x\sum x∑x, ∑y\sum y∑y, ∑xy\sum xy∑xy, ∑x2\sum x^2∑x2, ∑y2\sum y^2∑y2,
plug into the Pearson formula above, and interpret the resulting number
between −1-1−1 and 111.
Information gathered from public forums or data available on the internet and portrayed here.