how to find variance
Variance measures how spread out a set of numbers is around the mean.
What variance means
- Variance is the average of the squared distances of each data point from the mean.
- A small variance means data points are tightly clustered; a large variance means they are more spread out.
- The square root of the variance is the standard deviation.
Two main formulas
For a dataset with values x1,x2,...,xnx_1,x_2,...,x_nx1,x2,...,xn:
- Population variance (when you have all data from the entire group):
σ2=∑(xi−μ)2n\sigma^2=\dfrac{\sum (x_i-\mu)^2}{n}σ2=n∑(xi−μ)2 where μ\mu μ is the population mean.
- Sample variance (when your data is just a sample):
s2=∑(xi−xˉ)2n−1s^2=\dfrac{\sum (x_i-\bar{x})^2}{n-1}s2=n−1∑(xi−xˉ)2 where xˉ\bar{x}xˉ is the sample mean.
Dividing by n−1n-1n−1 (instead of nnn) for a sample corrects bias and is standard in statistics.
Step‑by‑step: how to find variance
Here’s the general process you use almost every time:
- Find the mean
- Add all values and divide by how many values you have.
- Find deviations from the mean
- For each value, calculate xi−meanx_i-\text{mean}xi−mean.
- Square each deviation
- Compute (xi−mean)2(x_i-\text{mean})^2(xi−mean)2 for every value.
- Add the squared deviations (sum of squares)
- Add all those squared values together.
- Divide to get the variance
- If it’s a population: divide by nnn.
- If it’s a sample: divide by n−1n-1n−1.
Concrete example (sample variance)
Suppose your data is: 4, 7, 9, 10 (treat this as a sample).
- Mean: xˉ=(4+7+9+10)/4=30/4=7.5\bar{x}=(4+7+9+10)/4=30/4=7.5xˉ=(4+7+9+10)/4=30/4=7.5.
- Deviations from mean:
- 4 − 7.5 = −3.5
- 7 − 7.5 = −0.5
- 9 − 7.5 = 1.5
- 10 − 7.5 = 2.5
- Squared deviations:
- (−3.5)2=12.25(-3.5)^2=12.25(−3.5)2=12.25
- (−0.5)2=0.25(-0.5)^2=0.25(−0.5)2=0.25
- (1.5)2=2.25(1.5)^2=2.25(1.5)2=2.25
- (2.5)2=6.25(2.5)^2=6.25(2.5)2=6.25
- Sum of squares: 12.25+0.25+2.25+6.25=2112.25+0.25+2.25+6.25=2112.25+0.25+2.25+6.25=21.
- Sample variance:
- There are 4 values, so n−1=3n-1=3n−1=3.
- s2=21/3=7s^2=21/3=7s2=21/3=7.
So the variance of this sample is 7.
Population vs sample: which do I use?
- Use population variance if your data includes every member you care about (e.g., the exact heights of all 30 students in one class).
- Use sample variance if your data is just a subset used to estimate a bigger group (e.g., survey 50 people out of a city).
In most practical statistics problems, you are working with samples and should use the sample variance formula with n−1n-1n−1.
HTML table: quick reference of steps
html
<table>
<thead>
<tr>
<th>Step</th>
<th>What you do</th>
<th>Population formula</th>
<th>Sample formula</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Compute the mean of all values.</td>
<td>μ = (Σxᵢ) / n</td>
<td>x̄ = (Σxᵢ) / n</td>
</tr>
<tr>
<td>2</td>
<td>Subtract the mean from each value (xᵢ − mean).</td>
<td colspan="2">Find deviations for each xᵢ.</td>
</tr>
<tr>
<td>3</td>
<td>Square each deviation (xᵢ − mean)².</td>
<td colspan="2">Compute squared deviations.</td>
</tr>
<tr>
<td>4</td>
<td>Sum all squared deviations.</td>
<td colspan="2">Σ(xᵢ − mean)²</td>
</tr>
<tr>
<td>5</td>
<td>Divide the sum of squares.</td>
<td>σ² = Σ(xᵢ − μ)² / n</td>
<td>s² = Σ(xᵢ − x̄)² / (n − 1)</td>
</tr>
</tbody>
</table>
Tiny storytelling angle
Imagine you are a coach checking how consistent your player’s scores are over several games. Variance tells you whether they’re steady around their average performance or wildly up and down from game to game.
Quick TL;DR
- Find the mean.
- Subtract the mean from each value and square the result.
- Add all squared results.
- Divide by nnn for a population or by n−1n-1n−1 for a sample.
Information gathered from public forums or data available on the internet and portrayed here.