Variance, Standard Deviation

Sarowar Ahmed
2 min readJun 8, 2024

--

Photo by Arno Senoner on Unsplash

Ever wondered how we measure the “spread” or “variability” in a set of data? Let’s break down two fundamental concepts that do just that!

Variance: Imagine you and your friends measure the length of fish you catch on a fishing trip. Some fish are big, some small, but most are around the same size. Variance tells us how much the fish lengths differ from the average length. It’s like asking, “On average, how much do our fish lengths vary from the ‘typical’ fish length we catch?” To find it, we calculate the difference between each fish’s length and the average length, square those differences (to get rid of negative numbers), and then find the average of those squared differences.

Standard Deviation: If variance is how we measure the spread of our fish lengths squared, standard deviation is the square root of that variance. Why take the square root? It brings our measure of spread back to the same units as our original data, making it easier to understand. So, if we’re measuring fish in centimeters, standard deviation gives us a spread in centimeters too, not squared centimeters!

Example to Differentiate Both:
Imagine you have recorded the heights (in cm) of five sunflowers in your garden: 150, 155, 160, 165, and 170.

Mean (Average) Height:
μ=(150+155+160+165+170)/5=160 cm

Variance Calculation:
σ²=((150−160)²+(155−160)²+(160−160)²+(165−160)²+(170−160)²)/5

σ²=((−10)²+(−5)²+0²+5²+10²)/5=(100+25+0+25+100)/5=50 cm2

Standard Deviation Calculation:
σ=√50≈7.07 cm

❇ Understanding the Difference:

Variance (50 cm²) tells us that, on average, the heights of the sunflowers vary 50 square centimeters from the mean height. It’s a bit abstract because we’re dealing in square units, making it hard to visualize the variability of the heights directly.

Standard Deviation (approximately 7.07 cm), on the other hand, is much more intuitive. It tells us that, on average, the heights of the sunflowers deviate about 7.07 cm from the mean height. This gives us a clearer, more relatable picture of the spread of the sunflower heights, directly in the units of the original data.

Why They Matter: Both these statistics help us understand the diversity or uniformity in our data. If you’re analyzing anything from heights in a classroom to temperatures during a month, knowing how spread out your data is can provide deep insights. High variance or standard deviation means your data points are spread out widely; low variance or standard deviation means they’re more clustered closely.

If you enjoyed this article, feel free to follow me for more insights and updates.

--

--

Sarowar Ahmed
Sarowar Ahmed

Written by Sarowar Ahmed

An IIT Madras Scholar | LinkedIn Top Statistics Voice | Researching on Quantitative Finance | Data Science | AI | Machine Learning | Deep Learning |