Statistics & Distributions

Statistics transforms raw data into meaningful insights. In this lesson we review measures of center, introduce the concept of spread through standard deviation, and explore the most important distribution in all of statistics: the normal distribution.

Measures of Center (Review)

Measure	Definition	Best Used When
Mean	Sum of all values divided by the count	Data is symmetric, no extreme outliers
Median	Middle value when data is ordered	Data is skewed or has outliers
Mode	Most frequently occurring value	Categorical data or identifying peaks

Standard Deviation

The standard deviation (denoted σ for a population, s for a sample) measures how spread out data values are from the mean. A small standard deviation means data clusters tightly around the mean; a large one means data is widely scattered.

To calculate standard deviation:

Find the mean (μ or x̄).
Subtract the mean from each value to get deviations.
Square each deviation.
Find the average of the squared deviations (variance).
Take the square root of the variance.

σ = \sqrt[ Σ(x i - μ) 2 / N ]

Worked Example 1 -- Standard Deviation

Find the population standard deviation of: 4, 8, 6, 5, 7.

Mean = (4 + 8 + 6 + 5 + 7)/5 = 30/5 = 6.
Deviations: -2, 2, 0, -1, 1. Squared: 4, 4, 0, 1, 1.
Variance = (4 + 4 + 0 + 1 + 1)/5 = 10/5 = 2.
σ = √2 ≈ 1.414.

The Normal Distribution

The normal distribution (bell curve) is symmetric, centered at the mean μ, with spread determined by σ. It arises naturally when many small, independent random effects combine -- heights, test scores, measurement errors, and countless other phenomena follow approximate normal distributions.

The Empirical Rule (68-95-99.7 Rule)

For normally distributed data:

68% of data falls within 1σ of the mean.
95% of data falls within 2σ of the mean.
99.7% of data falls within 3σ of the mean.

This means only 5% of data lies more than 2 standard deviations from the mean, and values beyond 3σ are extremely rare (0.3%).

Worked Example 2 -- Empirical Rule

Test scores are normally distributed with mean 72 and standard deviation 8. What range contains 95% of scores?

95% falls within 2σ: 72 - 2(8) to 72 + 2(8).
Range: 56 to 88.

Z-Scores

A z-score tells you how many standard deviations a value is from the mean:

z = (x - μ) / σ

A z-score of 0 means the value equals the mean. Positive z-scores are above the mean; negative z-scores are below.

Worked Example 3 -- Z-Scores

In a class with mean score 80 and standard deviation 5, a student scores 92. What is her z-score?

z = (92 - 80)/5 = 12/5 = 2.4.
Interpretation: her score is 2.4 standard deviations above the mean -- an exceptional result.

Comparing Across Different Scales

Z-scores let you compare values from different distributions. A z-score of 1.5 on a math test and a z-score of 2.0 on a reading test tell you the student performed relatively better on the reading test, even though the raw scores and scales are completely different.

Common Mistake

Applying the empirical rule to data that is not approximately normal. If the data is heavily skewed, bimodal, or has a very different shape, the 68-95-99.7 percentages will not apply. Always check that the distribution is roughly bell-shaped first.

Practice Problems

Find the mean, median, and mode of: 3, 5, 7, 5, 9, 5, 11.
Show Solution

Mean = 45/7 ≈ 6.43. Ordered: 3, 5, 5, 5, 7, 9, 11. Median = 5. Mode = 5.
A data set has mean 50 and standard deviation 4. Find the z-score for x = 42.
Show Solution

z = (42 - 50)/4 = -8/4 = -2. The value is 2 standard deviations below the mean.
Heights of adult males are normally distributed with mean 70 inches and standard deviation 3 inches. What percentage of males are between 64 and 76 inches tall?
Show Solution

64 = 70 - 2(3) and 76 = 70 + 2(3). This is within 2 standard deviations, so approximately 95%.
Which is more unusual: a test score of 88 when the mean is 75 and σ = 5, or a test score of 92 when the mean is 82 and σ = 4?
Show Solution

First: z = (88 - 75)/5 = 2.6. Second: z = (92 - 82)/4 = 2.5. The first score (z = 2.6) is slightly more unusual.
Using the empirical rule, approximately what percentage of data in a normal distribution falls above the value μ + σ?
Show Solution

68% is within ±1σ, so 32% is outside. By symmetry, 16% is above μ + σ.

Summary

Mean, median, and mode measure center; standard deviation measures spread.
The normal distribution is symmetric and bell-shaped, defined by μ and σ.
Empirical rule: 68% within 1σ, 95% within 2σ, 99.7% within 3σ.
Z-scores standardize values: z = (x - μ)/σ.