The central tendency measures are statistical tools used to summarize a dataset by identifying a "typical" or "central" value. The three main ones are the mean, the median and the mode. Here is a detailed explanation of each, with their definitions, calculations, uses and limits:
- Average (or arithmetic mean)
- Definition : This is the sum of all values in a set divided by the number of values. It represents the "equilibrium point" of the data.
- Formula : Mean =∑xin\text{Mean} = \frac{\sum x_i}{n}Mean=n∑xi
where xix_ixi is each value and nnn is the total number of values.
- Example : For grades 12, 15, 18, 10: Mean=12+15+18+104=554=13,75 text{Mean} = frac{12 + 15 + 18 + 10}{4} = frac{55}{4} = 13,75Moyenne=412+15+18+10 =455 =13,75
- Utility :
- Gives a quick overview.
- Widely used for continuous or discrete quantitative data (e.g. wages, temperatures).
- Limits :
- Sensitive to extreme values (outliers). Example: If we add a score of 0 to the previous set, the average falls to 11.
- Not very relevant for ordinal qualitative data (e.g., "good/average/bad").
- Median
- Definition : This is the value that lies in the middle of a dataset sorted in ascending or descending order. She divides the data into two equal halves (50% below, 50% above).
- Calculation :
- Sort the values.
- If the number of values (nnn) is odd, the median is the central value.
- If nnn is even, the median is the average of the two central values.
- Example :
- Suppose a running champion completes a typical 200-meter training run in the following times: 26.1 seconds, 25.6 seconds, 25.7 seconds, 25.2 seconds, 25.0 seconds, 27.8 seconds and 24.1 seconds. How is the median time calculated?
| Rank | Time (in seconds) |
| 1 | 24.1 |
| 2 | 25,0 |
| 3 | 25.2 |
| 4 | 25.6 |
| 5 | 25.7 |
| 6 | 26,1 |
| 7 | 27.8 |
- There are n = 7 values, an odd number. The median will therefore correspond to the rank value
(n+1) 2 = (7 +1) 2 = 4
The median time is 25.6 seconds.
- Utility :
- Resistant to extreme values, therefore ideal for asymmetric data (e.g. income from a few millionaires).
- Better represents the central tendency in case of asymmetric distribution
- Also works with ordinal data (e.g., rankings).
- Limits :
- Requires sorting data, which can be lengthy for large sets.
- Less informative than average on overall distribution.
- Mode
- Definition : It is the value (or values) that most often appears in a dataset. It represents the maximum frequency.
- Calculation : Identify the most frequent value.
- Example :
- Data: 10, 12, 12, 15, 18 Mode = 12 (appears 2 times).
- Data: 5, 5, 8, 8, 10 Mode = 5 and 8 (bimodal).
- Data: 3, 4, 5, 6 No mode (all appear once).
- Special cases:
- Unimodal: A single mode (ex : 18 is the only mode)
- Bimodal: Two modes (ex : if 14 and 18 appear the same number of times)
- Multimodal: Multiple modes
- Utility :
- Works for qualitative and quantitative data
- Useful for finding the most frequent value (dominant value).
- Limits :
- May not exist (no repeat) or be multiple (multiple modes).
- Does not give any information about the distribution or centrality of other values.
Comparison and interpretation
|
Measure |
When to use it? |
Advantage |
Disadvantage |
|
Average |
Quantitative data without extremes |
Summarizes all values |
Sensitive to outliers |
|
Median |
Data with outliers or asymmetric |
Robust to extremes |
Ignore extreme values |
|
Mode |
Qualitative or frequency data marked |
Simple and intuitive |
Uninformative alone |
Concrete example
Let’s imagine the salaries of a small group: 2000, 2100, 2200, 2300, 50000 €.
- Average : (2000 + 2100 + 2200 + 2300 + 50000) / 5 = 11 720 € Influenced by the 50,000 €, unrepresentative.
- Median : Sort : 2000, 2100, 2200, 2300, 50000 Median = 2200 € Closer to reality for the majority.
- Mode : No mode (no repetition).
In summary
- The mean gives a global idea but can be misleading with outliers.
- The median offers a robust view of the center, ideal for unbalanced data.
- The mode highlights the most common values, perfect for categories. These three measures complement each other and their choice depends on the type of data and the purpose of the analysis!