The central tendency measures are statistical tools used to summarize a dataset by identifying a "typical" or "central" value. The three main ones are the mean, the median and the mode. Here is a detailed explanation of each, with their definitions, calculations, uses and limits:

  1. Average (or arithmetic mean)
  • Definition : This is the sum of all values in a set divided by the number of values. It represents the "equilibrium point" of the data.
  • Formula : Mean =∑xin\text{Mean} = \frac{\sum x_i}{n}Mean=n∑xi​​

where xix_ixi is each value and nnn is the total number of values.

 

  • Example : For grades 12, 15, 18, 10: Mean=12+15+18+104=554=13,75 text{Mean} = frac{12 + 15 + 18 + 10}{4} = frac{55}{4} = 13,75Moyenne=412+15+18+10 =455 =13,75
  • Utility :
    • Gives a quick overview.
    • Widely used for continuous or discrete quantitative data (e.g. wages, temperatures).
  • Limits :
    • Sensitive to extreme values (outliers). Example: If we add a score of 0 to the previous set, the average falls to 11.
    • Not very relevant for ordinal qualitative data (e.g., "good/average/bad").
  1. Median
  • Definition : This is the value that lies in the middle of a dataset sorted in ascending or descending order. She divides the data into two equal halves (50% below, 50% above).
  • Calculation :
    1. Sort the values.
    2. If the number of values (nnn) is odd, the median is the central value.
    3. If nnn is even, the median is the average of the two central values.
  • Example :
  • Suppose a running champion completes a typical 200-meter training run in the following times: 26.1 seconds, 25.6 seconds, 25.7 seconds, 25.2 seconds, 25.0 seconds, 27.8 seconds and 24.1 seconds. How is the median time calculated?
Rank Time (in seconds)
1 24.1
2 25,0
3 25.2
4 25.6
5 25.7
6 26,1
7 27.8
  • There are n = 7 values, an odd number. The median will therefore correspond to the rank value

(n+1) 2 = (7 +1) 2 = 4

The median time is 25.6 seconds.

 

  • Utility :
    • Resistant to extreme values, therefore ideal for asymmetric data (e.g. income from a few millionaires).
    • Better represents the central tendency in case of asymmetric distribution
    • Also works with ordinal data (e.g., rankings).
  • Limits :
    • Requires sorting data, which can be lengthy for large sets.
    • Less informative than average on overall distribution.
  1. Mode
  • Definition : It is the value (or values) that most often appears in a dataset. It represents the maximum frequency.
  • Calculation : Identify the most frequent value.
  • Example :
    • Data: 10, 12, 12, 15, 18 Mode = 12 (appears 2 times).
    • Data: 5, 5, 8, 8, 10 Mode = 5 and 8 (bimodal).
    • Data: 3, 4, 5, 6 No mode (all appear once).
  • Special cases:
    • Unimodal: A single mode (ex : 18 is the only mode)
    • Bimodal: Two modes (ex : if 14 and 18 appear the same number of times)
    • Multimodal: Multiple modes

 

  • Utility :
    • Works for qualitative and quantitative data
    • Useful for finding the most frequent value (dominant value).
  • Limits :
    • May not exist (no repeat) or be multiple (multiple modes).
    • Does not give any information about the distribution or centrality of other values.

 

Comparison and interpretation

Measure

When to use it?

Advantage

Disadvantage

Average

Quantitative data without extremes

Summarizes all values

Sensitive to outliers

Median

Data with outliers or asymmetric

Robust to extremes

Ignore extreme values

Mode

Qualitative or frequency data marked

Simple and intuitive

Uninformative alone

 

Concrete example

Let’s imagine the salaries of a small group: 2000, 2100, 2200, 2300, 50000 €.

  • Average : (2000 + 2100 + 2200 + 2300 + 50000) / 5 = 11 720 € Influenced by the 50,000 €, unrepresentative.
  • Median : Sort : 2000, 2100, 2200, 2300, 50000 Median = 2200 € Closer to reality for the majority.
  • Mode : No mode (no repetition).

In summary

  • The mean gives a global idea but can be misleading with outliers.
  • The median offers a robust view of the center, ideal for unbalanced data.
  • The mode highlights the most common values, perfect for categories. These three measures complement each other and their choice depends on the type of data and the purpose of the analysis!