The quartiles and the box (or boxplot) are key tools in descriptive statistics for analyzing data spread and distribution. They allow to visualize the distribution of a dataset and to identify possible extreme values (outliers).
1. Quartiles
The quartiles divide a sorted dataset into four equal parts, each containing 25% of the observations. They allow to understand the distribution of data beyond the simple median.
Definitions:
- Q1 (first quartile) : The value below which 25% of the data lies (the "low" in the middle).
- Q2 (second quartile) : The median, which separates the data into two equal halves (50%).
- Q3 (third quartile) : The value below which 75% of the data lies (the "top" of the middle).
- Interquartile Deviation (IQR) : The difference between Q3 and Q1 (IQR=Q3 Q1), which measures the dispersion of the central 50% of the data.
Example
Let’s take this dataset:
4, 7, 8, 10, 12, 15, 18, 21, 22
Calculation of quartiles
- Sort the data : They are already sorted here: 4, 7, 8, 10, 12, 15, 18, 21, 22
- Q1 = 1/4 (9+1) = 2.5th value between 7 and 8 (7+8)/2 = 7.5
- Q2 = 1/2 (9+1) = 5th value 12 (median)
- Q3 = 3/4 (9+1) = 7.5th value between 18 and 21 (18+21)/2 = 19.5
2. The Interquartile Interval (IQR)
The IQR (InterQuartile Range) is the difference between the 3e quartile and the 1er quartile:
IQR=Q3 Q1IQR = Q3 - Q1IQR=Q3 Q1
???? Interpretation:
- IQR weak The data is concentrated around the median
- high IQR Data is more scattered
???? Example (cont.):
IQR=19.5 7.5=12
3 . The Box with Whiskers (Boxplot)
The box plot is a graphical representation that summarizes the distribution of data by displaying quartiles and extreme values.
???? Whispering box elements:
???? Central box Between Q1 and Q3, containing 50% of the data
???? Whiskeres Extensions to extreme values (without outliers)
???? Outliers (outlier values) Points beyond normal range
???? How are whiskers defined?
- Lower bound: Q1 1.5 IQRQ1 - 1.5 times IQRQ1 1.5 IQR
- Upper bound: Q3+1.5 IQRQ3 + 1.5 times IQRQ3+1.5 IQR
➡ Any value outside of these bounds is considered an outlier.
4. Mustache Box Graphic Example
5. Why Use Quartiles and Box Plots?
Quartiles and box plots are very useful when you want to address the following aspects of data analysis:
✅ Compare multiple distributions
✅ Visualize scatter and outliers
✅ Identify data asymmetry