What is a median value?
The median of an array of numbers is the value of the middle item in the array, assuming the array is sorted. If the array has an even number of items, the median represents the average of the two middle values in the array.
This value is very popular one tries to understand "in which half my value is?". For example, I got a D (or 80) in my last test at school, am I in the top 50% of my class students or not?
As a practical example, let's look into the process of retrieving the median value from the following array of school test grades: [55, 80, 95, 100, 99, 70, 60].
- First, we'll sort the array: [55, 80, 95, 100, 99, 70, 60] ===> [55, 60, 70, 80, 95, 99, 100].
- The array contains 7 items, which isn't an even number, so therefore the median is the (7 / 2 + 1) item, which is the 4th item => 80.
Calculating the median value of a column in MySQL
Unfortunately, MySQL doesn't yet offer a built-in function to calculate the median value of a column. Therefore, we'll have to build a query our own.
Assume we would like to retrieve the median value from the column `grades`. Let's look into the algorithm we're going to use to build the query:
- Retrieve a sorted list of the values from column `grades` and attach an index to each row.
- If the amount of values in the column is odd, find the value of the item in the middle of the sorted list.
- If the amount of values in the column is even, find the values of the two items in the middle of the sorted list.
- Calculate the average of the value(s) retrieved in (3) and (4) above.
- Return the average value as the median value.
So how such a query will look like?
SET @rowindex := -1; SELECT AVG(g.grade) FROM (SELECT @rowindex:[email protected] + 1 AS rowindex, grades.grade AS grade FROM grades ORDER BY grades.grade) AS g WHERE g.rowindex IN (FLOOR(@rowindex / 2) , CEIL(@rowindex / 2));
Let's do some explaining:
- Let's start with the internal subquery - the subselect assigns @rowindex as an incremental index for each grade that is selected, and sorts the grades.
- Once we have the sorted list of grades, the outer query will fetch the middle items in the array. If the array contains an odd number of items, both values will be the single middle value.
- Then, the SELECT clause of the outer query returns the average of those two values as the median value.
As you can see, it's not as simple as running MEDIAN(column_name), but it's doable. Let's cross our fingers that MySQL will decide to add this functionality to the built-in product, which will simplify any query that fetches the median value.