Drexel dragonThe Math ForumDonate to the Math Forum

Ask Dr. Math - Questions and Answers from our Archives
_____________________________________________
Associated Topics || Dr. Math Home || Search Dr. Math
_____________________________________________

Defining Quartiles

Date: 07/20/2002 at 16:37:30
From: Tom
Subject: quartiles

Dear Dr. Math:

We have a project for statistics class where we have to collect a 
set of data, then find the mean, median, mode, range, upper 
quartile, lower quartile, interquartile range, and standard 
deviation. We also have to plot the data in a stem-and-leaf plot, 
dot plot, histogram, and box-and-whisker plot.

I decided to collect data on the heights of the players on our 
soccer team, and got the following data:

{70", 71", 71", 71", 72", 73", 74", 74", 74", 74", 75", 75", 77", 
77", 77", 82"}

I didn't have any problems until I was checking my work using my 
calculator and a computer. All the values agreed with my hand 
calculations except the upper quartile, lower quartile, and 
interquartile range.

When I calculated the quartiles and IQR following the textbook, I 
got 77" (UQ), 71" (LQ) and 6" (IQR). But when I plugged the values 
into my calculator (a TI-83), it gave the upper quartile as 76", the 
lower quartile as 71.5", and the IQR as 4.5". I then tried making an 
Excel spreadsheet and it gave the upper quartile as 75.5", the lower 
quartile as 71.75", and the IQR is 3.75". Then I went to the 
computer lab at school and tried using Minitab. That program gave 
the upper quartile as 76.5", the lower quartile as 71.25", and the 
IQR as 5.25".  If they just disagreed with my calculations, I'd 
figure that I made a mistake, or there's some sort of rounding going 
on, since we're told to take the nearest data point and these 
programs obviously don't. But they don't even agree with each other. 
They can't all be right! What's going on?

Please help clear up this mystery.

Thanks,
Tom


Date: 07/20/2002 at 16:45:49
From: Doctor Twe
Subject: Re: quartiles

Hi Tom! Thanks for writing to Dr. Math!

Quartiles are simple in concept but can be complicated in execution.

The concept of quartiles is that you arrange the data in ascending 
order and divide it into four roughly equal parts. The upper 
quartile is the part containing the highest data values, the upper 
middle quartile is the part containing the next-highest data values, 
the lower quartile is the part containing the lowest data values, 
while the lower middle quartile is the part containing the next-
lowest data values.

Here's where it starts to get confusing. The terms 'quartile', 'upper
quartile' and 'lower quartile' each have two meanings. One definition
refers to the subset of all data values in each of those parts. For
example, if I say "my score was in the upper quartile on that math
test", I mean that my score was one of the values in the upper
quartile subset (i.e. the top 25% of all scores on that test). 

But the terms can also refer to cut-off values between the subsets.
The 'upper quartile' (sometimes labeled Q3 or UQ) can refer to a
cut-off value between the upper quartile subset and the upper middle
quartile subset. Similarly, the 'lower quartile' (sometimes labeled Q1
or LQ) can refer to a cut-off value between the lower quartile subset
and the lower middle quartile subset. 

The term 'quartiles' is sometimes used to collectively refer 
to these values plus the median (which is the cut-off value between 
the upper middle quartile subset and the lower middle quartile 
subset). John Tukey, the statistician who invented the box-and-
whisker plot, referred to these cut-off values as 'hinges' to avoid 
confusion. Unfortunately, not everyone followed his lead on that.

It gets worse. Statisticians don't agree on whether the quartile 
values ('hinges') should be points from the data set itself, or 
whether they can fall between the points (as the median can when 
there are an even number of data points). Furthermore, if the 
quartile value is not required to be a point in the data set itself, 
most data sets don't have a unique set of values {Q1, Q2, Q3} that 
divides the data into four "roughly equal" portions. The SAS 
statistical software package, for example, allows you to choose from 
among five different methods for calculating the quartile values. 
How then do we choose the "best" value for the quartiles?

The answer to that question depends in part on the statisticians' 
objective in finding quartile values. Tukey wanted a method that was 
simple to use, "without the aid of calculating machinery." Others 
seek to minimize the bias in selecting the quartile values. Still 
others want methods that can be extended to other quantiles (for 
example, quintiles or percentiles). Thus, different methods have 
been developed for calculating the quartile values.

Tukey's method for finding the quartile values is to find the median 
of the data set, then find the median of the upper and lower halves 
of the data set. If there are an odd number of values in the data 
set, include the median value in both halves when finding the 
quartile values. For example, if we have the data set:

   {1, 4, 9, 16, 25, 36, 49, 64, 81}

we first find the median value, which is 25. Since there are an odd 
number of values in the data set (9), we include the median in both 
halves. To find the quartile values, we must find the medians of:

   {1, 4, 9, 16, 25}  and  {25, 36, 49, 64, 81}

Since each of these subsets has an odd number of elements (5), we 
use the middle value. Thus the lower quartile value is 9 and the 
upper quartile value is 49.

The TI-83 uses a method described by Moore and McCabe (sometimes 
referred to as "M-and-M") to find quartile values. Their method is 
similar to Tukey's, but you *don't* include the median in either 
half when finding the quartile values. Using M-and-M on the data set 
above:

   {1, 4, 9, 16, 25, 36, 49, 64, 81}

we first find that the median value is 25. This time we'll exclude 
the median from each half. To find the quartile values, we must find 
the medians of:

   {1, 4, 9, 16}  and  {36, 49, 64, 81}

Since each of these data sets has an even number of elements (4), we 
average the middle two values. Thus the lower quartile value is 
(4+9)/2 = 6.5 and the upper quartile value is (49+64)/2 = 56.5.

With each of the above methods, the quartile values are always 
either one of the data points, or exactly half way between two data 
points.

Those methods involve only simple arithmetic and are easily 
extendable to octiles (eighths), hexadeciles (sixteenths), etc. They 
are not, however, extendable to quintiles (fifths) or percentiles 
(hundredths), etc. Furthermore, they tend to have a high bias. (That 
is, the quartile values calculated on subsets of the data set tend 
to vary more, and are not good predictors of the quartile values of 
the entire data set.)

Mendenhall and Sincich, in their text _Statistics for Engineering 
and the Sciences_, define a different method of finding quartile 
values. To apply their method on a data set with n elements, first 
calculate:

   L = (1/4)(n+1)

and round to the nearest integer. If L falls halfway between two 
integers, round up. The Lth element is the lower quartile value. 
Next calculate:

   U = (3/4)(n+1)

and round to the nearest integer. If U falls halfway between two 
integers, round down. The Uth element is the upper quartile value. 
So for our example data set:

   {1, 4, 9, 16, 25, 36, 49, 64, 81}

n = 9, so

   L = (1/4)(9+1) = 2.5

which becomes 3 after rounding up. The lower quartile value is the 
3rd data point, 9. Similarly:

   U = (3/4)(9+1) = 7.5

which becomes 7 after rounding down. The upper quartile value is the 
7th data point, 49.

Using this method, the upper and lower quartile values are always 
two of the data points.

Minitab uses the same method, except it doesn't round the values of 
L and U. Instead, it uses linear interpolation between the two 
closest data points. For our example above, instead of rounding L to 
3, Minitab would let L = 2.5 and find the value half way between the 
2nd and 3rd data points. In our example, that would be (4+9)/2 = 
6.5. Similarly, the upper quartile value would be half way between 
the 7th and 8th data points, which would be (49+64)/2 = 56.5. If L 
were 2.25, Minitab would find the value one fourth of the way 
between the 2nd and 3rd data points and if L were 2.75, Minitab 
would find the value three fourths of the way between the 2nd and 
3rd data points.

Excel uses a method described by Freund and Perles, which almost no 
one else uses. To apply this method on a data set with n elements, 
Excel first calculates L = (1/4)(n+3). The Lth element is the lower 
quartile value. If L is not an integer, Excel uses linear 
interpolation. Next it calculates U = (1/4)(3n+1). The Uth element 
is the upper quartile value. If U is not an integer, Excel again 
uses linear interpolation. So for our example data set:

   {1, 4, 9, 16, 25, 36, 49, 64, 81}

n = 9, so

   L = (1/4)(9+3) = 3

The lower quartile value is the 3rd data point, 9.

   U = (1/4)(3*9+1) = 7

The upper quartile value is the 7th data point, 49.

As we can see, these methods sometimes (but not always) produce the 
same results. To further illustrate, consider the following data 
sets:

     A = {1, 2, 3, 4, 5, 6, 7, 8}
     B = {1, 2, 3, 4, 5, 6, 7, 8, 9}
     C = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
     D = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}

Here are the upper and lower quartile values, as calculated by each 
method described above:

              Tukey  M&M  M&S  Mini  Excel
              -----  ---  ---  ----  -----
   Set A LQ:   2.5   2.5   2   2.25   2.75
         UQ:   6.5   6.5   7   6.75   6.25

   Set B LQ:   3.0   2.5   3   2.50   3.00
         UQ:   7.0   7.5   7   7.50   7.00

   Set C LQ:   3.0   3.0   3   2.75   3.25
         UQ:   8.0   8.0   8   8.25   7.75

   Set D LQ:   3.5   3.0   3   3.00   3.50
         UQ:   8.5   9.0   9   9.00   8.50


For more information on how and why different software packages 
calculate the quartile values, check out:

   Ancillary Notes on Quartiles 
<http://wwwmaths.murdoch.edu.au/units/c503a/unitnotes/boxhisto/
quartilesmore.html>

   Ticky-Tacky Boxes
   http://exploringdata.cqu.edu.au/ticktack.htm 

   Quartiles: How to calculate them?
   (This is a document in Microsoft Word format)
   http://www-wl.itss.nerc.ac.uk/products/sas/doc/quartiles.doc 


I hope this helps! If you have any more questions, write back!

- Doctor TWE, The Math Forum
  http://mathforum.org/dr.math/ 


Date: 07/20/2002 at 16:51:50
From: Tom
Subject: Thank you (quartiles)

Thanks, Dr. Math!  That really helps clear things up!
Tom
Associated Topics:
College Statistics
High School Statistics

Search the Dr. Math Library:


Find items containing (put spaces between keywords):
 
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

_____________________________________
Math Forum Home || Math Library || Quick Reference || Math Forum Search
_____________________________________

Ask Dr. MathTM
© 1994-2015 The Math Forum
http://mathforum.org/dr.math/