Defining QuartilesDate: 07/20/2002 at 16:37:30 From: Tom Subject: quartiles Dear Dr. Math: We have a project for statistics class where we have to collect a set of data, then find the mean, median, mode, range, upper quartile, lower quartile, interquartile range, and standard deviation. We also have to plot the data in a stem-and-leaf plot, dot plot, histogram, and box-and-whisker plot. I decided to collect data on the heights of the players on our soccer team, and got the following data: {70", 71", 71", 71", 72", 73", 74", 74", 74", 74", 75", 75", 77", 77", 77", 82"} I didn't have any problems until I was checking my work using my calculator and a computer. All the values agreed with my hand calculations except the upper quartile, lower quartile, and interquartile range. When I calculated the quartiles and IQR following the textbook, I got 77" (UQ), 71" (LQ) and 6" (IQR). But when I plugged the values into my calculator (a TI-83), it gave the upper quartile as 76", the lower quartile as 71.5", and the IQR as 4.5". I then tried making an Excel spreadsheet and it gave the upper quartile as 75.5", the lower quartile as 71.75", and the IQR is 3.75". Then I went to the computer lab at school and tried using Minitab. That program gave the upper quartile as 76.5", the lower quartile as 71.25", and the IQR as 5.25". If they just disagreed with my calculations, I'd figure that I made a mistake, or there's some sort of rounding going on, since we're told to take the nearest data point and these programs obviously don't. But they don't even agree with each other. They can't all be right! What's going on? Please help clear up this mystery. Thanks, Tom Date: 07/20/2002 at 16:45:49 From: Doctor Twe Subject: Re: quartiles Hi Tom! Thanks for writing to Dr. Math! Quartiles are simple in concept but can be complicated in execution. The concept of quartiles is that you arrange the data in ascending order and divide it into four roughly equal parts. The upper quartile is the part containing the highest data values, the upper middle quartile is the part containing the next-highest data values, the lower quartile is the part containing the lowest data values, while the lower middle quartile is the part containing the next- lowest data values. Here's where it starts to get confusing. The terms 'quartile', 'upper quartile' and 'lower quartile' each have two meanings. One definition refers to the subset of all data values in each of those parts. For example, if I say "my score was in the upper quartile on that math test", I mean that my score was one of the values in the upper quartile subset (i.e. the top 25% of all scores on that test). But the terms can also refer to cut-off values between the subsets. The 'upper quartile' (sometimes labeled Q3 or UQ) can refer to a cut-off value between the upper quartile subset and the upper middle quartile subset. Similarly, the 'lower quartile' (sometimes labeled Q1 or LQ) can refer to a cut-off value between the lower quartile subset and the lower middle quartile subset. The term 'quartiles' is sometimes used to collectively refer to these values plus the median (which is the cut-off value between the upper middle quartile subset and the lower middle quartile subset). John Tukey, the statistician who invented the box-and- whisker plot, referred to these cut-off values as 'hinges' to avoid confusion. Unfortunately, not everyone followed his lead on that. It gets worse. Statisticians don't agree on whether the quartile values ('hinges') should be points from the data set itself, or whether they can fall between the points (as the median can when there are an even number of data points). Furthermore, if the quartile value is not required to be a point in the data set itself, most data sets don't have a unique set of values {Q1, Q2, Q3} that divides the data into four "roughly equal" portions. The SAS statistical software package, for example, allows you to choose from among five different methods for calculating the quartile values. How then do we choose the "best" value for the quartiles? The answer to that question depends in part on the statisticians' objective in finding quartile values. Tukey wanted a method that was simple to use, "without the aid of calculating machinery." Others seek to minimize the bias in selecting the quartile values. Still others want methods that can be extended to other quantiles (for example, quintiles or percentiles). Thus, different methods have been developed for calculating the quartile values. Tukey's method for finding the quartile values is to find the median of the data set, then find the median of the upper and lower halves of the data set. If there are an odd number of values in the data set, include the median value in both halves when finding the quartile values. For example, if we have the data set: {1, 4, 9, 16, 25, 36, 49, 64, 81} we first find the median value, which is 25. Since there are an odd number of values in the data set (9), we include the median in both halves. To find the quartile values, we must find the medians of: {1, 4, 9, 16, 25} and {25, 36, 49, 64, 81} Since each of these subsets has an odd number of elements (5), we use the middle value. Thus the lower quartile value is 9 and the upper quartile value is 49. The TI-83 uses a method described by Moore and McCabe (sometimes referred to as "M-and-M") to find quartile values. Their method is similar to Tukey's, but you *don't* include the median in either half when finding the quartile values. Using M-and-M on the data set above: {1, 4, 9, 16, 25, 36, 49, 64, 81} we first find that the median value is 25. This time we'll exclude the median from each half. To find the quartile values, we must find the medians of: {1, 4, 9, 16} and {36, 49, 64, 81} Since each of these data sets has an even number of elements (4), we average the middle two values. Thus the lower quartile value is (4+9)/2 = 6.5 and the upper quartile value is (49+64)/2 = 56.5. With each of the above methods, the quartile values are always either one of the data points, or exactly half way between two data points. Those methods involve only simple arithmetic and are easily extendable to octiles (eighths), hexadeciles (sixteenths), etc. They are not, however, extendable to quintiles (fifths) or percentiles (hundredths), etc. Furthermore, they tend to have a high bias. (That is, the quartile values calculated on subsets of the data set tend to vary more, and are not good predictors of the quartile values of the entire data set.) Mendenhall and Sincich, in their text _Statistics for Engineering and the Sciences_, define a different method of finding quartile values. To apply their method on a data set with n elements, first calculate: L = (1/4)(n+1) and round to the nearest integer. If L falls halfway between two integers, round up. The Lth element is the lower quartile value. Next calculate: U = (3/4)(n+1) and round to the nearest integer. If U falls halfway between two integers, round down. The Uth element is the upper quartile value. So for our example data set: {1, 4, 9, 16, 25, 36, 49, 64, 81} n = 9, so L = (1/4)(9+1) = 2.5 which becomes 3 after rounding up. The lower quartile value is the 3rd data point, 9. Similarly: U = (3/4)(9+1) = 7.5 which becomes 7 after rounding down. The upper quartile value is the 7th data point, 49. Using this method, the upper and lower quartile values are always two of the data points. Minitab uses the same method, except it doesn't round the values of L and U. Instead, it uses linear interpolation between the two closest data points. For our example above, instead of rounding L to 3, Minitab would let L = 2.5 and find the value half way between the 2nd and 3rd data points. In our example, that would be (4+9)/2 = 6.5. Similarly, the upper quartile value would be half way between the 7th and 8th data points, which would be (49+64)/2 = 56.5. If L were 2.25, Minitab would find the value one fourth of the way between the 2nd and 3rd data points and if L were 2.75, Minitab would find the value three fourths of the way between the 2nd and 3rd data points. Excel uses a method described by Freund and Perles, which almost no one else uses. To apply this method on a data set with n elements, Excel first calculates L = (1/4)(n+3). The Lth element is the lower quartile value. If L is not an integer, Excel uses linear interpolation. Next it calculates U = (1/4)(3n+1). The Uth element is the upper quartile value. If U is not an integer, Excel again uses linear interpolation. So for our example data set: {1, 4, 9, 16, 25, 36, 49, 64, 81} n = 9, so L = (1/4)(9+3) = 3 The lower quartile value is the 3rd data point, 9. U = (1/4)(3*9+1) = 7 The upper quartile value is the 7th data point, 49. As we can see, these methods sometimes (but not always) produce the same results. To further illustrate, consider the following data sets: A = {1, 2, 3, 4, 5, 6, 7, 8} B = {1, 2, 3, 4, 5, 6, 7, 8, 9} C = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} D = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} Here are the upper and lower quartile values, as calculated by each method described above: Tukey M&M M&S Mini Excel ----- --- --- ---- ----- Set A LQ: 2.5 2.5 2 2.25 2.75 UQ: 6.5 6.5 7 6.75 6.25 Set B LQ: 3.0 2.5 3 2.50 3.00 UQ: 7.0 7.5 7 7.50 7.00 Set C LQ: 3.0 3.0 3 2.75 3.25 UQ: 8.0 8.0 8 8.25 7.75 Set D LQ: 3.5 3.0 3 3.00 3.50 UQ: 8.5 9.0 9 9.00 8.50 For more information on how and why different software packages calculate the quartile values, check out: Ancillary Notes on Quartiles <http://wwwmaths.murdoch.edu.au/units/c503a/unitnotes/boxhisto/ quartilesmore.html> Ticky-Tacky Boxes http://exploringdata.cqu.edu.au/ticktack.htm Quartiles: How to calculate them? (This is a document in Microsoft Word format) http://www-wl.itss.nerc.ac.uk/products/sas/doc/quartiles.doc I hope this helps! If you have any more questions, write back! - Doctor TWE, The Math Forum http://mathforum.org/dr.math/ Date: 07/20/2002 at 16:51:50 From: Tom Subject: Thank you (quartiles) Thanks, Dr. Math! That really helps clear things up! Tom |
Search the Dr. Math Library: |
[Privacy Policy] [Terms of Use]
Ask Dr. MathTM
© 1994-2015 The Math Forum
http://mathforum.org/dr.math/