Session Guide
Sampling to Study Drug Use




The purpose of this session is to expose participants to the range of methods available for sampling. Participants will be taught the underlying principles of sampling and will undertake practical sampling exercises. Participants should understand that each sampling method has different strengths and weaknesses, and can be used in different circumstances.

To study drug use in health facilities may require the use of a range of sampling techniques. Participants should also learn how to teach field staff to sample accurately under varying field situations.


By the end of the session participants will be able to:

1. Describe the principles on which sampling is based.

2. Identify and describe different sampling methods.

3. Select a sample of health facilities from a list of such facilities using at least two different methods.

4. Compare the results of the methods and discuss when each method might be used in the field.

Instruct others how to sample when studying drug use.


1. Designing and Conducting Health Systems Research Projects Vol. 2 Part 1 IDRC/WHO 1991 Available from IDRC, PO Box 8500, Ottawa, Ontario, Canada K1G 3HG See Pages 196-220

2. Vaughan, J. P., and Morrow, R. H., Manual of Epidemiology for District Health Management, WHO Geneva, 1989, p. 76-78.

3. Beaglehole, R., Bonita, R., Kjellström, T., Basic Epidemiology, WHO Geneva 1993, p. 46.

4. Lwanga, S. K., and Lemeshow, S., Sample Size Determination in Health Studies, WHO Geneva, 1991.

5. EPI INFO Version 6 Manual, p. 135-136, Plus using the program.




Sampling is a process by which we study a small part of a population to make judgments about that population. We sample as a part of our daily lives.

For example, if we want to know if Fruit Seller A is better than Fruit Seller B in a market, we would go to Seller A and examine a number of his/her fruits for quality and price and then repeat the process for Seller B. It would not be fair or accurate to base judgment on a single fruit from each seller and it would be impractical to check every fruit in the stall. The questions are: How do we select which fruit to examine, and how many to examine? Obviously, if we only examine the fruit at the front of the stall we may get an incorrect answer. So we need to find a sample method to check all the fruit by examining a sample of the fruit.

Whenever we want to learn about health in the community or practices in the health system, we need to draw samples since it would be impractical to collect data on every person or event. In drug use surveys we need to draw samples to select facilities to survey, prescriptions to study, or patients to observe.

If we wanted to know about prescribing in primary care facilities in a country we would probably get the wrong impression of the real situation if we only surveyed the five health centers closest to the central office of the Ministry of Health because these would likely be better than the average. If we wanted to examine prescribing in a health center we would be misled if we surveyed the first 20 cases attending on a Monday morning. There may be an excess of men with hangovers from weekend drinking or workers wanting sick notes to excuse them from going to work. So, to get a representative sample we would need to ensure that all facilities or patients can be included in the survey.





Sampling involves the selection of a number of study units from a defined study population.


A study unit may be a person, a health facility, a prescription, or another such unit.

The study population, sometimes called the reference population, is the collection of the entire population of all possible study units. Again, this population may be people, health facilities, prescriptions or other such units.

A representative sample has all the important characteristics of the population from which it is drawn.


A sampling frame is a list of all of the available units in the study population. If a complete listing is available, the sampling frame is identical to the study population. The method of sampling depends whether there is a sampling frame available. If a sampling frame exists, or if it can be created, probability sampling is used. If there is none available, probability samplings cannot be used.

There are two broad categories types of sampling methods. These are:

• Non -- probability sampling

• Probability sampling

The type of sampling depends whether there is a sampling frame available. If a sampling frame exists, or if it can be created, probability sampling is used. If there is none available, probability samplings cannot be used.

It is always better to use probability sampling if this is possible. However, in some situations, non -probability sampling is the only possible method. It is less A sample drawn using non-probability methods is likely to be less representative than a probability sampling and the so study results are less valid. When non -probability sampling is used in a survey this fact should be included in any report.


If a sampling frame is not available or it cannot be created, a non-probability sampling method will need to be used. There are two common methods. These are convenience sampling and quota sampling.

Convenience Sampling is a method by which, for convenience sake, the study units that happen to be available at the time of data collection are selected in the sample. This is the least representative sampling method.



Quota sampling is a method by which different categories of sample units are included to ensure that the sample contains units from all these categories. For example, a quota sample of patients from a health center that might included 10 patients with ARI, 10 with diarrhea, and 10 with malaria.

Both of these methods may be used in drug use studies. When measuring prescribing and dispensing times or in assessing patient understanding, a convenience sample of patients may be observed or interviewed the only practical method. A quota sample may be used for males and females to ensure that both genders are observed or interviewed. This may be important since men are often treated with more respect and have higher literacy rates. Also, men may be given priority over women or children and receive more thorough care.

Non-probability sampling is are not necessarily representative of the reference population. However, we often need to use these methods when we have inadequate record sample frames or when a time constraint exists which forces us to use them.


If a sampling frame (a list of the population units) exists then probability sampling may be used.

Probability sampling involves RANDOM selection procedures to ensure that each sample unit is chosen on the basis of CHANCE.

Whenever possible use probability sampling to obtain results which are not less biased. There are a number of different methods.

1. Simple Random Sampling


This is the simplest form of simple probability sampling. A lottery is an example of a random sample. The simple random sampling procedure is as follows:

a) Make a numbered list of all units in the reference population from which you will select the sample (for example, a list all the health centers in the country).

b) Decide on the size of the sample (for the WHO Drug Use Indicators method this would be a minimum of 20 facilities).

c) Choose the facilities to include by a lottery method. (For example the numbers of all the facilities can be placed in a box and drawn, a random number table can be used, or random numbers can be generated using a spreadsheet or calculator.)

This is the method used in Worksheet 1.1

2. Systematic Sampling

In systematic sampling, sample units are selected from a numbered list of all units in the study population by using a regular interval, starting from a random sampling starting point.

To calculate the sampling interval, divide the size of the list by the desired sample size. For example, if we want to select 20 health centers from a list of 46 in our sampling frame, our sampling interval would be 46/20 = 2.3.

The first facility chosen in this case can be either 1, 2 or 3, which are all the possible sampling units within the first sampling interval. This is selected by (1) choosing a random number between 0 and 1 (with at least 3 digits after the decimal point), then (2) multiplying this random number by the sampling interval, and (3) rounding this result upward to get the number of the first facility. For example, if the random number chosen is 0.183, the first unit for the sample is 0.183 x 2.3 = 0.421 which rounds upward to 1, so the first facility on the list is chosen for the sample.

Later facilities are selected by adding the sampling interval to the previous result. If the first result was 0.421. then the next facilities selected would be:

Facility 1

0.421 + 2.3 = 2.721 so Facility 3 (Remember: always round upward)

2.721 + 2.3 = 5.021 so Facility 6

5.021 + 2.3 = 7.321 so Facility 8

and so forth.

If the first result had been 1.749, then the first facility would be Facility 2, and the next facilities selected would be:

Facility 2

1.749 + 2.3 = 4.049 so Facility 5

4.049 + 2.3 = 6.349 so Facility 7

6.349 + 2.3 = 8.649 so Facility 9

and so forth.

The method just described gives every unit an equal chance of being selected. This method is used in Worksheet 1.2 . This method can also be used with minor modification to select units allowing for how large they are.

Sometimes it is desirable for clinics serving a larger populations to have a greater chance of being included in a sample. This method is called sampling with probability proportional to size. This method is used in Worksheet 1.3.

Systematic sampling is also useful when sampling prescriptions from a patient register. If a register contains 100 pages each with 25 lines of prescriptions and you need to select 30 prescriptions, the sampling interval would be:

100 x 25 = 83.3


Thus every 83rd prescription would be sampled. An alternative method described below called multistage sampling could also be used to select a sample from a patient register.


3. Stratified Sampling

Stratified sampling is used when the reference population contains clearly different sub-populations which should be considered separately.

For example, this would might be the case in a study which included urban and rural facilities, facilities with or without doctors, male or female patients.

When stratified sampling is used, the sample frame (the list of the overall population) is sorted into two or more groups. These different strata (groups) may then be sampled either randomly or systematically.


In our fruit sellers example, we would need might want to check the quality and price of each of the varieties of fruit sold rather than a single fruit.


The WHO manual, (p.59-60), recommends the use of stratified systematic sampling methods for selecting facilities. For example, the sampling frame might include the following list of facilities:



Facility Number

























This could then be grouped and sorted into 2 strata as follows:













































and a sample would be selected separately from both the urban list and the rural list.

This is the method used in Worksheet 1.4, 1.5, and 1.6.

4. Cluster Sampling

In a cluster sample, a group of sample units is selected together, rather than each unit being selected separately. This method may not be as representative as single unit sampling, but for logistic reasons may be necessary. The recommended EPI WHO sampling procedure of selecting 30 groups of 7 children is a common cluster sampling method.

The main advantage of cluster sampling is that the method is easy to use and often logistically simpler to organize. For example, when choosing houses in a community, it is easier to walk between neighboring houses than all over the community. The disadvantage is that the samples selected may be less representative especially when the number of clusters selected is low. Thus, when the cluster method is used, try to increase the sample size (that is, by increasing the number of clusters sampled). As a rough guide, double the sample size if cluster sampling is used.

In drug use studies, cluster sampling may be used for selecting facilities when distances are great. For example, a cluster of two facilities could be selected by randomly selecting a single facility and then selecting the closest facility to the one selected. This method might allow two facilities to be surveyed in a day when travel time between facilities is significant. This method is used in Worksheet 1.7.

5. Multistage Sampling

In multistage sampling, the methods described above can be combined. For example, we might wish to select 32 health facilities in a country containing 56 districts, each of which contains a number of health facilities. From the 56 districts, 16 districts would first be selected. In each district two health facilities would then be randomly selected. This would be two stage random sampling.

In the example above of selecting 30 prescriptions from a patient ledger of 100 pages containing 25 prescriptions per page, we can use systematic sampling of the pages and random selection of the prescription per page.

Thus we would calculate the sampling interval for the pages

100 +÷ 30 = 3.3

Randomly select the starting page from 1-4 as described above, then add 3.3 repeatedly to select the page numbers. On each page, randomly select a number from 1-25 using a calculator or a random number table to select the which individual prescription would actually be chosen on each page.


The method of deciding the desirable sample size is described in the HSR manual p. 205-209 and Table 7.1 and in the WHO Manual of Epidemiology for District Health Management on p.78. The EPI-INFO computer package contains a statistical calculator which calculates minimum sample sizes based on different assumptions.

The appropriate sample size depends on:

• Expected variation of the data. The more variation the larger sample required.

• The expected rate of the variable. For example, a smaller sample will be required to obtain the same degree of accuracy if the rate of antibiotic prescribing is 50% than if the rate is 15%.

• The degree of accuracy required. Because the entire population is not studied, every sample has some degree of uncertainty. The larger the sample, the less the uncertainty. This certainty is often measured in terms of a 95% confidence interval. For example, if we find a rate of 50% in a sample, how sure can we be that this is the true rate in the reference population?

For a sample size of 50, the range of certainty will be 36% - 64% (95% CI)

100, the range will be 40% - 60%

200, the range will be 43% - 57%

500, the range will be 45% - 55%

1000, the range will be 47% - 53%

Increasing the size of the sample increases the certainty, but after a certain point the value of the increase in certainty is not in proportion to the increase in effort and resources to collect the data.

The appropriate sample size is usually a compromise between what is STATISTICALLY DESIRABLE and what is FEASIBLE.

In general, a minimum sample size is 30

The WHO Manual on "How to Investigate Drug Use in Health Facilities" reflects the experiences of a number of surveys carried out in Africa, Asia, and Latin America.

These surveys have shown that there is less variation in practice within health facilities than between facilities. Thus to obtain reliable estimates, it is better to increase the number of facilities included in the sample rather than emphasizing the number of prescriptions surveyed or patients observed in each survey.

However the difficulty in doing these surveys is usually the logistic or transport problem of getting to the health facilities.

The compromise which has been reached for a simple cross-sectional survey would be 30 prescriptions from 20 facilities.

When individual facilities are being studied or compared, a minimum of 100 prescriptions should be collected. If the event being studied is very unusual, e.g. injection use in Bangladesh, or very frequent, e.g. generic use in Zimbabwe, this number may need to be increased to obtain sufficiently precise estimates.

When the indicators are used for supervision a method called Lot Quality Assurance Sampling (LQAS) may be used. This is described in Annex 4 of the WHO Manual, p. 77-81.


The principles underlying sampling should be understood by all people involved in drug use surveys. The field workers, enumerators, and supervisors should understand that any facility or any prescription or any patient attending could be included in the survey. Every effort should be made to avoid bias (systematic error) in selecting sampled units for study.


Sample Selection


This is a small group or individual activity that will lead you through a process of deciding on a sample for a prescribing study. There are different approaches to sample selection. Each group will be asked to implement at least two different strategies. The groups will then come together and compare the results of the different sampling methods.

Summary information about the province to be studied is given to you on the bottom of the Results Worksheet 1.1. It provides details on catchment population, whether the facility is urban or rural, and the staffing level at each facility. For your information, a summary sheet is also provided that lists the number of urban and rural facilities, the number staffed by physicians and paramedics, and the true average values of two indicators (# of drugs and % antibiotics) overall and in each subgroup.

Refer to Annex 1 of the Manual, p. 59, How to Investigate Drug use in Health Facilities.

Your group will be assigned one method of drawing a sample of 20 health facilities, either

1. simple random sampling (Worksheet 1.1)

2. systematic sampling (Worksheet 1.2)

3. systematic sampling with probability proportional to size (Worksheet 1.3)

4. stratified sampling by location (Worksheet 1.4)

5. stratified sampling by staffing (Worksheet 1.5)

6. cluster sampling (Worksheet 1.6)stratified sampling by location and staffing (Worksheet 1.6) cluster sampling (Worksheet 1.7)


A table of random numbers has been provided to draw the samples. When you have completed your assigned method, you should choose one other method, and draw another sample.

For each method, compute the values of two descriptive variables (% urban facilities, % facilities staffed by physicians) as well as the values of the two indicators. Compare your two sets of results and be prepared to discuss your experience in drawing the sample and the differences in results. Enter your results into the Results Worksheet 1.8 and compare your results with the true values. Be prepared to discuss which method would be best in drawing an actual sample and why.


CHOICE #1: A simple random sample (Worksheet 1.1)


Tear a sheet of paper into 52 pieces. Number them 1 to 52, crunch each up into a little ball, and place in a paper bag. Shake the bag well. Randomly remove 20 numbers. These 20 facilities are a simple random sample of the whole 52. Fill in the Results Worksheet to show the results of the sampling.


CHOICE #2: Systematic random sampling (Worksheet 1.2)

Use the method described in the Indicators Manual on p. 60, Step 3:

a. Calculate your sampling interval. This will be 52 ÷ 20 = _____.

b. Choose a random number between 0 and 1 from the random number table = _____.

c. Multiply the result in (a) by the random number selected in (b) = _____ .

d . Round up by adding 1 to this result and dropping any numbers after the decimal point. This is number of your first facility = _____.

e. Add the sampling interval calculated in (a) to the result in (c) = _____,

f. Round the number up . The result is the number of your second facility.

g. Add the number you calculated in (e) to the sampling interval calculated in (a) and use the results to choose another facility (remembering to round up). Continue with this procedure until you have selected 20 facilities [i.e., (a), (f), (g), plus 17 more].

d. Add the number selected in (a) to the random number in (c), and round the number up (again by adding 1 and dropping the decimal). The result is the number of your second facility = _____.

e. Add the number you calculated in (d) to the sampling interval calculated in (a). Round this number up to select your third facility = _____.

f. Continue with this procedure until you have selected 20 facilities [i.e., (c), (d), (e), plus 17 more].

h. Fill in the Results Worksheet  to show the results of the sampling.


CHOICE #3: Systematic Sampling proportional to size (Worksheet 1.3).

Order the list of facilities to be sampled in inverse order of their size, as in Worksheet 1.3, and calculate the cumulative population.

Calculate the sampling interval by dividing the total cumulative population by the number of facilities in the sample. This will be 1,721,549 ÷ 20 = _____ .

Choose a random number between 0 and 1 from the random number table = _____ .

Multiply the result in (a) by the random number selected in (b) = _____ .

Select for the sample the health facility whose cumulative total is the lowest one greater than the result in (c).

Add the sampling interval calculated in (a) to the result in (c) = _____ .

Select as the next facility the one whose cumulative total is greater than the new result in (e). It is possible for the same facility to be selected more than once by this method. In this case, the sample size in that facility would be increased accordingly.

Continue with this procedure until you have chosen 20 facilities.

Fill in the Results Worksheet to show the results of the sampling.



CHOICE # 4-5: Systematic random  Stratified Sampling of stratified or unstratified populations (Worksheets 1.4, 1.5)

If there is enough time, you can use Worksheets 1.3, 1.4, or 1.5 to systematically sample facilities based on population (Worksheet 1.3), or stratified by location (Worksheet 1.4) or stratified by staffing (Worksheet 1.5) or both staffing and location (Worksheet 1.6). You can use either simple random sampling (choice 1) or systematic sampling (choice 2) to select facilities from within each stratum. You can divide the 20 facilities to be chosen equally within the two strata (10 facilities from each), or according to their actual proportion in the entire population.

a. Calculate the sampling with the class. This is 1,721,538 ÷ by 20 = 86,076.9. Enter this number into the memory of your calculator.

b. A random number table has been generated and printed at the end of the worksheet. Select a number from this table. Multiply the random number by the sampling interval. This is your starting number.

c. Look at the cumulative population column. Select the first facility with the population figure greater than but closest to the random number. This is your first facility.

d. Take the cumulative population figure above (from your first facility) and add your sampling interval 86,076.9. Look for the cumulative population figure greater than but closest to this number. This will be your second facility.

Note: Hitting the "=" button again on most calculators adds the last number you entered again. Try this shortcut on your calculator.

e. Continue this sampling system until all 20 facilities are collected.

CHOICE # 6: Cluster Sampling

Use Worksheet 1.7 which is the same as Worksheet 1.1 and figure 1.1..

a. Randomly Select 10 facilities using a simple random sample.

b. Choose the next For each facility selected, choose the closest facility (figure 1.1) to also be included forming a cluster. If one of these "clustered" facilities is chosen in a later random selection, pick an alternative facility.

For each example, if you randomly select facility 11, choose 12. If you choose number 52, choose facility 1.


Results of Different Sampling Methods

Results Worksheet
Sampling Method Number


Number Rural %


Number Physician Number Paramedic %













Random Proportional to Size










Stratified By Staffing and Location





Figures of


21 25 45.7% 35 11 76.1%
Worksheet 1.1
wpe3.jpg (113195 bytes)
Worksheet 1.2
wpe4.jpg (118198 bytes)
Worksheet 1.3
wpe5.jpg (118952 bytes)
Worksheet 1.4
wpe6.jpg (118254 bytes)
Worksheet 1.5
wpe7.jpg (118623 bytes)
Worksheet 1.6
wpe9.jpg (118960 bytes)
Worksheet 1.7
wpe8.jpg (119336 bytes)

Back to top | Back to Table of Contents

Please send comments about this site to  Richard Laing, we encourage any and all feedback.  Copyright information:  All training materials are in the public domain and may be copied, adapted, used and reproduced with or without acknowledgement. We would appreciate being informed and being provided with copies of adapted materials that are used. Any translations of these materials should be sent to us so that we can place them on the web for others to use. PRDU CD-ROM Training Program Acknowledgements