Session Guide
Sampling to Study Drug Use
PURPOSE AND CONTENT 
PURPOSE AND
CONTENT The purpose of this session is to expose participants to the range of methods available for sampling. Participants will be taught the underlying principles of sampling and will undertake practical sampling exercises. Participants should understand that each sampling method has different strengths and weaknesses, and can be used in different circumstances. To study drug use in health facilities may require the use of a range of sampling techniques. Participants should also learn how to teach field staff to sample accurately under varying field situations. By the end of the session participants will be able to:
Sampling is a process by which we study a small part of a population to make judgments about that population. We sample as a part of our daily lives. For example, if we want to know if Fruit Seller A is better than Fruit Seller B in a market, we would go to Seller A and examine a number of his/her fruits for quality and price and then repeat the process for Seller B. It would not be fair or accurate to base judgment on a single fruit from each seller and it would be impractical to check every fruit in the stall. The questions are: How do we select which fruit to examine, and how many to examine? Obviously, if we only examine the fruit at the front of the stall we may get an incorrect answer. So we need to find a sample method to check all the fruit by examining a sample of the fruit. Whenever we want to learn about health in the community or practices in the health system, we need to draw samples since it would be impractical to collect data on every person or event. In drug use surveys we need to draw samples to select facilities to survey, prescriptions to study, or patients to observe. If we wanted to know about prescribing in primary care facilities in a country we would probably get the wrong impression of the real situation if we only surveyed the five health centers closest to the central office of the Ministry of Health because these would likely be better than the average. If we wanted to examine prescribing in a health center we would be misled if we surveyed the first 20 cases attending on a Monday morning. There may be an excess of men with hangovers from weekend drinking or workers wanting sick notes to excuse them from going to work. So, to get a representative sample we would need to ensure that all facilities or patients can be included in the survey.
A study unit may be a person, a health facility, a prescription, or another such unit. The study population, sometimes called the reference population, is the collection of the entire population of all possible study units. Again, this population may be people, health facilities, prescriptions or other such units.
A sampling frame is a list of all of the available units in the study population. If a complete listing is available, the sampling frame is identical to the study population. The method of sampling depends whether there is a sampling frame available. If a sampling frame exists, or if it can be created, probability sampling is used. If there is none available, probability samplings cannot be used. There are two broad categories types of sampling methods. These are: • Non  probability sampling • Probability sampling The type of sampling depends whether there is a sampling frame available. If a sampling frame exists, or if it can be created, probability sampling is used. If there is none available, probability samplings cannot be used. It is always better to use probability sampling if this is possible. However, in some situations, non probability sampling is the only possible method. It is less A sample drawn using nonprobability methods is likely to be less representative than a probability sampling and the so study results are less valid. When non probability sampling is used in a survey this fact should be included in any report. NON PROBABILITY SAMPLING METHODS If a sampling frame is not available or it cannot be created, a nonprobability sampling method will need to be used. There are two common methods. These are convenience sampling and quota sampling. Convenience Sampling is a method by which, for convenience sake, the study units that happen to be available at the time of data collection are selected in the sample. This is the least representative sampling method.
Quota sampling is a method by which different categories of sample units are included to ensure that the sample contains units from all these categories. For example, a quota sample of patients from a health center that might included 10 patients with ARI, 10 with diarrhea, and 10 with malaria. Both of these methods may be used in drug use studies. When measuring prescribing and dispensing times or in assessing patient understanding, a convenience sample of patients may be observed or interviewed the only practical method. A quota sample may be used for males and females to ensure that both genders are observed or interviewed. This may be important since men are often treated with more respect and have higher literacy rates. Also, men may be given priority over women or children and receive more thorough care. Nonprobability sampling is are not necessarily representative of the reference population. However, we often need to use these methods when we have inadequate record sample frames or when a time constraint exists which forces us to use them. If a sampling frame (a list of the population units) exists then probability sampling may be used.
Whenever possible use probability sampling to obtain results which are not less biased. There are a number of different methods. 1. Simple Random Sampling
This is the simplest form of simple probability sampling. A lottery is an example of a random sample. The simple random sampling procedure is as follows:
2. Systematic Sampling In systematic sampling, sample units are selected from a numbered list of all units in the study population by using a regular interval, starting from a random sampling starting point. To calculate the sampling interval, divide the size of the list by the desired sample size. For example, if we want to select 20 health centers from a list of 46 in our sampling frame, our sampling interval would be 46/20 = 2.3. The first facility chosen in this case can be either 1, 2 or 3, which are all the possible sampling units within the first sampling interval. This is selected by (1) choosing a random number between 0 and 1 (with at least 3 digits after the decimal point), then (2) multiplying this random number by the sampling interval, and (3) rounding this result upward to get the number of the first facility. For example, if the random number chosen is 0.183, the first unit for the sample is 0.183 x 2.3 = 0.421 which rounds upward to 1, so the first facility on the list is chosen for the sample. Later facilities are selected by adding the sampling interval to the previous result. If the first result was 0.421. then the next facilities selected would be: Facility 1 0.421 + 2.3 = 2.721 so Facility 3 (Remember: always round upward) 2.721 + 2.3 = 5.021 so Facility 6 5.021 + 2.3 = 7.321 so Facility 8 and so forth. If the first result had been 1.749, then the first facility would be Facility 2, and the next facilities selected would be: Facility 2 1.749 + 2.3 = 4.049 so Facility 5 4.049 + 2.3 = 6.349 so Facility 7 6.349 + 2.3 = 8.649 so Facility 9 and so forth. The method just described gives every unit an equal chance of being selected. This method is used in Worksheet 1.2 . This method can also be used with minor modification to select units allowing for how large they are. Sometimes it is desirable for clinics serving a larger populations to have a greater chance of being included in a sample. This method is called sampling with probability proportional to size. This method is used in Worksheet 1.3. Systematic sampling is also useful when sampling prescriptions from a patient register. If a register contains 100 pages each with 25 lines of prescriptions and you need to select 30 prescriptions, the sampling interval would be:
Thus every 83rd prescription would be sampled. An alternative method described below called multistage sampling could also be used to select a sample from a patient register.
3. Stratified Sampling Stratified sampling is used when the reference population contains clearly different subpopulations which should be considered separately. For example, this would might be the case in a study which included urban and rural facilities, facilities with or without doctors, male or female patients. When stratified sampling is used, the sample frame (the list of the overall population) is sorted into two or more groups. These different strata (groups) may then be sampled either randomly or systematically.
In our fruit sellers example, we would need might want to check the quality and price of each of the varieties of fruit sold rather than a single fruit.
The WHO manual, (p.5960), recommends the use of stratified systematic sampling methods for selecting facilities. For example, the sampling frame might include the following list of facilities:
This could then be grouped and sorted into 2 strata as follows:
and a sample would be selected separately from both the urban list and the rural list. This is the method used in Worksheet 1.4, 1.5, and 1.6. 4. Cluster Sampling In a cluster sample, a group of sample units is selected together, rather than each unit being selected separately. This method may not be as representative as single unit sampling, but for logistic reasons may be necessary. The recommended EPI WHO sampling procedure of selecting 30 groups of 7 children is a common cluster sampling method. The main advantage of cluster sampling is that the method is easy to use and often logistically simpler to organize. For example, when choosing houses in a community, it is easier to walk between neighboring houses than all over the community. The disadvantage is that the samples selected may be less representative especially when the number of clusters selected is low. Thus, when the cluster method is used, try to increase the sample size (that is, by increasing the number of clusters sampled). As a rough guide, double the sample size if cluster sampling is used. In drug use studies, cluster sampling may be used for selecting facilities when distances are great. For example, a cluster of two facilities could be selected by randomly selecting a single facility and then selecting the closest facility to the one selected. This method might allow two facilities to be surveyed in a day when travel time between facilities is significant. This method is used in Worksheet 1.7. 5. Multistage Sampling In multistage sampling, the methods described above can be combined. For example, we might wish to select 32 health facilities in a country containing 56 districts, each of which contains a number of health facilities. From the 56 districts, 16 districts would first be selected. In each district two health facilities would then be randomly selected. This would be two stage random sampling. In the example above of selecting 30 prescriptions from a patient ledger of 100 pages containing 25 prescriptions per page, we can use systematic sampling of the pages and random selection of the prescription per page. Thus we would calculate the sampling interval for the pages
Randomly select the starting page from 14 as described above, then add 3.3 repeatedly to select the page numbers. On each page, randomly select a number from 125 using a calculator or a random number table to select the which individual prescription would actually be chosen on each page. SAMPLE SIZE The method of deciding the desirable sample size is described in the HSR manual p. 205209 and Table 7.1 and in the WHO Manual of Epidemiology for District Health Management on p.78. The EPIINFO computer package contains a statistical calculator which calculates minimum sample sizes based on different assumptions. The appropriate sample size depends on:
Increasing the size of the sample increases the certainty, but after a certain point the value of the increase in certainty is not in proportion to the increase in effort and resources to collect the data. The appropriate sample size is usually a compromise between what is STATISTICALLY DESIRABLE and what is FEASIBLE.
The WHO Manual on "How to Investigate Drug Use in Health Facilities" reflects the experiences of a number of surveys carried out in Africa, Asia, and Latin America. These surveys have shown that there is less variation in practice within health facilities than between facilities. Thus to obtain reliable estimates, it is better to increase the number of facilities included in the sample rather than emphasizing the number of prescriptions surveyed or patients observed in each survey. However the difficulty in doing these surveys is usually the logistic or transport problem of getting to the health facilities. The compromise which has been reached for a simple crosssectional survey would be 30 prescriptions from 20 facilities. When individual facilities are being studied or compared, a minimum of 100 prescriptions should be collected. If the event being studied is very unusual, e.g. injection use in Bangladesh, or very frequent, e.g. generic use in Zimbabwe, this number may need to be increased to obtain sufficiently precise estimates. When the indicators are used for supervision a method called Lot Quality Assurance Sampling (LQAS) may be used. This is described in Annex 4 of the WHO Manual, p. 7781. The principles underlying sampling should be understood by all people involved in drug use surveys. The field workers, enumerators, and supervisors should understand that any facility or any prescription or any patient attending could be included in the survey. Every effort should be made to avoid bias (systematic error) in selecting sampled units for study. Sample Selection RATIONALE This is a small group or individual activity that will lead you through a process of deciding on a sample for a prescribing study. There are different approaches to sample selection. Each group will be asked to implement at least two different strategies. The groups will then come together and compare the results of the different sampling methods. Summary information about the province to be studied is given to you on the bottom of the Results Worksheet 1.1. It provides details on catchment population, whether the facility is urban or rural, and the staffing level at each facility. For your information, a summary sheet is also provided that lists the number of urban and rural facilities, the number staffed by physicians and paramedics, and the true average values of two indicators (# of drugs and % antibiotics) overall and in each subgroup. Refer to Annex 1 of the Manual, p. 59, How to Investigate Drug use in Health Facilities. Your group will be assigned one method of drawing a sample of 20 health facilities, either 1. simple random sampling (Worksheet 1.1) 2. systematic sampling (Worksheet 1.2) 3. systematic sampling with probability proportional to size (Worksheet 1.3) 4. stratified sampling by location (Worksheet 1.4) 5. stratified sampling by staffing (Worksheet 1.5) 6. cluster sampling (Worksheet 1.6)stratified sampling by location and staffing (Worksheet 1.6) cluster sampling (Worksheet 1.7)
A table of random numbers has been provided to draw the samples. When you have completed your assigned method, you should choose one other method, and draw another sample. For each method, compute the values of two descriptive variables (% urban facilities, % facilities staffed by physicians) as well as the values of the two indicators. Compare your two sets of results and be prepared to discuss your experience in drawing the sample and the differences in results. Enter your results into the Results Worksheet 1.8 and compare your results with the true values. Be prepared to discuss which method would be best in drawing an actual sample and why. INSTRUCTIONS CHOICE #1: A simple random sample (Worksheet 1.1)
Tear a sheet of paper into 52 pieces. Number them 1 to 52, crunch each up into a little ball, and place in a paper bag. Shake the bag well. Randomly remove 20 numbers. These 20 facilities are a simple random sample of the whole 52. Fill in the Results Worksheet to show the results of the sampling.
CHOICE #2: Systematic random sampling (Worksheet 1.2) Use the method described in the Indicators Manual on p. 60, Step 3:
CHOICE #3: Systematic Sampling proportional to size (Worksheet 1.3). Order the list of facilities to be sampled in inverse order of their size, as in Worksheet 1.3, and calculate the cumulative population.
CHOICE # 45: Systematic random Stratified Sampling of stratified or unstratified populations (Worksheets 1.4, 1.5) If there is enough time, you can use Worksheets 1.3, 1.4, or 1.5 to systematically sample facilities based on population (Worksheet 1.3), or stratified by location (Worksheet 1.4) or stratified by staffing (Worksheet 1.5) or both staffing and location (Worksheet 1.6). You can use either simple random sampling (choice 1) or systematic sampling (choice 2) to select facilities from within each stratum. You can divide the 20 facilities to be chosen equally within the two strata (10 facilities from each), or according to their actual proportion in the entire population.
CHOICE # 6: Cluster Sampling Use Worksheet 1.7 which is the same as Worksheet 1.1 and figure 1.1.. a. Randomly Select 10 facilities using a simple random sample.
ACTIVITY ONE – Results of Different Sampling Methods

Please send comments about this site to Richard Laing, we encourage any and all feedback. Copyright information: All training materials are in the public domain and may be copied, adapted, used and reproduced with or without acknowledgement. We would appreciate being informed and being provided with copies of adapted materials that are used. Any translations of these materials should be sent to us so that we can place them on the web for others to use. PRDU CDROM Training Program Acknowledgements 