(2004; 98 pages)

### 5.4 Probability sampling methods for quantitative studies

In quantitative studies we aim to measure variables and generalize findings obtained from a representative sample from the total population. In such studies, we will be confronted with the following questions:

• which group of people (study population) do we want to draw a sample from?

• how many people do we need in our sample?

• how will these people be selected? Is there an administrative list of the (sampling frame) units of the population involved?

The study population has to be clearly defined, for example, according to age, sex and residence. Apart from people, a study population may consist of villages, institutions, records, etc. Each study population consists of study units. The way one defines the study population and the study unit depends on the problem to be investigated.

If researchers want to draw conclusions that are valid for the whole study population, they should take care to draw a sample in such a way that it is representative of that population.

A representative sample is one that has all the important characteristics of the population from which it is drawn.

If it is intended to interview 100 mothers to obtain a complete picture of drug use practices in District X these mothers would need to be selected from a representative sample of villages. It would be unwise to select them from only one or two villages, as this might give a distorted or biased picture. It would also be unwise to interview only mothers who attend the under-fives clinic, as those who do not attend this clinic may wean their children differently. An important issue influencing the choice of the most appropriate sampling method is whether a sampling frame is available, that is, a listing of all the units that compose the study population. If a sampling frame does exist or can be compiled, probability sampling methods can be used. With these methods, each study unit has an equal or at least a known probability of being selected in the sample.

Five probability sampling methods are discussed below:

- Simple random sampling

- Systematic sampling

- Stratified sampling

- Cluster sampling

- Multi-stage sampling.

*Simple random sampling*

This is the simplest form of probability sampling. To select a simple random sample you need to:

• make a numbered list of all the units in the population from which you want to draw a sample or use an already existing one (sampling frame)

• decide on the size of the sample (this will be discussed in section 5.6)

• select the required number of sampling units, using a ‘lottery’ method or a table of random numbers.

Simple random sampling can be used for the weekly illness recall method and when selecting facilities for simulated client visits (see Chapter 3).

*Systematic sampling*

In systematic sampling, individuals or households are chosen at regular intervals from the sampling frame. For this method we randomly select a number to tell us where to start selecting individuals from the list.

For example, a systematic sample is to be selected from 1,200 students at a school. The sample size selected is 100. The sampling fraction is 1200/100. The sampling interval is therefore 12. The number of the first student to be included in the sample is chosen randomly, for example, by blindly picking one out of 12 pieces of paper, numbered 1 to 12. If number 6 is picked, then every twelfth student will be included in the sample, starting with student number 6, until 100 students are selected. The numbers selected would be 6, 18, 30, 42, etc.

Systematic sampling is usually less time-consuming and easier to perform than simple random sampling. However, there is a risk of bias, as the sampling interval may coincide with a systematic variation in the sampling frame. For instance, if we want to select a random sample of days on which to count clinic attendance, systematic sampling with a sampling interval of 7 days would be inappropriate, as all study days would fall on the same day of the week, which might, for example, be a market day.

*Stratified sampling*

The simple random sampling method described above does not ensure that the proportion of some individuals with certain characteristics will be included. If it is important that the sample includes representative groups of study units with specific characteristics (for example, residents from urban and rural areas, or different age groups), then the sampling frame must be divided into groups, or strata, according to these characteristics. Random or systematic samples of a predetermined size will then have to be obtained from each group (stratum). This is called stratified sampling.

Stratified sampling is only possible when we know what proportion of the study population belongs to each group we are interested in. An advantage of stratified sampling is that it is possible to take a relatively large sample from a small group in the study population. This makes it possible to get a sample that is big enough to enable researchers to draw valid conclusions about a relatively small group without having to collect an unnecessarily large (and hence expensive) sample of the other, larger groups. However, in doing so, unequal sampling fractions are used and it is important to correct for this when generalizing our findings to the whole study population.

A survey is conducted on self-medication practices in a district comprising 20,000 households, of which 20% are urban and 80% rural. It is suspected that in urban areas self-medication is less common due to the vicinity of health centres. A decision is made to include 100 urban households (out of 4,000, which gives a 1 in 40 sample) and 200 rural households (out of 16,000, which gives a 1 in 80 sample). This allows for a good comparison between urban and rural self-medication practices. Because we know the sampling fraction for both strata, the rates for self-medication for all the district households can be calculated.

*Cluster sampling*

It may be difficult or impossible to take a simple random sample of the units of the study population, either because a complete sampling frame does not exist or because of other logistical difficulties (e.g., visiting people scattered over a large area may be too time-consuming). However, when a list of groupings of study units is available (for example, villages or schools) or can be easily compiled, a number of these groupings can be randomly selected. The selection of groups of study units (clusters) instead of the selection of study units individually is called cluster sampling.

Clusters are often geographic units (for example, districts, villages) or organizational units (e.g., clinics, training groups). In a study of the knowledge, attitudes and practices related to family planning in a region’s rural communities, a list is made of all the villages. Using this list, a random sample of villages is chosen and a defined number of adults in the selected villages are interviewed.

*Multi-stage sampling*

A multi-stage sampling procedure is carried out in phases and usually involves more than one sampling method. In very large and diverse populations sampling may be done in two or more stages. This is often the case in community-based studies, in which the people to be interviewed are from different villages, and the villages have to be chosen from different areas.

In a study of a district’s treatment of acute respiratory infections, 150 households are to be visited for interviews with family members, as well as for observations on medicines kept in the homes. The district is composed of six wards and each ward has between six and nine villages. The following four-stage sampling procedure could be performed:

1. Select three wards out of the six by simple random sampling.

2. For each ward, select five villages by simple random sampling (15 villages in total).

3. For each village select 10 households. Because simply choosing households in the centre of the village would produce a biased sample, the following systematic sampling procedure is proposed:

- go to the centre of the village

- choose a direction in a random way: spin a bottle on the ground and choose the direction the bottleneck indicates

- walk in the chosen direction and select every third or every fifth household (depending on the size of the village) until you have the 10 you need. If you reach the boundary of the village and you still do not have 10 households, return to the centre of the village, walk in the opposite direction and continue to select your sample in the same way until you have 10. If there is nobody in a chosen household, take the next nearest one.

Decide beforehand who to interview (for example, the head of the household, if present, or the oldest adult who lives there and who is available).

*Strengths and weaknesses of cluster and multi-stage sampling*

The **strengths** of cluster and multi-stage sampling are that:

• a sampling frame of individual units is not required for the whole population. Initially a sampling frame of clusters is sufficient. Only within the clusters that are finally selected do we need to list and sample the individual units

• the sample is easier to select than a simple random sample of similar size because the individual units in the sample are physically together in groups, instead of scattered all over the study population.

The **weakness** of cluster and multi-stage sampling is that:

• compared to simple random sampling, there is a larger probability that the final sample will not be representative of the total study population. The likelihood of the sample not being representative depends mainly on the number of clusters selected in the first stage. The larger the number of clusters, the greater the likelihood that the sample will be representative. If you use cluster-sampling, you should increase your sample size by about 50%.