Prepared by Peter Kellner for ESOMAR (www.esomar.org)
- What is an opinion poll?
- What makes a survey “scientific”?
- How does a poll choose a sample that is truly representative?
- Do polling companies do anything else to achieve representative samples?
- Are other kinds of surveys bound to be wrong?
- But surely a phone-in or write-in poll in which, say, one million people take part is likely to be more accurate than an opinion poll sample of 1,000?
- How can you possibly tell what millions of people think by asking just 1,000 or 2,000 respondents?
- But isn’t there some risk of sampling error in a poll of 1,000 or 2,000 people?
- You say those calculations apply to “a random poll with a 100% response rate”. Surely that’s pie in the sky?
- So isn’t the real margin of error much larger?
- Doesn’t this mean that polls can’t really be trusted at all?
- I have seen polls conducted by different, well-regarded, companies on the same issue produce very different results. How come?
- When someone sends me a poll, how can I tell whether to take it seriously or not?
- 1. What is an opinion poll?
- An opinion poll is a scientific survey designed to measure the views of a specific group — for example a country’s electors (for most political polls) or parents or trade union members.
- 2. What makes a survey “scientific”?
- The two main characteristics of scientific surveys are a) that respondents are chosen by the research company, and b) that sufficient information is collected about respondent to ensure that the data in the published results match the profile of the group being surveyed. For example, if the population being sampled contains 52% who are women and 30% who are over 55, then a typical opinion poll will ensure that its published data contains these proportions of women and older respondents.
- 3. How does a poll choose a sample that is truly representative?
- There two main methods. The first is “random” sampling, the second “quota sampling”. With random sampling, a polling company either uses a list of randomly-drawn telephone numbers or email addresses (for telephone or some Internet polls); or visits randomly-drawn addresses or names from a list such as an electoral register (for some face-to-face surveys). The polling company then contacts people on those telephone numbers or at those addresses, and asks them to take part in the survey.
- “Quota” sampling involves setting quotas — for example, age and gender — and seeking out different people in each location who, together, match those characteristics. Quota polls are often used in face-to-face surveys. In addition, some Internet polls employ quota samples to select representative samples from a database of people who have already provided such information about themselves.
- 4. Do polling companies do anything else to achieve representative samples?
- Usually they do. While well-conducted random and quota samples provide a broad approximation to the public, there are all kinds of reasons why they might contain slightly too many of some groups and slightly too few of others. What normally happens is that polling companies ask respondents not only about their views but about themselves. This information is then used to compare the sample with, for example, census statistics. The raw numbers from the poll are then adjusted slightly, up or down, to match the profile of the population being surveyed. If, for example, a poll finds that, when its survey-work is complete, that it has 100 members of a particular demographic group, but should have 110 of them (in a poll of, say, 1,000 or 2,000), then it will “weight” the answers of that group so that each of those 100 respondents counts as 1.1 people. This way, the published percentages should reflect the population as a whole
- 5. Are other kinds of surveys bound to be wrong?
- No. Just as a stopped clock tells the right time twice a day, unscientific surveys will occasionally produce right percentages. But they are far more likely to be badly wrong. The most common forms of unscientific surveys are phone-in polls conducted by television programmes and self-selecting surveys conducted over the Internet. These contain two defects. First, their samples are self-selecting. Such polls tend to attract people who feel passionately about the subject of the poll, rather than a representative sample. Second, such polls seldom collect the kind of extra information (such as gender and age) that would allow some judgement to be made about the nature of the sample.
- 6. But surely a phone-in or write-in poll in which, say, one million people take part is likely to be more accurate than an opinion poll sample of 1,000?
- Not so. A biased sample is a biased sample, however large it is. One celebrated example of this was the US Presidential Election in 1936. A magazine, Literary Digest, sent out 10 million post cards asking people how they would vote, received almost 2.3 million back and said that Alfred Landon was leading Franklin Roosevelt by 57-43 per cent. The Digest did not gather information that would allow it to judge the quality of its sample and correct, or “weight”, groups that were under- or over-represented. A young pollster called George Gallup employed a much smaller sample (though, at 50,000, it was much larger than those normally used today), but because he ensured that it was representative, he correctly showed Roosevelt on course to win by a landslide. In the event, Roosevelt won 60% and Landon just 37%. The Literary Digest closed down soon afterwards.
- 7. How can you possibly tell what millions of people think by asking just 1,000 or 2,000 respondents?
- In much the same way that a chef can judge a large vat of soup by tasting just one spoonful. Providing that the soup has been well stirred, so that the spoonful is properly “representative”, one spoonful is sufficient. Polls operate on the same principle: achieving representative samples is broadly akin to stirring the soup. A non-scientific survey is like an unstirred vat of soup. A chef could drink a large amount from the top of the vat, and still obtain a misleading view if some of the ingredients have sunk to the bottom. Just as the trick in checking soup is to stir well, rather than to drink lots, so the essence of a scientific poll is to secure a representative sample, rather than a vast one
- 8. But isn’t there some risk of sampling error in a poll of 1,000 or 2,000 people?
- Yes. Statistical theory allows us to estimate this. Imagine a country that divides exactly equally on some issue — 50% hold one view while the other 50% think the opposite. Statistical theory tells us that, in a random poll of 1,000 people, with a 100% response rate, then 19 times out of 20, a poll will be accurate to within 3%. In other words, it will record at least 47%, and no more than 53%, for each view. But there is a one in 20 chance that the poll will fall outside this range.
- With a sample of 2,000, the poll will be within 2% 19 times out of 20.
- 9. You say those calculations apply to “a random poll with a 100% response rate”. Surely that’s pie in the sky?
- Fair point. Many polls are non-random, and response rates are often very much lower — well below 50% in many countries for polls conducted over just a few days.
- 10. So isn’t the real margin of error much larger?
- Possibly — but possibly not. Here are two examples, at opposite extremes of this issue. Return to our example of an equally divided country. Suppose everyone who hold view A lives in the northern half of the country, while everyone who holds view B lives in the southern half. In that case, if pollsters ensures that half of each survey is conducted in the north, and half in the south, then their polls should be exactly accurate. Structuring polls in this kind of way is called “stratification”. Properly done, stratification can help to increase a poll’s accuracy.
- Now make a different assumption about our mythical, equally divided country. Suppose people who hold view A are far more likely to express that view to strangers — such as survey researchers — than people who hold view B. Unless the polling company is aware of this bias, and knows how big it is, it could well produce results showing that view A is far more popular than view B. This is an example of a systematic error.
- To measure the “true” margin of error, we would need to take account of random sampling error, and the effects of stratification, and possible systematic errors. The trouble is that it is hard, and arguably impossible, to be sure of the true impact of stratification and systematic errors. (If the impact of all systematic errors were known, a competent survey company would adjust its results to compensate for them.)
- 11. Doesn’t this mean that polls can’t really be trusted at all?
- No. Polls may not be perfect, but they are the best, or least bad, way of measuring what the public thinks. In most countries where poll results can be compared with actual results (such as elections), well-designed polls are usually accurate to within 3%, even if they occasionally stray outside that margin of error. Moreover, much of the time, polls provide a good guide to the state of opinion, even allowing for a larger margin of error. If a well-designed, representative survey finds that the public divides 70-30% on an issue, then a margin of error of even 10% cannot alter the fact that one view is expressed far more widely than the other. However, it is true that in a closely-fought election, a polling lead (in a sample of 1-2,000) of less than 5% for one candidate or party over another cannot be regarded as a certain indicator of who is ahead at the time the survey was taken — let alone a guarantee of who will in the days, weeks or months ahead.
- 12. I have seen polls conducted by different, well-regarded, companies on the same issue produce very different results. How come?
- There are a number of possible reasons, quite separate from issues to do with sampling error.
- The polls might have been conducted at different times, even if they are published at the same time. If the views of many people are fluid, and liable to change in response to events, then it might be that both polls were broadly right, and that the public mood shifted between the earlier and the later survey
- The polls might have asked different questions. Wording matters, especially on subjects where many people do not have strong views. It is always worth checking the exact wording when polls appear to differ.
- There might be an “order effect”. One poll might ask a particular question “cold”, at the beginning of a survey; another poll might ask the same question “warm”, after a series of other questions on the same topic. Differences sometimes arise between the two sets of results, again when many people do not have strong views, and some people may give different answers depending on whether they are asked a question out of the blue or after being invited to consider some aspects of the issue first.
- They might have been conducted using different methods. Results can be subject to “mode effects”: that is, some people might, consciously or sub-consciously, give different answers depending on whether they are asked questions in person by an interviewer, or impersonally in self-completion surveys sent by post or email/Internet. There is some evidence that anonymous self-completion surveys secure greater candour on some sensitive issues, than face-to-face or telephone surveys. So if two reputable companies, asking the same question at the same time, produce different figures, and one conducts its surveys by telephone and the other by the Internet, “mode effects” might be at work.
- 13. When someone sends me a poll, how can I tell whether to take it seriously or not?
- Check the following:
- Who conducted the poll?
Was it a reputable, independent polling company? If not, then regard its findings with caution. If you are not sure, then one test is its willingness to answer the questions below. Reputable polling firms will provide you with the information you need to evaluate the survey.
- Who paid for the poll and why was it done?
If it was conducted for a respected media outlet, or for independent researchers, there is a good chance it was conducted impartially. If it was conducted for a partisan client, such as a company, pressure group or political party, it might still be a good survey (although readers/listeners/viewers should be told who the client was). The validity of the poll depends on whether it was conducted by a reputable company, whether it asked impartial questions, and whether full information about the questions asked and results obtained are provided. If such information is provided, then the quality of the survey stands or fall according to its intrinsic merits. If such information is not provided, then the poll should be treated with caution. In either event, watch out for loaded questions and selective findings, designed to bolster the view of the client, rather than report public opinion fully and objectively.
- How many people were interviewed for the survey?
The more people, the better — although a small-sample scientific survey is ALWAYS better than a large-sample self-selecting survey. Note, however, that the total sample size is not always the only relevant number. For example, voting intention surveys often show figures excluding “don’t knows”, respondents considered unlikely to vote, and those who refuse to disclose their preference. By excluding these groups, the voting-intention sample size may be significantly lower than the total sample, and the risk of sampling error therefore greater.
Likewise, be careful when comparing sub-groups — for example men and women. The sampling error for each sub-group could be significantly higher than for the sample as a whole. If the total sample is 500, and made up of equal numbers of men and women, the margin of error for each gender (counting only random errors and disregarding any systematic errors) is around 6%.
- How were those people chosen?
If the poll purports to be of the public as a whole (or a significant group of the public), has the polling company employed one of the methods outlined in points 2,3 and 4 above? If the poll was self-selecting — such as readers of a newspaper or magazine, or television viewers writing, telephoning, emailing or texting in — then it should NEVER be presented as a representative survey. If the poll was conducted in certain locations but not others, for example, cities but not rural areas, then this information should be made clear in any report.
- When was the poll done?
Events have a dramatic impact on poll results. Your interpretation of a poll should depend on when it was conducted relative to key events. Even the freshest poll results can be overtaken by events. Poll results that are several weeks or months old may be perfectly valid, for example if they concern underlying cultural attitudes or behaviour rather than topical events, but the date when the poll was conducted (as distinct from published) should always be disclosed
- How were the interviews conducted?
There are four main methods: in person, by telephone, online or by mail. Each method has its strengths and weaknesses. Telephone surveys do not reach those who do not have telephones. Email surveys reach only those people with Internet access. All methods depend on the availability and voluntary co-operation of the respondents approached; response rates can very widely. In all cases, reputable companies have developed statistical techniques to address these issues and convert their raw data into representative results (see points 3 and 4 above).
- Was it a “push poll?”
The best way to guard against “push polls” is to find out who conducted the survey. Reputable companies have nothing to do with “push polls”, a phenomenon that has grown in recent years in a number of countries.
The purpose of “push polls” is to spread rumours and even outright lies about opponents. These efforts are not polls, but political manipulation trying to hide behind the smokescreen of a public opinion survey. In a “push poll,” a large number of people are called by telephone and asked to participate in a purported survey. The survey “questions” are really thinly-veiled accusations against an opponent or repetitions of rumours about a candidate’s personal or professional behaviour. The focus here is on making certain the respondent hears and understands the accusation in the question, not in gathering the respondent’s opinions. “Push polls” have no connection with genuine opinion surveys.
- Was it a valid exit poll?
This question applies only at elections. Exit polls, properly conducted, are an excellent source of information about voters in a given election. They are the only opportunity to survey actual voters and only voters. They are generally conducted immediately after people have voted, and are therefore able (in theory) to report actual behaviour. Pre-election surveys, even those conducted the day before the vote, cannot entirely avoid the danger that some people may change their mind, about whether to vote or which party/candidate to support, at the very last minute.
However exit polls are still prone to three distinct sources of error, apart from pure random error:
First, supporters of one candidate/party may be more willing to disclose their vote than supporters of another. This phenomenon, “differential non-response”, is especially hard to judge accurately in exit polls.
Second, some people may genuinely have thought they voted for a particular candidate/party, but may inadvertently have voted for someone else, or spoiled their ballot paper or (when using voting machines) not have completed the process properly. (This may explain the statistically slight but politically crucial difference between the exit poll in Florida in the US 2000 Presidential election, which indicated a narrow victory for Al Gore, and the declared result of a wafer-thin victory for George Bush.)
Third, exit polls may not have been conducted an absolutely representative group of polling stations. Even if the total sample is very large — say, 5,000 or more — it may suffer from an effect known as “clustering”. If, say, 50 polling stations are selected, and 100 voters questioned at each, the figures could be wrong if the overall political balance of those 50 polling districts is even slightly askew.
Reputable polling companies go to considerable lengths to avoid this problem. Other companies may conduct exit polls in a minimal number of voting locations using interviewers who do not have experience or specialist training in this method of polling.
- Who conducted the poll?