Tuesday, July 22, 2008

The Cellphone Problem, Revisited

Let me comment on a bit more length on the so-called "cellphone problem" -- the fact that many voters are unreachable to pollsters whose samples consist of landline numbers only. This may have some relevance in explaining the Rasmussen results today in Ohio which showed John McCain with a fairly large lead.

The basic issue with cellphone-only households is that their incidence is not distributed evenly throughout the population. Minorities are more likely to be cellphone-only than whites, and men are more likely to be cellphone-only than women. But the most important differences are in terms of the age of the voter.

The below is data compiled by the Centers for Disease Control on the number of cellphone-only adults by age cohort. Actually, it is not just cellphone-only adults -- the CDC also tracks another category which I call "cellphone-mostly" adults. These are people that have a landline, but also have a mobile phone, and use their mobile phone to receive most or all of their calls. I know, personally, a lot of people who fall into this category: they may use their landlines only to make local calls, only to connect to the Internet, only as an emergency in case their cellphone service is down, and they may have the service only because it came bundled with their cable or wireless package. If their friends and family are in the habit of calling them on their cellphones, they may be very suspicious of calls coming into their landlines -- assuming that they are likely to be from telemarketers -- and not make a practice of answering them.

Table 1. Cellphone-Only and Cellphone-Mostly Adults by Age Cohort


As you can see, fully half of all adults under the age of 30 fall into the cellphone-only or cellphone-mostly buckets, and the number is growing every day. About a third of adults aged 30-44 are cellphone-only or cellphone-mostly, and then the numbers trail off once adults pass the midpoint of their lives.

Obviously, if polling firms did not weight by age, this would be an utter disaster for any election in which preferences vary significantly by age. Suppose for example that the following represented the true distribution of the likely voter population in Big Industrial State:

Age    %/LV    Obama    McCain
18-24 10 69 31
25-29 10 60 40
30-44 30 50 50
45-64 35 46 54
65+ 15 40 60
------------------------------
TOTAL 100 50 50
These numbers have been 'rigged' such that each of Obama and McCain receive exactly 50 percent of the vote. Suppose, however, that we exclude cellphone-only and cellphone-mostly voters from our sample, according to their proportions in the CDC data. What you'd instead wind up with is the following:
Age    %/LV    Obama    McCain
18-24 7 69 31
25-29 6 60 40
30-44 28 50 50
45-64 39 46 54
65+ 20 40 60
------------------------------
TOTAL 100 48.5 51.5
What ought to have been a tie instead turns into a 3-point lead for John McCain. (And keep in mind that the numbers in this example are hypothetical -- but they probably look something like this).

Pollsters can get around this problem by weighting groups that are likely to be cellphone-only more heavily -- in particular younger voters. This is what nearly all smart pollsters do, and it is considerably better than the alternative of not weighting at all. However, it creates a couple of additional problems.

The first and more commonly-discussed problem is that the cellphone-only voters may not be the same as their landline counterparts, even once we control for age and other variables like race and gender. Urban voters are about 50 percent more likely to be cellphone-only than rural voters, for instance, and while some pollsters weight by geography, others do not. Thus, you may wind up with a biased sample.

But even if the sample were unbiased -- the pollster is smart enough to figure out how to balance all the weights properly -- what you're still doing in effect is to magnify the importance of sampling error. Suppose that a pollster wants to sample 500 likely voters in a state. Roughly speaking, about 20 percent of these -- 100 of them -- are likely to fall into the 18-29 age range. But, about half of those voters can't be reached because they are cellphone-only or cellphone-mostly. So your effective sample size for this subgroup is 50 voters, which carries a margin of error of +/- 14 points. Sometimes, the luck of the draw will come through for you and you'll wind up with a pretty good sample, but other times you'll be pretty far off.

If you are not fortunate enough to wind up with a good sample, what you are going to wind up doing is compounding your problems, because you have to weight all the young voters that you do sample more heavily to make up for the ones that you can't reach because they depend on cellphones.

So what you should get in the habit of doing, where such information is available, is to check the cross-tabs for groups that are known to have problems with non-response bias -- by which I mean check them for younger voters because of the cellphone-only problem. If the pollster was unlucky and wound up with a poorly-representative sample of such voters, it may skew their overall results, as such responses wind up being weighted more heavily.

Is this an issue with the Rasmussen poll in Ohio? Actually, it may be. The poll has McCain leading 50-39 among voters aged 18-29, and 67-33 among voters aged 30-39. Obama leads 55-36 among voters in their 40s, and then McCain leads by single-digit margins among voters aged 50 and up. Such an age distribution is inconsistent with most other polling that we have seen in this election.

This does not mean that Rasmussen screwed up. This problem has nothing to do with Rasmussen; it is common to all pollsters that don't include a cellphone supplement, which means all pollsters except Gallup and Selzer. These pollsters are trying to do everything they can to work around a vexing problem -- that about half the young voters they might want to sample can't be reached, and that they are stuck with small sample sizes of such voters as a result. But it does mean that, if there is greater error in their sample of young voters, it will lead to greater error in their poll as a whole.

100 comments

JC said...

Very tangentially to the election, I wonder:

Pollsters don't poll cell phones because cell phone users get charged for incoming as well as outgoing calls. What would have to change for cell phone service providers to offer free incoming calls? How did we get to a situation where they don't?

Becky Sharp said...

Nate, if there were a cellphone wouldn't the cell phone problem lead to *All* Rasmussen polls being skewed toward McCain - not just Ohio today?

Jeff said...

Wow, I didn't realize how large a percentage the cell-phone only/mostly crowd was. I guess we won't really be able to test how big their impact is until election day.

Mark said...

Sterling analysis. I think there's a lot of polling error, like this, that will render an election as close as this one a statistical tie up until Election Night, pending a major sea change.

Nate said...

Becky,

The problem I'm addressing here is not necessarily that the poll is biased in one or the other direction. If the pollster is smart enough about how they weight their responses -- and Scott Rasmussen is smart -- they may be able to work around the bias problems. But in the process of conducting this weighting, the pollster increases the margin of error above and beyond its theoretical value.

KellyO101 said...

As a Kerry staffer, let me caution everyone on this. We heard this constantly in '04, that there were hidden constituencies out there for the Democratic candidate, and that a tie in the polls would go to us. There is alot of truth in the post, but I am equally sure that even in today's political climate, a tie goes to the Repub, they simply are willing to do more to wrestle it away. The shamelessness factor is hard to quantify, but it is orders of magnitude more powerful than any polling irregularity.

Phillip said...

This explains a lot for me. Bravo.

scdono said...

Nate:

I appreciate your analysis and your attempt to be unbias.

You might as well dismiss all of the polls that don't favor Obama because of cell phone users. However, there clearly seem to be in denial over a recent tread towards McCain. If the poll was favorable to Obama, would you look for a reason to dismiss or diminish is value?

Anonymous said...

I'm curious if cell phones are such a problem for Rasmussen who do they end up in the top 3 of the pollster ratings?

Higglytown said...

Very interesting analysis. So the Ohio result with McCain +10 may be random noise generated by the lack of cell phone only crowd receiving the calls.

This reasoning may be equally as possible to the PPP poll showing +8 Obama correct? SUSA at +2 for Obama could be off also for the same reason. That is why we snap shot all polls. There is the Dayton problem though. Young business professionals by the thousands, same with Toledo, very Cell Phone only, very Republican crowd. I wonder how cell phone only use weights with young professionals with middle incomes that have a higher percentage of being republican, verses younger innercity or low income, and whether each of these cell phone user types in each age bracket is likely to vote.

I know I went cell phone only 4 years ago and havent received a poll call since, (PTL and TYJ), and I am not a McCain backer.

I just think lumping age together as cell phone only users may ignore which camp the cell phone only users may be in. A lot of young republicans are cell phone only, fiscally conservative, not wanting to spend money on a land line any longer types.

Anonymous said...

Interesting analysis. Just one comment, McCain leading 50-39 among voters aged 18-29, and 67-33 among voters aged 30-39 seems a little bit odd.
I don't think Nate is making an excuse for rasmussen, he is just trying to explain a plausible explanation for what might be going on with the OH numbers.

Anonymous said...

That Rasmussen Ohio poll is really screwy. No way does McCain have a double-digit lead among 18-29 year olds.

One thing to keep in mind about polls is that even if a good, solid pollster does everything right, 1 out of every 20 polls they do will still be completely, totally, utterly wrong. As in outside-the-margin-of-error wrong. That's what the 95% confidence interval means.

Rasmussen does so many polls that they're bound to throw out a few ones that are just plain wrong, and my bet is that we've just seen one.

Landstander said...

I am a great example of what you are talking about. From 23-26 I had a landline. For the first few years I used it sparingly and answered rarely. Then I simply had it unplugged for a couple year(I only the line for DSL).

Now I am 27 and for the last year I am cell phone only. Interestingly, I have been polled twice in the last month on my cell phone. Both, however, were local/state oriented polls. I wonder what stops presidential pollsters from getting my number, since these other pollsters clearly have no problem reaching me.

And yes, I am an urban liberal.

Higglytown said...

I meant I am a McCain backer, or I am not an Obama backer sorry.

tibor75 said...

So, the last few days Rasmussen has:
Michigan: Obama +8
Ohio: McCain +10
Colorado: Obama +7

Uh, yeah, sure.....