In a past article, I showed, using the number 1 in 150 for number of autistics, and the US population, that around 26,000 autistics will be born this year (you can debate whether they are born autistic or not, but around 26,000 will be autistic - either at birth or within a few years, depending on your political views), but 600,000 people will die this year of heart disease. Of course there is nothing amazing about these numbers - they are in line with numbers from other countries around the world. So, in essence, over 20 times the number of “newly created autistics” will die this year due to heart disease. Yet autism continues to be considered “extremely common” and the “biggest health epidemic the nation has ever seen.” No wonder health care policy is such a mess today, if 600,000 deaths a year are ignored for the sake of 26,000 newly minted autistics (I’ll add that most of these autistics will not die fitting the stereotype of a low functioning autistic - most will speak, be able to use the toilet, etc).
But, nonetheless, there has been a particular interest by many in the idea that a polluted environment causes autism. Thus, when there is a “hot spot”, especially if that hot spot is in a place like New Jersey, which, factual or not is known as a place with high pollution, it is assumed that this demonstrates a “connection.”
However, before you can decide that where people live has anything to do with autism, you need to do a few things. The very first thing you must do is to eliminate other variables. Perhaps different areas of the country diagnose autism slightly differently, resulting in different autism rates. Perhaps culture is relevant to the manifestation of autism (the canonical example of this is eye contact, which is useless for diagnostic purposes in some cultures). The average wealth of an area also has a key effect, and must be accounted for in the study. Controlling these variables is beyond the scope of this article, but it is very important and has a very important affect upon diagnosis.
The other thing you must do - and in fact all quantitative science aims to do - is to determine whether two things are connected, or, as scientists say, correlated. When studying rates of something (like autism) in a population categorized by geography, you are attempting to say geography is correlated with the rate of autism. What this really means, to the scientist, is that autism is not randomly distributed geographically. If geography has no connection to autism, and is thus randomly distributed, then you are just as likely to find autistics in Paris as in Denver, or even Brick Township. (This is also called the “null hypothesis” - good quantitate science attempts to prove the null hypothesis, and only when that doesn’t happen is the real hypothesis considered as perhaps being accurate)
That moves us to Brick Township, New Jersey, USA. In 2001, a study was published in Pediatrics which was interpreted to mean that there was an epidemic of autism in Brick Township. This study is often cited by those who want to show that pollution causes autism. After all, if there was an extremely high rate of autism in a certain small geographic area, it might be worth looking into, to determine if there is something that is causing it. However, looking back at that study, we find that 6.7 out of every 1000 children were autistic - in other words, 1 in 149. Being that the commonly cited (by organizations promoting the idea of environmental autism) rate of autism is 1 in 150, it doesn’t exactly look like Brick Township is different from anywhere else in the US. In fact, what is striking is how accurate that 1 in 150 number may be.
So, dismissing Brick Township’s “extremely high” rate of autism (which is the same as the rest of the US’s), we are left with mostly anecdotical reports of “There are 3 autistic kids in my neighborhood alone” and other similar reports to demonstrate that there is “high” rates of autism in some areas. So, how would we determine whether an area’s high rate of autism is a result of correlation with geography?
The first thing you have to do is understand how a random distribution works, so that you can see whether things are random geographically when it comes to autism. If you don’t know what random distribution looks like, you don’t know whether or not the results are a product of randomness.
To do that, I constructed a short Perl program that calculated the number of autistics in 20,000 groups of 300 “children”. Each child had a 1 in 150 chance of being “diagnosed” by my program. Each group of 300 was chosen to represent a medium-sized neighborhood, which might have around 300 kids in it. I simulated 20,000 of these neighborhoods to find out how many would have “more than expected” numbers of autistics, if autistics are randomly distributed. My program created a large datafile which consists of the “group number” (starts at zero and increments to 19,999), the number of “autistics” in that group, and the percentage of autistics in that group’s 300 members.
The raw results were that a total of 40,053 “autistics” were diagnosed by my program. Since each child had a 1 in 150 chance of diagnosis, we would expect a number very close to 40,000 - and we are right on target. Our actual (measured) rate of autism was 1 in 149.8. The random distribution could be graphed as:
As you can see, this very closely resembles a bell curve centered on 1 or 2 autistics out of a group of 300. Now, if I told someone that, in real life, there were 6 neighborhoods in the US out of 20,000 I studied that had an autism rate of 1 in 33, or nearly 5 times the US average, most people would agree that we should study them. But if a scientist only found 6 groups out of 20,000, he would conclude that this matches a random distribution and studying these 6 groups would be a waste of time, as a random distribution would naturally produce a few “hotspots”. Unless the number of hot spots is higher than what could be explained from a random distribution, it’s not worth investigation, as the random distribution sufficiently explains those groups, and it most certainly is not due to pollution or other environmental factors. Now, if 100 groups had a rate of 1 in 33, that would be worth investigation scientifically, as the random distribution would only be able to explain somewhere around 6 of those groups, and pollution, environmental factors, or other variables might be responsible for the higher rate of autism.
Unfortunately scientists know this, but many lay people do not. They hear of a high rate of autism (1 in 33!!!) and immediately say, “Hey, there has to be a cause!” Scientists, however, look at it and say, “1 in 33, okay, but is there a correlation? In other words, could this be explained by a random distribution, or could it not?” Sadly, this is counter-intuitive to many people, and people are often more willing to trust their “instinct” than the scientific facts. Living in a neighborhood with a 1 in 33 rate of autism doesn’t necessarily mean that there is anything but chance at work.
A couple of other interesting facts from my fictional autism data: One third of autistic children are in neighborhoods that have a rate of autism over twice the 1 in 150 rate. And only 14% are in neighborhoods with less than the 1 in 150 rate. This probably explains why anecdotical accounts of “lots of autistic people in my neighborhood” are so common. More autistic people live in neighborhoods with more autistic people (how is that for a truism), and less autistics live in neighborhoods with less autistic people! This is the nature of the random distribution. And, in fact, we know that one third of autistic children will live in neighborhoods (assuming all neighborhoods have exactly 300 kids) with at least twice the normal rate of autism, even if chance alone can explain the distribution of autistic people.
Now of course we know that autism isn’t completely random, and that there is a strong, but not complete, genetic basis to autism. But we must be very careful when looking at poorly designed studies and anecdotes when we look for that “other” causal factor of autism. Statistics, once again, are essential - and if someone doing the research can’t explain why the results couldn’t be explained by the random distribution, that person has no business publishing results, as he doesn’t have the very basic information he needs to draw any conclusion whatsoever (I’ll note that people who can do statistics, in the scientific world at least, probably also know how to use a computer program called SAS, and, even more importantly, how to interpret the results). Statistics are the key to understanding science, and most science - in particular epidemiology - absolutely requires statistics. The statistics are at least as important as going out and counting people. There’s an old saying - “There are lies, damned lies, and statistics.” Perhaps a better one would be “There is faulty research, damn faulty research, and research based on gut feeling.”