STATS ARTICLES 2010
Why any ol' diet will work (if your BMI is high enough): A case study in regression toward the mean
Rebecca Goldin, PhD, February 10, 2010
Understanding regression toward the mean
Imagine a city where, from year to year, the overall height of the population is stable. There are no sudden growth spurts or frantic attempts by parents to give their children growth hormones, or movements of very short people in or very tall people out of the city limits. Now consider the following statistical conundrum: if you measure the tall people in the city, they will tend to have children who are shorter and, similarly, the short people will tend to have children who are taller. Other strange things are happening too. Though the city's school test scores are not improving overall, the children with the worst test scores last year, appear to be doing better this year.
What you are witnessing is called "regression toward the mean," and it means that whenever you measure something more than once in a population, the extremes tend to move more toward the average. Consider height: Suppose the range of adult heights (the minimum to maximum heights) is four feet to seven feet, and stays the same over several generations. If we look at just those who are 7 feet tall, we will find that they have children who are shorter, since if these children grow to be taller than 7 feet, the height range of the population would change. If the range stays the same, the 7-feet people cannot have children who are taller than themselves!
You might ask: but where do we get the new 7-feet tall people if the tallest people have shorter kids? First, a maximum height individual might have a same-heigh child. Second, invariably, some tall people who are not quite at the maximum will have taller children - for example, someone 6 feet 6 inches could have a child who grows to be 7 feet tall. These kids will become the next generation of “tallest.”
“Height of children” is an example of a measurement that has some random fluctuation, that has nothing to do with genetics at all. Regression toward the mean describes the behavior of such randomness. Unfortunately, it can lead to some bias in observational studies, and understanding its effects is an important defense against false conclusions. Just as we may be misled about what is happening to our society’s height if we only observe the heights of the children of very tall people (thinking that they are shrinking) we may be misled about social progress and medical success if we do not recognize the phenomenon.
How bias sneaks in: an experiment in regressing
We can set up an experiment that illustrates the phenomenon without having to wait generations to see the result. Take a random group of people and ask them each to flip a coin 100 times and record the result. We would expect to find that many people got close to 50 heads and 50 tails. But we would also find that a few people got many more heads than tails and vice versa.
Suppose we did the experiment again with just those who got the outlying results. Let’s call someone a “head-flipper” if he/she got at least 60 heads, and we only include head-flippers in our second experiment. We would still expect these people to have, on average, 50 heads and 50 tails in second set of flips. The head-flippers would generally not be head-flippers any longer. As these experiments are independent of each other, we do not expect head-flippers to be particularly likely to flip heads the second time they participate in the experiment.
Now imagine I have noted the first set of results and decided that what the world needed was a medication that improved one's ability to flip tails (imagine I'm crazy). Of course, I am only interested in what this medicine does to head-flippers, since they are the ones with the very serious problem of over-flipping heads. What will I find if I give all the head-flippers the medicine and run my test again? Amazingly, most of them will no-longer flip at least 60 heads - I will have proven success! Of course, the medication is illusory, as this purported increase in tail-flipping is only in comparison to those people who were chosen because they had flipped a lot of heads. We have simply observed regression toward the mean.
Regression toward the mean is a fundamental notion in statistics, and important to account for in experiments. It can create bias in poorly done observational studies. If the score on the coin flips were replaced by blood pressure measurements, for example, we should expect that people with the highest blood pressure readings are also those whose pressure will go down at the next measurement, whether they are given a medication or not. This is why it is so important to compare the medicine with a placebo, so that random errors are not confused with true benefit.
But when the test results are correlated, does regression toward the mean still occur?
The skeptical reader will note that blood pressure is very different from flipping a coin. In fact, blood pressure readings are highly correlated with one another. In the absence of medicinal or lifestyle intervention, a person with high blood pressure now will almost certainly have high blood pressure in a year.
However, regression toward the mean is still occurring, just in a less spectacular way. Consider, the following examples blood pressure readings, adapted from Dicing with Death by Stephen Senn.
Each dot represents a person with two measurements: a first blood pressure measurement (along the vertical axis), and a second blood pressure measurement (along the horizontal axis). Dots above the red line indicate that the first measurement was high. In general, it is clear that people with elevated blood pressure on first reading will also have elevated levels on second reading, as we would expect. The average measurement is also the same in both the first and second measurement. However, those with the highest readings the first time are not necessarily those with the highest readings the second time.
In particular, if scientists study only those people with blood pressure above a critical level in the first measurement (represented by the red line), they will find that many of them have a level of blood pressure below the critical level on the second measurement (represented by being left of the vertical blue line).
Pictured above, there are ten people whose blood pressure is above critical on the first reading. Of these, six people improved on second reading, and are below critical level, left of the blue line! If we only look at people above the red line, we would think that serious progress had been made.
The trick is that by only considering those people who had high blood pressure the first time around, we did not register the people who were below critical on the first reading, and measured above critical the second reading. The situation becomes much clearer when we look at the whole picture.
By looking at the whole graph, we see there were five people below the red line, but to the right of the blue line, who had non-critically high blood pressure on the first reading, but above critical on the second. These people whose condition “worsened” from one reading to the next will compensate for those who “improved,” so that we see an average of no change.
The point of a clinical trial is to get rid of the (random) bias introduced by regression toward the mean. A randomized clinical trial would compare groups of people with high blood pressure readings, by giving one group the medicine and the other not. It would then observe whether there is more benefit for the group that takes the medicine. Alternatively, one could introduce a regimen for everyone without regard to blood pressure levels (such as an exercise routine) and see whether the average changes.
Weight Loss and Regression toward the Mean
By the same principle, the heaviest people are in a good position not to be heaviest at the next weigh-in. Suppose that in February, 2010, we observe the weight of a population of people whose overall average weight and distribution of weights does not change. At the end of the year, the people with the highest BMI are likely to have decreased their weight while those with the smallest are likely to increase. These changes which are attributed to "regression toward the mean" should be understood to be the changes due to random fluctuations of the weight measurement -- it could be because our weights do fluctuate some without any intervention on our part, or because the scales we use have some random error. If the population as a whole does not have an increased average or change its weight distribution, then as people below the average will tend to put on weight, while those who have a very high BMI will be likely to go down in weight. [Keep in mind that the average BMI may well be over 25, classifying the average person as overweight - regression toward the mean doesn't do anything for the average!]
Now throw "going on a diet to lose weight" into the picture, something that typically only overweight people do, and this effect should be exaggerated. On average, attempting to lose weight has a positive impact on losing weight.
Of course there are many aspects of changing BMI levels that are more complicated than we have presented here. In particular, American BMI is not standing still; it is increasing. And factors such as aging can increase the BMI among people in a fixed group. However, regression to the mean is a powerful force that can skew the results.
In the context of claims made by diets (from Atkins to pills that target the Cocaine Amphetamine Regulatory Track), it is important to note that some success is to be expected because of regression to the mean. What diets do not advertise is that people with the highest BMI people are (on average) losing weight even without a diet - though they may well hover at a very high BMI. For full disclosure: the random variance of weight measurements may in fact be very small.
For people with higher-than-average BMI trying to shed some pounds, regression to the mean is an encouraging thought -- but of course, it's only reduced calorie consumption and exercise that will shed pounds instead of ounces