STATS ARTICLES 2010
Science minus women equals biology?
Rebecca Goldin, Ph.D, June 15, 2010
New evidence fails to prove that gender disparity in the sciences is biological
In the June 7 edition of the New York Times, reporter/columnist John Tierney drew attention to “new evidence supporting Dr. [Lawrence] Summers’s controversial hypothesis about differences in the sexes’ aptitude for math and science.” The piece generated hundreds of comments, reflecting the controversial nature of Tierney’s view. The article nonetheless raises the important question: is there (new) evidence that biology drives the differences in performance between the sexes in math and science?
STATS has waded into this issue before, and previously documented that Dr. Summers, a former President of Harvard University, was at least partly misunderstood in the conflagration that followed his remarks: he didn’t say that boys are better than girls at math on average, or that there are no exceptional women in math and science; instead, he said that the variance is greater in boys and men than in girls and women. The average scores on standardized tests may well be the same between boys and girls, and yet the standard deviation may be much larger for boys. This would have an impact on the tail end of the curve -- the top 0.1 percent – which may be populated more by boys than girls.
But does the new evidence Tierney cites really support the notion of a biological basis for differences in male and female performance? It consists of a new study appearing in the journal Intelligence, which finds that top performance on standardized math tests is dominated by boys, in this case, talented seventh graders who take the SAT. It is fair to ask whether such evidence is actually biological. For example, was an actual mechanism identified? Was a controlled experiment or observational study conducted, in which social mechanisms were held constant or rendered irrelevant? These are the kinds of evidence that are traditionally viewed as necessary to confirm the existence of a biological agent acting on human behavior. Conversely, as we discuss below, there are severe limitations on any effort to assign a biological cause solely on the basis of differences in standardized test scores.
The new research in question explores data, collected over a 30- year period, on young adolescents who display mathematical talent on standardized tests. It finds that the ratio of boys to girls among the top-scoring students went from 13 to 1 in the early 1980s to just under 4 to 1 in the early 1990s, where it has since hovered for almost 20 years. But the study provides no additional evidence to justify an assignment of biological cause, except for the fact that there hasn’t been a significant improvement in the past 20 years. It is by no means clear that, after 12 to 13 years of exposure to social forces, these adolescents’ scores would necessarily be expressing something innate rather than something cultivated through social influences.
Does Scoring at the Top of the Top Matter?
Tierney also points to “other research” showing that top scores on the SAT predict academic success, even within the group of top performers, based on a 2005 paper published in the Journal of Educational Psychology. He summarizes this research as showing that, “Even when you consider only members of an elite group like the top percentile of the seventh graders on the SAT math test, someone at the 99.9 level is more likely than someone at the 99.1 level to get a doctorate in science or to win tenure at a top university.” This finding is used to support the argument that SAT scores are a valuable surrogate marker for understanding biological differences between boys and girls. In fact, the findings are more equivocal on this point.
This paper asks whether kids who score in the top quarter of the top 1 percent on the SAT at age 13 fare better (measured in terms of PhDs, income, patents, and tenure at a top-50 U.S. university) than those who score in the bottom quarter of the top 1 percent. The paper actually shows that this is the case for men, but not necessarily for women. Further, the paper highlights the difficulties of such a study: the outcome (academic success, for example) is not independent of the measurement (the SAT scores). In other words, the result (which does not exist for women in any case) may reflect the societal value of the SATs rather than the underlying intelligence reflected in the SATs.
First, to the biological question: the fact that these test scores -- even the small distinction of being in the top .1 percent compared to the top .9 percent -- also correlate with academic success and accomplishment may reflect the way society values test scores. If society (in the form of parents, educators, employers etc.) believes that this difference in score has meaning, then the difference will have an impact on which colleges and graduate schools the kids attend, which postdoctoral opportunities they have, and which jobs they eventually land. Any ultimate differential in success may not reflect an innate difference that independently leads to academic success as much as the fact that the score itself leads to such success. In other words, we cannot use SAT as a surrogate for what leads to academic success if access to academic success (such as college admittance) is based in part on SAT scores.
While the findings do suggest that the SAT has some predictive ability in this category, they also illustrate the limits of such a study. Among girls identified in the 1972-1974 talent search, there were no PhDs subsequently earned in math or science (out of 151 women) among those in the bottom quarter of this top group, and there were two (out of 130 women) in the top quartile. One can interpret this to mean that there were huge advances for women who scored at the top end of the top group -- or one might dismiss the numbers as being too small to say much of anything. By the time the second cohort of kids entered the picture, the situation had changed a little, but the benefit of being in the top of the top group went away for women. In the 1976-1979 cohort of identified “talented” girls, there were two out of fifty who got PhDs in math or science in the bottom quartile, and the same number in the top quartile.
With two PhDs obtained in each group, what can actually be said about the predictive power of the SAT at age 13 to determine the ability of women to get tenure at a top-50 university? Obviously, this is a small pool of subjects, and the question arises as to how many of these women even wanted an academic job. The outcome was that one of the women in the top quartile of the top one percent did ‘make it’ to a top-50 tenured position, while the other three in the top 1 percent did not. (The article does not report on how many tried to enter academia.) In the earlier cohort, neither of the two women in the top quartile received tenure at a top-ranked university at the time of the survey.
For men, these numbers were very different: The overall number of identified mathematical talent judged by SAT scores was higher, as was the identified number of PhDs in math or science, and the PhDs conferred on those in the top quartile was higher than those in the bottom quartile. Reflecting the time-frame for obtaining tenure at a top-50 university, there were relatively few men – 17 in all -- who had received tenure at a top-ranked university. Few as they were, however, their number was still 17 times that of the women.
It should also be mentioned that this survey of the career trajectory of talented young men and women had some built-in bias that reduced the number of “successful” outcomes, possibly more for women than for men. The survey was based on the accomplishments of these individuals by age 33. Yet most academics who gain tenure at a top university do not obtain it by age 33. A PhD in science typically takes five to seven years of post-undergraduate education, bringing a person to his or her late 20s before beginning a postdoctoral fellowship. Three years as a postdoc is standard fare for consideration by a top-ranked university for hire, and it takes typically another six years for tenure to be conferred. Thus, most successful applicants do not attain this status until they are at least their mid-thirties. Of course any time off due to having children would delay that age further.
To summarize, the claim that the top scores on SATs are predictive of success has little basis for women. The success that it does predict (in men) may reflect social norms (such as using standardized scores to allow people opportunities that lead to other opportunities). The study itself has nothing to say about biology per se.
The Robustness of Test Scores: Does It Suggest Biology?
As Tierney notes, the strongest argument against the idea that high-end performance on the SAT is itself biologically founded is the fact that the ratio of men to women in the “tail” of the statistical curve has changed over the years. It was 13 boys to each girl in 1983 but dropped to 4 to 1 in 1991, where it has since hovered for two decades. Tierney concludes, “Since , however, the math gender gap hasn’t narrowed, despite the continuing programs to encourage girls.” Actually, this is not quite true. There is evidence on other tests, referenced below, that the ratio of top performers is not constant. (And, of course, there is more room for improvement when there is more disparity.)
If a difference is not robust over many years and many different cultural settings, then it is typically dismissed from consideration for having a biological basis. There are many instances in which this trend is bucked, which means that either social forces can counteract against the purported biological strength of boys in math, or they are less biologically driven than claimed. In reference to data from the Minnesota state assessments, and from the International Student Assessment exam, professors Janet S. Hyde and Janet E. Mertz noted:
“At the 99th percentile, the M:F ratio was 2.06, again close to theoretical prediction. However, the M:F ratio was only 0.91 for Asian-Americans, that is, more girls than boys scored above the 99th percentile. Analysis of data from 15-year-old students participating in the 2003 Program for International Student Assessment (PISA) likewise indicated that as many, if not more girls than boys scored above the 99th percentile in Iceland, Thailand, and the United Kingdom.”
Without a consistent difference resistant to change over time and over cultural differences, biological arguments are weak at best. Other studies examining very young children -- not discussed in the New York Times piece -- found no differences between boys and girls in numerical representations and reasoning, geometrical reasoning, the construction of natural numbers, and map reading, even in the top performing groups of kids. While age and “male hormonal” exposure might lead to pronounced (biological) differences in mathematical ability, there is little evidence of gender-based biological differences among children young enough to be relatively free of social conditioning. This does not dismiss the possibility of biological differences leading to ability, but it does point to the dearth of hard evidence to support this conclusion.
We have to be wary about what data can say, and what it cannot. There are differences among many groups in performance on the SATs, including between wealthy and poor kids, kids whose parents went to college versus those whose parents did not, and African American and Hispanic kids versus white and Asian kids. Such differences have also persisted over decades of attempts to close the gap, and many studies document the dearth of kids in the less privileged groups who scored in the top one percent on the math portion of the SAT; in the 2005 SATs, for example, whites were nine times as likely to score in this top group compared to blacks (based on a percentage of each population). Yet, it would be fallacious to argue based on the test scores that the differences are biological.
Research that speaks more directly to the possible biological differences has not found consistent differences. For example, there are natural, consistent stages at which babies and young children develop mathematical notions of counting, one-to-one correspondence, volume and quantity, abstract notions of “one more,” and representations of large numbers. There is no evidence of the development favoring one sex or the other. According to Elizabeth Spelke of Harvard University, an expert in the biological basis of mathematical reasoning in humans, these developments are founded in human biology, and they apply equally to boys in girls.
The social arguments
In addition to the flaws of making biological conclusions based on SAT data, there is widespread evidence that social factors do have an impact on performance in mathematics. A good example of this is that parents of young children are more likely to say their sons, as opposed to their daughters, are talented at science or good at math. These parental perceptions have an impact on how parents, caretakers, and other family members react to a young child.
Another social phenomenon consists of “stereotype threat,” which occurs when people perform worse after their gender or race is called to their attention. For example, girls perform worse when they are asked to state their gender before rather than after taking a test. A 2009 article published in Psychological Science that, for both women and African Americans, test scores on standardized test such as the SAT consistently go up with reduced psychological threat (while those for white males do not).
There remains substantial evidence that women are actually discriminated against in real terms. A 1999 MIT investigation of their own faculty found that “differences in salary, space, awards, resources, and response to outside offers between men and women faculty with women receiving less despite professional accomplishments equal to those of their male colleagues. An important finding was that this pattern repeats itself in successive generations of women faculty. The Committee found that, as of 1994, the percent of women faculty in the School of Science (8%) had not changed significantly for at least 10 and probably 20 years.” More recently and more systematically, the National Academy of Science finished a report in 2006 documenting systemic bias that hurt women’s chances of success in a scientific career. The culprit is not, for the most part, explicit sexist actions, but unconscious bias and arbitrary and subjective evaluative processes. The good news consists of pockets of gender parity, but gender inequality is not only a reflection of self-perception.
In addition, the rapid intellectual pace of work in highly mathematical scientific fields may weed out women who take a break for mothering, a thesis supported by Ceci and Williams in their book, The Mathematics of Sex: How Biology and Society Conspire to Limit Talented Women and Girls.All these social factors might affect how competitive a career someone chooses and how they perform in tests, which in turn affects their chances to be admitted to elite colleges and graduate schools.
With regard to academic freedom, Tierney is right to suggest that this subject should be open to discussion and decided on the basis of evidence rather than emotion. I have argued that the evidence he cites does not warrant the conclusion that he reaches. But we cannot address the question of why gender inequality exists while ignoring the documented evidence of past and present discrimination against women in the sciences, and the equally well-documented evidence that societal conditions and self-fulfilling expectations add to the difficulties faced by the talented girls and young women who strive for these careers. Along these lines, Elizabeth Spelke attributes the dearth of women in science to “biased perceptions by fellow scientists, unequal opportunity, biased perceptions earlier in life, and gender gap in facilities leading women to view science as a man’s world.”