How bad is American health care?
Nirit Weiss MD, MBA, December 2, 2008
The Commonwealth Fund’s National Scorecard has been hailed in the media for diagnosing America’s health care woes and for offering a roadmap to recovery. But now that reform is a political priority, it’s time to ask, does the Scorecard do what it claims it does, or is it in need of urgent reform itself?

doctors at workIn 2005, the Commonwealth Fund established a commission to help, as it put it, “move the U.S. toward a higher performing health care system that achieves better access, improved quality, and greater efficiency.” The commission’s first task was to develop a way of measuring health care performance – to set benchmarks that could be evaluated from year-to-year and thus provide a way of tracking improvement or decline. The commission said its scorecard was the most comprehensive attempt to rate health care across so many areas; and so, when the first results were published in 2006, the conclusions – that American health care was falling far short of what might be achieved, and what other countries were achieving – received widespread coverage in the media.

On July 17, 2008, the Commonwealth Fund commission published its second Scorecard report, Why Not the Best?: Results from the national scorecard on U.S. Health System Performance, 2008. Its findings that U.S. health care performance had not improved since 2006 and that access to health care significantly declined were again reported widely in the media and in a way that gave wide credence to the Scorecard’s conclusions (see sidebar). As the New York Times noted, “The findings are likely to provide supporting evidence for the political notion that the nation’s health care system needs to be fixed.”

"This is a real wake-up call," Paul Ginsburg, president of the Center for Studying Health System Change, a nonpartisan research group in Washington. told Dow Jones MarketWatch. "It's really telling us that because our delivery system is so fragmented [and] disorganized with the wrong payment incentives that our country is really suffering from that."

Summary of 2008 Scorecard findings
The 2008 Scorecard’s conclusions can be summarized as follows

  • The U.S. falls short in performance in 37 indicators across five dimensions of health system performance.
  • National and collaborative efforts to measure and report performance have led to improved health care delivery.
  • The U.S. health-care system provides a poor return-on-investment given its high costs.
  • Eliminating inefficiencies would drastically improve quality of health care, at a small fraction of the current cost.
  • Improved primary care delivery would lead to better outcomes, and lower costs.
  • There are already many countries currently providing significantly higher quality of care, with significantly lower costs.
  • A universal, one payer insurance system would provide higher quality care, at reduced costs

However, the 2008 Scorecard must be interpreted with caution. In attempting to diagnose the ills of America’s healthcare system, the Scorecard suffers from serious flaws that challenge the validity of its conclusions – flaws that were, essentially, ignored by the authors of the study and completely missed by the media coverage. These flaws fall into three categories:

  1. The methodology by which the data were collected and the studies were designed to address specific questions.
  2. Arbitrary definitions and metrics used to define the concept of “quality” in health care.
  3. Sweeping, broad conclusions that are unsubstantiated by the findings of the study.

1. Flawed methodology
The 2008 Scorecard reviewed and synthesized numerous individual studies in order to compare U.S. and international performance on 37 “indicators” across five “dimensions” of health system performance. U.S. performance was compared with a “benchmark” performance for each indicator, which was based upon scores achieved in any of the following: the top 10 percent of U.S. states, or regions, or hospitals, or health plans, or other providers, or the international community. In a number of calculations, “benchmark” performance was simply based upon “logical policy goals, such as 100 percent of the population to be adequately insured.”

Unfortunately, the Scorecard is based upon multiple disparate studies, using various methodologies, non-uniform definitions of “benchmark,” and arbitrary assumptions as to what “logical policy goals” are, and what “adequately insured” actually means. The Scorecard attempts to draw meaningful conclusions based on a summation of individual studies with varying sample sizes, varying performance comparisons, and varying data collection techniques.

In peer-reviewed, scientific literature, it is invalid to lump together the results of multiple studies, using multiple methodologies, in the same charts, graphs, and conclusions, without assigning relative weight to the results of the studies. Adding even more to the confusion, many of the reported data are not directly referenced to published studies, so it is impossible to trace and evaluate the sources of the information. A substantial number of the individual analyses were merely described as “conducted by the authors,” limiting the reader’s ability to evaluate the quality and validity of the studies.

Perhaps more troubling are those data which can be traced back to their source studies, and turn out to be based on patient self-reporting. For example, in its section on “quality,” the Scorecard quantified mistakes made in health care delivery according to patient interpretation and self-reporting! Patients were asked how often they felt mistakes were made in their medical care, such as errors in laboratory testing, or medication errors.

The problem with this approach is clear: Identifying mistakes in health care requires a great deal of medical training, insight, and experience. Even among highly trained specialists, what constitutes a medical error is often hotly debated, and this has been made clear in multiple studies, including the Institute of Medicine’s own 1999 report To Err is Human. Patient perception of medical error in their care may or may not be determined to be a valid metric to be studied in quantifying patient satisfaction as a consumer of healthcare; however, it simply is not a measure of true medical error, and would never be accepted for publication in a peer-reviewed scientific or medical journal as a determinant of such. Surely, any nationally-distributed report striving for policy-changing influence should be held to the same standards?

In addition to medical error, the Scorecard findings on activity limitations due to health problems, “dissatisfaction” with the health care system, ease of access to after-hours care, and unnecessary repetition of medical tests, were also quantified based on patient self-reporting. Again, this is a highly problematic metric, because of the questionable ability of a non-medically-trained patient to determine what is appropriate medical testing, and because the scoring of these self-reported measures depends on the relationship between the reality of health care delivery and the expectations of the American healthcare consumer.

Americans expectations of their healthcare system, and of the state of medicine as it exists in 2008, differ from the expectations of those in other nations, and are often not achievable even with the highest quality of delivery at the present time. These discrepancies bias any data drawn from studies based on patient self-reporting, with the potential for U.S. patients reporting lower satisfaction rates, despite higher quality of care.

2. Arbitrary metrics used to define “quality”
The entire premise of the Scorecard, is that “quality” in the U.S. Healthcare System can be quantified by assessing mean scores with respect to 37 “indicators” across five “dimensions” of health system performance. The flawed methodology used to obtain these scores has been noted; however, even with significant methodological improvements, the results of these studies could only be interpreted and applied in limited ways.

The 37 “indicators” across five “dimensions” are modeled after those used in studies of industry, and focus on health care delivery systems performance, which is indeed one component of value and return-on-investment.

But it is not the largest determinant of what most Americans would define as “quality.” Assessing the quality of delivery of goods by studying uniformity, for example, is appropriate when evaluating the transformation of undifferentiated inputs into uniform outputs, each machined to be identical to the other.

In other words, the conclusions of the study are dependent on the authors’ assumption that all patients with a given diagnosis, say diabetes, are otherwise identical, and should have no difference in outcomes. This input/output calculation disregards the fact that all inputs, such as patients with diabetes, have other comorbidities, and cannot be expected to have the same outcome or outputs.

In the practice of medicine, one deals with people and disease processes which are unique on a case-by-case basis, and the outcomes are not expected to be identical between patients. A large contribution toward patient outcomes depends upon the population of individual patients being treated, and there is tremendous variability in the degree of responsibility patients are willing to assume or are capable bearing.

Another problem with the metrics under evaluation is that they are taken to be uniformly applicable throughout the entire health care system. Clearly, primary care and the treatment of related chronic diseases affect the largest number of U.S. patients, consume most of our health care resources, and should be optimized; however, there are a large number of highly specialized, costly, life-altering treatments, and other subspecialty healthcare interventions which should not just be ignored. Americans expect access to latest surgical techniques, the most effective medications, and cutting-edge technology, without government restriction or attempts at rationing. This represents a huge cultural difference between patients in the United States and other nations. While uniformity of health care delivery has some value in being studied, additional metrics need to be designed and validated for use in quantifying these aspects of the U.S. healthcare system.  

3. Unsubstantiated conclusions
The authors of the Scorecard present their results in raw form, and then apply these results to reach broad conclusions, which are several steps removed from the data actually being presented; indeed, the data presented often does not directly support or relate to some of the Scorecard’s conclusions. Because there are so many such examples, it is reasonable to summarize them. In general, the authors conclude that universal health insurance in the U.S. would improve quality at substantial cost-savings, that increased primary care and preventative care always improves outcomes at reduced cost, and that “preventable” hospitalizations and outcomes are preventable entirely by policy changes in health care delivery, in the absence of a drastic culture shift in this country. These conclusions may have some validity, but they are not directly supported by the data presented. In fact, these are some of the issues being actively investigated in the medical and scientific literature at this time.

The authors of the 2008 Scorecard emphasize the point that what receives attention, and what gets measured and reported, gets improved. That is likely true, as hospitals and physicians scramble to meet compliance requirements with ever-changing guidelines. Numeric targets are set, and vast resources are dedicated to reaching “benchmark” goals. The result is that the follow-up measurements of these metrics likely do gradually improve. However, achieving the quantified, targeted scores does not necessarily mean that quality has improved.

This places the greatest burden upon those who study the U.S. healthcare system to define and select metrics which truly reflect the values shared by the American public, and which have the greatest impact on “quality” of care. Indeed, many physicians perceive that the quality of care they deliver is decreasing as a result of recently imposed regulations, some of which have been structured in response to reports of questionable validity. As a result, physicians are reluctantly leaving the practice of medicine in record numbers, leading to regional shortages, and decreased patient access to high-quality medical care. Given the potentially far-reaching implications of publications such as the 2008 Scorecard on policy-making and financing of healthcare, the authors and sponsors of these studies must be held to the same high standards as are physicians and scientists when reporting results of their investigations.This responsibility must be shared by the media, who control the dissemination of information, and must present the results of such studies in as objective and informative a manner as their audience deserves.

Nirit Weiss is a neurosurgeon practicing in New York City. She also has a Master of Business Administration and specializes in healthcare economics.

How the media covered the 2008 Scorecard
Trevor Butterworth

The Commonwealth Fund's National Scorecard for 2008 entered the realm of public debate through stenography rather than reporting. Journalists failed to pose even the most rudimentary questions about the methodological grounding for the report's sweeping conclusions, suggesting that when it comes to American health care, reporters and editors suffer from a bad case of conventional wisdom (our system is terrible) compounded by confirmation bias (an expert study confirms what assume to be true therefore the report is accurate it is).

As Reed Abelson reported in the New York Times While the U.S. Spends Heavily on Health Care, a Study Faults the Quality:

"American medical care may be the most expensive in the world, but that does not mean it is worth every penny. A study to be released Thursday highlights the stark contrast between what the United States spends on its health system and the quality of care it delivers, especially when compared with many other industrialized nations.

The report, the second national scorecard from this influential health policy research group, shows that the United States spends more than twice as much on each person for health care as most other industrialized countries. But it has fallen to last place among those countries in preventing deaths through use of timely and effective medical care...

...The findings are likely to provide supporting evidence for the political notion that the nation’s health care system needs to be fixed."

The Times also quoted Helen Darling, president of the National Business Group on Health, which represents big employers that provide medical benefits to their workers, as saying the report

“documents that it’s been as bad as we have been thinking it is... It proves once again if you have quantitative information and metrics and make people pay attention, they change.”

The International Herald Tribune headlined Abelson's articleDeep flaws found in U.S. health care”

Both the Washington Post (“U.S. Health Care Still Ill, Survey Finds”) and Forbes (same headline)  ran an article by Healthday News, which simply reported the Scorecard's conclusions and cited Karen Davis, the Commonwealth Fund's president, and Cathy Schoen, the Commonwealth Fund's senior vice president for research and evaluation, to explain the significance:

“‘The United States also lags behind other countries in health-care results, Schoen said. ‘Even where the U.S. average improved, other countries have improved much more rapidly,’ she said. ‘As a result, we are falling further behind the leaders.’”

No other expert opinion was offered in either article, but Post readers were invited to visit the Commonwealth Fund's website “To learn more about health care in the United States.”

Similarly, Medical News Today, provided readers with only a summary of the Scorecard's conclusions.

FOX Business Channel along with Reuters were the only sources of any criticism of the report's findings, but only in the form of a rewritten press release from the Hudson Institute, in which Betsy McCaughey, Ph.D., criticized the Commonwealth Fund for conclusions that were "political, not scientific." She continued:

"Astoundingly, the Commonwealth report gives the U.S. poor marks for 'capacity to innovate and improve to achieve excellence.' But the report's definition of innovation has nothing to do with new cures and new treatments.  It is defined as emphasizing primary care."


