Policy makers seek to transform healthcare delivery at home and abroad by shifting payment from volume-based to quality-based methods. This study explores elements common across pay-for-performance in 13 countries.
Policy makers at home and abroad have embraced pay-for-performance (P4P) as a scheme to improve the quality of care for patients. There were over 250 P4P programs in the United States in 2007, some of which we have reviewed in prior posts. Yet with the heterogeneity in program designs, comparison between individual programs is difficult.
This current research sought to add to the volume of information about P4P programs but on an international scale. In it, 13 P4P programs from 9 countries (UK, Israel, Australia, Germany, Taiwan, New Zealand, Canada, Netherlands, and Argentina) were analyzed for program design characteristics and overall effectiveness in achieving outcomes.
Available literature for this synthesis came not only from PubMed (for peer-reviewed literature) but also from various internet sources such as Google and Google Scholar. Included documents were written in either English, Dutch, or German and described program features such as incentivized processes, types and numbers of measures, methods to mitigate provider risk, monitoring and feedback efforts, financial incentives, and payment frequency.
Empirical data appeared scarce in this synthesis, thus descriptive information on program design served as the basis for the author’s observations. Of the thirteen P4P programs studied, seven were regional and 6 national. Eight of thirteen were conducted by public purchasers.
Compared to typical U.S. P4P programs, which tend to report on 10-25 measures, the bulk of international programs demanded reporting on significantly more measures with programs in the Netherlands and UK requiring 80 and 134 separate measures, respectively.
Most programs had some way of mitigating risk to providers, either through risk selection or utilizing composite scores.
Other than Canada’s Primary Care Renewal Model, which only investigates individual clinicians, every other P4P program made use of group measures. These may reflect hospital level observations but may also cover physician groups as well. The vast majority of P4P programs observed to date were voluntary, however some in Israel, Australia, and Argentina were mandatory.
Incentives varied across programs but many possessed bonus payments representing add-ons to traditional fees. Bonuses ranged from 4 percent to 30 percent. In the U.S., P4P payments averaged 7 percent for physicians and 2.5 percent for hospitals. Payments tended to be made annually, but in some cases semi-annually, quarterly, and even monthly in Taiwan’s National Health Insurance P4P program.
Despite the fact that the author stated that negative incentives tend to be stronger drivers of behavior than positive incentives (for a given incentive size), only two programs adopted financial penalties, i.e. taking from the lowest performers to reward the best performers.
Pay-for-performance programs internationally shared many common design features with U.S. programs. Among these include the focus on clinical quality, targeting physician groups in primary care, engaging providers in measure development, and paying on an annual basis. International programs tended to focus less on relative performance compared to the U.S.
Three important lessons arose from the observations made in this study. First, program design is critical to avoid undesired behavior such as teaching to the test and risk selection. Second, P4P must rely on “timely, reliable, and comprehensive performance data.” Third, incentives must be aligned across the continuum of care. Fourth, P4P programs require more formal evaluation. Lastly, provider input is essential to the process.
The author of this paper undertook an enormous task to catalogue and classify the spectrum of pay-for-performance (P4P) programs across international health systems. As there were over 250 P4P programs in the United States, just 5 years ago, an undertaking on this scale seems daunting.
The results described above demonstrate the lack of hard core, objective analysis and evaluation on the part of P4P program planners. As such, the five lessons provided by the author reflect nothing more that theory. Evidentiary proof remains elusive in the case of optimal P4P design.
However, I would have to agree with the assumptions and assessments made based on provider reactions to America’s most widespread P4P effort, CMS Hospital Core Measures.
Program design is fundamental to preventing risk selection and “teaching to the test.” Not only must collected data be accurate, it must be valid in the mind of providers in order to prompt buy-in. Without provider acceptance, quality measures may serve to infuriate rather than motivate.
The P4P movement represents an ever-changing landscape where theory and clinical practice will continue to collide.
Cedric Dark, MD, MPH