A Primer on
Alternative Study Designs for
Evidence-based Practice:
Harnessing Natural Variation for
Effectiveness Research
This Primer is based on presentations during a conference titled:
Alternative
Study Designs for Evidence-based Practice:
Harnessing Natural Variation for Effectiveness Research
Principal
Investigator: Peter I. Buerhaus, PhD, RN, FAAN
Team Members:
Susan Horn, PhD
Brenda Cornett
Jennifer Smith
Roberta James, MStat
Organization:
Vanderbilt University School of Nursing
Inclusive Dates of
Project: October 20 – 21, 2005
Federal Project
Officer: Milford Henderson
Agency Sponsors:
Dept of
Health & Human Services
Agency for Healthcare Research &
Quality (AHRQ)
National Center for Medical
Rehabilitation Research (NCMRR)
National
Institutes of Health
Interagency Committee on Disability
Research (ICDR)
National Institute of Child Health
& Human Development (NICHD)
Pharmaceutical
Research and Manufacturers of America (PhRMA)
Vanderbilt
University School of Nursing
Institute
for Clinical Outcomes Research, Salt Lake City, Utah
Award
#: 1
R13 HS015954-01
Abstract of Conference
Purpose: To
discuss and refine alternative study designs to randomized controlled trials
(RCTs) for effectiveness research and clinical decision-making; and to expand
infrastructure for conducting clinical research within healthcare delivery
system by increasing knowledge about rigorous alternative designs among
researchers and policymakers.
Scope:
Alternative study designs to determine comparative effectiveness of
treatments for clinical decision-making using analyses of existing
administrative databases, MDS data from long-term care settings, registries, etc.;
quasi-experimental designs, before-after or interrupted time series designs,
longitudinal designs and cross-sectional designs; and Clinical Practice
Improvement (CPI) study designs.
Methods: Conference
format over 1.5 days held in Washington, D.C., October 20-21, 2005. Combination of plenary presentations by
invited experts, workgroup discussions, and workgroup reports of key issues and
recommendations. Participants (n =96)
included health services researchers, behavioral researchers, study design
experts, clinicians, institutions, foundations, voluntary associations, health
plans, journal editors, and policymakers in Federal, State, and local
governments.
The following is a list of conference speakers and workgroup leaders with
their organizational affiliations:
|
NAME |
SPEAKER /
WORKGROUP LEADER |
TOPIC |
ORGANIZATIONAL AFFILIATION |
|
Peter
Buerhaus, PhD, RN |
Speaker |
Welcome and Introduction |
Vanderbilt University School of
Nursing |
|
Susan D.
Horn, PhD |
Speaker |
Clinical Practice
Improvement (CPI) study design |
Institute for
Clinical Outcomes Research |
|
Carolyn
Clancy, MD
|
Speaker |
Greeting and funder perspective |
Director, Agency for
Healthcare Research
and Quality |
|
Steven
Tingus, MS, C.Phil |
Speaker |
Greeting and funder perspective |
Director, National Institute on
Disability & Rehabilitation Research |
|
Michael
Weinrich, MD |
Speaker |
Greeting and funder perspective |
Director, National Center
for Medical
Rehabilitation Research |
|
Gerben
DeJong, PhD |
Speaker,
Moderator & Workgroup Leader |
Introduction; Moderator of CPI workgroup |
National Rehabilitation Hospital |
|
Kelly Cronin, MPH |
Speaker |
Greeting and payer perspective |
Centers for
Medicare & Medicaid Services |
|
Scott Gottlieb, MD |
Speaker |
Greeting and FDA perspective |
Food & Drug
Administration |
|
Andrew
Kramer, MD |
Speaker |
Administrative Databases |
University of Colorado |
|
Sharon-Lise Normand, PhD |
Speaker |
Quasi-experimental designs |
Harvard Medical School |
|
David
Helms, PhD |
Speaker |
Health services research perspective |
AcademyHealth |
|
Robert Rhodes, MD |
Speaker
& Workgroup Leader |
Surgery Board
perspective |
American Board
of Surgery |
|
Marcel Dijkers, PhD |
Workgroup Leader |
Moderator of Administrative
Databases workgroup |
Mount Sinai School of Medicine, New York |
|
Ruth Brannon, MSPH, MA |
Workgroup
Leader |
Moderator of Quasi-Experimental
Designs workgroup |
National Institute on Disability &
Rehabilitation Research |
|
Arthur Hartz, MD, PhD |
Workgroup
Leader |
Moderator of Quasi-Experimental
Designs workgroup |
University of
Iowa |
|
Nancy Bergstrom,
PhD, RN |
Workgroup
Leader |
Moderator of CPI workgroup |
University of
Texas at Houston |
|
|
|
|
|
A Primer on
Alternative Study Designs for
Evidence-based Practice:
Harnessing
Natural Variation for Effectiveness Research
In a recent John Eisenberg Lecture, Don Berwick called for a broader health services research agenda and the development and application of new research methods to support this agenda. “The challenge is to discover what we need to know that we do not now know in order to create much more effective systems of care.”1 He argues, “Health services research has not yet been sufficiently helpful in meeting the challenge of improving care in part because it has over-constrained both its methods and its favorite topics. The cost of insisting on formal, classical, summative, evaluative experimental designs [randomized controlled trials (RCTs)] in an uncertain, poorly understood, nonlinear, system is, unfortunately, to maintain the status quo….Health services research should become more effectively part of the solution. To do that will require that we enrich our portfolio of methods and broaden our agenda of inquiry. The scientific methods that we need to enhance and dignify in academic settings will combine formal classical methods with some pragmatic, immediate, and in many ways more informative forms of learning and investigation.” 1
We need alternative study designs that produce pragmatic, practice-based evidence that is useful and acceptable for practice and policy purposes. A confluence of factors is driving the need for better evidence to improve clinical practice, that is, for better knowledge regarding effective practice, concern for costs and cost-effectiveness (value), quality, patient safety, and equity. Better evidence will come from research that is clinically and practically applicable and generalizable. The traditional emphasis on internally valid research ignores the requisites of sound generalization, external validity, effectiveness, and utility in practice. Evidence-based medicine is “the integration of best research evidence with clinical expertise and patient values.” 1 In order to improve patient outcomes and population health we must move beyond generalizations based on belief and use new methodologies that are appropriate to clinical practice improvement.
Performing the “best research” requires identifying the best research methodology to answer a given question. Evidence does not always flow from the laboratory to clinical practice; it can also be discovered in the study of clinical practice. This Primer addresses the issue of “best research evidence” and how to integrate it with clinical practice and values.
As researchers, we have failed to embrace adequately the clinical experience that exists among front-line clinicians. There is a need to overcome gaps in dissemination that prevent translation of knowledge gained in research to practicing clinicians. Many research methodologies involve clinicians only in the periphery. Research designs that incorporate clinicians’ practical knowledge throughout every step of the process may increase our ability to transform research findings into practice.
Perfect evidence is an illusion―a useful motivating belief, but not the criterion for practice decisions. What we need is good, relevant, reliable information on what most probably is effective, safe, and worthwhile. An insistence on perfect evidence has led to an absence of good evidence that can be used to guide actual practice and policy.
There have been many methodological advances in experimentation, research design, and statistical modeling in the last 30 years. It is time for the health services community to make better use of these improved designs. Sophisticated alternative research designs have been developed that are as rigorous and able to demonstrate causality as RCTs. Other study designs are weaker, but in certain situations can provide the best possible evidence given practical and ethical considerations. These powerful research designs often go unused due to a lack of understanding and agreement on what constitutes strong alternative research designs and the circumstances and problems for which they are best suited.
The purpose of this Primer is to provide an overview of new developments in alternative, pragmatic, practice-based evidence research designs, including non-experimental research methods, and to elucidate those methods that are relevant to various tasks as well as limitations of some conventional approaches, including many RCTs. Specifically, we will 1) describe continuing developments in strong, convincing, quasi-experimental research designs; 2) describe improvements in correlational research designs that allow stronger causal inference; and 3) distinguish these designs from poorly controlled observational comparisons (which can be the best research option in some circumstances). The information presented here is intended for researchers, hospital and physician practice groups, grant reviewers, policy makers, funding organizations, and journal editors.
This Primer discusses four types of research designs, which are not mutually exclusive. The first type is the randomized controlled trial, the gold standard of medical research. The second focuses on study designs that use administrative databases. At the conference, Dr. Andrew Kramer from the University of Colorado drew upon his years of research experience to address these issues. Administrative database study designs take advantage of the large, currently existing administrative databases such as the Medicare Provider Analysis and Review (MEDPAR), Healthcare Cost and Utilization Project (HCUP), Minimum Data Set (MDS) for nursing, and many others. These databases have been used to examine specific treatment methodologies, health services, and provider characteristics.
The third type of research that we address is quasi-experimental designs, a broad category of research designs that include before and after designs, longitudinal designs, interrupted time series, and systematic treatment designs. At the Conference Dr. Sharon-Lise Normand from Harvard University led this discussion.
Our fourth research design is practice-based evidence for clinical practice improvement (PBE-CPI, or PBE for short in the rest of this Primer), which has been championed by Dr. Susan D. Horn. PBE examines three sets of factors and the interactions among them. The first is patient factors. Patient factors, such as case mix classifications and severity of illness measures can be used to control for differences in populations. The second is process factors: treatments, interventions, and medications, as well as the entire process of care, including the management and payment strategies in place. PBE examines combinations of patient and process factors in order to identify their association with outcomes, which is the third factor. Outcomes include clinical outcomes, such as health status of the patient, as well as measures like cost, length of stay, and number of encounters. Utilization indicators can function as independent or dependent variables. PBE brings a new level of rigor to this bolus of research designs. At the conference, Dr. Horn described PBE study designs.
Some areas of health care research have adopted alternative research designs more quickly than others, even though RCTs cannot always accurately reflect real world situations. Service systems are complex and adaptive; they do not provide a solitary intervention, such as a pill or a new device. Examination of an intervention in the real world requires assessing the entire system within which the intervention is delivered.
The health services research field has made limited use of a comprehensive or trans-disciplinary approach, which brings the best of all disciplines together in an active, participatory way. Another aspect that is often overlooked in designing research studies is the clinical experience of front-line clinicians. Much can be gained from harvesting their valuable experience and making it an integral part of research, not only in the research design, but also in the research process itself. Involving front-line clinicians throughout the entire process facilitates clinical buy-in and knowledge transfer.
One limitation of many research designs is that the clinicians who are participants in the study to one degree are not involved at every stage. Thus, by the time the study is completed, the clinicians may not believe the final result, which slows the implementation of research findings. We need to explore how to foster clinical buy-in that transforms the treatment community into effective advocates for research findings. There is a great gap between science and actual practice. This is due partially to our failure to engage our clinician colleagues in the entire research process, to encourage them to become advocates for the findings, and then to implement those findings.
Education is necessary but not sufficient for practice change to be implemented and sustained. Education is not merely a matter of knowledge transfer or educating clinicians through continuing education programs, but a matter of engaging clinicians throughout the entire research process.
The field of health services has been slow to adopt standardized documentation. We continue to use different kinds of instruments to evaluate patients. For example, in post-acute care there are many patient assessment instruments; lack of crosswalks between them inhibits comparative work.
The healthcare system and the policymakers and decision makers who drive it are finally coming to terms with chronic illness versus acute illness. Methods such as RCTs work for assessing new drug products or for assessing new interventions in care. For example, they can determine whether new intensive care units reduce mortality in people who have had an acute MI. Yet RCTs are unable to identify which interventions help people with disabilities lead more productive, independent lives.
To be useful, research has to be timely, convincing to providers, valid in practice (not only in controlled settings), and practical to implement. Until now, interventions often have been under-defined. We need to define specifically all the steps of the intervention and document them adequately. Many interventions have been recommended on the basis of reviews of sets of studies, but these reviews have not adequately specified the intervention, making it difficult to recommend implementation for practice.
In response to these challenges, Don Berwick has said, "We now have embedded in healthcare an extraordinarily powerful belief system, and a set of behaviors around clinical evaluation of science. This has taken us a long way from clinical practice guided by anecdote. Among other consequences, this revolution in applied methods placed the randomized controlled trial at the top rung of design as the best way to learn. But this commitment to sound evaluative science has also created a problem, namely that the journey we need to take now in seeking better systems of care will not yield to those methods alone. To crack the problem of health systems improvement, we are going to have to be interested as colleagues in science in other methods for learning, as we were previously engaged in the new classical methods. The formal methods of summative evaluation simply are not relevant when the hypotheses are many and vague, when alternative needs have evolved over time, when local knowledge is relevant and contains perhaps more transferable wisdom than bias, and when the confounders are not defects that spoil our learning but are themselves interesting and comprise the seeds of further progress. And when the effects sought are large enough, we ought not to have a hard time detecting the signal within the noise."1
RCTs are considered to be the gold standard for establishing the efficacy of drugs and other well-defined treatments and interventions.3 RCTs offer the most broadly applicable simple research design. However, the simplicity that makes this research design so appealing can lead to oversimplification of interventions and their effectiveness in real world applications. Finding ways to circumvent the limitations and shortcomings of the RCT, while maintaining a high level of internal validity, has led to some recent developments in alternative study designs. For this reason, it is important to understand both the strengths and weaknesses of the RCT.
RCTs began in the field of agriculture, where a few easily measured and controlled interventions and resulting outcomes could be investigated in hothouses. This type of research design is the most effective way to determine the efficacy of medications and well-defined interventions, as it controls for natural variation and singles out a small number of interventions for careful examination. RCTs prove causality by eliminating other confounders and by allowing for close examination of dose-response relationships.
It is customary in designing RCTs to develop a data collection tool that must be completed for every study patient. Variables are defined precisely and providers and patients are paid to collect the data. Also, careful monitoring of data reliability is performed. RCTs are often very expensive.
RCTs are concerned with efficacy, i.e., with the question of whether a treatment works under ideal conditions. Efficacy is simplest to determine when using a homogenous research population, but the requirement for homogeneity in the population that allows the researcher to determine the impact of the intervention also limits the ability to generalize the outcomes to the general population. Thus, these studies have strong internal validity, but weak external validity.
In contrast, effectiveness research, such as PBE, is concerned with the question of whether a treatment works under usual conditions of care. Effectiveness studies seek to identify the natural variation in the population and determine how interventions affect different subgroups of patients. Heterogeneity of the population is seen as a strength in these studies, and a means of gaining a clearer understanding of the intervention. These studies attempt to examine interventions within the wider healthcare system, where care is actually delivered. Thus, internal validity is weakened, while external validity is strengthened. Since PBE studies are not randomized, outcomes may be influenced by treatment selection. However, statistical methods can be used to overcome selection bias: matching, propensity scoring, and covariate adjustments.
There are methods for adapting RCTs to maximize their clinical relevance. RCTs should be designed to study the treatment or intervention as it would be delivered in the clinical practice setting, using outcome measures that reflect the values of persons involved and society (such as cost-effectiveness). In addition, studies should be conducted on a representative sample of patients in order to improve ability to generalize results to the wider public. While some RCTs can be adapted according to these principles, in other cases this is impossible and contextual issues (how the intervention would work within the multilayered healthcare system) remain unresolved. Also when it takes a long time to conduct a clinical trial, one can be left in the end with a greater understanding of the efficacy of methods that are no longer in use.
There are many threats to the validity of research inference, and selection bias―the primary validity threat that the RCT controls for―is only one. Optimal research designs must consider all major threats to validity of inference, and clinically applicable research should be designed to address all issues of generalizability and utility in practice―not just selection bias. The emphasis on RCTs as the gold standard of research has led to an oversimplification of the definition of high-quality research. It is as if selection bias were the only important threat to validity of research. Past evidence-based literature syntheses have been oriented around the RCT, sometimes to the extent of ignoring weaknesses of the RCT, and without sensitivity to the fact that other designs can, in certain circumstances, provide better, or indeed, the only evidence.
Most reports
concur that RCTs are needed to establish the efficacy of treatments. While that recommendation is clearly
justifiable, the reality is that RCTs pose substantial ethical and design
challenges for many clinical practice questions and may produce results with
limited generalizability. In clinical
practice, due to the wide variability in patient types and severity, the
complex dynamic and interactive nature of treatments, and the increasing
difficulty controlling for confounding treatment factors as more treatments are
introduced, the care environment is not very conducive to establish the
controlled experimental conditions necessary to conduct randomized trials. In recent years, the need for new research
methodologies to supply necessary missing pieces of information to clinicians
and health policy decision makers has become increasingly apparent, as RCTs
alone have failed to fill existing knowledge gaps.1,3 In summary, RCTs are a very important
study design methodology, but we need to consider alternative designs depending
on the research questions asked.
The value of administrative databases has been well established for purposes of health planning, public health surveillance, examination of geographic variation, and the investigation of health disparities across socioeconomic status, racial, and ethnic disparity. However, the value of administrative databases is far less well established for looking at practice-based evidence. First we describe administrative data. Second, we discuss how they can be used to generate practice-based evidence. Third, we present their strengths and limitations. And finally, we talk about what we can do to improve these data to provide practice-based evidence.
Dr. Kramer defined administrative data as preexisting data collected for federal/state requirements, or any surveys or databases for a sample of patients or providers collected for general purposes. Registries fall into the category of administrative data, as does Medicare cost and utilization information, nursing home MDS used for payment and quality purposes, and OASIS for home health care payment and quality. Non-institutionalized population administrative data include the National Ambulatory Medical Care Survey (NAMCS) and the National Health Interview Survey (NHIS).
How can administrative data be used to generate practice-based evidence? Typically administrative data are used in observational studies. Sometimes they can be used in quasi-experimental studies to supplement primary data collection in order to reduce respondent burden. This is the case if one wants to use some kinds of information that can be found in secondary data sources for the same sample of patients that primary data are being collected but one does not want to collect the data from respondents directly. Some examples of outcomes measures or effectiveness measures that can be found in administrative data are mortality, hospitalization, discharge disposition, etc. Mortality is a reliable endpoint in most of administrative data sources, but it is not 100 percent reliable. Data sources don't always agree, but usually one can triangulate and cross-check mortality with Social Security and other files, and verify mortality endpoints if multiple sources are used.
Hospitalization, and particularly diagnosis-specific hospitalization, is another useful endpoint. For example, Ambulatory Care Sensitive Conditions are used to look at quality of ambulatory care for conditions such as diabetes, COPD, and CHF. People think hospitalization for these conditions might be completely avoidable and they look at rates of hospitalization as indicators of quality of ambulatory care for these conditions.
A problem in most administrative databases is that there are pre-specified times when data are collected, so one is limited to those pre-specified times during analyses. Surgical and medical complications based on ICD-9 codes during hospitalization are an example of data where pre-specified collection times can affect their usefulness, since these complications are collected after discharge but when they occurred during the hospitalization is not specified.
For practice-based evidence purposes our administrative data analysis should be hypothesis driven. This means that we must define an effect variable separately from other covariates that are being adjusted for. One should not put all the variables into a model and say “This is what we found.” One must be very clear about hypotheses up front. Examples of effect variables that we can examine using administrative data are surgical procedures, new surgical technologies, specific services, treatment settings, and specific types of treatments that are coded with various codes. We can also study frequency of different kinds of services, and examine availability of services in an area, which might be a good proxy for the extent to which services are used. We can look at payer issues, e.g., managed care versus fee for service, and study which setting is more effective. We can look at facility characteristics, such as the volume of services or teaching hospitals versus non-teaching hospitals. And we can look at individual provider characteristics such as training levels, staffing levels in facilities, and physician specialty.
For example, consider an open appendectomy versus laparoscopic appendectomy.4 There is controversy about the indications for each surgery type. A study used data about 20 percent of all U.S. hospital discharges. It contained 43,000 appendectomy patients. Of this group, 17 percent were laparoscopic and 82 percent were open. Length of stay, complications, and mortality for appendectomy was examined, and there were an array of covariates, including perforation and abscess. We found decreased length of stay, some decreased complications, and increased rate of direct discharge for laparoscopic appendectomy patients. With stratifications, some of the complication differences went away. Nevertheless, this is an example of what can be done with administrative data from acute care and has benefits over a single site randomized trial.