A Primer on

 

Alternative Study Designs for Evidence-based Practice:

Harnessing Natural Variation for Effectiveness Research

 

 

 

 

 

 

 

 

 

 

By

Susan D. Horn, PhD

 

August 22, 2007

 


This Primer is based on presentations during a conference titled: 

 

Alternative Study Designs for Evidence-based Practice:  Harnessing Natural Variation for Effectiveness Research

 

Principal Investigator:  Peter I. Buerhaus, PhD, RN, FAAN

Team Members:

            Susan Horn, PhD

            Brenda Cornett

            Jennifer Smith

            Roberta James, MStat

 

Organization:   Vanderbilt University School of Nursing

Inclusive Dates of Project:  October 20 – 21, 2005

Federal Project Officer:  Milford Henderson

Agency Sponsors:

            Dept of Health & Human Services

Agency for Healthcare Research & Quality (AHRQ)

National Center for Medical Rehabilitation Research (NCMRR)

            National Institutes of Health

Interagency Committee on Disability Research (ICDR)

National Institute of Child Health & Human Development (NICHD)

Pharmaceutical Research and Manufacturers of America (PhRMA)

            Vanderbilt University School of Nursing

            Institute for Clinical Outcomes Research, Salt Lake City, Utah

 

Award #:  1 R13 HS015954-01

 

Abstract of Conference

Purpose:  To discuss and refine alternative study designs to randomized controlled trials (RCTs) for effectiveness research and clinical decision-making; and to expand infrastructure for conducting clinical research within healthcare delivery system by increasing knowledge about rigorous alternative designs among researchers and policymakers.

 

Scope:  Alternative study designs to determine comparative effectiveness of treatments for clinical decision-making using analyses of existing administrative databases, MDS data from long-term care settings, registries, etc.; quasi-experimental designs, before-after or interrupted time series designs, longitudinal designs and cross-sectional designs; and Clinical Practice Improvement (CPI) study designs.

 

Methods:  Conference format over 1.5 days held in Washington, D.C., October 20-21, 2005.  Combination of plenary presentations by invited experts, workgroup discussions, and workgroup reports of key issues and recommendations.  Participants (n =96) included health services researchers, behavioral researchers, study design experts, clinicians, institutions, foundations, voluntary associations, health plans, journal editors, and policymakers in Federal, State, and local governments. 

 

Results:  Alternative Study Designs Primer was be prepared for dissemination.

 

 


The following is a list of conference speakers and workgroup leaders with their organizational affiliations:

 

 

 

NAME

SPEAKER / WORKGROUP LEADER

TOPIC

 

 

ORGANIZATIONAL AFFILIATION

Peter Buerhaus, PhD, RN

Speaker

Welcome and Introduction

Vanderbilt University School

  of Nursing

Susan D. Horn, PhD

Speaker

Clinical Practice Improvement (CPI) study design

Institute for Clinical Outcomes

  Research

Carolyn Clancy, MD

Speaker

Greeting and funder perspective

Director, Agency for Healthcare

  Research and Quality

Steven Tingus, MS, C.Phil

Speaker

Greeting and funder perspective

Director, National Institute on

  Disability & Rehabilitation Research

Michael Weinrich, MD

Speaker

Greeting and funder perspective

Director, National Center for

  Medical Rehabilitation Research

Gerben DeJong, PhD

Speaker, Moderator & Workgroup Leader

Introduction; Moderator of CPI workgroup

National Rehabilitation Hospital

Kelly Cronin, MPH

Speaker

Greeting and payer perspective

Centers for Medicare & Medicaid

  Services

Scott Gottlieb, MD

Speaker

Greeting and FDA perspective

Food & Drug Administration

Andrew Kramer, MD

Speaker

Administrative Databases

University of Colorado

Sharon-Lise Normand, PhD

Speaker

Quasi-experimental designs

Harvard Medical School

David Helms, PhD

Speaker

Health services research perspective

AcademyHealth

Robert Rhodes, MD

Speaker & Workgroup Leader

Surgery Board perspective

American Board of Surgery

Marcel Dijkers, PhD

 

Workgroup Leader

Moderator of Administrative Databases workgroup

Mount Sinai School of Medicine,

  New York

Ruth Brannon, MSPH, MA

Workgroup Leader

Moderator of Quasi-Experimental Designs workgroup

National Institute on Disability

  & Rehabilitation Research

Arthur Hartz, MD, PhD

Workgroup Leader

Moderator of Quasi-Experimental Designs workgroup

University of Iowa

Nancy Bergstrom, PhD, RN

Workgroup Leader

Moderator of CPI workgroup

University of Texas at Houston

 

 

 

 

 

 


A Primer on

Alternative Study Designs for Evidence-based Practice:

Harnessing Natural Variation for Effectiveness Research

 

Chapter 1 - Introduction

In a recent John Eisenberg Lecture, Don Berwick called for a broader health services research agenda and the development and application of new research methods to support this agenda.  “The challenge is to discover what we need to know that we do not now know in order to create much more effective systems of care.”1  He argues, “Health services research has not yet been sufficiently helpful in meeting the challenge of improving care in part because it has over-constrained both its methods and its favorite topics.  The cost of insisting on formal, classical, summative, evaluative experimental designs [randomized controlled trials (RCTs)] in an uncertain, poorly understood, nonlinear, system is, unfortunately, to maintain the status quo….Health services research should become more effectively part of the solution.  To do that will require that we enrich our portfolio of methods and broaden our agenda of inquiry.  The scientific methods that we need to enhance and dignify in academic settings will combine formal classical methods with some pragmatic, immediate, and in many ways more informative forms of learning and investigation.” 1 

We need alternative study designs that produce pragmatic, practice-based evidence that is useful and acceptable for practice and policy purposes.  A confluence of factors is driving the need for better evidence to improve clinical practice, that is, for better knowledge regarding effective practice, concern for costs and cost-effectiveness (value), quality, patient safety, and equity.  Better evidence will come from research that is clinically and practically applicable and generalizable.  The traditional emphasis on internally valid research ignores the requisites of sound generalization, external validity, effectiveness, and utility in practice.  Evidence-based medicine is “the integration of best research evidence with clinical expertise and patient values.” 1  In order to improve patient outcomes and population health we must move beyond generalizations based on belief and use new methodologies that are appropriate to clinical practice improvement.

Performing the “best research” requires identifying the best research methodology to answer a given question.  Evidence does not always flow from the laboratory to clinical practice; it can also be discovered in the study of clinical practice.  This Primer addresses the issue of “best research evidence” and how to integrate it with clinical practice and values.

The time for solutions in health care is now.  Many years after the March, 2001 IOM report “Crossing the Quality Chasm”, significant gaps in quality of care remain.2  The United States has the most expensive healthcare system in the world, yet there is ample evidence that increased spending does not always lead to better outcomes; indeed, it has been shown to be associated with worse outcomes in some areas.  The American healthcare system is complex, with micro and macro factors influencing patient outcomes.  Health services research must capture the complexity of this system in order to obtain results that are generalizable to patients and clinicians who function within the confines and realities of the system.  Improving outcomes means changing multilevel systems at the level of patients, facilities, organizations, health systems, and health policy.  We benefit little from understanding the potential impact of a single intervention when it is the larger context within which the intervention resides that often determines whether the intervention is successful or not. 

As researchers, we have failed to embrace adequately the clinical experience that exists among front-line clinicians.  There is a need to overcome gaps in dissemination that prevent translation of knowledge gained in research to practicing clinicians.  Many research methodologies involve clinicians only in the periphery.  Research designs that incorporate clinicians’ practical knowledge throughout every step of the process may increase our ability to transform research findings into practice.

Perfect evidence is an illusion―a useful motivating belief, but not the criterion for practice decisions.  What we need is good, relevant, reliable information on what most probably is effective, safe, and worthwhile.  An insistence on perfect evidence has led to an absence of good evidence that can be used to guide actual practice and policy.

There have been many methodological advances in experimentation, research design, and statistical modeling in the last 30 years.  It is time for the health services community to make better use of these improved designs.  Sophisticated alternative research designs have been developed that are as rigorous and able to demonstrate causality as RCTs.  Other study designs are weaker, but in certain situations can provide the best possible evidence given practical and ethical considerations.  These powerful research designs often go unused due to a lack of understanding and agreement on what constitutes strong alternative research designs and the circumstances and problems for which they are best suited.

Purpose of the Primer

The purpose of this Primer is to provide an overview of new developments in alternative, pragmatic, practice-based evidence research designs, including non-experimental research methods, and to elucidate those methods that are relevant to various tasks as well as limitations of some conventional approaches, including many RCTs.  Specifically, we will 1) describe continuing developments in strong, convincing, quasi-experimental research designs; 2) describe improvements in correlational research designs that allow stronger causal inference; and 3) distinguish these designs from poorly controlled observational comparisons (which can be the best research option in some circumstances).  The information presented here is intended for researchers, hospital and physician practice groups, grant reviewers, policy makers, funding organizations, and journal editors.

This Primer is derived from presentations at the “Alternative Study Designs for Evidence-Based Practice Conference” that took place on October 20-21, 2005, in Washington, DC, with funding from Agency for Healthcare Research and Quality (AHRQ), National Institute for Disability and Rehabilitation Research (NIDRR), Pharmaceutical Research and Manufacturers of America (PhRMA), National Center for Medical Rehabilitation Research (NCMRR), Vanderbilt University, and the Institute for Clinical Outcomes Research.  The Conference brought together policy makers, academics, researchers, and clinicians to discuss and further the use of alternative research designs.  The overall goal of the Conference was to develop cost-effective, pragmatic methodologies that allow research to identify interventions that are associated with improved outcomes for specific patients in the actual practice of care.

This Primer discusses four types of research designs, which are not mutually exclusive.  The first type is the randomized controlled trial, the gold standard of medical research.  The second focuses on study designs that use administrative databases.  At the conference, Dr. Andrew Kramer from the University of Colorado drew upon his years of research experience to address these issues.  Administrative database study designs take advantage of the large, currently existing administrative databases such as the Medicare Provider Analysis and Review (MEDPAR), Healthcare Cost and Utilization Project (HCUP), Minimum Data Set (MDS) for nursing, and many others.  These databases have been used to examine specific treatment methodologies, health services, and provider characteristics. 

The third type of research that we address is quasi-experimental designs, a broad category of research designs that include before and after designs, longitudinal designs, interrupted time series, and systematic treatment designs.  At the Conference Dr. Sharon-Lise Normand from Harvard University led this discussion. 

Our fourth research design is practice-based evidence for clinical practice improvement (PBE-CPI, or PBE for short in the rest of this Primer), which has been championed by Dr. Susan D. Horn.  PBE examines three sets of factors and the interactions among them.  The first is patient factors.  Patient factors, such as case mix classifications and severity of illness measures can be used to control for differences in populations.  The second is process factors: treatments, interventions, and medications, as well as the entire process of care, including the management and payment strategies in place.  PBE examines combinations of patient and process factors in order to identify their association with outcomes, which is the third factor.  Outcomes include clinical outcomes, such as health status of the patient, as well as measures like cost, length of stay, and number of encounters.  Utilization indicators can function as independent or dependent variables.  PBE brings a new level of rigor to this bolus of research designs.  At the conference, Dr. Horn described PBE study designs.

Some areas of health care research have adopted alternative research designs more quickly than others, even though RCTs cannot always accurately reflect real world situations.  Service systems are complex and adaptive; they do not provide a solitary intervention, such as a pill or a new device.  Examination of an intervention in the real world requires assessing the entire system within which the intervention is delivered. 

The health services research field has made limited use of a comprehensive or trans-disciplinary approach, which brings the best of all disciplines together in an active, participatory way.  Another aspect that is often overlooked in designing research studies is the clinical experience of front-line clinicians.  Much can be gained from harvesting their valuable experience and making it an integral part of research, not only in the research design, but also in the research process itself.  Involving front-line clinicians throughout the entire process facilitates clinical buy-in and knowledge transfer. 

One limitation of many research designs is that the clinicians who are participants in the study to one degree are not involved at every stage.  Thus, by the time the study is completed, the clinicians may not believe the final result, which slows the implementation of research findings.  We need to explore how to foster clinical buy-in that transforms the treatment community into effective advocates for research findings.  There is a great gap between science and actual practice.  This is due partially to our failure to engage our clinician colleagues in the entire research process, to encourage them to become advocates for the findings, and then to implement those findings.

Education is necessary but not sufficient for practice change to be implemented and sustained.  Education is not merely a matter of knowledge transfer or educating clinicians through continuing education programs, but a matter of engaging clinicians throughout the entire research process.

The field of health services has been slow to adopt standardized documentation.  We continue to use different kinds of instruments to evaluate patients.  For example, in post-acute care there are many patient assessment instruments; lack of crosswalks between them inhibits comparative work.   

The healthcare system and the policymakers and decision makers who drive it are finally coming to terms with chronic illness versus acute illness.  Methods such as RCTs work for assessing new drug products or for assessing new interventions in care.  For example, they can determine whether new intensive care units reduce mortality in people who have had an acute MI.  Yet RCTs are unable to identify which interventions help people with disabilities lead more productive, independent lives.

To be useful, research has to be timely, convincing to providers, valid in practice (not only in controlled settings), and practical to implement.  Until now, interventions often have been under-defined.  We need to define specifically all the steps of the intervention and document them adequately.  Many interventions have been recommended on the basis of reviews of sets of studies, but these reviews have not adequately specified the intervention, making it difficult to recommend implementation for practice. 

In response to these challenges, Don Berwick has said, "We now have embedded in healthcare an extraordinarily powerful belief system, and a set of behaviors around clinical evaluation of science.  This has taken us a long way from clinical practice guided by anecdote.  Among other consequences, this revolution in applied methods placed the randomized controlled trial at the top rung of design as the best way to learn.  But this commitment to sound evaluative science has also created a problem, namely that the journey we need to take now in seeking better systems of care will not yield to those methods alone.  To crack the problem of health systems improvement, we are going to have to be interested as colleagues in science in other methods for learning, as we were previously engaged in the new classical methods.  The formal methods of summative evaluation simply are not relevant when the hypotheses are many and vague, when alternative needs have evolved over time, when local knowledge is relevant and contains perhaps more transferable wisdom than bias, and when the confounders are not defects that spoil our learning but are themselves interesting and comprise the seeds of further progress.  And when the effects sought are large enough, we ought not to have a hard time detecting the signal within the noise."1

 

Chapter 2:  Randomized controlled trials: Strengths and limitations

RCTs are considered to be the gold standard for establishing the efficacy of drugs and other well-defined treatments and interventions.3  RCTs offer the most broadly applicable simple research design.  However, the simplicity that makes this research design so appealing can lead to oversimplification of interventions and their effectiveness in real world applications.  Finding ways to circumvent the limitations and shortcomings of the RCT, while maintaining a high level of internal validity, has led to some recent developments in alternative study designs. For this reason, it is important to understand both the strengths and weaknesses of the RCT.

RCTs began in the field of agriculture, where a few easily measured and controlled interventions and resulting outcomes could be investigated in hothouses.  This type of research design is the most effective way to determine the efficacy of medications and well-defined interventions, as it controls for natural variation and singles out a small number of interventions for careful examination.  RCTs prove causality by eliminating other confounders and by allowing for close examination of dose-response relationships.

It is customary in designing RCTs to develop a data collection tool that must be completed for every study patient.  Variables are defined precisely and providers and patients are paid to collect the data.  Also, careful monitoring of data reliability is performed.  RCTs are often very expensive.

RCTs are concerned with efficacy, i.e., with the question of whether a treatment works under ideal conditions.  Efficacy is simplest to determine when using a homogenous research population, but the requirement for homogeneity in the population that allows the researcher to determine the impact of the intervention also limits the ability to generalize the outcomes to the general population.  Thus, these studies have strong internal validity, but weak external validity. 

In contrast, effectiveness research, such as PBE, is concerned with the question of whether a treatment works under usual conditions of care.  Effectiveness studies seek to identify the natural variation in the population and determine how interventions affect different subgroups of patients.  Heterogeneity of the population is seen as a strength in these studies, and a means of gaining a clearer understanding of the intervention.  These studies attempt to examine interventions within the wider healthcare system, where care is actually delivered.  Thus, internal validity is weakened, while external validity is strengthened.  Since PBE studies are not randomized, outcomes may be influenced by treatment selection.  However, statistical methods can be used to overcome selection bias: matching, propensity scoring, and covariate adjustments.

There are methods for adapting RCTs to maximize their clinical relevance.  RCTs should be designed to study the treatment or intervention as it would be delivered in the clinical practice setting, using outcome measures that reflect the values of persons involved and society (such as cost-effectiveness).  In addition, studies should be conducted on a representative sample of patients in order to improve ability to generalize results to the wider public.  While some RCTs can be adapted according to these principles, in other cases this is impossible and contextual issues (how the intervention would work within the multilayered healthcare system) remain unresolved.  Also when it takes a long time to conduct a clinical trial, one can be left in the end with a greater understanding of the efficacy of methods that are no longer in use.

There are many threats to the validity of research inference, and selection bias―the primary validity threat that the RCT controls for―is only one.  Optimal research designs must consider all major threats to validity of inference, and clinically applicable research should be designed to address all issues of generalizability and utility in practice―not just selection bias.  The emphasis on RCTs as the gold standard of research has led to an oversimplification of the definition of high-quality research.  It is as if selection bias were the only important threat to validity of research.  Past evidence-based literature syntheses have been oriented around the RCT, sometimes to the extent of ignoring weaknesses of the RCT, and without sensitivity to the fact that other designs can, in certain circumstances, provide better, or indeed, the only evidence. 

Most reports concur that RCTs are needed to establish the efficacy of treatments.  While that recommendation is clearly justifiable, the reality is that RCTs pose substantial ethical and design challenges for many clinical practice questions and may produce results with limited generalizability.  In clinical practice, due to the wide variability in patient types and severity, the complex dynamic and interactive nature of treatments, and the increasing difficulty controlling for confounding treatment factors as more treatments are introduced, the care environment is not very conducive to establish the controlled experimental conditions necessary to conduct randomized trials.  In recent years, the need for new research methodologies to supply necessary missing pieces of information to clinicians and health policy decision makers has become increasingly apparent, as RCTs alone have failed to fill existing knowledge gaps.1,3   In summary, RCTs are a very important study design methodology, but we need to consider alternative designs depending on the research questions asked. 

CHAPTER 3: STUDY DESIGNS USING ADMINISTRATIVE DATABASES

The value of administrative databases has been well established for purposes of health planning, public health surveillance, examination of geographic variation, and the investigation of health disparities across socioeconomic status, racial, and ethnic disparity.  However, the value of administrative databases is far less well established for looking at practice-based evidence.   First we describe administrative data.  Second, we discuss how they can be used to generate practice-based evidence.  Third, we present their strengths and limitations.  And finally, we talk about what we can do to improve these data to provide practice-based evidence. 

Dr. Kramer defined administrative data as preexisting data collected for federal/state requirements, or any surveys or databases for a sample of patients or providers collected for general purposes.  Registries fall into the category of administrative data, as does Medicare cost and utilization information, nursing home MDS used for payment and quality purposes, and OASIS for home health care payment and quality.  Non-institutionalized population administrative data include the National Ambulatory Medical Care Survey (NAMCS) and the National Health Interview Survey (NHIS). 

How can administrative data be used to generate practice-based evidence?  Typically administrative data are used in observational studies.  Sometimes they can be used in quasi-experimental studies to supplement primary data collection in order to reduce respondent burden.  This is the case if one wants to use some kinds of information that can be found in secondary data sources for the same sample of patients that primary data are being collected but one does not want to collect the data from respondents directly.  Some examples of outcomes measures or effectiveness measures that can be found in administrative data are mortality, hospitalization, discharge disposition, etc.  Mortality is a reliable endpoint in most of administrative data sources, but it is not 100 percent reliable.  Data sources don't always agree, but usually one can triangulate and cross-check mortality with Social Security and other files, and verify mortality endpoints if multiple sources are used.   

Hospitalization, and particularly diagnosis-specific hospitalization, is another useful endpoint.  For example, Ambulatory Care Sensitive Conditions are used to look at quality of ambulatory care for conditions such as diabetes, COPD, and CHF.  People think hospitalization for these conditions might be completely avoidable and they look at rates of hospitalization as indicators of quality of ambulatory care for these conditions. 

A problem in most administrative databases is that there are pre-specified times when data are collected, so one is limited to those pre-specified times during analyses.  Surgical and medical complications based on ICD-9 codes during hospitalization are an example of data where pre-specified collection times can affect their usefulness, since these complications are collected after discharge but when they occurred during the hospitalization is not specified. 

For practice-based evidence purposes our administrative data analysis should be hypothesis driven.  This means that we must define an effect variable separately from other covariates that are being adjusted for.  One should not put all the variables into a model and say “This is what we found.”  One must be very clear about hypotheses up front.  Examples of effect variables that we can examine using administrative data are surgical procedures, new surgical technologies, specific services, treatment settings, and specific types of treatments that are coded with various codes.  We can also study frequency of different kinds of services, and examine availability of services in an area, which might be a good proxy for the extent to which services are used.  We can look at payer issues, e.g., managed care versus fee for service, and study which setting is more effective.  We can look at facility characteristics, such as the volume of services or teaching hospitals versus non-teaching hospitals.  And we can look at individual provider characteristics such as training levels, staffing levels in facilities, and physician specialty.

For example, consider an open appendectomy versus laparoscopic appendectomy.4  There is controversy about the indications for each surgery type.  A study used data about 20 percent of all U.S. hospital discharges.  It contained 43,000 appendectomy patients.  Of this group, 17 percent were laparoscopic and 82 percent were open.  Length of stay, complications, and mortality for appendectomy was examined, and there were an array of covariates, including perforation and abscess.  We found decreased length of stay, some decreased complications, and increased rate of direct discharge for laparoscopic appendectomy patients.  With stratifications, some of the complication differences went away.  Nevertheless, this is an example of what can be done with administrative data from acute care and has benefits over a single site randomized trial. 

A second example deals with indwelling catheter use in hip fracture patients, and looks at the issue of expanded use of indwelling catheters.5  This study used Medicare claims data and nursing home MDS data to look at the presence of catheters at the time of hospital discharge and some hospital characteristics.  There were 111,000 hip fractures sent to nursing homes in this study and 32 percent were discharged with catheters.  We studied rehospitalization for UTI and/or sepsis, return to community, mortality, and a whole array of covariates.  In particular there were variables like function and cognition from the MDS, justifying indications for catheter use, which is what one would be concerned about.  One certainly has to eliminate obstruction and retention as indications for use.  But following multivariable analyses, there was increased rehospitalization for UTI and sepsis, higher mortality, and decreased community residence for 30 days for patients with extended indwelling catheter use.

What are the strengths and limitations of using administrative databases for practice-based evidence studies?  The major strengths of administrative data include large sample size of subjects and providers, lower cost than primary data studies, and less time required for these studies since the data already exist.  One can address policy questions and questions about past practices.  There is no respondent burden and no need for consent.  However, one of the greatest limitations is the silo limitation.  We typically collect administrative data in silos such as hospital inpatient data separate from nursing home data.  The richer elements in these databases are often within silos.  And although one may link them, there are still some incompatibilities in timing, scale, and frequency.  There is also the issue of unmeasured confounders; administrative data often do not fit the specific needs of a research question.  They don't have all the controls that we want.  In summary, administrative databases can be useful to answer some questions, but they must be used wisely; we must realize their strengths and limitations for practice-based evidence.

 

CHAPTER 4:  quasi-experimental designS

Quasi-experiments, as defined in Campbell and Stanley, are experiments in which subjects are not assigned to condition or treatment variables.6  In order for an intervention to be a causal effect, the timing has to be right.  The cause must precede the effect and the cause must co-vary with the effect.  In addition one has to rule out alternative explanations for the causal relationship.  This is very, very important in observational studies where we do not have randomization.  Although one may have an observational database and want to use it, one must adjust for covariates because treated responses may differ from control responses in ways that are not caused by the treatment but by missing confounders.

Common quasi-experimental designs include before/after designs, longitudinal designs, regression discontinuity/quantified assignment designs, multiple interrupted time series with a stable baseline and follow-up series, etc.  Quasi-experimental designs are empirical investigations in which the objective is to understand causal effects.  These questions are: What are treatment effects?  What are intervention effects?  What are policy effects?   

The simplest longitudinal designs are before/after designs―one has n subjects at two times, before and after an intervention and there is no control group.  All the subjects receive the control at early times, and then receive the treatment at later times.  This may be a study in which one implements a new treatment.  Nobody has it at the beginning and everybody has it after the intervention.  The question is “What is the effect of the treatment?”  We are looking at pre and post the new treatment and there is no control group.  The strength of a pre/post design is that one has some information about the counterfactual (counterfactual theories of causation are ones where the meaning of a singular causal claim of the form "Event c caused event e" can be explained), because one sees what patients look like prior to the introduction of the new treatment or policy initiative.  However, there are many weaknesses including the fact that the treatment is completely confounded with the post time period and there could be selection maturation (selection-maturation threat results from differential rates of normal growth between pre-test and post-test for the groups).

What about a repeated interrupted series design?  This is a before/after design just expanded.  Here we have multiple “before” observations, and multiple “after” observations, but still no control group.  We have n subjects and observe their outcomes.  We still have the issue that all n subjects receive the control at earlier times.  But rather than one point, there may be a panel of observations, for example 10 monthly observations.  However, patients receive the treatment at all the later times.  What is the strength of this design?  Why not just stick to pre/post?  The strength is in multiple patient measurements, which provide information about current trends, and this helps to reduce sample size.  But one still has selection maturation effects, and again, there is no blinding and no control group. 

Clearly, we need a control group.  The control group should be in both the pre and post test time periods.  That is, some of the subjects receive the control and some receive the treatment at the same time and we have the untreated response in both groups at the baseline measurement.  What is the strength of this design?  Now treatment is not confounded completely with time. 

If possible one wants to have multiple observations pre and multiple observations post.  In some sense that is the ideal world.  Regardless, one still has to estimate the causal effect.  This is not the same as association.  We want to say x causes y. 

Regression is the most common method of analysis, but regressing a number of covariates results in association; it does not tell causation.  Nevertheless one still could run regressions but be careful about the timing of the treatment or policy.  Even if one has information about all the confounders, one should not just run regression analyses.  Regression is familiar to most people, is simple to interpret, and is easy for someone who has collected a long list of covariates to say that the law of survival equals the treatment plus the confounders.  But regression requires (a) a parametric model, (b) extrapolating different regions of the covariates, and (c) imposing certain functional relationships, such as saying that the relationship is linear or log linear. 

The major problem, even when one has all the covariates, is not knowing how comparable the treatment and control groups are.  In the observational world, after adjusting for as many confounders as possible, it is very difficult to see whether the treated and control groups are similar.  One can look at each covariate in the treated arm and in the control arm and look at differences.  However, since there are many covariates, it is really hard to see whether or not the groups are the same. 

Moreover, if the variances of the confounders differ between the treatment and control groups, then the bias is increased.  The group of people who receive treatment in an observational study may be more homogeneous than the group of people who do not receive the treatment, because people in the latter group may not get treatment for many reasons.   

There are several different strategies to estimate treatment effects.  One can do some exact matching or stratification, which is simple to interpret and there are standards of how to do it.  However, there can be too few observations to adjust for many possible confounders, and so the original database needs to be large.  Even then the values of the observed confounders for the treatment group may not overlap with those for the control group, i.e., the patients might not be comparable exactly.

Regression is good, but it is not the only analysis method to use.  Matching for stratification is good, but it can be problematic when using large observational databases because one has to use some of the confounders to match.  Another way is to produce a propensity score or some metric to summarize the difference between who receives the treatment and who does not.7  A propensity score is simple to interpret and standard software can be used.  One gains from looking at the comparability of the treatment and control groups based on a number that summarizes the information of all observed confounders.  An example of a propensity score is the Comprehensive Severity Index that is discussed in Chapter 5.   

An RCT is simple.  One looks at the difference in the outcomes between two groups: treatment and control.  Where people get nervous about observational data is that by definition one cannot use simple-minded methods.  One has to do a lot of work to show that the treatment and control groups are comparable based on the observables.  What about the unobservables?  People worry about them also.  If there is no lack of who gets the treatment and who does not, and if the treated and control groups look comparable, then one can be confident in the robustness of the effects. 

A well-designed quasi-experiment can provide valid inferences.  However, the investigator must work harder in the analyses to demonstrate that the treatment effects are causal and in particular must describe other possible causal explanations and why they either support or do not support the original findings.

 

CHAPTER 5:  PRACTICE-BASED EVIDENCE FOR CLINICAL PRACTICE IMPROVEMENT

PBE methodology is a novel and complementary practice-based evidence (versus evidence-based practice) approach to study effectiveness of clinical interventions.3,8,9  The PBE method involves statistical analyses of large databases that incorporate extensive details on patient characteristics including severity of illness and co-morbidities, standardized documentation of treatment details, and periodic outcome assessments.  This methodology has been used successfully to uncover important clinical associations between care and outcomes in multiple conditions, including a recent study that revealed several very specific and clinically-relevant insights from inside the ‘black box’ of stroke rehabilitation.10

PBE is useful to study a wide range of treatment options and practices in diverse populations and to determine how these factors interact to affect outcomes.  PBE is a rigorous observational method that is embedded within ‘real-world’ multidisciplinary clinical care and offers many advantages over a tightly constrained clinical trial.  It does not alter or suspend the treatment regimen to evaluate the efficacy of a particular intervention.  Instead, it collects detailed information on actual care practices and thereby captures the breadth and depth of patients, treatment regimens, and their interactions within the multidisciplinary setting.  The hypotheses and study design are developed specifically to answer questions faced on a daily basis by clinicians such as: “Does this treatment work as well as it is purported to work?  For whom does the treatment work best?”

The PBE methodology has the advantage of compiling data on a large number of patients―numbers that would not be available (or affordable) in an RCT with rigid inclusion and exclusion criteria.  The PBE approach controls statistically for patient differences by taking into account important patient covariates such as severity of illness and functional status, thus giving it an advantage over traditional smaller scale observational studies. Accepting a priori that potential confounding variables should be identified and measured, rather than eliminated, allows for a richer study.  This inclusiveness also allows for greater external validity (generalizability) of findings.   

Perhaps the best way to understand PBE is through some illustrative examples of the types of information that have been uncovered in applications using this approach.  One such example is the recently published report on stroke rehabilitation.10-12  A prospective observational cohort study was conducted on over 1,291 patients post-stroke treated at 7 inpatient rehabilitation facilities (six in the US and one in New Zealand).  In stroke rehabilitation, treatments are customized to meet individual patient needs with little guidance or adherence to established practice parameters or standardized treatment protocols.  Consequently, considerable variation in treatment approaches is seen from one patient to another and from one rehabilitation center to another.

Three types of data were collected for the stroke study: 1) patient characteristics that are used to formulate a Comprehensive Severity Index (CSI®), which is a validated unique component of the PBE approach.  It is an age- and disease-specific measure of physiologic and psychosocial complexity comprised of multiple clinical signs, symptoms, and physical findings; 2) process variables that detail what is being done in treatment; and 3) outcome variables such as severity of illness and functional status (e.g., Functional Independence Measure (FIM) scores). 

One of the unique features of PBE is its attention to the details of the process of care, looking inside the ‘black box’ of treatment.  Relevant details for some interventions (e.g., surgical procedures and medications) can be found in the medical record.  However, for interventions such as physical, occupational, and speech therapy, details of the clinical activities performed in any given session typically are not documented sufficiently in a patient chart.  One of the most impressive parts of the stroke study was the development of ‘point-of-care’ documentation using a form designed by the participating therapists that successfully captured what was being done in each therapy session.  Results indicated that, controlling for patient differences, certain activities and interventions were associated with better outcomes: more time spent in higher level rehabilitation activities such as gait training, upper extremity control, and problem solving, use of new psychiatric medications, and enteral feeding.  Initiating gait training very early in the rehabilitation process was associated with better outcomes, even for low functioning patients.  Equally important information was the fact that many treatments or activities that were used commonly and frequently failed to be associated with positive outcomes.  These findings could have important and immediate implications for stroke rehabilitation practices; however, while the inherent scientific value of the data obtained is widely acknowledged, some urge caution in direct application of these findings to clinical settings.13-15

The stroke study provides details on the development of the rigorous methodology implemented there and shows that it is both possible and feasible to obtain this type of detailed information in a complex setting such as stroke rehabilitation.12  The stroke study engenders confidence that use of PBE methodology will be similarly successful in capturing the nuances of multidisciplinary clinical care for patients with other conditions.  Pertinent findings from other PBE studies are as follows:

  1. Patients post abdominal surgery had shorter lengths of stay if fed early and sufficiently.  Also, use of patient controlled analgesia pumps, which is very common, was associated with a higher rate of wound infection and therefore poorer outcomes.16-17
  2. Children admitted with bronchiolitis who were 33-35 weeks gestational age had poor outcomes and subsequently it was recommended that prophylaxis be extended to this ‘older’ group of infants, as is standard practice for infants of lower gestational age.18
  3. Self-monitoring of blood glucose levels in patients with diabetes improves outcomes only if results are discussed with providers.19
  4. Use of disposable briefs dramatically decreased the incidence of pressure ulcers in long term care residents.20
  5. Several cost containment strategies (e.g., failure to use newer more expensive asthma drugs) were associated with higher health care resource allocation due to decreased treatment effectiveness.21

The major impetus for pursuing PBE methodology is the challenge many have faced in trying to design RCTs to evaluate efficacy of certain treatments as well as a failure to know how to proceed with data from a series of controlled studies indicating that several treatments are available for a specific indication, each of which have some degree of efficacy or effectiveness.  Unless studies comparing each viable treatment in each patient sub-group are done, clinicians are still at a loss to decide what works best for whom.  RCTs are considered the gold standard for efficacy, and this reputation is warranted because these studies are designed specifically to demonstrate that the measured effect can be attributed directly to the treatment.  However, RCTs are not without limitations.  The sterile and somewhat artificial treatment environment of a RCT and rigid inclusion-exclusion criteria greatly limit the generalizability of results.

Although research funders have long favored the RCT, they may be beginning to recognize that RCTs are not the most appropriate designs for many questions, particularly in the complex world of health care research.  PBE methodology uses various types of regression analyses and large numbers of patients, which allow the examination of multiple factors.  What is unique about the PBE method is the strong focus on patient severity, specifically the formulation of the Comprehensive Severity Index, which is based on many years of research.  In summary, PBE offers an alternative to traditional multi-center randomized clinical trials and it may be appropriate particularly to evaluate multidisciplinary care.  It imposes a structure and rigor on the establishment of a multi-center database that yields high quality and clinically pertinent data.  Recent published literature finds that significant effects in RCTs and observational studies are very similar.22-24

            This chapter provides an overview of the methods used in the practice-based evidence clinical practice improvement approach.  A PBE study is an observational cohort study that collects both prospective and retrospective data without interrupting the natural treatment environment.  PBE studies examine what actually happens in the care process and overcome shortcomings commonly attributed to observational studies by the ways they account for patient covariates and severity of illness. 

Although PBE studies resemble other observational studies that take into account patient demographic and setting characteristics that may affect outcomes and determine generalizability, PBE moves beyond traditional observational approaches to create comprehensive, complex databases that include detailed patient-specific descriptions, severity-of-illness measures, and characterizations of treatments for large samples of patients. 

Methods

Steps in a PBE Study

The purpose of a PBE study is to determine the relative contribution of specific interventions and therapies to patient outcomes taking into account patient differences and other contributing factors.  PBE methodology captures in-depth, comprehensive information about patient characteristics (including clinical signs, symptoms, and physical findings), processes of care, and outcomes needed to ascertain the contribution of individual processes to outcomes.  There are seven phases or steps in a full PBE study. 

1.  Create a multi-site, multidisciplinary Project Clinical Team whose tasks are to (a) identify outcomes of interest, (b) identify individual components of the care process, (c) create a common intervention vocabulary and dictionary, (d) identify key patient characteristics and risk factors, (e) propose hypotheses for testing, and (f) participate in analyses.  The multidisciplinary Project Clinical Team (referred to as the Team henceforth) builds on theoretical understanding, research evidence to date, existing guidelines, and clinical experience about factors that may influence outcomes.  PBE studies entail extensive front-line clinical staff participation in all phases of study design, data collection, and analyses. 

2.  Use the Comprehensive Severity Index to control for differences in patient severity of illness, including comorbidities that might otherwise affect outcomes.  CSI is an age- and disease-specific measure of physiologic and psychosocial complexity comprised of over 2,200 signs, symptoms, and physical findings.25-28 

3.  Implement an intensive data collection protocol that captures data on patient characteristics, care processes, and outcomes drawn from medical records and study-specific, point-of-care data collection instruments.  Data collectors are tested for inter-rater reliability.

4.  Create a study database suitable for statistical analyses.

5. Successively test hypotheses based on questions that motivated the study originally, previous studies, existing guidelines, and, above all, hypotheses proposed by the Team using bivariate and multivariable analyses including multiple regression, analysis of variance and covariance, logistic regression, hierarchical models, Cox proportional hazards regression, and other methods consistent with measurement properties of key variables.

6.     Validate study findings through an implementation phase that tests the predictive validity of the findings.  In this phase, findings from the first 5 steps are implemented and evaluated to determine whether the new or modified interventions are associated with better outcomes as predicted.

7.     Incorporate validated study findings into standard practice of care and practice guidelines.  After the validation of specific PBE findings, the findings are ready to be incorporated into care protocols.

The PBE approach uses detailed data on interventions that allow researchers to penetrate to the most meaningful level of resolution regarding the effects of the types of care rendered.  Thus, the PBE approach can answer study questions and hypotheses initially at a basic level but also allows researchers to drill down into the data with the help of additional insights offered by Team participants. 

Project Clinical Team

The Team provides expert advice to ensure clinical meaningfulness to create clear and compelling hypotheses, useful study variables, and appropriate analyses.  It usually contains a core group including the medical director or director of nursing (DON) from each participating site.  This core clinical Team develops and implements patient selection criteria, provides expert advice for data collection instrument development, obtains IRB approvals at their respective affiliated organizations, oversees the data collection process, and participates in analyses.  Over time and depending on project activities/needs, the Team expands to include representatives of each discipline in the clinical area treating each patient.  People from these disciplines from each study site provide expert advice specific to their fields of expertise.  Team members participate in weekly or biweekly conference calls over much of the PBE project.  Frequent team meetings via conference calls contribute to overall collaboration and investment in the study’s processes and findings.

Study Facilities

Study sites are selected based on their willingness to participate and geographic location.  Usually there are no specific criteria for selection; thus, study sites are not a probability sample of sites in the US.  Facilities can be for-profit or not-for-profit, free-standing, or part of an organization of facilities.  Facility level differences are controlled for using statistical analyses. 

Patient Selection Criteria

Each site contributes detailed data for a specified number of consecutive patients or for a specified time period using general criteria.  Facility size and rate of condition specific patient admissions determine the duration of the enrollment period.  Some sites enroll patients faster than others.  No eligible patients are excluded.  Patients from the study sites constitute a convenience sample. 

Each participating site obtains IRB approval for the study and enrolls consecutively admitted patients that meet a set of inclusion criteria specified by the Team.  Inclusion criteria usually include:

1.   Diagnosis.  List of diagnoses and/or procedures by code.

2.      Age.  Age limits exist for some studies, e.g., studies about adults may exclude children below specified ages.

3.      Reason for admission.  Reason for admission criteria may be established.  For example, the Post-Stroke Rehabilitation Outcomes Project (PSROP) used the first rehabilitation admission following current stroke, with the principal reason for admission being stroke.  The patient may have had previous strokes and previous rehabilitation admissions for previous stroke(s), but this is the first admission for the current stroke.  Current stroke must have occurred within one year of the rehabilitation admission.

4.      Transfer-out limitations.  Some studies create study inclusion criteria for patients with interrupted stays.  For example the PSROP Project Clinical Team decided that if a patient were transferred to another setting of care, e.g., acute hospital, and returned to the inpatient rehabilitation facility within 30 days, the patient remained a study patient.  If a patient transferred to another setting of care and returned to the facility after 30 days, participation in the study ended on the day of transfer.

There are no exclusion criteria that might otherwise limit the generalizability of findings.  Because PBE studies usually do not entail a new or experimental intervention for which patient consent is needed, there are no refusals or study dropouts and therefore, no need to compare study participants with study dropouts or need to account for patient selection effects that might otherwise occur.  Some PBE studies, however, do require patient informed consent, particularly if they wish to conduct patient or family interviews.  In these cases, comparisons between patients giving consent and refusing to give consent can be performed.

Sample size and power calculations 

Sample size can be determined using recommendations such as those of Cohen for modeling the magnitude of effect size.29  In some research (e.g., studies conducted in applied settings or new areas of inquiry), effect sizes may be small because the phenomena under study are not under good experimental or measurement control.  The smaller the effect size, the larger the sample required (other parameters being equal) to detect significant differences.  Cohen recommends that power calculations be performed assuming small, moderate, and large effect sizes based on the proportion of variance accounted for in the dependent variable.

Using these concepts and tables provided in Cohen, a sample of 1,800 subjects will have at least 80% power (with Type 1 error of p<.05 (2-tailed test)) to detect small effects (effect size of 0.15) of the predictor variables on outcomes.  The sample allows detection of differences in mean values of continuous outcomes that are 0.15 standard deviation units, and differences in discrete outcomes of 4% to 8%.  For regression analyses, independent variables that predict about 2% of the variance in outcomes can be detected.  When analyzing subgroups of patients, if, for example, 300 subjects are expected, then detection of medium sized effects (effect size of 0.35) with at least 80% power (with Type 1 error of p<.05 (2-tailed test)) is possible.  Models for these sub-analyses are sensitive to differences in mean outcomes that are 0.35 standard deviations, or between 10% to 17% differences in rates of an outcome.

Data Collection

Usually three types of study data are collected in PBE studies: (1) patient characteristics (e.g., admission severity of illness and functional status measures), (2) process variables (e.g., treatments and interventions), and (3) outcome variables (e.g., discharge functional status, discharge severity of illness, and discharge destination) and are obtained from multiple sources either at the point of care or from post-discharge chart review in the site.

Point-of-care data

An important component of PBE is its attention to the details of the process of care that the patient actually receives; it addresses interventions and patient management strategies.  PBE relies on information contained in patient medical records, which trained data collectors abstract following patient discharge.  The Team identifies those study variables that can be obtained from existing documentation at their respective sites.  However, they often believe that existing patient records do not adequately document specific activities and interventions provided by certain clinician specialists, e.g., physical, occupational, and speech language therapists, etc., in stroke rehabilitation, because much patient documentation is oriented to the needs of payment or reimbursement systems.  The Team recommends how to get all members of the treatment team to describe accurately what they do.  Thus, the concept of point-of-care intervention documentation can be incorporated into the study design. 

Point-of-care intervention documentation development

Discipline-specific specialty teams with representation from each participating study site conceptualize and then create discipline-specific point-of-care intervention documentation forms to record activities/interventions used with study patients.  This iterative process, which can include face-to-face meetings and telephone conference calls, can take several months depending on the level of detail desired and the extent that practice differs by site.  Clinicians sometimes find that definitions of common terms differ from site to site and practitioner to practitioner.  Thus, part of the effort requires agreement on definitions of terms by participating therapists.

Clinicians from study sites create an intervention documentation form that includes a taxonomy of activities used in each clinical area.  This work incorporates practices and definitions in existing frameworks, and the level of intervention intensity clinicians think is needed to capture a complete and accurate picture of the contribution made by that discipline to care (beyond what is already contained in traditional medical record documentation).  In addition to developing the content of its documentation form, each discipline decides upon the frequency with which its form should be completed.  The taxonomy provides a format into which clinicians document actual interventions performed with patients; the documentation forms do not suggest treatment strategies or changes to routine practice.

Intervention documentation forms are standardized for all sites.  Because development efforts include representatives from each participating site, the forms contain interventions that may be specific to one or more sites but are not used by all.  These ‘unique’ interventions are included on each site’s form even though most places do not use them.  Therapists are trained to record only what was done in the actual care process at each site for each patient.  As an example, see Appendix A for point-of-care documentation form used by physical therapists in the PSROP.

Point-of-care intervention documentation training/reliability 

During a pilot test period following development of each documentation form, practicing clinicians who worked on form development use their draft forms during patient treatment sessions and solicit input from clinician colleagues.  Discipline-specific weekly teleconferences provide the forum for clinicians to discuss pilot findings and agree to add, edit, or delete items from the form.  Each discipline’s documentation form is finalized following this pilot test period.

 Site clinicians are trained to use intervention documentation forms via discipline-specific train-the-trainer sessions attended by a lead clinician in each specialty from each site.  The Team facilitates this training for each clinical specialty using a training manual that includes paper and electronic copies of the intervention documentation forms, instructions for completing the forms, and definitions for all terms used on the forms.  Written case studies are included; several case studies are used to demonstrate how to complete each form based on a patient scenario.  Additional case studies are used to evaluate trainees’ understanding of instructions by providing examples of how to use the form for different patient scenarios. 

Following the training session, each clinical leader conducts on-site training sessions for their co-workers.  It is possible to have the same training team visit each study site to conduct training for point-of-care documentation for all clinicians.  With sufficient funding, such standard training is preferable.  Teleconferences for each group are held throughout the few months following training to provide clinicians the opportunity to discuss implementation issues and ask questions of their peers in other participating institutions. 

Each site incorporates auditing of intervention documentation form use into routine site practices.  Typically, a second therapist (usually the lead therapist) observes a patient session and completes a separate intervention documentation form based on what is observed.  The therapist providing the session completes a form as per protocol and the two are compared.  The lead therapist reviews and discusses differences in documentation with the practicing therapist.

Point-of-care intervention documentation form use 

Intervention documentation forms are completed as decided for each therapy session, surgery, and/or nursing day for each study patient.  Completed documentation forms are entered into the project database.  Methods to do this include optical character recognition interpretation, key entry, digital entry using handheld devises or laptops, etc.    

Point-of-care intervention documentation validity 

Face validity is built into the intervention documentation forms, since they are developed and used by site clinicians as described above.  Clinicians agree with the content of their respective forms by discussing findings from the pilot test and then agreeing to add, edit, or delete items from the form (content validity).

Showing significant effects of interventions on outcomes assesses predictive validity.   For example, the amount of variation explained in discharge FIM scores controlling for patient characteristics (including admission FIM, severity of illness, and demographic factors) was 40% for moderate strokes and 45% for severe strokes.  When total time per day spent on physical therapy (PT), occupational therapy (OT), and speech language pathology (SLP) was added, there was no increase in variation explained for discharge FIM, consistent with previous findings by Bode, Heinemann, et. al.30  However, when time per day spent in specific PT, OT, and SLP activities was added, the amount of variation explained increased to 52% for moderate strokes and 73%  for severe strokes, adding 12% to 23% explanation of variation, respectively, in discharge FIM.10   

Post-Discharge Chart Review 

To create a study database, a method is needed to enter data from post-discharge medical chart review.  One mechanism used in previous PBE studies is the Comprehensive Severity of Illness (CSI®) Software System that allows for both the input of severity of illness data and the creation of auxiliary data modules (ADMs), which are sets of study-specific data elements that are collected in addition to patient severity information.  The Team identifies and defines all patient, process, and outcome variables to include in the study ADM.  Using laptop computers, data collectors at each participating site enter chart review data into the CSI Software System. 

CSI: disease-specific severity of illness data (signs and symptoms) 

The signature component of the CSI Software System is the disease-specific severity system, hereafter referred to as CSI®.  CSI is an objective method to define severity of illness based on individual signs and symptoms of a patient’s diseases.  Between 1980 and 1992, Dr. Susan Horn, in conjunction with expert clinician panels originally at The Johns Hopkins Hospital, developed explicit severity criteria for each ICD-9-CM diagnosis code or group of similar diagnosis codes.  In order to keep severity criteria up-to-date with medical practice, the criteria are reviewed and updated via clinician panel discussions with each application of CSI.  CSI defines severity of illness as the physiologic and psychosocial complexity presented to medical personnel due to the extent and interactions of a patient’s disease(s).9,25-28,31 

Inputs to the CSI include over 2,200 disease-specific and age-specific severity criteria including physical findings, historical factors, physiologic parameters, and laboratory and radiology results at specified levels of abnormality found in a resident's chart.  Treatments provided do not contribute to severity of illness.  For example, intubation is not a severity criterion; severity criteria include patient signs, symptoms, and physical findings that led to a clinical decision to intubate (e.g., respiratory acidosis, absent breath sounds, cyanosis, etc.). 

Disease-specific criteria sets are determined by ICD-9-CM codes assigned routinely by trained facility medical records coding personnel.  CSI data collection is performed via retrospective chart review after patient discharge, and thus, all diagnoses assigned by the facility diagnosis coder appear on a front or summary sheet in the patient’s chart.  The CSI data collector enters the list of diagnosis codes into the CSI Software System, which then displays disease-specific criteria to a trained data collector, who abstracts the signs and symptoms that address the criteria from the patient’s medical record for specified time periods.  It is important to note that the existence of a diagnosis does not indicate the extent or severity of the disease.  CSI substantiates the diagnosis and allows for stratification based on documented patient signs and symptoms. 

As an example, the pneumonia criteria set involves the neurological, cardiovascular, and respiratory systems, vital signs, and laboratory and radiology values.   The presence of a pneumonia ICD-9 code (486, for example) prompts for questions from the pneumonia criteria set, as listed in Appendix B.  Each criterion is followed by response choices for the data collector to select; possible responses are presented in decreasing order of severity.  Responses for the pneumonia dyspnea question, for example, include dyspnea at rest, dyspnea on exertion, and other breathing difficulties.  The data collector selects the appropriate response based on information found in the patient chart; data collectors are trained to select the most severe response (by order of presentation).  A disease-specific criteria set exists for each group of similar ICD-9-CM codes; CSI contains over 5,500 criteria sets for specific diagnoses in five health care settings (acute care, rehabilitation, ambulatory, long term care, and hospice) with details similar to the pneumonia criteria set in Appendix B.

Each CSI criterion can be ‘answered’ separately for various time periods.  For example: ‘admission’ to the hospital or center (e.g., first 24 hours), ‘discharge’ from the hospital or center (e.g., last 24 hours), and ‘maximum’ (maximum CSI covers the full hospital or center stay, including ‘admission’ and ‘discharge’ periods).  The ‘maximum’ score reflects the most abnormal signs and symptoms regardless of when they occur during the stay.

CSI severity scores reflect the interactions of various health conditions and diseases, as derived from variables in the disease-specific criteria sets.  The CSI severity calculation engine assigns a ‘severity weight’ to each criterion response, which then contributes to a severity rating for each diagnosis for each review period.  To compute the overall severity score for a patient, the severity scores for all diagnoses are combined using disease-specific weighting rules that reflect the interaction of the diagnoses.  The overall patient severity level is scored on a continuous scale with non-negative integer values that are not subject to any preset maximum limit.  The more abnormal the signs and symptoms, the higher the score, which indicates that the patient is more severely ill.  For example, a patient with pneumonia and congestive heart failure (CHF) probably would have a higher severity score than a patient with pneumonia alone.  The congestive heart failure diagnosis does not indicate higher severity, but the signs and symptoms that determine acuteness of the disease contribute to the overall severity of illness of the patient.  If the CHF is controlled and the patient exhibits no abnormal symptoms of the disease, the diagnosis will not contribute to the overall severity score.  If, however, the patient exhibits symptoms of CHF such as shortness of breath, abnormal breath sounds, high pulse, low blood pressure, respiratory acidosis, etc., these symptoms will contribute to the overall CSI score.   Thus, to produce the overall CSI score, CSI logic takes into account the interactions of diseases that are present, their severity levels, and the clinical relationships of the diseases. 

Often a patient is ‘the sickest’ on admission and thus, the ‘admission’ and ‘maximum’ CSI scores will be the same.  However, when iatrogenic conditions develop, the ‘maximum’ CSI score becomes larger (more severe) than the ‘admission’ score; this is referred to as ‘increase in severity’.  ‘Discharge’ CSI scores typically are the lowest because patients have improved and stabilized throughout the stay. 

Advantages of the CSI approach to measure severity of illness include: disease-specificity, based on a concise, carefully-chosen set of relevant physiologic characteristics and physical findings of the particular disease rather than based on a standard set of physiologic factors applied to all diseases; comprehensive in scope with over 5,500 disease-specific severity criteria sets representing all[l3]  diseases for which there is an ICD-9-CM code; independent of treatments; and able to measure severity during specified time windows in the care process.  CSI has been validated extensively in many inpatient, ambulatory, rehabilitation, and long-term care settings since 1982.9,25-28,31-32 

Patient, process, and outcome data.  PBE methodology promotes collection of study-specific patient (in addition to severity of illness), process, and outcome data elements, identified and defined by the study Team.  These elements comprise the auxiliary data module (ADM) of the study.  ADMs typically contain over 200 variables, most with date and time fields so that they can be associated with other variables in time sequence, and many have numerous data entries.  For example, data related to vital signs, weight, pain, etc., can be collected for each day of the stay, so these single variables may have as many entries as the length of stay.  The ADMs contain an extensive table of selection choices for each variable; however, data collectors are trained to add to the selection table if a response is not present.  Examples of ADMs used in PBE studies are presented in several publications.12,32,33 

Patient variables in ADMs may include age, gender, race, payer source, psychosocial risk factors, and any other patient variables that the Team thinks can affect the outcomes.

Process variables in ADMs also are defined by the Team and may include therapy intensity and specific activities/interventions from point-of care documentation forms, oxygen use, medications during care, details of surgical interventions, nutritional interventions (e.g., diet type, tube feeding types and amounts), etc. 

Outcome variables in ADMs may include discharge functioning scores, death, discharge destination (home, community, institution), and complications such as deep vein thrombosis, electrolyte imbalance, anemia, urinary tract infection, pneumonia, falls, elevated white blood cell count, etc. 

Chart review training/reliability 

Reliable data collection is essential in PBE studies.  To accomplish this each site medical records abstractor completes a 3 or 4-day training session during which efficient and accurate collection of chart-review data is explained and practiced.  Following the training session, each data collector undergoes a rigorous manual reliability testing process to ensure complete and accurate data collection that goes beyond internal data editing features of CSI (e.g., features that prohibit entry of non-sensible values).  Reliability monitoring is conducted at several points throughout a PBE study to ensure that data abstraction accuracy is maintained throughout.  An agreement rate of 95% at the criteria level between each data collector and the Project training-team reliability person is required for each reliability test. 

Database Management

For each PBE study a comprehensive database is created that contains all point-of care and chart review patient data.  Patients and facilities are identified by Study ID number only and are not identified directly or through linked identifiers; thus they are compliant with HIPAA limited dataset requirements.  The entire database can be exported to a statistical software package, such as SAS (SAS Institute, Cary, North Carolina) for analysis.

Data Analyses

The study investigators and the trans-disciplinary Team members direct PBE analyses.  These researchers and clinicians have the fundamental knowledge and experience treating patients in the study area to know when associations are clear or whether additional explanatory variables are needed.  Clinical strengths of the Team combined with analytic experience result in clinically meaningful, statistically sound data analyses. 

      Management of Missing Data and Outliers 

When data are missing, adjustments are made depending on the variable and its intended use in analyses.  Sometimes values are categorized simply as “unknown” (and included in analysis as a dummy variable representing the missing category); sometimes patients with missing data are deleted from analyses; and sometimes continuous variables with missing data are collapsed into categorical data and placed with cases with missing information into a category using corroborating data.  For example, if a patient’s Body Mass Index is missing, but other weight- and height-related information exists (e.g., an order for a bariatric wheelchair), the patient may be categorized broadly as overweight or obese.  When missing data are material, one also can examine whether the patients with missing data in question are substantially different from the rest of the study group, and adjust accordingly.  Ranges for some variables are set to exclude unrealistic values and obvious outliers from the analysis.  Values beyond set ranges are considered improbable and not used in analysis. 

Preliminary Data Analyses 

Typically, the first phase of analysis uses descriptive statistics to examine frequencies of categorical patient, treatment, and outcome measures, and average, median, quartiles, and amount of variation (standard deviation and range) for continuous measures.  Bivariate analyses are conducted to test the relationship between each candidate predictor and other predictors and outcomes.  For discrete variables, contingency tables are created and chi-square tests, Fisher’s Exact tests, or Wilcoxon tests or Kendall’s tau (for ordered categories) are used to determine significance of bivariate associations.  Also categorical analysis of variance can be used to determine the proportion of variation in outcome explained by each predictor.  For continuous variables with normal distributions, Pearson correlation, 2-sample t-tests, or analysis of variance can be used.  For continuous variables with non-normal distributions, non-parametric tests are used including Spearman correlation, Wilcoxon rank sum tests, or Kruskal-Wallis tests.  Usually a two-sided p value <0.05 is considered statistically significant. 

Analyses of Primary Outcomes 

Analyses in PBE studies include application of correlational research methods.  These are most valuable to improve clinical practice, to elucidate the circumstances/context affecting quality of care, or implementation of a known treatment process (e.g., treatment of pressure ulcers).  Correlational research designs also provide invaluable hypotheses/probable findings that would never arise from RCTs/lab research.  In general, in circumstances we define, they can provide evidence of highly probable effectiveness (level 2)–which is the usual threshold for clinical decisions.  

The most common multivariable analysis methods used in PBE studies are hierarchical and least squares regression for continuous outcomes and logistic or Cox proportional hazard regression for dichotomous outcomes.  These types of regression analyses are used to identify patient and treatment variables that are associated with better outcomes.  Hence, these regressions include patient characteristics, such as severity of illness, age, gender, race, education level, and location and severity of injury, and individual treatments and combinations of treatments.  In all multivariable analyses, a p-value of <0.05/m (Bonferroni correction: m is the number of independent variables in the model) is considered significant. 

Using suggestions from the trans-disciplinary Team, potential predictors are allowed to enter the models.  Those that are not statistically significant are deleted sequentially from the full model.  Excluded variables can be reintroduced at various stages of model development as decided by the Team but final models usually include only statistically significant variables.  Two-way and higher order interactions can be included and tested along with non-linear transformations of variables suggested by the Team.  Regression analyses allow examination of the extent to which various process/treatment steps and facility variables are associated with outcomes, controlling for severity of illness and other patient factors. 

Analyses within subgroups of patients can clarify associations found in regression analyses using larger samples of patients.  For example, one might perform analyses within case mix groups (CMGs) to control for differences in initial injury severity or within diagnosis related groups (DRGs) to control for type of surgery and comorbidities.  A sample of 300 or more patients in a subgroup would allow up to 30 predictors in models without being over specified.  Using a 10:1 cases:variables ratio helps to avoid spurious correlations.  Because there can be many variables to use as possible predictors, variables can be grouped (e.g., patient variables as a group) and significant variables from each group can be included in final models. 

When performing patient-level regression analyses, patient characteristics are allowed to enter in order to determine the amount of variation in outcomes due to differences in patients.  Next, treatment variables are added to determine the amount of variation in outcomes due to differences in treatments delivered while controlling for patient differences.  In the next step, interaction and non-linear variables are added.  Only later are facility variables included, because if facility variables are significant, they do not tell us what to do to improve care.  We cannot send all patients to one facility.  PBE analyses first examine variation due to patient and treatment factors and their interactions, which give information about which treatments are better and for whom.  After including patient and treatment variables, including facility variables determines if there is any additional variation explained by facility variables that has not been captured already with the significant patient and treatment variables included in the models.  Often regression analyses are repeated using hierarchical models to determine if there are significant “among-site” components of variance and if any significant patient or treatment variables are lost in hierarchical models.

Hierarchical analyses address the fact that patients are treated within facilities, which may affect the independence of observations.  Alternatively facility descriptive variables or facility dummy variables may be included in regressions.  Site effects, which could be influential as determined by hierarchical models, may already be accounted for in the detailed patient and treatment predictors.  Researchers rightfully worry that patient observations may be correlated within a setting or that treatments may be correlated within a setting; independence of observations is the basic issue.  Hierarchical analyses are conducted to be sure that significant variables remain significant and in the same direction for both ordinary and hierarchical regression.34

A PBE study collects comprehensive detailed data on all factors that may influence outcomes for a specific group of patients.  The goal is to capture variables at the patient level that may differ across sites.  As a result, very detailed patient-level data about severity of illness, levels of impairment, and many other patient factors are collected, as are details about all interventions, including date and time, defined by the Team.  Hence, any differences in patients and treatments among the participating sites are likely to be captured in the detailed patient and intervention data used in PBE study analyses.  Using this level of detail helps to make observations within facilities less correlated in regression analyses.

Past analyses of PBE databases routinely have included both hierarchical and non-hierarchical multivariable analyses to predict outcomes, but we have not found differences in the significant factors identified by the two approaches.  The absence of a difference may be due to the detailed manner in which PBE data account for patient differences, including physiologic severity of illness information, and treatment differences at the level of detail of each treatment performed, with both time and date recorded. 

Regression coefficients and odds ratios on the independent variables are used to quantify the magnitudes and directions of effects of each predictor variable on outcomes.  Before analysis is started, pairwise correlations explore associations between independent variables (colinearity), and one of each pair of highly correlated independent variables (r > 0.75) is deleted. 

For logistic regressions, discrimination can be assessed using the area under the receiver operator characteristic curve (c) to evaluate how well the model distinguishes patients who did not achieve a specified outcome from patients who did achieve the specified outcome.  Values of c that are closer to 1 indicate better discrimination.35  In addition, the Hosmer-Lemeshow goodness-of-fit test can be used to evaluate the degree of correspondence between patients’ estimated probabilities of developing the specified outcome and the actual development of the specified outcome over groups spanning the entire range of probabilities (calibration).  Hosmer-Lemeshow p values that are closer to 1 indicate better fit.  R2 can be used to evaluate proportion of variation in continuous outcomes that is explained by the model.  R2 values closer to 1 indicate better models. 

Artifactual relationships are always possible in regression analyses.  However, in PBE methodology, analyses are not performed by including all possible variables and seeing what is significant.  Instead, the trans-disciplinary Team leads clinical analyses using theory, research evidence to date, existing guidelines, and real-world clinical experience.  Although many findings are not surprising, some significant findings may be surprising and these findings lead to more detailed analyses.  Various types of sensitivity analyses can be performed by including additional possible confounders, examining subsets of variables and patients with specific characteristics, and looking at multiple different slices of the data in order to determine if the surprising findings persist.  After exhausting all suggestions from clinicians as to what might explain the surprising associations, and if the findings persist, then providers and researchers feel more confident that the relationship is not an artifact.  Of course, significant patient characteristics or interventions are found only if some patients have them.  And clinicians who use the surprising significant interventions can speak to their effectiveness from personal experience.

In the PBE methodology not all possible associations can be articulated at the onset.  The PBE process depends on the ability to define identified outcome measures and control for possible covariates in order to identify best treatments.  While the investigation is governed by the proposed study’s broad hypotheses, PBE is also a discovery process based on post hoc analyses suggested by clinical professionals with fundamental knowledge of patient and treatment issues.  Data collection questions and analyses are processed regularly with the Team.  All analyses are discussed until the Team is satisfied that study questions have been addressed fully and findings are based on the most valid interpretation of the data. 

Certainty of conclusions (causality) from PBE analyses may be less rigorous than that of good RCTs, but much better than that often available for guiding clinical decisions.  Or conclusions can be very strong, if one takes into account the fact that the inference is based on both the joint probability of pre-existing knowledge and the correlational results.

PBE is an innovative approach to understand the impact of specific interventions on outcomes in routine clinical care.  PBE uses both existing research findings and practicing clinicians’ expertise to define the elements and analyze the data to capture the complexity of the care process.  Preliminary findings from previous PBE studies show quite clearly that PBE methodology can succeed in opening routine practice to scientific inquiry.

Due to the central role played by the Project Team in all aspects of PBE, this approach can be characterized as a form of “participatory action research”―a bottom-up approach that values the participation of those actually engaged in the care-providing process and garners their participation in implementing study findings.  PBE encourages new findings, even those that challenge conventional wisdom and long-standing practice.

Using a severity system, such as the Comprehensive Severity Index, enables going beyond controlling only for study disease severity: it allows control for many complex comorbidities common to patients (particularly elderly patients), reflecting more accurately the realities of clinical practice.  The strength of CSI’s mechanism for compensating or adjusting for differences among patients allows for a more powerful assessment of the effectiveness of therapeutic interventions.  CSI uses specific, disease-oriented questions to produce a highly sensitive measure of severity that cannot be produced by using diagnosis and/or procedure codes alone or a limited, fixed set of physiologic criteria no matter what the underlying diagnoses may be.  Diagnosis codes indicate existence of disease; they do not indicate extent or severity of disease. 

Limitations

PBE methodology relies on the expertise of participating facility clinicians to guide the development of high-level study hypotheses and identify critical data elements to study.  As such, these clinicians are aware of study data elements as they provide care and complete point-of-care intervention documentation forms or perform routine documentation practices.  This could be construed as introducing treatment or observational bias.  However, the number of clinicians who participate in the development of study instruments is a very small subset of all clinicians who care for patients in study facilities.  Intervention documentation forms and project hypotheses are designed to capture descriptions of actual practice, not alter practice patterns.  In addition, the novelty of attention to specific study questions would wane over the course of an extended patient enrollment period. 

As much as supplemental point-of-care intervention documentation forms provide an unprecedented level of detail about interventions, they also have limitations.  Add-on documentation to traditional site practices increases the documentation burden of front-line staff and allotted documentation time may not be sufficient to ensure complete documentation of both.  Intervention documentation form training usually is conducted via a train-the-trainer approach using a lead clinician in each discipline in each study site.  Thus, the training of the majority of clinicians is dependent on the expertise and time availability of the site trainers.  It is possible to have the same training team visit each study site to conduct training for point-of-care documentation for all clinicians.  With sufficient funding, such standard training is preferable.  Usually monitoring of documentation accuracy is an obligation of each study site.  If it is not done well, inaccurate data are likely to be noisy and would bias against finding significant treatment effects.

A physiologic severity indexing system, such as CSI, is limited by data availability.  Credentialed coding personnel at each facility assign ICD-9-CM codes as part of standard operating procedures; it is these codes that usually determine reimbursement.  A smaller number of ICD-9-CM codes may result in lower severity of illness scores when using a system that is built upon ICD-9-CM coding.  If laboratory tests are not ordered, findings are not clearly reported, or complications are not documented, the severity or incidence rate for the related conditions will be less.  The incidence and type of test ordering and availability of information may not be uniform across sites and could account for a portion of the site variability in CSI scores. 

One great concern about observational studies is that the relation between an intervention and an outcome may be confounded by other variables.  Usually confounders are controlled for through study design or statistical analyses.  Regression is a powerful tool to control for confounders, but many independent variables may be required.  With many independent variables another concern is over-specification (i.e., when the regression model has too many independent variables relative to the size of the study group).  Having a Team that raises many possible confounding variables and being careful with statistical methods, helps to overcome these limitations.  Despite these limitations, having micro-level data provide the ability to focus on the individual patient level to explore reasons for findings and discover many important associations between treatments and outcomes. 

In summary, PBE methodology creates a comprehensive database to assess the importance of such patient variables as gender, race, severity of illness, baseline level of functioning, and various therapy interventions on patient-centered outcomes.  The data describe the duration, intensity, and components of treatment regimens.  PBE studies allow discovery of treatment practices that are associated with better outcomes for patients with various levels of illness or impairment.  These include findings about surgical approaches, medications, physical therapy, occupational therapy, speech and language therapy, timing of treatments, nutritional support, etc., that are implementable in routine practice and have been found to be associated with better outcomes as predicted by PBE models.

 

CONCLUSIONS

We need more evidence―evidence that is reliable, strong, and generalizable to real world scenarios.  Simplistic beliefs in perfect rigor provided by large RCTs stand in the way of improving evidence for most clinical problems.   In addition to RCTs, strong alternative designs should be employed more often, and there are circumstances where correlational designs provide the best evidence given practical constraints or the nature of the questions at issue.  Alternative research designs can provide reliable evidence (level 1 or near level 1 where there are no plausible alternatives) for major and new treatments and complex system-level interventions.  There is also a need for research methodologies that provide reliable, good (level 2) evidence using correlational modeling with very good covariance matching analyses.  In certain circumstances, these studies provide the best information.  Weaker (level 3) studies also may provide needed useful information in other circumstances.  It is time to incorporate sophisticated research design considerations into evidence grading methods by distinguishing circumstances in which alternative study designs are strong or optimal from circumstances where RCTs are designed to provide the best evidence.

It is incumbent upon investigators, if using something other than an RCT, to demonstrate why it is a better approach and that it yields good statistically robust answers.  The goal of the Conference and this Primer was to expand the study design toolbox and help investigators decide the most appropriate design to use.  Clinicians and patients have to make decisions every day whether they have information or not.  We need to figure out the most expeditious ways of providing the best available information even if it is not definitive, when it is needed at the point of care in a way that is understandable.  We have to do this knowing that evidence is dynamic, and we should expect it to change and revisit it on a regular basis.

Increased use of sophisticated research designs and statistical methods can greatly increase the speed with which reliable information is obtained to improve knowledge of the effectiveness of interventions in clinical practice.  Hundreds of millions of dollars are spent nationwide to incorporate better physical and biological tools into research, e.g., MRI, genomic, and proteomic technology, but research designs and statistical tools are also critical to making a difference in practice.  The improved research designs and evidence evaluation sketched in this Primer can speed the progress of translational research and knowledge of what works best in actual practice, as well as discover new factors that are associated with improved outcomes in practice.  The quality, effectiveness, and value of health care in practice can achieve stunning gains.  The tools exist, they simply need to be used, because experimental designs such as RCTs are rarely feasible to evaluate complex interventions in the real world.

Acknowledgment.  It is a pleasure to thank the people who made this Conference a success.  Dr. Peter Buerhaus and his executive assistant, Brenda Cornett-Compton, created the proposal and skillfully handled many of the meeting details.  Dr. Gerben DeJong was a skillful moderator as well as a Workgroup Leader and on the Conference Planning Committee. Mark Johnston, PhD, helped create the outline for the Primer.  Linsey BenAmi, MPH, and Randy Smout, MS helped with editing the Primer.  The planning committee designed the conference agenda to make it as understandable as possible, and the presenters and workgroup leaders made an exceptional effort to implement the agenda: Carolyn Clancy, MD [Speaker], Steven Tingus, MS, C.Phil [Speaker], Kelly Cronin, MPH [Speaker], Scott Gottlieb, MD [Speaker], Andrew Kramer MD [Speaker], Sharon-Lise Normand, PhD [Speaker], Arlene Ash, PhD [Conf Plan Com], Alan B. Cohen, ScD [Conf Plan Com], John Corrigan, PhD [Conf Plan Com], Marcel Dijkers, PhD [Workgroup Leader, Conf Plan Com], Alan M. Jette, PhD, PT [Conf Plan Com], Arthur Hartz, MD, PhD [Workgroup Leader, Conf Plan Com], David Helms  [Speaker, Conf Plan Com], John Melvin, MD  [Conf Plan Com], Robert Rhodes, MD, FACS [Speaker, Conf Plan Com], Mary Stuart, ScD [Conf Plan Com], Ruth Brannon [Workgroup Leader, Conf Plan Com], Michael Weinrich [Speaker, Conf Plan Com], Nancy Bergstrom, PhD, RN [Workgroup Leader].


References for Primer

 

1.      Berwick DM, The John Eisenberg Lecture: Health Services Research as a Citizen in Improvement. Health Services Research 40:2 (April 2005):317-336.

2.      Institute of Medicine (U.S.), Committee on Quality of Health Care in America.  Crossing the Quality Chasm: a new health system for the 21st century.  March 2001. 

3.      Horn SD, Gassaway J. Practice-Based Evidence Study Design for Comparative Effectiveness Research. Medical Care 2007;45:10 (October Supplement 2).

4.      Guller U, Hervey S, Purves H, Muhlbaier LH, Peterson ED, Eubanks S et al. Laparoscopic versus open appendectomy: outcomes comparison based on a large administrative database. Ann Surg 2004; 239(1):43-52
5.      Wald H, Epstein A, Kramer A. Extended use of indwelling urinary catheters in postoperative hip fracture patients. Med Care 2005; 43(10):1009-10017

6.      Campbell DT, Stanley JC. Experimental and Quasi-experimental Designs for Research. Chicago: Rand McNally, 1966.

7.      D’Agostino RB Jr.  Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group.  Stat Med 1998 Oct 15;17(19):2265-81.

8.      Horn SD, DeJong G, Ryser D, Veazie P, Teraoka, J. Another Look at Observational Studies in Rehabilitation Research:  Going Beyond the Holy Grail of the Randomized Controlled Trial. Arch Phys Med Rehabil 2005;86(12 Supplement 2):S8-S15.

9.      Horn SD, Editor. Clinical Practice Improvement Methodology: Implementation and Evaluation.  Faulkner & Gray, New York, New York, 1997.

10.  Horn SD, DeJong G, Smout R, Gassaway J, James R, Conroy B. Stroke Rehabilitation Patients, Practice, and Outcomes: Is Earlier and More Aggressive Therapy Better? Arch Phys Med Rehabil 2005;86(12 Supplement 2):S101-S114.

11.  DeJong G, Horn SD, Conroy B, Nichols D, Healton E. Opening the Black Box of Poststroke Rehabilitation: Stroke Rehabilitation Patients, Processes, and Outcomes. Arch Phys Med Rehabil 2005;86(12 Supplement 2):S1-S7.

12.  Gassaway J, Horn SD, DeJong G, Smout R, Clark C, James R. Applying the Clinical Practice Improvement Approach to Stroke Rehabilitation:  Methods Used and Baseline Results. Arch Phys Med Rehabil 2005;86(12 Supplement 2):S16-S33.

13.  Jette AM. The Post-Stroke Rehabilitation Outcomes Project. Arch Phys Med Rehabil 2005;86(12 Suppl 2):S124-5.

14.  Ottenbacher KJ. The Post-Stroke Rehabilitation Outcomes Project. Arch Phys Med Rehabil 2005;86(12 Suppl 2):S121-3.

15.  DeJong G, Horn SD, Smout RJ, Gassaway J, James R.  The Post-stroke Rehabilitation Outcomes Project revisited. Arch Phys Med Rehabil 2006;87(April,4):595-597.

16.  Neumayer LA, Smout RJ, Horn HGS, Horn SD.  Early and Sufficient Feeding Reduces Length of Stay and Charges in Surgical Patients.  Journal of Surgical Research 2001;95(1):73-77.

17.  Horn SD, Wright HL, Couperus JJ, Rhodes RS, Smout RJ, Roberts KA, Linares AP.  Association Between Patient Controlled Analgesia Pump Use and Post-Operative Surgical Site Infection in Intestinal Surgery Patients.  Surgical Infections 2002;3(2):109-118.  Abstracted in Year Book of Surgery, 2003.

18.  Horn SD, Smout RJ. Effect of prematurity on respiratory syncytial virus hospital resource use and outcomes.  J Pediatrics 2003;143 (5 Suppl): S133-141.

19.  Blonde L, Ginsberg BH, Horn SD, et al.  Frequency of Blood Glucose Monitoring in Relation to Glycemic Control in Patients with Type 2 Diabetes, Diabetes Care 25:1 (January 2002) 245-246.

20.  Horn SD, Bender SA, Ferguson ML, Smout RJ, Bergstrom N, Taler G, Cook AS, Sharkey SS, Voss AC. The National Pressure Ulcer Long-term Care Study (NPULS): Pressure ulcer development in long-term care residents. J. American Geriatrics Society 2004 March;52(3):359-367.

21.  Horn SD, Sharkey PD, Tracy DM, Horn CE, James B, Goodwin F.  Intended and Unintended Consequences of HMO Cost-Containment Strategies: Results from the Managed Care Outcomes Project.  The American Journal of Managed Care 1996;2(3):253-264.

22.  Benson K, Hartz AJ. A comparison of observational studies and randomized, controlled trials. N Eng J Med 2000;342:1878-86.

23.  Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med 2000;342:1887-92.

24.  Ioannidis JP, Haidich AB, Pappa M, et al. Comparison of evidence of treatment effects in randomized and nonrandomized studies. JAMA 2001;286:821-30.

25.  Averill RF, McGuire TE, Manning BE, Fowler DA, Horn SD, Dickson PS, et al.  A study of the relationship between severity of illness and hospital cost in New Jersey hospitals.  Health Services Research 27(5): 587-617, 1992.

26.  Horn SD, Torres A Jr, Willson D, Dean JM, Gassaway J, Smout R.  Development of a Pediatric Age- and Disease-Specific Severity Measure.  J Pediatr 141:4 (2002): 496-503.

27.  Horn SD, Sharkey PD, Buckle JM, Backofen JE, Averill RF, Horn RA.  The relationship between severity of illness and hospital length of stay and mortality.  Medical Care 29:305-317, 1991.

28.  Willson DF, Horn SD, Smout RJ, Gassaway J, Torres A.  Severity Assessment in Children Hospitalized with Bronchiolitis Using the Pediatric Component of the Comprehensive Severity Index (CSI®), Pediatric Critical Care Medicine 1(2): 127-132, 2000.

29.  Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences, Second edition, Lawrence Erlbaum Associates, Inc, Publishers; Hillsdale, NJ

30.  Bode R, Heinemann A, Semik P, Mallinson T.  Patterns of Therapy Activities Across Length of Stay and Impairment Levels: Peering Inside the “Black Box” of Inpatient Stroke Rehabilitation.  Arch Phys Med Rehabil 2004;85:1901-1908.

31.  Horn SD, Sharkey PD, Gassaway J.  Managed Care Outcomes Project: Study Design, Baseline Patient Characteristics, and Outcome Measures.  The American Journal of Managed Care 1996;2(3):237-247.

32.  Horn SD, Bender SA, Bergstrom N, Cook AS, Ferguson ML, Rimmasch HL, Sharkey SS, Smout RJ, Taler G, Voss, AC. Description of the National Pressure Ulcer Long-Term Care Study (NPULS).   J. American Geriatrics Society 2002;50:1816-1825.

33.  Connor SR, Horn SD, Smout RJ, Gassaway JV. The National Hospice Outcomes Project (NHOP): Development and Implementation of a Multi-Site Hospice Outcomes Study. J Pain Symptom Manage 2005 March;29(3):286-296.

34.  Raudenbush SW, Bryk AS. 2002. Hierarchical Linear Models: Applications and Data Analysis Methods Second edition. Sage Publications. Thousand Oaks, CA.

35.  Hosmer DW, Lemeshow S. 1989. Applied Logistic Regression. John Wiley and Sons, New York, NY.

 


Appendix A. Definition of terms available upon request (pdf file download)

Appendix B. CSI Criteria Set for Pneumonia (pdf file download)