Skills For Reading Negative Clinical Trials
Special Feature
Skills For Reading Negative Clinical Trials
By Gordon D. Rubenfeld, MD, MSc
Careful readers of critical care medical lit- erature must, unfortunately, learn an important skill: the interpretation of the negative randomized controlled trial (RCT). Ibuprofen, steroids, pneumonia prophylaxis, and gastric ulcer prophylaxis have each taken a turn in the intensive care batters box and struck out at improving mortality. There are many reasons why an effective treatment can look ineffective in a randomized trial. Sometimes, investigators have enrolled too few patients, so their results do not reach statistical significance. Researchers may enroll patients too late in their illness to respond to the treatment. Of course, the study hypothesis can be wrong, and, in fact, the treatment doesn't actually work. In this essay, I'd like to provide readers with three new reading skills to add to their toolbox for reading negative clinical trials: getting in touch with your "inner bias," understanding the measurements of effect, and resisting the lure of Table 1.
Getting in touch with your inner bias
If nothing else, modern literary criticism has taught us that no reader is objective. This is true whether you're reading Moby Dick or JAMA. We all bring a set of clinical biases, anecdotes, experiences, and preconceptions to a journal article. Ideally, we should read the entire article blinded to its major results and then decide whether the study is believable or not. What frequently happens is that we peek at the results and modify our level of criticism based on our level of agreement. We are concerned that the mortality of X% in the placebo group indicates that these patients were too sick or too well, or that the care they received in the study hospitals was better or worse than might have occurred elsewhere, and that these concerns either strengthen or invalidate the study's conclusions.
Although the people who designed the study almost certainly tried to identify ideal candidates and treatment for a positive study, the temptation to play Monday morning quarterback after a negative study is overwhelming.
There is no real cure for your inner bias. The best approach is to develop a strict routine for evaluating articles that will allow you to approach them consistently.
Measurements of effect: Ratios and differences
Somewhere in every RCT is an estimate of the treatment effect. One way to present the treatment effect is percent reduction in mortality. Headlines say, "New treatment for heart attack reduces mortality by 25%." Another common way to present the treatment effect is the relative risk-the ratio of mortality in those treated to those not treated. For example, if the placebo mortality is 40% and the treated group mortality is 37%, the relative risk for treatment is 0.92. This means treatment is associated with 92% of the mortality risk of non-treatment. Finally, and rarely presented, is the risk difference. This is simply the difference between mortality rates in the two groups.
What is interesting about these different measures is how dependent they are on the baseline mortality. The Table shows actual data from two different studies, thrombolytic therapy for myocardial infarction and a recently published study of ibuprofen for sepsis. Let's assume the results from both studies were statistically significant, (the ibuprofen study was not), and you are faced with deciding if the treatment effects are clinically significant. Thrombolytic therapy is associated with a 25% reduction in mortality, compared to 7.5% for ibuprofen and a relative risk of 0.75 compared with 0.92. It sure looks as if the effect of thrombolytics in myocardial infarction is much larger than ibuprofen in sepsis. Is it? If you treat 100 myocardial infarction patients with thrombolytic therapy, on average you will save three lives. If you treat 100 patients with ibuprofen (again, remember these results were not statistically significant so this example is only for illustration) for sepsis, you will also save three lives. The number of lives saved is the same, but all of the other measures of "effect" are different. This is simply a result of the fact that at lower mortality rates in myocardial infarction, the same number of lives saved has a bigger effect on the ratio measures of mortality. So, remember to compare the treatment effect estimate from an RCT with someruler. One useful rule of thumb is that many cardiology trials show a risk difference of about 3-5 lives saved per 100 treated. Evidence based medicine articles invert this and talk about the "number needed to treat to save a single life." In this example, thrombolytic therapy for myocardial infarction and ibuprofen for sepsis have identical numbers needed to treat -33.
Table
Data from two clinical trials
Thrombolytic therapy in myocardial infarction | Ibuprofen in Sepsis |
Placebo | Treatment | Placebo | Treatment | |
Mortality | 12% | 9% | 40% | 37% |
Percent reduction in mortality | [(12-9) ¸ 12] x100 = 25% | [(40-37) ¸ 40] x100 = 7.5% | ||
Relative risk | 9/12=0.7 | 37/40=0.92 | ||
Risk difference | 12%-9% = 3% | 40%-37% = 3% |
Resisting the lure of Table 1
You are reviewing the results of another negative RCT of yet another monoclonal antibody to treat septic shock. The placebo group had a mortality of 30%, and the treatment group had a mortality of 25%, (P = 0.3), but you wonder if is it possible that the difference in mortality was obscured because the treatment group had sicker patients? To answer your questions, you want to see the information frequently presented in Table 1 of RCTs, the comparison of clinical features of the treated and control groups. Before you do, consider the following: all differences in Table 1 variables must be the result of chance. Why is this important? It is a common misconception that the purpose of randomization is to create identical study groups. Randomization usually does lead to roughly similar groups-but not always. Just as a well-shuffled deck of cards sometimes deals a royal flush, randomized groups of patients sometimes are very different. Proper randomization is a function of the process of randomization and has not "failed" because the groups turn out to be (randomly) different.
Luckily, we have tools to tell us just what effect random differences in Table 1 variables might have on the study findings. If the treatment has no effect (and the study was properly blinded and a few other caveats having nothing to do with randomization), any observed difference in mortality and Table 1 variables must be the result of chance. If the drug has no effect, what is the chance that the mortality reduction seen in Table 1 is because of chance differences in the variables? Of course, this is the P value reported for the RCT, 0.3, or 30%. Without looking at a breakdown of Table 1 or anything else, you can conclude that an ineffective drug plus random differences between the two groups would be a very reasonable explanation of the observed data.
But these statistical arguments arent very compelling and you insist on looking at Table 1. There, you see that the treatment group had patients who were 30 years older, on average, than the placebo group. Now what? If the chance of obtaining the results in this study given an ineffective drug and random differences of 30%, what is it now that you know that random differences actually occurred?
Here's the problem-and it's a problem with statistics, not with RCTs or study design. There is no secret provision that treatment and control groups must be similar for the P value that compares the mortality in the two groups to be valid. What we want to know, and we scrutinize Table 1 to try to find out, is whether this trial is one of the false-negative studies created by chance. We look at differences in Table 1 and worry that the unfavorable randomization might provide some evidence of a problem.
The question is how to quantify or express this feeling of unease. Different readers with different biases will derive very different conclusions from the imbalances in Table 1. Those who strongly believe in the efficacy of the treatment are likely to conclude that the differences in Table 1 invalidate the trial, and those who think the drug is ineffective will dismiss the imbalances in Table 1. Regardless of Table 1, anyone's best quantitative guess of the probability of obtaining the study results if the treatment were ineffective and outcome differences were because of random imbalances in Table 1, is still the P value. In this example, that means that the trial is negative.
To summarize, while differences in Table 1 certainly make us anxious that random imbalances are a more likely explanation for the data than the P value suggests, there is no accepted, unbiased method of incorporating this anxiety into an updated estimate of the probability that the results are or are not due to chance.
Options for dealing with Table 1 imbalances
• Do nothing and take differences in Table 1 with a large grain of salt. Realizing that Table 1 can be used to justify a range of subjective conclusions about the data, some authors have advocated presenting data for Table 1 without a treatment/control breakdown. This lets readers know who the patients in the study were without tempting them to arbitrarily "adjust" the results. Since most people are unlikely to believe a single RCT, you can wait for the next "toss" of the randomization. If the next trial is consistent, the probability that both are a result of random imbalances is unlikely.
• Stratified randomization. This technique is like statistical insurance. By randomizing patients separately with specific values for age, severity of illness, and other Table 1 variables, it ensures that treatment and control groups will be identical for the stratified variables.
• Adjusted analysis. An adjusted analysis should be performed even if there does not appear to be imbalances in Table 1. These are powerful techniques that can adjust even extreme Table 1 imbalances and may improve the study's power. Methods like multivariate logistic regression or Cox survival analysis present a treatment effect that is unencumbered by differences in the Table 1 variables. It is interesting to note, however, that even in trials with grossly "unlucky" imbalances in randomization, adjustment rarely changes the results by much.
Problems with adjusted analyses
Discordant results. It is reassuring when adjusted and unadjusted analyses of the data yield similar results, but it isn't clear what to do when the adjusted analysis is "positive" and the unadjusted is "negative" or vice versa.
Dredging for covariates. It is possible to pick and
choose which variables go into the adjusted analysis. By selecting specific variables and not others, unscrupulous authors can turn a negative study into a positive adjusted analysis. To avoid this, the variables for the adjusted analysis should be specified in advance without any peeking at their effect on the treatment estimate.
Simplicity. Presenting the results of a simple analysis of a RCT in a 2 ´ 2 table has a certain blunt simplicity. Compare In a multivariate logistic regression model controlling for age, APACHE, presence of ARDS, and four other prognostic variables, the adjusted odds ratio for the treatment effect was 0.62 (95% CI 0.3-0.9). Given the mortality rate in this study, the adjusted odds ratio is likely to overestimate the actual risk reduction." to "The mortality in the placebo group was 40%, the mortality in the treated group was 25%, the relative risk was 0.62 (P = 0.04)."
Conclusion
Reading and understanding negative RCTs is an important task for critical care clinicians. There are many possible reasons why an effective drug might appear ineffective in a RCT. The well-prepared reader acknowledges his or her own clinical biases when critiquing a paper, understands that the size of the treatment effect depends on how it is presented, and approaches Table 1 with a grain of salt.
Recommended sources for further reading:
Begg CB. Suspended judgment. Significance tests of covariate imbalance in clinical trials. Control Clin Trials 1990;11:223-225.
Senn SJ. Covariate imbalance and random allocation in clinical trials. Stat Med 1989;8:467-475.
Enas GG, et al. Baseline comparability in clinical trials: Prevention of "poststudy anxiety." Drug Information Journal 1990;24:541-548.
Subscribe Now for Access
You have reached your article limit for the month. We hope you found our articles both enjoyable and insightful. For information on new subscriptions, product trials, alternative billing arrangements or group and site discounts please call 800-688-2421. We look forward to having you as a long-term member of the Relias Media community.