Turn 'So what?' into 'Oh, really!'
Turn 'So what?' into 'Oh, really!'
By Patrice Spath, ART
Consultant in Health Care Quality and
Resource Management
Forest Grove, OR
[Editor's note: This is the second article in a two-part series on helping quality professionals answer the "So what" question when they present performance data to administration and providers. In the January issue, Spath pointed out the flaws with some performance measures and showed how to put teeth into them. In this issue, she explains how to turn standard comparative data into high-powered information by using inferential tests. Providers and managers who receive this kind of information from their quality department will be better equipped to make changes that will influence the organization's ability to compete in the health care marketplace.]
Answering the "So What" question may require comparative data and statistical evaluation techniques. Ideally, the performance measurement program should include a comparative component. What are the outcomes achieved by individual caregivers in comparison to their peers? How is the organization as a whole doing in relation to other similar organizations?
Comparative data can be illustrated in a table such as Figure 1 (see p. 35). To minimize the impact of patient severity on this length of stay data, only patients considered to be low risk for operative mortality are included in the sample. The Society of Thoracic Surgeons' National Cardiac Surgery Database Risk Stratification system was used to calculate patients' severity score.
Statistical tests of significance, called inferential tests, can add even more information to the comparative length of stay data in the first three columns of Figure 1. Inferential tests are useful when comparing outcomes of a particular patient population to reference values and subgroups in the population. They can also be used to correlate treatment factors to outcomes in the target population.
Using the results of inferential tests, caregivers can determine if the differences observed among providers are likely due to chance alone. Was it merely by chance that the patients of physician #200 stayed in the hospital longer than similar patients treated by other physicians? Even if there was no actual statistical relationship between this outlier physician's treatment practices and a higher length of stay, sometimes you may find one anyway just because of the particular cases included in the study.
Inferential tests are often reported in terms of a level of significance -- the probability (P) that the number will occur, given the size of the study population. This is usually stated as P<0.10, P<0.05 or P<0.01. The P value measures surprise. The smaller the P value, the more surprising the outlier result if physician #200 has very similar patients to those of other doctors.
When the P value is between 0.05 and 0.01, the result is usually called "statistically significant"; when it is less than 0.01, the result is often called "highly statistically significant." Even when results are statistically significant, caregivers should ask if the results are also clinically significant -- strong enough to justify doing something different. Clinical significance is a subjective judgement based on peer discussions.
Inferential tests vary widely in complexity and difficulty. Some can be performed easily using basic arithmetic and algebra, while others require complicated data entry procedures in sophisticated computer programs. Inferential tests also vary widely in their power. Researchers can use results of certain tests to reach more definitive conclusions about the hypothesis. For example, tests comparing continuous measurement values for two groups are considered more powerful than those comparing frequency of two groups for a nominal measurement variable. Patients' exact temperatures at 24 hours postoperative are considered continuous measurement data, whereas checking for the presence or absence of an elevated temperature would be nominal measurement data.
The z-score is a commonly used inferential test that measures the difference between a data item (for example, patients length of stay for physician #200) and the mean of the overall data set (mean length of stay for all patients). The z-score expresses the deviation from the mean in standard deviation units and is synonymous with terms such as z transformation, a standard score, or a critical ratio. Converting actual values to a z-score allows clinicians to know how far the value is above or below the mean for the entire set of data. A z-score is a measure of deviation from the mean, relative to the standard deviation; that is, it is a deviation score expressed in standard deviation units. A z-score expresses any raw score as being a certain amount of standard deviations above or below the mean.
The arithmetic formula that can be used for calculating the z-scores for the lengths of stay of CABG patients in Figure 1 is:
x - x x
z = S = S
The letter S refers to the standard deviation of the sample population, which in this example is .5423165. The mean is represented by the x. A slightly different formula would be used for calculating z-scores on nonsample data.
To compute the z-score, compute the mean for the group, compute the standard deviation, subtracting the mean from the designated value, and divide the answer by the standard deviation. The z-score for a length of stay of 6.2 days is +2.10.
6.2 - 5.06 1.14
.5423165 = .5423165 = 2.10
This z-score demonstrates that the length of stay for physician #200 is 2.10 standard deviations above the group average for all physicians. The length of stay for physician #100, 4.5 days, has a z-score of -1.03; meaning this length of stay is 1.03 standard deviations below the average length of stay for the physicians under study.
A z-score greater than +1.96 or less than -1.96 may be considered significant by statisticians. The z-scores for each physician's length of stay can be seen in Figure 1. These scores help practitioners answer the "So what" question.
The best measures of performance are those that provide accurate and meaningful information and answer the "So what" question. Organizations cannot help but achieve success if administration and key medical staff leaders play a dynamic role in defining the measures, and quality management professionals do their best to design data collection strategies and data displays that answer the "So what" question.
[Editor's note: For more information on the Society of Thoracic Surgeons' National Cardiac Surgery Database Risk Stratification software systems, contact the Society at 401 N. Michigan Ave., Chicago, IL 60611-4267; or call (800) 767-0766 or (312) 644-6610.] *
Subscribe Now for Access
You have reached your article limit for the month. We hope you found our articles both enjoyable and insightful. For information on new subscriptions, product trials, alternative billing arrangements or group and site discounts please call 800-688-2421. We look forward to having you as a long-term member of the Relias Media community.