Limits of longitudinal decline for the interpretation of annual changes in FEV1 in individuals
- 1Division of Respiratory Disease Studies, National Institute for Occupational Safety and Health, Centers for Disease Control and Prevention, Morgantown, WV, USA
- 2Constella Group, Morgantown, WV, USA
- 3Division of Occupational and Environmental Medicine, Department of Family Medicine, University of California Los Angeles, Los Angeles, CA, USA
- 4Phoenix Fire Department, Phoenix, AZ, USA
- 5Section of Pulmonary, Critical Care, and Environmental Medicine, Department of Medicine, Tulane Medical School, New Orleans, LA, USA
- Dr E Hnizdo, Division of Respiratory Disease Studies, National Institute for Occupational Safety and Health, 1095 Willowdale Road, Morgantown, WV 26505, USA;
- Accepted 20 April 2007
- Published Online First 3 May 2007
Objective: Spirometry-based screening programmes often conduct annual assessment of longitudinal changes in forced expiratory volume in 1 second (FEV1) to identify individuals with excessive rates of decline. Both the American Thoracic Society (ATS) and the American College of Occupational and Environmental Medicine (ACOEM) recommend a reference limit value of ⩾15% for excessive annual decline. Neither the ATS nor the ACOEM adjust this limit for the precision of the existing spirometry data. The authors propose an improved method of defining the reference limit of longitudinal annual FEV1 decline (LLD) based on the precision of the spirometry data.
Method: The authors used data from four monitoring programmes and measured their data precision using a pair-wise within-person variation statistic. They then derived programme- and gender-specific absolute and relative LLD values and validated these against the 95th percentiles for observed yearly changes in FEV1.
Results: The relative limit for annual decline was more practical than the absolute limit as it adjusted for gender differences in the magnitude of FEV1. The programme-specific relative limit values were in good agreement with 95th percentiles for year-to-year FEV1 changes and ranged from 6.6% to 15.8%. For individuals with COPD and bronchial hyperreactivity the 95th percentiles for year-to-year changes were about 15% and higher.
Conclusions: The relative longitudinal limit for annual FEV1 decline based upon precision of measurements is valid and can be generalised to different gender and population groups. A relative limit of approximately 10% appears appropriate for good quality workplace monitoring programmes, whereas a limit of about 15% appears appropriate for clinical evaluation of individuals with an obstructive airway disease. Computer software based on the method described is available from the corresponding author.
The interpretation of an individual’s longitudinal spirometry data in workplace spirometry monitoring programmes should include assessment of both the level of lung function and rate of lung function decline. The level of lung function is usually interpreted against an expected value for an asymptomatic non-smoker of the same age, height, race/ethnicity, and gender. An abnormal lung function level is usually identified using the lower limit of normal (LLN), which approximates the one-sided 95% confidence limit for the expected value and identifies approximately 5% of healthy never-smokers as abnormal.
Interpretation of the rate of lung function decline is less standardised. There are sophisticated statistical methods designed for the analysis of historically collected longitudinal data comparing mean rates of decline among groups in research studies. However, there has been less effort in developing and validating two important practical applications: (1) evaluating an individual’s rate of decline prospectively for early identification of those with excessive lung function decline; and (2) monitoring the quality (that is, data precision) of workplace spirometry monitoring programmes. Currently, there are two recommended methods for prospective evaluation of the rate of decline in an individual. The American Thoracic Society (ATS) recommends a reference limit of annual longitudinal decline for forced expiratory volume in 1 second (FEV1) of 15% as clinically important decline.1 Alternatively, the American College of Occupational and Environmental Medicine (ACOEM) has proposed a longitudinal reference limit based on a 15% decline FEV1 for working populations.2 Neither of these recommended longitudinal limits takes into account the precision (that is, the within-person variability) of the existing spirometry data, thus prohibiting their use for evaluating the reliability of the predicted decline and for evaluating and enhancing the quality of the collected data.
Measurement errors can have a substantial effect on the amount of the within-person FEV1 variability observed in longitudinal spirometry data and on the uncertainty associated with the estimated rate of FEV1 decline.3–6 For example, as a result of differences in standardised testing procedures and/or varying adherence to those procedures over time, there can be substantial systematic differences in FEV1 data precision among monitoring programmes, even if they conform to ATS or European Respiratory Society (ERS) standards.7 8 As most monitoring programmes conduct annual or less frequent testing, the number of measurements is usually insufficient to obtain a reliable estimate of an individual’s within-person variation over a typical follow-up period. On the other hand, monitoring overall longitudinal data precision, using a pair-wise estimate of within-person variation estimated on a yearly basis on a group of workers, enables establishing overall data precision and detecting temporal variability in data precision in a spirometry testing programme.8 Furthermore, the pair-wise estimate of within-person variation allows for the development of a limit of longitudinal decline (LLD) that will provide a relatively simple and practical method for quality control of an individual’s longitudinal data, and for identification of excessive declines in FEV1.9
This paper further investigates the LLD method for facilitating interpretation of annual longitudinal changes in FEV1. In particular, we further evaluate the statistical validity of the absolute limit method,9 and propose and evaluate a relative LLD method. These methods have not been previously evaluated in the literature, and establishing their statistical validity across programmes with varying data precision and in groups with different demographic and respiratory health characteristics is needed. We address the following questions:
1. How valid is the ATS limit of 15% for the assessment of annual declines in workplace male and female populations, and in patients with COPD or asthma?
2. Is an absolute or per cent change criterion better for the identification of excessive declines?
3. What relative limit reference value would be appropriate for workplace monitoring programmes where testing adheres to ATS/ERS spirometry standards?
MATERIALS AND METHODS
Monitoring programmes studied
We provide examples of within-person variability in FEV1 from three distinct sources: (1) two workplace monitoring programmes conducted in manufacturing plants, which we have previously studied for longitudinal data precision (Programme 1 and Programme 2);8 (2) a spirometry monitoring programme done on more than 1600 fire-fighters (Programme 3); (3) a longitudinal epidemiological study of 5887 cigarette smokers with early COPD10 (Programme 4).
All the programmes were conducted in the US on predominantly white males and females. The programmes used equipment and applied procedures and computational methods consistent with ATS spirometric criteria.12 Workplace testing was administered by personnel who had successfully completed a National Institute for Occupational Safety and Health (NIOSH)-approved course in spirometric testing.13 The two manufacturing plants and Programmes 4 used the same make of dry-rolling seal spirometer throughout their follow-up (Infodyne Systems 8L and Spirotech S500, respectively). Programme 3 used a dry-rolling seal spirometer (Spirotech S400) up to July 2001, and then switched to a flow-based spirometer (Renaissance II). The longitudinal spirometry data for all programmes included the largest back-extrapolated FEV1 and FVC from the best three curves, and the ratio FEV1/FVC computed from the largest values.
For Programmes 1 and 2, central quality assurance of the spirometric tests was done by one of the authors (HWG) in both flow-volume and volume-time format.13 14 The spirometric data and responses to a standardised questionnaire on respiratory symptoms and disease was automatically computerised. Asthma was defined as a positive response to a question: “Do you have asthma?” Programme 3 was conducted by trained technicians12 and no special central quality control was done; only demographic and spirometry data were available. Programme 4 was a randomised clinical trial designed to determine the effect of a smoking cessation intervention and bronchodilator use on the rate of decline in FEV1 in cigarette smokers with early COPD.10 15 Spirometry measurements were obtained over six annual visits in several centres using stringent standardised methods of testing.10 The presence of bronchial hyper-reactivity (BHR) was established by a methacholine challenge conducted at the onset of the study. BHR was defined by the slope of methacholine dose-FEV1 relation (Methacholine Challenge Slope, SMCT) as (difference between baseline FEV1 and final FEV1)/(highest dose of methacholine used).15 Individuals were categorised into quartiles for SMCT; those in the lowest quartile were categorised as having BHR.
The present study was approved by the NIOSH Human Subject Review Board.
We first evaluated the temporal changes in data precision over all years of follow-up for each monitoring programme, using the pair-wise measure of within-person variation sp (see Appendix). Next, we calculated programme- and gender-specific average pair-wise estimates of within-person variation s̄p and pair-wise estimates of relative within-person variation s̄r (Appendix). Using these estimates of within-person variation, we then calculated programme- and gender-specific absolute and relative limits for an annual decline, and evaluated their agreement with the 95th percentiles for year-to-year changes in FEV1. We also estimated these limits for individuals with asthma and COPD.
Limits of longitudinal decline
The absolute and relative limits of annual longitudinal decline for FEV1 is defined as follows:
1. Absolute limit of longitudinal decline (LLDa) (ml) is the approximate one-sided 95% confidence limit for longitudinal decline:9
LLDa = t (b +1.645×SE(b))
where b is the referent slope, for which we used the value of 30 ml/year,16 and t = 1 represents one year of follow-up. The standard error of the slope b is given by a formula derived by Schlesselman:17
Here, P = 2 is the number of repeated measurements done during the follow-up time t of one year, and σw is the within-person standard deviation. Note that, in comparison to the duration of follow-up t, P has only a small effect on SE(b) and on LLDa.9 By substituting the programme- and gender-specific values of s̄p for the within-person variation σw in equation (2), we predicted the programme- and gender-specific LLDa.
2. Relative limit of longitudinal decline (LLDr) standardises for the size of FEV1, and is defined as
where b is standardised by the programme- and gender-specific mean baseline FEV1, and SEr(b) is the approximate standard error of
calculated by substituting the programme- and gender-specific values of s̄r for the within-person variation σw in equation (2).
For an individual, the reference limit for an annual FEV1 decline may be calculated in terms of the level of FEV1. Individuals whose FEV1 values fall below this reference limit should raise concern. The limit can be calculated in terms of the individual’s baseline (previous year’s FEV1) FEV1b value and LLDa, and LLDr as follows:
FEV1 = FEV1b− LLDa or FEV1 = FEV1b− FEV1b×LLDr.
Agreement between the limits of longitudinal decline and 95th percentiles
We evaluated the agreement between programme- and gender-specific LLDa values and the 95th percentiles for all absolute year-to-year changes (ΔFEV1 = FEV1b−FEV1), and similarly we evaluated agreement between the LLDr values and the 95th percentiles for all relative year-to-year changes (%ΔFEV1 = ΔFEV1/(FEV1b + FEV1)/2×100). To examine the relation between the two sets of measures, we first plotted the limits against the 95th percentiles and then tested the differences using the Bland-Altman method.18
Agreement between LLD values and 95th percentiles in individuals with respiratory disease
We also calculated the appropriate LLDr values for one year of follow-up for individuals reporting asthma (using data from Programme 2), and for individuals with early COPD, according to the severity of bronchial BHR (using data from Programme 4 where BHR was measured).
Table 1 provides programme- and gender-specific baseline demographic data and duration of follow-up for individuals who had at least one follow-up period within 14 months. Because Programme 3 had a marked change in data precision when a new spirometer started to be used (fig 1), we conducted the evaluations for two periods: years 1990–9 (Programme 3A), and years 2000–4 (Programme 3B). The three workplace programmes (Programmes 1 to 3) had similar demographic characteristics, but individuals in Programme 4, who had early COPD, were older and had lower mean FEV1.
Figure 1 shows the time-related differences in data precision among the programmes, as measured by the yearly values of the pair-wise within-person variation sp. For Programme 4 males and females are plotted separately; in the other programmes the effect of females on the overall sp values was small and these were combined with males.
Table 2 shows the number of year-to-year intervals on which the calculation of the limits of annual decline and the 95th percentiles were based, and the mean duration of the intervals. For the absolute and relative limits, the table shows the programme- and gender-specific pair-wise within-person standard deviations s̄p and s̄r, the calculated limits LLDa and LLDr, and the 95th percentiles for ΔFEV1 and %ΔFEV1. Figure 2 shows the predicted LLDa values for s̄p ranging from 70 ml to 270 ml (line), and the observed programme- and gender-specific 95th percentiles (points) plotted against s̄p. Notably, males have consistently higher within-person variability s̄p and higher 95th percentiles by almost 100 ml, and males and females from Programme 4 have second lowest 95th percentiles and estimated LLDa values (table 2). The mean difference between LLDa and the 95th percentile for ΔFEV1 was 15.6 (SD 21.7) ml/year. The LLDa method overestimated the 95th percentiles by about 16 ml/year, and all the individual differences were within the limits of agreement of −26.9 to 58.1 ml/year, calculated as 15.6 (SD 1.96).18 The average difference of 15.6 ml/year is acceptable in view of the current recommended limit for annual decline of 15% (about 630 ml for FEV1 of 4.2 l—the mean value for males from Programme 1).
There was also good agreement between the predicted LLDr and the 95th percentiles. Figure 3 shows the predicted LLDr for within-person variation s̄r ranging from 2% to 7% (line) and the programme- and gender-specific 95th percentiles for %ΔFEV1 (points) plotted against s̄r. The mean difference between LLDr and the 95th percentile for %ΔFEV1 was 0.45% (SD 0.80%) (fig 4). The LLDr method overestimated the 95th percentiles by about 0.5% on average and all the differences ranged within the agreement limits of −1.1% and 2.0%. This magnitude of agreement is acceptable.
The practical advantage of LLDr over LLDa is that the adjustment for FEV1 size had effectively adjusted for the gender differences seen in figure 2. Also, the adjustment for the smaller FEV1 size for those with early COPD from Programme 4 resulted in the LLDr values being higher than those for the workplace monitoring programmes. Since Programme 4 had excellent quality control, measurement error is an unlikely reason for the higher relative variability in FEV1, unlike in Programme 3B where measurement error was likely the main reason for the increased variability.
Effect of asthma and BHR on year-to-year variability in FEV1
Table 3 gives the relative limit statistics for individuals who reported asthma (from Programme 2), and for individuals with early COPD by quartiles of BHR (from Programme 4). Males and females were combined for this analysis since their results for the relative within-person variation did not practically differ. For individuals with asthma, the estimated LLDr was 12.8%. For individuals with early COPD, the LLDr was about 18% for those with greatest severity of BHR (BHRq1), and around 10% for individuals with least BHR (BHRq4). Thus in Programme 4, BHR associated with early COPD appears to be the main determinant of the increased within-person variability. In the quartile with least severe BHR, the within-person variability was comparable to that in Programme 3A.
A major issue in the interpretation of longitudinal spirometry data in worker monitoring programmes is the healthcare providers’ uncertainty about spirometry data quality. Monitoring of longitudinal data precision over time using the pair-wise estimate of within-person variation helps to improve: (1) the quality of data collected by the programme and (2) the precision with which individual workers with excessive rate of decline are identified.
Identifying temporal changes in FEV1 variation can lead to timely corrective actions on a programme’s level. For example, in Programme 3, a change of a spirometer in 2001 was associated with a substantial increase in data variability (fig 1) due to systematic procedural and equipment errors. The increased data variability was identified only in 2004, when the group within-person variability monitoring began. Assessment of spirometry quality identified major problems with the testing instrument itself (incorrect sensors functioning as a result of moisture accumulation) and incorrect use of the spirometer due to lack of technician training in using the instrument. Earlier recognition of this increased variation could have facilitated a more timely intervention.
Our study shows that using limits of annual decline that reflect within-person variability in the FEV1 measurements facilitates improved precision of interpretation of annual declines in FEV1 in an individual, especially during the early stages of follow-up. The longitudinal limit for an annual decline predicts the 95th percentile cut-off point for observed annual changes and thereby identifies 5% of individuals with excessive declines. This approach facilitates quality control on an individual basis, as it helps to identify individuals for whom spirometry quality control and/or respiratory conditions may need further investigation, or those who should have more frequent testing. Figure 2 provides guidance for selecting absolute annual declines that should be considered excessive depending on the existing data within-person variation. (The absolute values are mostly restricted to Caucasian populations.) For example, in Programme 3A, an annual decline greater than 380 ml/year and 320 ml/year can be considered excessive in males and females, respectively. However, after 1999 only respective declines greater than 560 ml/year and 490 ml/year can be considered excessive in Programme 3B (table 2). This example demonstrates the impact of data precision on what decline can be detected as excessive. More precise early warning is especially important in workplace situations where occupational exposure has been shown to be associated with rapid excessive loss of lung function that can lead to disabling respiratory disease, as, for example, in a study of popcorn workers.19
Comparing the absolute and relative limits, our study shows, however, that the relative limit LLDr has more general validity and is more practical, as it adjusts for FEV1 size. Table 3 and figure 3 demonstrate that the relative limit LLDr effectively adjusts for gender differences, which turned out to be mainly due to FEV1 size; male and female groups thus could use the same longitudinal limit. The LLDr also adjusts for differences in mean FEV1 size among different population groups as indicated by increased relative within-person variability in Programme 4, which was found to be mainly due to increased BHR.
The study also shows that it is possible for workplace monitoring programmes conducted by trained technicians using ATS/ERS standards7 to achieve good data precision corresponding to LLDr of about 10%. This corresponds to average relative within-person variation s̄r of about 4.0% (fig 3, programme P3A). This degree of precision allows identification of a “true” rapid decliner of 90 ml/year after about five years of follow-up.9 A recent study of spirometry monitoring data from a chemical plant also reports the 95th percentile for annual decline to be 10.4% for males and 10.6% for females.20 For the cases with early COPD who did not have BHR, the limit of annual decline was also around 10% (table 3). A previous report on the Lung Health Study (Programme 4), reported 95% of differences between FEV1 measurements taken 21 days apart to be within 240 ml for females and within 320 ml for males.21 We estimated 95% of the annual declines for Programme 4 to be within 280 ml for females and 370 ml for males (table 2). Similarly, results from a cohort of 389 blue-collar male workers with good quality spirometry data reported a yearly decline in FEV1 of 8% or 330 ml for healthy workers and good quality spirometry data.22 Our study provides a simple general statistical framework through which these published results could be interpreted.
However, unlike our study, a study investigating short-term (less than 3 months) changes in FEV1 in patients with COPD recommended using an absolute limit of 225 ml, irrespective of the baseline level of FEV1.23 The authors noted that the absolute difference in FEV1 between two spirometry sessions did not vary with the baseline FEV1. We found that FEV1 variability and the absolute longitudinal limit LLDa were related to gender, data precision, and to the lower mean FEV1 size in older individuals with early COPD, and that the relative limit adjusts for these differences. The authors23 suggested as an alternative approach to accept a per cent difference greater than 10% with an absolute change of at least 150 ml as a short-term limit of decline. This result is in agreement with findings from our study. Nevertheless, we show that this limit may not be appropriate for individuals with BHR or asthma, or in monitoring programmes with poor spirometry quality (table 4).
Our study shows that clinical conditions such as airway hyper-responsiveness and asthma increase FEV1 variability. Another study found that week-to-week variability in per cent change was greater in adult patients with asthma and COPD than in normal adults, and recommended that a significant change was 12% in normal individuals and 23% in adults with obstructive disease.24 These results support using the ATS criterion for annual FEV1 decline of 15% for individuals with asthma or COPD with increased BHR in clinical practice. However, in workplace monitoring programmes, where the number of individuals with asthma and COPD is usually small, those individuals should have the same stringent LLDr criteria as that applied to all the workers in the programme.
A limitation of the proposed method is that in situations where occupational exposure is associated with increased excessive decline in lung function and/or increased within-person variability, the derived limit may increase as a result of these two effects. As the excessive decline is unlikely to affect a large proportion of the workforce, its impact on within-person variation is likely to be relatively small in comparison to the effect of measurement errors (Appendix). However, excessive within-person variability can only be identified if it is monitored, and the causes of increased within-person variability investigated. This study provides data on acceptable within-person variability values, and the proposed method will be more sensitive than the limit based on the 15% only if the relative pair-wise within-person variation s̄r is less than 6.5% (fig 3). Some systematic error resulting from aging may have inflated our estimates of the within-person standard deviation, but in a preliminary analysis we found the effect from this factor can be ignored for practical purposes. Although the number of measurements repeated on each individual within 14 months varied by the programmes, this effect was unlikely to significantly affect the precision of the estimated pair-wise within-person variance since the yearly values shown in figure 1 were consistent.
In conclusion, we show the following. (1) The limits of longitudinal decline (LLDa and LLDr) method is valid and practical for evaluating annual changes in FEV1 for efficient detection of persons with excessive decline or for detection of measurement errors. (2) Assessment of longitudinal data precision using the pair-wise estimate of within-person variation, sp or sr, can help to identify extraneous sources of variability in FEV1 soon after they arise in a group, and implement interventions. (3) The LLDr for annual decline should be about 10% or less for good quality workplace monitoring programmes; the ATS recommended 15% criterion appears excessive. (4) For individuals with airways disease associated with BHR (that is, asthma or COPD), the ATS-recommended 15% criterion for annual decline appears appropriate for clinical practice. Use of the longitudinal limit methods described here enables: (1) to take into consideration data quality when interpreting the longitudinal data, and (2) to identify longitudinal data that are of substandard quality so that intervention on an individual and programme level can be taken to improve data quality. Based on the described method, we have developed computer software for the analysis and interpretation of individual subjects’ longitudinal spirometry data; this software is available on request from the first author.
The authors would like to express appreciation to the Phoenix Fire Department Health Center for their cooperation on the study and to Drs Lee Petsonk, Michael Attfield and Robert Castellan for their helpful comments.
DATA PRECISION OF THE MONITORING PROGRAMMES
The pair-wise estimate of within-person standard deviation was calculated as
for each year of follow-up on the n individuals who had two consecutive measurements FEV11i and FEV12i done within about 12 months (maximum range 14 months). The date of the first test determined the assigned year of follow-up, and only one comparison per person per year was used. To avoid systematic effect of age on FEV1 change, the interval should be relatively short, but still long enough to reflect most potential sources of within-person variability. We used the interval of 12 months for practical reasons. For practical purposes we also neglected the effect that the average decline within the 12 months has on the sp estimate. Theoretically, sp will overestimate the “true” within-person variation by the average rate of decline b in a group by approximately √(b2/2). Thus, for average slopes of 30 ml/year (healthy never-smokers), 60 ml/year (current smokers) and 90 ml/year (an extreme situation), sp will be inflated by about 21, 42 and 64 ml, respectively. The differences from the 30 ml/year slope are 21 and 43 ml for the 60 and 90 ml/year average slopes, respectively. In most workplace monitoring situations, the fraction of workers with a “true” slope greater than 90 ml/year is usually relatively small. We excluded also extreme year-to-year outliers |ΔFEV1|>|1.7| l, which constituted less than 0.1% of the ΔFEV1 values overall; the few extreme outliers can cause large deviations in sp values and misrepresent the average within-person variation.
The programme- and gender-specific average within-person variation s̄p was then calculated by calculating the sp values over all years of follow-up. These average s̄p values agreed well with the within-person standard deviation estimated by the mixed-effects model.
The pair-wise estimate of the relative within-person standard deviation, which adjusts for the individual’s FEV1 size, was defined as21
and was calculated together with its programme- and gender-specific averages, s̄r, using the same method as that for the calculation of s̄p.
Competing interests: None declared.
Disclaimer: The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the National Institute for Occupational Safety and Health.
- American College of Occupational and Environmental Medicine
- American Thoracic Society
- bronchial hyper-reactivity
- chronic obstructive pulmonary disease
- European Respiratory Society
- forced expiratory volume in 1 second
- limit of longitudinal decline
- lower limit of normal
- National Institute for Occupational Safety and Health