Article Text

## Abstract

**OBJECTIVES** To estimate lung function prediction equations and to identify appropriate normal reference values for the population of about 250 000 of South African gold miners.

**METHODS** Data from a lung function screening programme conducted at a large South African gold mine from 1994 to 1998 were used to estimate the lung function prediction equations. The most reliable period of lung function testing was identified in a previous study of a temporal pattern in reliability, and lung function tests from this period were used. Miners with a history of pulmonary tuberculosis or with radiological abnormalities were excluded from the study. The prediction equations were estimated cross sectionally on 15 772 black and 2752 white miners, and published reference equations that fitted most closely the observed data were identified.

**RESULTS** The estimated prediction equations for forced vital capacity (FVC) are as follows: for black men, FVC (l)=− 2.901−0.025×age+4.655×height; and for white men, FVC(l)=−4.407−0.036×age+ 5.940×height. For forced expiratory volume in one second (FEV_{1}) these equations are: for black men, FEV_{1}(l)=−1.654− 0.30×age+3.665×height; and for white men, FEV_{1}(l)= −2.341− 0.038×age+4.314×height. Units are years for age and metres for height. Knudson's and the European Community of Coal and Steel (ECCS) reference values provided the closest fit to the data on lung function of white miners, but the lower limits of normal from the ECCS equations were the closest to the observed one sided lower 95% confidence intervals (95% CIs). For black miners, reference equations that fitted best were derived by Louw *et al*on asymptomatic black South African men unexposed to occupational dust. There were significant differences between the two groups of miners in the estimated height adjusted mean lung function values for a 40 year old 1.7 m tall man (220 ml (5.2%) for FVC and 110 ml (3.2%) for FEV_{1}); white men had higher FVC and FEV_{1}, but lower FEV_{1}/FVC ratio. The ECCS reference values scaled by a conversion factor of 0.93 for the FVC and 0.95 for the FEV_{1}provided close fits to the data for black miners, but the rate of decline with age was higher than that in the observed data. None of the linear equations provided a good fit for the 20–29 and more than 55 years old age categories.

**CONCLUSION** The ECCS and Knudson equations provided the best fit to the data for white miners, whereas the equations by Louw *et al*estimated on asymptomatic black South African bank workers provided the best fit to the data for black miners. The ECCS reference values scaled by a factor of 0.93 for FVC and by 0.95 for FEV_{1} provided close fits, but the rate of decline with age was higher than that in the data for black miners.

- silica dust
- miners
- pulmonary function reference equations

## Statistics from Altmetric.com

The availability of appropriate normal reference values is fundamental to a workplace lung function monitoring programme. Good reference values allow for an early detection of accelerated loss of pulmonary function, for assessment of fitness to work in certain job categories, to wear respirators, and for the diagnosis of a compensational disease, among other purposes.1

The South African gold mining industry currently employs about 250 000 miners. Because of exposure to dust with a high concentration of crystalline silica, the miners are at risk of developing respiratory diseases—such as silicosis, chronic obstructive lung disease, and pulmonary tuberculosis. Also, there is an extensive population of ex-miners who may be eligible for compensation due to existing lung diseases. Legislation for screening of pulmonary function in all miners has been introduced in South Africa only recently.1However, there is uncertainty as to the most appropriate reference values for assessing the miners. The European Community for Coal and Steel (ECCS) reference values2 have been suggested for general use,3 but their suitability, particularly for black miners, has never been evaluated. The goal of this study was to identify reference equations appropriate for the large population of gold miners who are predominantly black, but with a significant white subpopulation. The approach adopted was to estimate prediction curves for lung function measurement obtained on a healthy subset of current black and white gold miners and to evaluate how well published reference equations4 5 compare with the estimated prediction curves.

## Material and methods

### STUDY POPULATION

Miners from a large South African gold mining company who had spirometry performed routinely at entry into the industry (initial examination), periodically at 3 year intervals (periodic examination), and on leaving the company (exit examination) comprised the study population. The lung function monitoring programme started in May 1994 and served a population of 71 515 miners in 1994, which decreased to 43 359 miners by 1998.6

### SPIROMETRY MEASUREMENTS

Maximal forced expiratory manoeuvres were recorded with a Hans Rudolph pneumotachograph (Flowscan, Electromedical Systems). The system software required and validated calibration with a 3 litre syringe and allowed keyboard entry of barometric pressure and ambient temperature for the correction to body temperature, pressure, and saturation (BTPS) conditions. Calibration was done 3–4 times a day. During testing, flow versus volume tracings were displayed. A minimum of three acceptable and reproducible forced expiratory manoeuvres are obtained according to the standards recommended by the American Thoracic Society (ATS).7 All testing was done by nursing personnel with a college diploma specialisation in spirometry testing and trained in the techniques of performing spirometry to ATS standards. Height was measured to the nearest centimetre (without shoes). Data recorded for each test included the date of test, date of birth, height, weight, forced vital capacity (FVC), forced expiratory volume in one second (FEV_{1}), and forced expiratory flow at 25%–75% of forced vital capacity (FEF_{25-75%}).

### RELIABILITY OF THE LUNG FUNCTION DATA

During the period May 1994 to March 1998, 113 120 spirograms were recorded in a computerised database. In a previous study6 we have examined reliability of the spirograms over time, and in the present study we used data from the most reliable period from January 1995 to August 1996 for which the average coefficient of reliability, G, was 0.93 (the size of random error of measurement was 7%). In total, 45 053 tests were done during the reliable period.6 Of these, 36 777 (31 108 on black, and 3270 on white miners) were on miners for whom a complete history of pulmonary tuberculosis and chest radiology was available. The remaining tests were from contract workers with incomplete information. From the 31 108 tests on black miners, we excluded 3905 subjects with a history of pulmonary tuberculosis and 1485 subjects with radiological changes (pulmonary tuberculosis, pneumoconiosis, pneumonia, cardiomegaly, pleural changes, etc; there was an overlap between cases). (All cases of pulmonary tuberculosis were computerised from 1979 to 1998. The yearly radiological screening was done on minature 100×100 mm chest radiographs, and subjects with abnormalities were recorded cross sectionally from 1994 to 1998.) Of the remaining 26 024 tests, we excluded 71 miners whose age was less than 20 years or more than 63 years, and 255 miners (0.98%) and 84 subjects whose lung function or height measurements were outside the 99.98% confidence interval, respectively. Of the remaining 25 614 tests, 1374 were initial, 15 772 periodic, 8267 exit, and 201 were done for other reasons. Of the 3270 tests done on white miners, we excluded two tests because of a history of pulmonary tuberculosis, 20 tests because of radiological changes, and 39 (1.2%) tests were excluded because lung function or height measurements were outside the 99.98% CI. Of the remaining 3248 tests, 158 were initial, 2752 periodic, 174 exit, and 59 were done for other reasons.

In a preliminary analysis, the initial examinations showed a slightly decreased predicted curve for FEV_{1}. For a 40 year old and 1.7 m tall black man, the decrease was 50 ml. As the number of periodic examinations was large, we used these exclusively as they would provide the most representative sample of current miners. Measurements from both the initial and exit examinations may be potentially lower because of a learning effect and a selection process, respectively. Finally, the prediction equations were estimated on periodic lung function tests from 15 772 black and 2752 white miners. Because periodic tests are done at 3 yearly intervals, each miner had one test only.

### STATISTICAL ANALYSIS

The prediction equations were estimated separately for black and white miners from the linear regression model8 Y=b_{0}+b_{1}×age+b_{2}×height+b_{3}×weight+b_{4}×age× height+b_{5}×age ^{2} (1)

where Y is the predicted lung function value for a miner of a given age and height, and b_{i} , i=0,1, . . .,5 are the regression coefficients. Only the variables that contributed significantly to the*R ^{2}
* statistic were retained in the final model. The equations of the final model as well as the published reference equations4
5 were then used to predict lung function values of 1.7 m tall men of different ages. The resulting curves were compared with the mean observed lung function values standardised to the average height of black miners of 1.7 m. To allow the comparison, the lung function measurements were standardised to the height of 1.7 m as lung function measurement×1.7

^{2}/ht

^{2}. The standardisation for height allowed for comparison of lung function measurements from populations with different mean height (fig 5). Selection of the best fitting reference equations was done by a visual inspection with this method. As most of the published curves did not fit well, a visual inspection was sufficient to identify the best fitting curves.

In the next step, we compared the percentage of miners that fell below the lower limit of normal (LL) for the different reference equations. The LL refers to the one sided lower 95% CI calculated as LL=P_{(age,1.7)}−1.645×SEE, where P_{(age,1.7)}is the age and height specific predicted value, and SEE is the standard error of the estimate of the regression line.4
9 For a homogenous population and a normally distributed variable, it is expected that 5% of the subjects will have lung function values below the predicted LL. For a reference equation to be applicable to a specific healthy population, the percentage below the reference LL should also be around 5%, otherwise too many or too few subjects will be categorised as abnormal.

In the data on miners there were known systematic effects that could have increased the percentage of miners that fell below the LL. These were age below 30 years or above 55 years (see the observed data points in figs 2 and 3) and changes in lung function due to smoking and exposure to dust. These changes usually start to become evident pathologically at around 35–40 years of age—for example, emphysema, pneumoconiosis. Thus, to minimise the systematic effects of age and the adverse effects of environmental exposures, we used a cross section of 30–35 year olds to compare the reference equations for the percentage of miners that are below the LL. The number of miners in the 30–35 age category was large (4503 black and 620 white miners) and the frequency distribution was close to normal. The percentage that were below the LL was derived from the standardised normal deviate,*z*, calculated as the difference between the observed mean and the LL of normal derived for each equation, divided by the observed SD, and the corresponding cumulative probability distribution table values subtracted from one.10

The percentage predicted values calculated as the ratio between observed and predicted lung function, which corresponded with a one sided 95% CI and the 5th percentile, were also determined for 10 year age categories. The 5th percentile of a distribution of lung function value is the lung function value such that 5% of subjects have lower values and 95% of the subjects have higher values. The 5th percentile is estimated by a non-parametric method from order statistics.

## Results

Figure 1 shows the age distribution of the miners in the study. Table 1 gives the mean lung function and the estimated prediction equations (regression coefficients (SEs)). Because most of the subjects were 30–45 years of age, the contributions of the age2 and height×age interaction to the variation explained by the model were small (the additional*R*
^{2} of 0.0002 and 0.0004, respectively, for FEV_{1}). Thus, the final prediction equations included age and height terms only.

For the black miners, figure 2 shows the observed height standardised lung function means for FVC, FEV_{1}, and FEV_{1}%, and the two sided 95% CI, plotted against age. The predicted curve and the best fitting reference curves, and their respective LLs are shown for each lung function test. The reference curve of Louw*at al*
5 (measurements done by a vitalograph) provided the best fit to the data, whereas the ECCS curves were too high. The ECCS reference values multiplied by a conversion factor (CF) of 0.93 for FVC and 0.95 for FEV_{1} were close to the predicted curves. For the white miners, figure 3 shows the observed mean values and the reference equations of the ECCS and Knudson*et al*
9 which provided the best fit to the data.

Figure 2 indicates that the percentage of black miners that falls below the LL of the ECCS curve is large for younger age groups, but decreases for older age groups. However, a somewhat different trend is found for white miners (figure 3), where the observed means start to decline after 40 years of age more rapidly than the ECCS and Knudson's predictions. To determine the percentage of subjects that were below the LLs, we used the cross section of 30−35 year old miners, because for these miners the observed means corresponded closely with the reference curves (figs 2 and 3) and this subgroup is least likely to be affected by systematic effects already discussed. The percentages that were below the LLs for the different reference curves for the 30–35 year olds are shown in figure 4. Figure 4 shows the frequency distribution for the height standardised FVC, FEV_{1}, and FEV_{1}%. Superimposed over each observed frequency distribution is a normal curve derived from the observed mean (SDs). The normal curves derived from the means predicted by the equations from the ECCS and Louw *et al* and from their respective SEE values are also shown in figure 4. (The SEE for the curve of Louw *et al* was available only for models in which the ratio of sitting over standing height was fitted. As the SEE was similar to the models which used standing height only, we used this SEE value.) The percentages below the LLs are shown in table 2. Table 2 presents, for the 30–35 year olds, the descriptive statistics for the observed lung function and the LLs, and the percentage of subjects below the LLs, for each equation. Louw*et al*
5 did not provide a reference equation for the FEV_{1}%.

The first part of table 3 presents descriptive statistics for the percentage predicted lung function values. The second part of table 3presents the percentage predicted values that were derived from our prediction equation, the equations of Louw *et al* , and ECCS equations, and that corresponded with the 5th percentile and with the one sided lower 95% CI for 10 year age categories, for black miners only.

## Discussion

The objective of the study was to identify reference equations appropriate for the large population of South African gold miners who are predominantly black, but with a notable white subpopulation. The question also arises whether a common reference equation can be used for all the miners irrespective of race, bearing in mind that the reference curves could serve various purposes—such as assessment of fitness to work in certain job categories, diagnosis of compensable disease or an early detection of accelerated loss of lung function.

We estimated prediction equations from the data obtained in a lung function screening programme and compared these with published reference values. There were large differences in the observed mean lung function values between white and black miners. These were 860 ml (17.3%) for FVC, 630 ml (14.9%) for FEV_{1}, −180 ml (−6.8%) for FEF_{25-75%}, and −2.38% (−2.9%) for FEV_{1}% (table 1). However, after adjustment for age and height, the differences for a 40 year old 1.7m tall man decreased substantially to 220 ml (5.2%) for FVC, 110 ml (3.2%) for FEV_{1}, −160 ml (−4.3%) for FEF_{25-75%}, and −1.6% (−2.0%) for FEV_{1}%. The results show that height is responsible for a large percentage of the observed difference, however, the remaining differences after adjustment for age and height were still significant.

Of the reference equations published by ATS,4
5 the ECCS and Knudson equations corresponded most closely with the prediction curves for white miners (fig 3). However, the observed data showed a steeper decline from about 35 years of age. This may be due to the effect of smoking and exposure to dust previously documented in white gold miners.11 The observed FEV_{1}% values were higher, however, than the ECCS values. The percentage of miners that were below the LLs of the ECCS and Knudson equations were similar and varied around 5%. For the 30–35 year olds, the percentage of subjects below the LL for the ECCS equation was 5.3% for FVC and 6.6% for FEV_{1}, whereas for the Knudson's equation it was 3.0% and 4.6%, respectively (table 4).

For black miners, the reference equations that fitted best were those derived for black South African men by Louw *at al*
5 with a vitalograph (fig 2). The ECCS reference equations are commonly used in South Africa for all subjects, either with an ethnic conversion factor of 0.88 for black men (as recommended by ATS4), or without a conversion factor. The data from the present study show that the ECCS reference values are much higher than the observed lung function values, the difference being almost 500 ml at 20 years of age for FVC (figure 2). The percentages below the LLs for equations of Louw and the ECCS, shown in figure 3 and table 2, are also substantially different. For the Louw equation, the percentage rejected was 6.6% for FVC and 3.8% for FEV_{1}, whereas for the ECCS equation the corresponding values were 14.9% and 12.5%. The ECCS reference values scaled by a factor of 0.93 for FVC and 0.95 for FEV_{1} provided the closest fit to the observed data and the predicted curves (fig 1). The scaling down by 8% for FVC and 5% for FEV_{1}, was higher than the differences found for the age and height adjusted lung function means which were 5.2% for FVC and 3.2% for FEV_{1.} This reflects the fact that white miners had lower observed values than the ECCS curves because of a steeper decline with age.

The differences in the regression coefficient for age in table 1 show a steeper decline with age in white miners, especially for FVC. The predicted curves for FVC and FEV_{1}, and the LLs were parallel with the reference curves of Louw *et al* derived from asymptomatic, healthy non-smokers (fig 2), but the white workers had a steeper slope than the ECCS curve. One of the reasons for these differences may be that fewer black miners smoke, and if they do so, then they smoke fewer cigarettes (five cigarettes a day) than white miners (over 20 cigarettes a day). It is interesting to note that for FVC, the curve of Louw *et al* as well as the predicted curve came closer to the ECCS curve (fig 2) with increasing age. This reflects the steeper slope with age for the ECCS curves. The reason for the steeper slope for the ECCS curve is not clear.

The results also show that black miners have higher FEV_{1}%, but slightly higher decline in FEV_{1}%, compared with the ECCS curve (fig 2). For 30–35 year old black miners, the observed mean FEV_{1} % was 85.5%, the observed one sided 95% CI was 75.5%, and the LL derived from all data was 74.7% (table 2). By comparison, the predicted FEV_{1}% mean from the ECCS equation was 80.4% and the LL derived from the ECCS equation was 71.7%. For 30–35 year old white miners, the mean observed FEV_{1}% was 82.5% and the observed one sided lower 95% CI was 73.0% (table 2).

When compared with other published studies, the prediction equations for black gold miners agree most closely with those found in other studies of South African black men occupationally exposed to dust (table 4).12
13 For FVC, the predicted values for a 38 year old and 1.7 m tall male gold miner were similar to those found for textile workers,12 bank workers,5 and other industrial workers.12-14 This result suggests that the high ratio found in this study may be due to a real effect. However, the data in table 4 also suggest that the groups exposed to dust may have lower FEV_{1} values than the unexposed subjects, and that reference values estimated on healthy South African black men not exposed to dust should preferably be used for the miners.

By comparison with our study and other studies of black workers, a most recent study of black university workers from Johannesburg, South Africa,13 found similar FEV_{1} values, but the FVC values were systematically higher than those predicted by our and other studies; the estimated increase in FVC was substantial and ranged from 120 ml at 20 years to 200 ml at 60 years of age. The explanation provided for the higher FVC was the changing socioeconomic status of black workers in South Africa. The socioeconomic factors associated with FVC in the study were job category and education. The socioeconomic factors were not associated with FEV_{1}. Smoking was not found to be a risk factor for FEV_{1} or FVC. The effect of socioeconomic factors is also apparent from a secular trend found in FVC measured in different studies of urban black workers done from the 1970s to the 1990s, which ranged from 3.72–3.98 to 4.24–4.42, respectively.13 In our study, we found a strong relation between height and age for black and white miners (fig5), although some of it was due to aging, there was an apparent cohort effect. Changing socioeconomic status, which includes nutrition from birth that results in increased height, stronger build, and lower respiratory morbidity, must play an important part on the respiratory variables of the black South African population15 whose socioeconomic status is generally improving with urbanisation and education. To take cognisance of that trend, estimation of reference values should be repeated every decade or so in accordance with the recommendations made by the ATS.

Based on these results and given the current circumstances, it seems that the available options are to use the ECCS equations on all miners, with a scaling factor for the black miners, or to use the ECCS equations on white miners and the equations of Louw*et al* on black miners. The curves of Louw*et al* provided an excellent fit to the data, whereas the ECCS curves were too steep with age. From an epidemiological view, the use of the ECCS reference equations without any scaling for all the miners is not recommended. Although the difference of 8% for FVC represents only about 300 ml, which seems to be insignificant in terms of a man's variability in lung function, the difference becomes important if applied systematically to many miners for selections—such as job placement and compensation. Future developments in reference values should account for the curvature in the observed data with age (figs 2 and 3) and the changing socioeconomic status of the black population.

These statistical considerations assume that the distribution of lung function measurements is normal. The skewness statistics in table 2indicate that the distribution for FVC has a tail to the right, the distribution for FEV_{1} is normal, and the distribution for FEV_{1}% has a tail to the left. This pattern is similar for black and white miners. Because of the skewness and variability in the distribution of the percentage predicted, the 5th percentile and 80% predicted were suggested as preferable criteria for the LL,9 and are often used in clinical practice.

The use of 80% predicted as a criterion for abnormal is not recommended, for adults by the ATS.4 Although some studies have shown that 80% of predicted is close to the fifth percentile,9 other shortcomings inherent in the use of a fixed value of percentage predicted are that shorter, older subjects are more readily classified as abnormal, whereas taller, younger adult subjects are more likely to be erroneously classified as normal.4 In South Africa, it has been suggested that the grading of lung function impairment for compensation purposes should be based on percentage of predicted derived from the ECCS reference values.3 The percentage predicted categories of impairment for FEV_{1} and FVC were as follows: normal ⩾80%, 79%–65% for mildly impaired, 64%–51% for moderately impaired, and <51% for severely impaired. Here, we examined the percentage predicted values corresponding with the 5th percentile and the lower 95% CIs, for different reference equations, according to age categories.

For FVC and FEV_{1}, the lower 95% CIs (or the LLs) for the percentage predicted corresponded with 80% for values derived from the prediction equation and the equation of Louw *et al*, but was lower for the ECCS equation (table 3). The corresponding percentage predicted based on the prediction equation started to decrease below 80% at 50 years of age for FVC and at 40 years of age for FEV_{1}. For FEV_{1}%, the corresponding percentage predicted started at around 89% for the youngest age category, and decreased to around 83% for the oldest age category. There was no important difference between the LL calculated as the 5th percentile or the 95% CI. The percentage predicted values corresponding to the LL depend on how well the reference equation fits the actual data, but even for the best fitting linear curve, the values were age related. Because subjects older than 50 years of age have a steeper decline in lung function than the predicted values based on a linear regression curve, the 80% criterion might not be appropriate for the older subjects.

Figure 5 shows strong relations between height and age, which is likely to be due to a cohort effect. Correlation between predictor variables (age and height) violates the linear regression assumption of independently distributed explanatory variables and could lead to biased estimates for age and height, in particular an overestimated effect of age. In such a case, the variable for age may not be suitable for predicting longitudinal loss of lung function with years of age.

In summary, of the reviewed reference equations,5 the ECCS and Knudson equations best fitted the data for white miners. Although the two reference curves were similar with respect to the percentage of miners below the LL, the LLs derived from the ECCS equations were slightly better in terms of rejecting 5% of the subjects. For black miners, the reference equation of Louw *et al*estimated on healthy South African men fitted best, whereas the ECCS reference equations overestimated the percentage of subjects that were below the LL by 10% for FVC and 8% for FEV_{1}. The ECCS reference values scaled by a factor of 0.93 for FVC and 0.95 for FEV_{1} provided a reasonably good fit to the data from black miners, however, the slope for FVC was too steep. The ECCS equations, with a conversion factor 0.88 for the black miners, are available on most spirometers sold in South Africa. However, the results of the present study show that the correction factor of 0.88 is too low and also that the ECCS curve for FVC is too steep to fit well.

Because of the strong correlation between height and age, the estimated regression variables for age may not be suitable for estimating the longitudinal loss due to years of age. The steep trend between height and age indicate that there is a strong cohort effect, and this supports the ATS recommendations that estimation of reference values should be repeated every decade or so.

## Acknowledgments

We thank the mines for letting us use the lung function screening data and Ms Tanusha Singh for her computing support. The study received partial support from the Safety in Mining Research Advisory Committee (SIMRAC).