Article Text

Download PDFPDF

Exposure–response relationship between lung cancer and polycyclic aromatic hydrocarbons (PAHs)
  1. B G Armstrong1,
  2. G Gibbs2,3
  1. 1
    London School of Hygiene and Tropical Medicine, London, UK
  2. 2
    Safety Health Environmental International Consultants, Devon, Alberta, Canada
  3. 3
    Department of Medicine, University of Alberta, Edmonton, Alberta, Canada
  1. Correspondence to Ben G Armstrong, London School of Hygiene and Tropical Medicine, Keppel St, London WC1E 7HT, UK; ben.armstrong{at}


Objectives: To estimate the exposure–response function associating polycyclic aromatic hydrocarbon (PAH) exposure and lung cancer, with consideration of smoking.

Methods: Mortality, occupational exposure and smoking histories were ascertained for a cohort of 16 431 persons (15 703 men and 728 women) who had worked in one of four aluminium smelters in Quebec from 1950 to 1999. A variety of exposure–response functions were fitted to the cohort data using generalised relative risk models.

Results: In 677 lung cancer cases there was a clear trend of increasing risk with increasing cumulative exposure to PAH measured as benzo(a)pyrene (BaP). A linear model predicted a relative risk of 1.35 (95% CI 1.22 to 1.51) at 100 μg/m−3 BaP years, but there was a significant departure from linearity in the direction of decreasing slope with increasing exposures. Among the models tried, the best fitting were a two-knot cubic spline and a power curve (RR = (1+bx)p), the latter predicting a relative risk of 2.68 at 100 μg/m−3 BaP years. Additive models and multiplicative models for combining risks from occupational PAH and smoking fitted almost equally well, with a slight advantage to the additive.

Conclusion: Despite the large cohort with long follow-up, the shape of the exposure–response function and the mode of combination of risks due to occupational PAH and smoking remains uncertain. If a linear exposure–response function is assumed, the estimated slope is broadly in line with the estimate from a previous follow-up of the same cohort, and somewhat higher than the average found in a recent meta-analysis of lung cancer studies.

Statistics from

Airborne polycyclic aromatic hydrocarbons (PAHs), which are emitted when organic matter is burned, are ubiquitous in the occupational and general environment. It has long been known that several PAHs can produce cancers in experimental animals, and epidemiological studies of exposed workers, especially in coke ovens and aluminium smelters, have shown clear excesses of lung cancer, and highly suggestive excesses of bladder cancer.1 2 3 4 5 6 7 8 9 10 11 Although the existence of a cancer risk is beyond reasonable doubt, there is considerable uncertainty as to the exposure–response relationship, and hence as to the risks posed at today’s levels in the workplace and general environment. Information on this relationship is clearly important to inform the setting of occupational and environmental standards.

What this paper adds

  • Polycyclic aromatic hydrocarbons (PAHs) cause lung cancer, but quantification of the relationship is uncertain.

  • Data from a large cohort of aluminium smelter workers showed clear evidence of increasing lung cancer risk with increasing cumulative exposure to PAHs, allowing for the effects of smoking.

  • A linear model predicted a relative risk of 1.35 (95% CI 1.22 to 1.51) at 100 μg/m−3 BaP years, although there was evidence for departure from linearity in the direction of decreasing slope with increasing exposures.

Estimating exposure–response relationships by extrapolation from animal studies is possible,12 13 but the limitation of this approach, in particular as regards species differences, makes sole reliance on it problematic. Previous estimates of exposure–response (PAH measured as benzo(a)pyrene, BaP) from epidemiological data include some using data from a large cohort of coke-oven workers in the USA followed since the 1960s,14 15 which has also been used to estimate risk per unit residential exposure.16 17 A meta-analysis of 39 studies extracted or estimated linear exposure–response slopes from each study. This found broad agreement in slopes in coke ovens and aluminium smelters, but heterogeneity otherwise. Only four of these studies had adjusted for smoking.

Several studies of mortality and cancer incidence have been published of substantially overlapping cohorts of Quebec aluminium smelter workers.1 11 18 19 20 21 These have included an investigation of exposure–response relationships for lung1 and bladder cancer.21 The mortality cohort studied by Gibbs and Horowitz18 has recently been followed up for a further decade and extended, new exposure estimates made22 and basic results published.23 24 25 Here we report estimates of exposure–response relationships for lung cancer using data from this updated cohort of Quebec aluminium workers.


Details of methods for identification and follow-up of the cohort, and for estimation of individual exposure and smoking histories have been published.22 23 25 Here we give a summary.

The cohort comprised 16 431 persons: 5977 men working at one very large and two smaller aluminium smelters in Quebec, Canada on 1 January 1950 (1951 for one small plant), and 9726 men and 728 women who started at any time and had completed 1 year’s employment before the end of 1989. Mortality was followed up until the end of 1999. A job-exposure matrix was constructed comprising estimates of exposure to BaP for each job, allowing for changes over time.22 This was combined with work histories to give estimates of cumulative exposure to BaP in μg/m3 for each year of employment. The mortality and exposure history data were then analysed in the first place using a subject-years approach drawing on age-time specific reference rates from Quebec province, using the OCMAP computer program.26

Smoking data were compiled from medical records and previous research. While considerable data were available, they were not considered adequate to reliably construct full lifetime smoking histories for all cohort members. Workers were classified as ever smokers, never smokers and unknown smokers. Ever smokers were further classified by number of cigarettes they reported smoking per day.

For the analyses reported here, we worked with a data matrix comprising lung cancer deaths, person-years and expected deaths classified by grouped cumulative exposure to BaP (lower bin boundaries: 0, 0.0000001, 20, 40, 80, 160 and 320 μg/m3 years), calendar time and age (5-year intervals), time from first exposure (boundaries: 0, 10, 20, 30, 40, 50, 60 years), decade of starting employment (1940s, 1950s, 1960s, 1970s and 1980s) and cigarette smoking history (boundaries 0, 1, 21, 41, 61 cigarettes/day and unknown). This was output from the OCMAP program. We excluded deaths (n = 1) and person-years occurring in the first 10 years from first employment, on the grounds that no risk from exposure would be likely in that time.

We analysed the data using relative risk Poisson regression models. Exploratory analyses (details not shown) compared results controlling for confounding by age and calendar time effects by offsetting the log of expected deaths with those using wholly internal adjustment using age and calendar time model terms. These showed no evidence that using expected deaths distorted comparison of risk within the cohort, so for simplicity and improved precision we concentrate on the models offsetting expected deaths. We concentrate on clarifying the shape of the association, and the manner by which risks due to cigarette smoking and occupational PAH exposure combine. Primary models used are described formally in the text box, and informally in the text below.

Box 1 Primary models used

Poisson model for data row with d observed and e expected deaths, cumulative exposure x and smoking s:

  • d∼Poisson(μ)

  • μ = e.RR(x,s)

Models of the occupational BaP–cancer relationship (initially ignoring smoking):

  • Grouped: RR(x) = RRi; i = 2−7, RR1 = 1

  • Exponential: RR(x) = exp(bex)

  • Linear: RR = 1+b1x

  • Natural cubic spline:

    • One interior knot at median

    • One interior knot by maximum likelihood

    • Two interior knots at tertiles

    • Two interior knots by maximum likelihood

  • Power: RR(x) = (1+bpx)p

Models for the combined effect of BaP and smoking (groups s):

  • Multiplicative: RR(x,s) = RR(x).RR(s)

  • Additive excess: RR(x,s) = 1+(RR(x)−1)+(RR(s)−1)


  • RR(x) = grouped, exponential, linear or two-knot (maximum likelihood) spline

  • RR(s) = 1 for never-smokers (RR1) and RRs for s = 2−6.

where x is BaP in μg/m3 years (mid-point of group interval) and s is smoking (1 = never, 2 = 1−19, 3 = 20−39, 4 = 40−59, 5 = 60− cigarettes/day, 6 = unknown).

The Poisson model is standard for exploring the dependence of cancer or other disease rates on putative determinants. Beyond the specification of this model, the top part of box 1 lists models used to characterise the shape of the BaP–cancer relationship. The grouped (categorical) model is the usual characterisation of relative risks according to exposure groups (“step functions”). The exponential and linear models are the two most commonly used “off the self” mathematical shapes. Natural cubic splines are more general, allowing a wide variety of shapes to be represented.27 They comprise cubic polynomials between pre-set “knots”, but are constrained to meet and have equal slopes at the knots, and to be linear beyond the range of the data. In general, setting more knots allows more complex curves, but placing of the knots has implications for where the greatest curvature can be. Following standard practice, we have always included “exterior” knots at the minimum and maximum values. To increase the range of shapes considered, we fit splines with one and with two interior knots each with two knot placing methods: (a) at equal quantiles and (b) where they maximised the likelihood. Finally, the power curve was considered as a parsimonious alternative way of representing a curve with slope greatest at zero exposure and declining (although always positive) as exposure increases. Our broad strategy was thus that advocated by Steenland27: we explored shapes with a minimum of assumptions using spline curves, but sought also more parsimonious models that fitted the data adequately.

To model the combined effect of occupational BaP exposure and smoking (bottom of box 1), we used the two standard forms: multiplicative and additive. A multiplicative model assumes that relative risk due to the two factors multiply, for example a smoker with RR(s) = 4 also exposed occupationally with RR(x) = 2 would have a combined risk relative to a non-smoker unexposed occupationally of RR = 2×4 = 8. In an additive model, the two excess relative risks (RR−1) are assumed to add to give the overall excess relative risk, so that the subject above would have a combined relative risk of 1+(2−1)+(4−1) = 5.

Because we wished to make minimum assumptions about the shape of the smoking–cancer association, all models included smoking as a grouped variable (step function).

To fit these models we used the Stata statistical package28 and EPICURE.29 The non-standard power curve was fitted by enumerating likelihood for a grid of parameter values and thus finding the maximum likelihood and likelihood profile confidence intervals. Similarly, for natural cubic splines the knots maximising likelihood were obtained by computing likelihood over a grid (5 μg/m3 year resolution).

We compared models by comparing deviances (−2×log likelihood) and Akaike’s information criterion (AIC), a measure that incorporates deviance along with a complexity penalty equal to twice the number of model parameters (AIC = deviance+2p). Low deviances and low AIC indicate better-fitting models. We also report dispersion (Pearson χ2 statistic over the residual degrees of freedom), which takes the value one or less in a model in which variation of deaths is no more than expected by chance. Dispersion over one is thus referred to as over-dispersion.


Preliminary grouped analyses

There were 677 lung cancer deaths in the cohort. Standardised mortality ratio (SMRs) by cumulative exposure to BaP are shown in table 1 and fig 1A. The SMRs show an upward trend with increasing exposure, but there are exceptions to this. The third group (20–39 μg/m3 years) is out of line with its neighbours. There is little increase over the highest three exposure groups. The small zero-exposure group has a significantly low SMR.

Figure 1

Graphs of lung cancer risk by grouped occupation PAH exposure and by smoking. BaP, benzo(a)pyrene; PAH, polycyclic aromatic hydrocarbon; RR, relative risk; SMR, standardised mortality ratio.

Table 1

Lung cancer cases and risks by exposure groups

Internal Poisson regression analyses showed the same pattern (fig 1B) and also a clear (although not monotonic) pattern of higher lung cancer rate with higher reported number of cigarettes (fig 1C). However, the correlation between number of cigarettes smoked (including non-smokers as zero) with cumulative occupational exposure to BaP was small (r = 0.02). Consequently, RRs by BaP adjusted for smoking (in the standard multiplicative model) were little different to unadjusted RRs (fig 1D).

Mathematical models

Models ignoring smoking

We investigated the shape of the exposure–response association by comparing goodness-of-fit as measured by model deviance in several different models. Results are shown in table 2 and illustrated in fig 2. Of the linear models, the linear fitted better than the exponential, but neither fitted well either by comparison to the “groups” model allowing each group to have a distinct RR, or with reference to its dispersion. One-knot natural cubic splines at either the median (10 μg/m3) or fit by maximum likelihood (5) fit little better. Two-knot splines, in particular that with knots fit by maximum likelihood (25 and 35) improved fit, although over-dispersion remained substantial. The power curve fitted slightly less well than the best two-knot spline, even allowing for its fewer parameters (AIC).

Figure 2

Fitted mathematical BaP–lung cancer exposure–response curves, super-imposed on un-smoothed risk estimates for each group. The left-hand y axes show the SMR scale, and the right hand y axes show the relative risk scale, which by definition is equal to one at zero exposure. Horizontal lines are drawn at SMR = 1 and RR = 1. The three lines depict the estimated curve with 95% CI. BaP, benzo(a)pyrene; RR, relative risk; SMR, standardised mortality ratio.

Table 2

Comparisons of models for BaP versus lung cancer (ignoring smoking)

The graphs in fig 2 show the fitted curves alongside the un-smoothed mortality estimates by grouped exposure. The figures show that the better-fitting functions have slopes which decrease as exposures increase (ie, supra-linear shape), but also locate the difficulty in finding a well-fitting smooth curve in the up-down-up pattern over groups 2–5.

Models incorporating smoking

Confounding was not an important concern (see above). However, we wished to clarify the joint effect of the two risk factors. Did they combine additively, multiplicatively or otherwise? Results are shown in table 3. Because the data are less aggregated here than for the analyses ignoring smoking, the deviances and AIC values cannot be directly compared to those in table 2, but it is reasonable to compare patterns of these values (for example the relative performance of exponential versus linear models).

Table 3

Comparisons of models for BaP and smoking versus lung cancer

The most robust comparison of the multiplicative and additive combination of risks is obtained by focusing on the groups exposure rows of table 3 (which make no assumption as to shape of exposure–response). Here the additive and multiplicative models differ only very slightly in deviance (hence AIC), with the additive model favoured, but not by enough to exclude chance as a cause. This pattern is apparent for the various smooth exposure–response models also.

The comparison of fit between mathematical forms of exposure–response (with either multiplicative or additive combination of risks) was much the same as for the analyses ignoring smoking. The values of the dispersion parameter were much lower than for the analyses ignoring smoking, and close to one for better fitting models. However, lower dispersion is expected in sparser data, for which natural Poisson variation inevitably comprises a bigger part of total variation.

The small size of differences in the parameters of the mathematical exposure–response functions between the multiplicative smoking-occupational BaP models and the models ignoring smoking confirm the lack of important confounding. The substantial differences between these and the parameters of the same exposure–response forms in the additive models reflect their different meaning in this context. Here the baseline risk from which they model departure is that in never-smokers. Because this baseline is low, the absolute values of the parameters in the additive models are higher.

Latency and time period effects

In this cohort, exposure peaked in the 1940s–1970s, and the information (cancers) from which we can infer exposure–response was overwhelmingly in years-at-risk from the 1980s and 1990s. This is illustrated in table 4, which shows expected lung cancers (effectively the information available) by year and time from first exposure. Although the follow-up period is considerably greater than in most epidemiology studies (ie, 50 years), this concentration of information in a narrow range of calendar time and time from exposure still limits the extent to which it is possible to investigate effects of these factors on exposure–response.

Table 4

Expected lung cancer deaths by calendar year and time from first exposure

We carried out a limited investigation of these factors, estimating slopes of a linear relative risk model for various strata of data according to time from first exposure and year. Results are shown in table 5. The linear model was chosen for simplicity despite its poor fit in unstratified analyses, to give a broad idea of the potential information in each stratum for further investigations. Either by calendar year or time from first exposure, all but the sparse lowest strata showed independent evidence for an association of lung cancer with occupational BaP. However, no patterns of slope were apparent beyond the imprecision indicated by the width of the confidence intervals.

Table 5

Slopes of fitted linear exposure–response relationships after stratifying by calendar year and time from first exposure


We set out answer two questions. What is the shape of the PAH–cancer exposure response? How do risks from occupational PAH and smoking combine? Unfortunately, despite this being the largest data set (most cases) analysed in relation to either of these questions to date, and the exposure estimates painstakingly researched, the answer to both of these questions was equivocal.

The shape of the exposure–response function clearly departed from linear, in the direction of lower slope at higher cumulative exposure. This has been observed in occupational cohorts exposed to some other pollutants, notably arsenic.30 This could be a real effect, for example due to saturation of binding sites for PAH metabolites on the carcinogenic pathway. However, some artefacts could also be responsible. We discuss some candidates in the next two paragraphs.

Despite the thoroughness of the exposure ascertainment,22 some errors are inevitable. As well as possible misclassification of BaP exposure, variation in the carcinogenic potency of the mixture per unit BaP over different occupations in the study would be a source of measurement error. BaP is frequently used as an index of PAH exposure because BaP, as well as being an important carcinogen itself, is usually quite highly correlated with the other known carcinogens in the PAH mixture.31 However, the correlation is not perfect. Random measurement error generally attenuates the slope of an association, but there can also be effects on shape. Multiplicative measurement error, whereby precision on an absolute scale and hence attenuation is greater at higher levels, would lead to a supra-linear shape of exposure–response even if the true shape were linear.32 33 The use for modelling of mid-points of exposure groups would also have introduced measurement error, although in our seven exposure groups between-group variation is much greater than within-group variation, so the impact of this error seems unlikely to be great. Also, grouping error is of Berkson type and consequently less likely to give rise to important bias in estimated regression parameters.31

We have considered exclusively cumulative exposure, and thus have not explored alternative ways in which duration and concentration of exposure might combine. Our limited exploration of latency and calendar time did not reveal patterns suggesting that cumulative exposure was a poor index, but we would not have been able to distinguish the fit of subtly different indices. We did not examine “lagged” cumulative exposures (which would be time consuming given the way that the data are organised), but would expect such analyses to have a similar limitation to those by time from first exposure due to lack of variation of time patterns of exposure in this cohort. This was a rather stable cohort, with those starting before 1970 (the more informative component due to long follow-up) on average starting at age 24 (interquartile range 20–29 years) and working for 33 years (IQR 25–38). The results we give for cumulative exposure can be therefore considered with least assumptions to be experienced over this duration (eg, 100 μg/m3 years = 3.3 μg/m3 over 30 years).

Uncertainty on the shape of the relationship translates to uncertainty on the magnitude of risk at given levels of exposure, especially at low exposures. For example, at 100 μg/m3 BaP years, the linear and power curve models predict relative risks of 1.35 and 2.68, respectively. If a linear relative risk model is assumed, a comparison of slope estimated in this study can be made with previous estimates: 0.0035(0.0022–0.0051) per μg/m3 year compared with 0.0028(0.0011–0.0051) from 338 cases in an earlier follow-up of the same cohort,1 and 0.0020(0.0011–0.0029) from a meta-analysis of 39 cohorts from a variety of exposure settings.34

The equivocal result on the mode of combination of risks from occupational exposure and smoking is not unusual in occupational epidemiology, but was disappointing given the large number of cases and substantial excess risks in this cohort. It may reflect a combination of risks intermediate between additive and multiplicative. It may also, however, reflect limitations in the smoking histories. Although the smoking data were unusually good for a cohort going back so long, they were approximate. In particular, changes in smoking habit during the life of an ever-smoker were not captured. The absence of a smooth monotonic increase in risk by number of cigarettes per day adds to concerns that this information may be subject to important error. Error in a putative modifier of an association is known to obscure such modification if present.32

We are confident, however, that the information on smoking is sufficient to exclude important confounding by smoking, because adjusting for even error-prone smoking data would be expected in such cases to cause some change in the association of interest.

There are some study limitations that might affect both focuses of our analyses. Working in these aluminium smelters entails exposures in some workers to other hazards. Heat, carbon monoxide, fluorides and alumina dust are prominent among them,19 but these are not suspected lung carcinogens. There is likely to have been some exposure to asbestos, but there seems to be no reason to assume that this would have been correlated with PAH exposure. Residual confounding by other factors and the differential action of a healthy worker effect – although less likely for cancer – could also play a part. Errors in exposure have been discussed. We see little reason to expect much misclassification of outcome, since lung cancer is an easily diagnosed cause of death, and follow-up of mortality was good.23

In conclusion, we have analysed the PAH–lung cancer exposure–response relationship in a large cohort with long follow-up. Despite this, the shape of the exposure–response function and the mode of combination of risks due to occupational PAH and smoking remain uncertain. If a linear exposure–response function is nevertheless assumed, the estimated slope is broadly in line with the estimate from a previous follow-up of the same cohort, and somewhat higher than the average found in a meta-analysis of B(a)P exposed populations in a variety of industries.



  • See Commentary, p 716

  • Funding This research was carried out with the financial support and full co-operation of Alcan aluminium smelters.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles