Article Text

Download PDFPDF
Smoothing in occupational cohort studies: an illustration based on penalised splines
  1. E A Eisen1,
  2. I Agalliu2,
  3. S W Thurston3,
  4. B A Coull4,
  5. H Checkoway5
  1. 1Occupational Health Program, Harvard School of Public Health, Boston; Department of Work Environment, School of Health and Environment, University of Massachusetts Lowell, Lowell, MA, USA
  2. 2Department of Work Environment, University of Massachusetts, Lowell, Lowell, MA, USA
  3. 3Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY, USA
  4. 4Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
  5. 5Department of Occupational and Environmental Health Sciences, University of Washington School of Public Health, Seattle, WA, USA
  1. Correspondence to:
 Prof. E A Eisen
 Occupational Health Program, Harvard School of Public Health, 665 Huntington Avenue, Boston, MA 02115, USA;


Aims: To illustrate the contribution of smoothing methods to modelling exposure-response data, Cox models with penalised splines were used to reanalyse lung cancer risk in a cohort of workers exposed to silica in California’s diatomaceous earth industry. To encourage application of this approach, computer code is provided.

Methods: Relying on graphic plots of hazard ratios as smooth functions of exposure, the sensitivity of the curve to amount of smoothing, length of the exposure lag, and the influence of the highest exposures was evaluated. Trimming and data transformations were used to down-weight influential observations.

Results: The estimated hazard ratio increased steeply with cumulative silica exposure before flattening and then declining over the sparser regions of exposure. The curve was sensitive to changes in degrees of freedom, but insensitive to the number or location of knots. As the length of lag increased, so did the maximum hazard ratio, but the shape was similar. Deleting the two highest exposed subjects eliminated the top half of the range and allowed the hazard ratio to continue to rise. The shape of the splines suggested a parametric model with log hazard as a linear function of log transformed exposure would fit well.

Conclusions: This flexible statistical approach reduces the dependence on a priori assumptions, while pointing to a suitable parametric model if one exists. In the absence of an appropriate parametric form, however, splines can provide exposure-response information useful for aetiological research and public health intervention.

  • P-splines, penalised splines
  • HR, hazard ratio
  • RR, relative risk
  • GAM, generalised additive model
  • df, degrees of freedom
  • AIC, Akaike’s Information Criterion
  • Cox regression
  • exposure-response models
  • regression diagnostics

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Supported by Grant CA81345-03 from National Cancer Institute

Linked Articles

  • Work in brief
    Dana Loomis