Article Text

Download PDFPDF

Validity of empirical models of exposure in asphalt paving
  1. I Burstyn1,
  2. P Boffetta2,
  3. G A Burr3,
  4. A Cenni4,
  5. U Knecht5,
  6. G Sciarra4,
  7. H Kromhout1
  1. 1Division of Environmental and Occupational Health, Institute for Risk Assessment Sciences, Utrecht University, PO Box 80176, 3508TD Utrecht, Netherlands
  2. 2Unit of Environmental Cancer Epidemiology, International Agency for Research on Cancer, 150 cours Albert-Thomas, 69372 Lyon Cedex 08, France
  3. 3The National Institute for Occupational Safety and Health, 5555 Ridge Avenue, Cincinnati, Ohio 45213, USA
  4. 4Operative Unit of Industrial Hygiene—National Health Service, Strada del Ruffolo, 53100 Sienna, Italy
  5. 5Institute of Occupational and Social Medicine, Aulweg 129/ III, 35392 Giessen, Germany
  1. Correspondence to:
 Dr H Kromhout, Environmental and Occupational Health Group, Institute for Risk Assessment Sciences, Utrecht University, PO Box 80176, 3508TD Utrecht, Netherlands;


Aims: To investigate the validity of empirical models of exposure to bitumen fume and benzo(a)pyrene, developed for a historical cohort study of asphalt paving in Western Europe.

Methods: Validity was evaluated using data from the USA, Italy, and Germany not used to develop the original models. Correlation between observed and predicted exposures was examined. Bias and precision were estimated.

Results: Models were imprecise. Furthermore, predicted bitumen fume exposures tended to be lower (−70%) than concentrations found during paving in the USA. This apparent bias might be attributed to differences between Western European and USA paving practices. Evaluation of the validity of the benzo(a)pyrene exposure model revealed a similar to expected effect of re-paving and a larger than expected effect of tar use. Overall, benzo(a)pyrene models underestimated exposures by 51%.

Conclusions: Possible bias as a result of underestimation of the impact of coal tar on benzo(a)pyrene exposure levels must be explored in sensitivity analysis of the exposure–response relation. Validation of the models, albeit limited, increased our confidence in their applicability to exposure assessment in the historical cohort study of cancer risk among asphalt workers.

  • bitumen
  • occupational exposure
  • reproducibility of results

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Increasingly in occupational and environmental epidemiology the quality of studies and their subsequent usefulness for risk assessors and regulators depends on the validity of their exposure assessment. This trend is in part a result of the recognition of the fact that most of the remaining unidentified health risks from occupational and environmental factors are likely to be low (relative risks of the order of 2–3) and can be easily missed because of misclassification of exposure.1 These weak associations, however, can have a profound public health impact if their causative agents are highly prevalent.1

Two examples illustrate the crucial role that exposure assessment plays in modern occupational epidemiology. Between 1976 and 1993, 20 studies of cancer risk among asphalt workers (mostly road pavers) have been conducted, many of which suggested that this occupation entailed an increased lung cancer risk.2 However, these studies suffered from failure to differentiate between coal tar and bitumen (or asphalt, as it is known in the USA) exposures. As a result, this substantial 17 year research effort has been proven to be of limited use in the evaluation of carcinogenicity of the main agent that asphalt workers are currently exposed to—bitumen,3,4 hampering any preventive measures through setting scientifically based exposure limits and exposure controls.

Another example is that of exposure to electromagnetic fields and occupational cancer. A study's ability to detect an association between electromagnetic fields and increased cancer risks to a large extent depends on assumptions made in exposure modelling, emphasising the importance of validating exposure models.5–7 Consequently, analyses of sensitivity of risk estimates to assumptions made in exposure assessment are becoming an integral part of analysis of epidemiological studies.5–11 This paper addresses validation of exposure models for asphalt paving workers, the most numerous bitumen exposed group in an international historical cohort study of bitumen.

The International Agency for Research on Cancer (IARC) is coordinating a multicentre investigation of cancer among asphalt workers. The study is an industry based historical cohort assembled in seven European countries (Denmark, Finland, France, Germany, the Netherlands, Norway, Sweden) and Israel. Coal tar use has been progressively discontinued in Western Europe, resulting in the possibility to disentangle any effects of its exposure from that of bitumen. The overall aim of the exposure assessment for the study was to develop an exposure matrix which can be used to assess the exposures to the agents of interest (either quantitatively or semiquantitatively) in a country, company, job, and time period specific manner. With this goal in mind, we developed statistical models of bitumen fume, organic vapour, and benzo(a)pyrene (as a representative of 4–6 ring polycyclic aromatic hydrocarbons) exposure during paving operations.12 Organic vapour and benzo(a)pyrene can originate from both bitumen and coal tar. These models were based on exposure data gathered from previously collected industrial hygiene measurements in the participating countries.13 Since then, additional exposure data from road paving operations were obtained, allowing us to validate the empirical models against these newly acquired measurements.


The mixed effects models evaluated in this paper have been described in detail elsewhere12 and are summarised in table 1. These models were aimed at identifying the factors predicting changes in exposure levels of road paving workers to exposures to bitumen fume and benzo(a)pyrene. They revealed a declining trend in exposures to bitumen fume and benzo(a)pyrene with time, 6% and 11% per year, respectively. Furthermore, differences in exposure levels were observed between different methods of paving. Coal tar use was shown to be the most important predictor of benzo(a)pyrene exposure, but the magnitude of this effect was somewhat less than that expected on the basis of laboratory studies. The differences in sampling and analytical methods, and in applied measurement strategies, were accounted for. There were no differences between comparable paving operations among countries.

Table 1

Multiple linear mixed effects models of bitumen fume (mg/m3) and benzo(a)pyrene (ng/m3) (adapted from Burstyn et al12)

The general model used to study the fixed and random effects is described by the following expression:Embedded Imagewhere:

Yij1 . . . βn = natural logarithm of the exposure concentration measured on the jth day of the ith worker in presence of the β1. . . βn determinants of exposure;

μ = true underlying mean of log transformed exposure averaged over all determinants of exposure;

β1. . . βn = fixed effects of the determinants of exposure;

χi = random effect of the ith worker;

εij = random within worker variation.

The restricted maximum likelihood algorithm estimated variance components derived from the mixed effect models. The algorithm assumes that χi and εij are normally distributed with zero means and variances σ2BW (between worker logarithmic exposure variance) and σ2WW (within worker logarithmic exposure variance), respectively, which are mutually independent.

The models were validated against external data obtained from the United States,14–19 Germany (measurements were only partially described in this publication),20 and Italy (unpublished). These data were made available to us after the original statistical models were constructed. Table 2 summarises the key features of these newly acquired data. Three US benzene soluble matter measurements from 1982 were excluded from the current analysis because they were collected during an experimental application of sulphur containing asphalt. All bitumen fume and Italian benzo(a)pyrene measurements were collected using personal samplers. However, because of the limited number of benzo(a)pyrene measurements, we also used stationary samples collected in Germany.20

Table 2

Description of data used in evaluation of external validity

We computed Pearson correlation of the predicted and observed values. We also compared 95% confidence intervals of the geometric means of survey specific predicted and measured exposure levels. Bias and precision of the models were estimated using a procedure similar to that proposed by Hornung21 (see equations 2, 3, and 4). Bias was defined as the mean difference between predicted and measured values on logarithmic scale; precision was defined as the standard deviation of bias. These were calculated on a logarithmic scale, because bias followed a left skewed distribution that approximated normal distribution after logarithmic transformation. The effect of the German sampling method on benzo(a)pyrene exposure could not be estimated using data that the original model was based on. Therefore, we recalculated predicted values for the German study, taking into account: (1) benzo(a)pyrene is present mostly in fume phase in asphalt work; and (2) the GGP sampler used in Germany tends to collect 3.3 (= exp(1.20)) times more dust than the sampler used in the original benzo(a)pyrene exposure models (table 1).Embedded ImageEmbedded ImageEmbedded Imagewhere n = number of pairs of measured and predicted values being compared.

Analyses were carried out using SAS version 6.12 (SAS Institute, Cary, NC), Microsoft Excel 7.0 (Microsoft Corporation, Seattle, WA), and SigmaPlot 4.01 (SPSS Inc., Chicago, IL). Data management and acquisition were facilitated by the use of Microsoft Access 2.0 (Microsoft Corporation, Seattle, WA).


The correlation between observed and predicted bitumen fume exposures for the US data was weak, but statistically significant (Pearson correlation coefficient (r) = 0.28, p = 0.004; n = 98). For the data obtained from Germany and Italy, the relation between observed and predicted benzo(a)pyrene exposure levels was much stronger (r = 0.45, p = 0.0001; n = 339). Figures 1 and 2 illustrate the relation between medians, estimated as geometric means, of measured and predicted values of bitumen fume and benzo(a)pyrene exposure. In fig 1, measured mean values correspond to the results of each Health Hazard Evaluation conducted by US NIOSH. The reason why some years in fig 1 have only one predicted and two measured values is because the predicted value is the same for each of the two surveys conducted in that year. The figure indicates that the bitumen fume model tends to underestimate exposure levels, even though for five out of six surveys these differences can be expected to be caused by chance (95% confidence intervals of observed and estimated values overlap).

Figure 1

Measured versus predicted median bitumen fume exposures for each one of six NIOSH Health Hazard Evaluations used in external validation; all measurements are from hot mix paving in the USA (n = number of observations/measurements).

Figure 2

Measured versus predicted median benzo(a)pyrene exposures; measurements from Italy and Germany. Whiskers: 95% CI. All samples collected during tar use were stationary one hour measurements at the expected location of a worker, obtained in Germany. Predictions for German data were not corrected for the GGP sampler use in this figure. All samples from tar free paving were collected in Italy.

In fig 2, measured mean values correspond to different exposure scenarios, as indicted by the legend. It would appear that there is a reasonable degree of agreement between observed and predicted medians for recent measurements in tar free environments. However, the benzo(a)pyrene exposure model tended to underestimate exposure levels for the circumstances in which tar was used in Germany. The observed effect of in situ recycling on benzo(a)pyrene exposure was similar to that expected on the basis of the previously developed statistical model.

Table 3 illustrates bias and precision of estimated exposures to bitumen fume and benzo(a)pyrene relative to external measurement data. Bitumen fume and benzo(a)pyrene models showed negative bias, −70 and −82% respectively. Correction for the sampling method used in Germany further decreased the estimate of relative bias to −51%.

Table 3

Assessment of bias, precision, and relative bias of the models of bitumen fume and benzo(a)pyrene with respect to external data (external validity)


Overall comparison of the individual predictions of bitumen fume and benzo(a)pyrene exposure models to external data revealed a significant, but weak correlation. This arose from the fact that measured data had a much wider range, implying that absolute values predicted by our models for each individual observation may be inaccurate. Furthermore, discrepancies in ranges suggest that our models may underestimate any contrasts that exist between different groups of subjects, leading to reduction in power to detect quantitative exposure–response relations. The models had relatively poor precision, probably resulting mostly from large day to day variances. This is adequate because our goal was to model between worker differences in exposure. The original models did explain a substantial 54–79% of the estimated between worker variance, implying that the between worker variance is reflected by the variables used in the calculation of exposure intensity estimates for the exposure matrix to be employed in epidemiological analyses.12 Bias estimates indicated that our models can underestimate bitumen fume and benzo(a)pyrene concentrations by 50–70%. Underestimating exposures, especially those occurring further in the past and lying at the upper edge of exposure distribution, as suggested by fig 2, would lead to overestimation of dose–response relations based on quantitative indices of exposure (for example, cumulative exposure or career average exposure).

Data used to construct the original exposure models was based primarily on measurements collected in Scandinavia. USA and Italy are not participating in the cohort study, therefore they did not contribute any data to the original models. Some bitumen fume exposure data from Germany was used in constructing the original bitumen fume exposure model. However, this German data was from a different survey than that used in validation.

It is possible that negative bias with respect to bitumen fume measurements from the USA is caused by either chance or systematic differences between road paving practices in Western Europe and the USA. Unfortunately, comparable data on bitumen fume exposure was not available in Western Europe, and we had to resort to US data for assessment of external validity. The two to four times greater pace of paving in the USA compared to Western Europe and differences in the types of asphalt mixes used (Max von Devivere, personal communications) may well explain observed higher exposures in the USA. If that is the case, apparent bias in our bitumen fume models probably does not impede applicability of our models to the European situation.

In validation of benzo(a)pyrene model, differences in sampling methods, such as use of short term stationary samples and differences in method of extraction of organic matter may also contribute to the observed discrepancies between predicted and observed values.12,22 An alternative reason for this discrepancy may arise from the fact that the coal tar content of asphalt binder, an important predictor of benzo(a)pyrene exposure,23 was not taken into account in the original models. Even if a model that takes this factor into account could be constructed, the coal tar content of asphalt binder would be impossible to estimate with any precision in an industry wide cohort study. Thus, our models may underestimate benzo(a)pyrene exposure under circumstances similar to those monitored in the German data (25–30% coal tar in asphalt binder). Nonetheless, the estimate of the effect of coal tar use derived from the validation data set is within a range that can be expected on the basis of the results of laboratory studies.12

In the IARC study of asphalt industry we resorted to empirical modelling of exposures for paving workers. Model based exposure assessment might have numerous advantages over exposure assessment based solely on the experts' opinions. Quantitative exposure assessment is important for both identification of “weak associations”1 and establishment of scientifically based exposure limits. One of their other advantages is that the use of statistical models of exposure in epidemiological studies allows data driven sensitivity analysis of exposure–response relations.5,6,11,24 We have also shown previously that interpretable empirical models can be derived despite limitations of the data.12,23 In this paper we have further shown that these models can be expected to produce reasonably accurate predictions—that is, have inaccuracy on the order of day to day variability in exposure. Such examination of patterns of exposure within the road paving industry would not have been possible on the basis of previously published reports, which would have formed the basis of subjective expert evaluations as a method for exposure assessment.22 Therefore, when measurement data are available, their use for exposure assessment should be considered as the primary basis of assessing exposure intensity. Subjective evaluation of exposures should be used only as the last resort (for example, few or no measurements, well known errors in analytical procedures used to obtain available measurements). This is especially relevant for large multicentre international studies in which calibration of different assessors may prove to be very difficult. If expert evaluation of exposures is the only available option in exposure assessment, the penalty is paid in the form of uncertainty about where the weaknesses of the exposure assessment protocol lie. Subjective evaluations, just like any other exposure assessment procedure, must be validated.8 The most direct method of validating any exposure model is through workplace measurements,25,26 but other validation methods are also available.8,21,27,28

We have shown that previously developed models of bitumen fume and benzo(a)pyrene can underestimate exposures under certain circumstances. Model estimates can be expected to be imprecise, making them most suitable for group based exposure predictions in which all members of a group are assigned the same average exposure, instead of individual based exposure predictions. Limited validation against external measurement data revealed that the patterns described by the original bitumen fume and benzo(a)pyrene models were also present in the validation data sets. This provided an additional guarantee that the originally derived models provided useful estimates of bitumen fume and benzo(a)pyrene exposure. Despite these encouraging findings, we should note that the evaluation of the external validity was possible only to a limited extent, since validation data sets lacked the diversity of exposure scenarios needed for a more comprehensive evaluation. Furthermore, models do seem to contain bias, but its magnitude appears reasonable given the nature of cohort design and uncertainties usually encountered in retrospective exposure assessment.29–31 Overall, results indicated that further improvements could not presently be made to the two original models, given the quality of data available on determinants of exposure and retrospective design of the epidemiological study. However, the consequences of assigning a higher multiplier to the effect of coal tar on benzo(a)pyrene will be explored in sensitivity analysis. Validation of the models increased our confidence in their applicability to exposure assessment in the historical cohort study of cancer risk among asphalt workers.

Main messages

  • Exposure models developed for study of cancer risk among European asphalt workers were suitable only for group based exposure assessment.

  • The observed bias and imprecision were acceptable given the constraints of the study design.

  • Weaknesses of the constructed models (especially apparent bias with respect to some production conditions) should be explored in sensitivity analysis of any exposure–response relations identified on the basis of the evaluated models.

  • Quantitative exposure assessment based on occupational exposure measurements can produce reasonably unbiased exposure estimates.

  • Quantitative exposure assessment, critical for modern occupational epidemiology, can be achieved in multicentre international retrospective cohort studies.

Policy implications

  • Statistical exposure models developed for the study of cancer risk among European asphalt workers, being suitable for group level exposure assessment in the epidemiological study, will help to establish whether bitumen is a human carcinogen.

  • However, imprecision in modelled exposure estimates and uncertainty in absolute levels of model predictions will hamper establishment of occupational exposure limits on the basis of exposure–response relations, if any, identified through use of these exposure models.


Igor Burstyn was supported by an IARC Special Training Award. The study was partially supported by a grant from the European Commission, DG-XII, Biomed-2 programme (Contract No. BMH4-CT95–1100), European industrial associations: European Asphalt Pavement Association, Eurobitume, and CONCAWE (the oil companies' European organisation for environment, health, and safety). Pamela Cruise edited the manuscript. Dr Hiltrud Merzenich provided valuable assistance in retrieving and recoding data from Germany. Dr Elisabeth Ward helped us to access the US data, and Dr Lucia Miligi facilitated us gaining access to the Italian data. Dr Bert Brunekreef made valuable comments on the final version of the manuscript. Statistical advice of Edwin Martens (Centrum voor Biostatistiek, Utrecht University) was appreciated.