Article Text
Statistics from Altmetric.com
Exposure to chemical, physical, and biological agents in the workplace is difficult to characterise. A worker’s exposure is never constant over time. Workers within groups with similar tasks and working environments are rarely uniformly exposed. Hence, assigning workers to “exposed” and “unexposed” groups or to exposure categories becomes a problem. Insight is required into the reasons why exposure variability exists, how large this variability is, and which factors determine differences in exposure levels among workers. This knowledge is essential in design, conduct, and interpretation of epidemiological studies and workplace intervention programmes. In recent years statistical techniques have become available that allow simultaneous evaluation of the magnitude of variance components as well as determinants of this variability. These techniques are powerful instruments in the design of measurement strategies in epidemiological studies and in implementation of control and prevention strategies to reduce hazardous exposure.^{1}
EXPOSURE VARIABILITY AND EXPOSURE MODELS
Assessment of exposure to hazardous substances at the workplace has shown that exposure is rarely constant. In workplaces tasks, activities, work processes, and locations change over time, resulting in occupational exposures that vary both within workers over time and between workers in the same job. Figure 1 depicts the variability in exposure to inhalable flour dust among and between workers in Swedish bakeries,^{2} illustrating that the (geometric) mean and (geometric) standard deviation of an exposure parameter in an occupational group present only limited information on the underlying exposure pattern among workers in this occupational group. The phenomenon of exposure variability needs to be understood for a number of purposes, including planning exposure measurements, assigning estimates of exposure to subjects in a study, identifying determinants of exposure, evaluating compliance with exposure limits, and establishing efficiency of control measures.
The first approach to evaluate the influence of particular factors on exposure levels is a linear regression model. This model requires a dataset consisting of single measurements on various workers and additional information for each worker on their work characteristics (see box 1). This information may be derived from questions in a selfadministered questionnaire on job descriptions or may be collected during walkthrough surveys. Assuming a log normal exposure distribution, the exposure level of worker i is predicted by the model:
Box 1 Examples of determinants of exposure in selected studies
Agent

Physical characteristics (e.g. vapour pressure, pH level)

Process characteristics

Intermittent/continuous process

Level of automation

Confinement (enclosed/open)

Type of process (e.g. welding v thermal cutting)

Technology development
Ventilation

General ventilation

Local exhaust ventilation
Worker characteristics

Job activities

Work techniques

Mobile/stationary
Environmental characteristics

Indoor/outdoor

Temperature
where a is the intercept representing the exposure concentration when all independent variables equal zero, b_{1} to b_{k} are regression coefficients of k independent variables X describing the fixed effects of these determinants on the exposure level, and ε represents the random deviation with mean 0. Among others, crucial assumptions are that the measurements (lnC) are independent of one another (this requires a single measurement per worker) and that the variance of lnC is the same for any fixed combination of X_{1}...X_{k} (equal covariances). A linear regression model presents the investigator with information on the relative importance of various determinants of exposure. For example, a regression coefficient of 0.50 for the presence of local exhaust ventilation indicates that the exposure among workers with local exhaust ventilation is a factor 1.65 [exp(0.50)] higher than those workers without ventilation. This insight in exposure determinants will greatly facilitate decisions about the type of control measures that are deemed most appropriate at specific workplaces. The linear regression model may also be translated into a mathematical expression to assign exposure levels to all individual workers in a population for which only information on determinants of exposure is available. One of the first applications of this approach was a linear regression model assessing the impact of different plant processes, dust control measures, and job assignments on historical exposure levels in an asbestos textile plant. Subsequently, all workers in the cohort were assigned exposure levels based on the regression equation.^{8}
An important disadvantage of linear regression analysis is that it cannot take into account repeated measurements on workers. When repeated measurements are available (at least two measurements for a proportion of workers) an analysis of variance (ANOVA) technique can be employed. The basic information in an ANOVA is the estimates of variance and this approach may be used to evaluate and optimise the grouping of workers into comparable exposure groups. In a simple oneway ANOVA model with repeated measurements on workers in an occupational group the measured exposure of worker i at time j, assuming a lognormal exposure distribution, is expressed in the formula:
where μ is the longterm group mean exposure, α_{i} is the random deviation of the mean exposure of person i from the group mean (contributing to the betweenworker variance), and ε_{ij} represents the random deviation of the exposure of person i on day j from the mean exposure of person i (contributing to the withinworker variance). This error term also includes the measurement error due to analytical errors in the measurement technique. The total variance in exposure is the sum of the betweenworker variance and the withinworker variance. The ANOVA model assumes that α_{i} and ε_{ij} are normally distributed and independent and that two important conditions are met: measurements have equal variances at each of the repeated measurements, and pairs of measurements on the same subject are equally correlated, regardless of the time lag between individuals. The latter two conditions are known as the “compound symmetry” assumption. Violation of this restrictive assumption of homogeneous betweenworker variance and homogeneous withinworker variance may result in invalid estimates and invalid standard errors for these estimates.
The ANOVA model can easily be expanded to include occupational groups in the analysis, allowing to partition the exposure variability into three variance components: betweengroup variance (do a priori defined groups in a study population really differ in mean exposure?), betweenworker variance (are subjects a priori assigned to an exposure group really similar?), and withinworker variance (do repeated samples on an individual show similar exposure levels?). These random effects ANOVA models have been used in many occupational groups to illustrate the presence of substantial variability in exposure to chemical substances,^{3,}^{4} physical agents,^{5} aeroallergens,^{6} and physical workload.^{7} A comprehensive evaluation of exposure to chemical substances among 165 occupational groups showed that the individual mean exposures within a group varied considerably. In fact, only in 25% of the groups was the 95% range of individual mean exposures within a factor of 2, almost 30% were within a 10fold range, and 10% of the groups showed a range of over 50fold. In general, the differences in exposure among activities and shift within workers exceeded the differences in exposure between workers in the same job in the same factory, suggesting that grouping strategies solely based on job titles may result in considerable misclassification and, hence, in attenuation of the true association between exposure and health effect.^{4}
Random effects ANOVA models and linear regression models have been applied in a rich diversity of occupational settings, showing that estimating and modelling of exposure determinants is a suitable addition to the classical exposure assessment strategies. An extensive review of studies on determinants of exposure concluded that observational studies can be used to identify sources of exposure and guide towards appropriate control measures which can be tested in experimental studies and, subsequently, reevaluated in observational studies at the workplace.^{9} However, a word of caution is necessary since both statistical techniques only produce valid results within the constraints of their mathematical assumptions. In several publications with exposure models based on linear regression analysis the available data for the analysis was not limited to a single measurement per worker, thus, disregarding the correlation between repeated measurements.^{10} In some publications ANOVA models have been described without a formal evaluation of the required compound symmetry covariance structure, whereas the description of the dataset suggests that important assumptions may have been ignored. The assumption of equal betweenworker variance should certainly be tested when including occupational groups from a wide range of situations, such as working outdoors versus working indoors or workers in intermittent processes versus operators in continuous processes. The withinworker variance may be influenced by the timeframe of the measurements, since measurements taken close in time have higher correlations than those taken further apart in time.
APPLICATION OF MIXED EFFECTS MODELS
In the classical linear regression model the effects of work characteristics on observed exposure levels are determined without taking into account the role of withinworker variability. The random effects ANOVA model does not present information on the influence of particular factors on the actual measured personal exposure level and has very restrictive assumptions on the variance components. Hence, there is a need for a statistical technique that can combine both models and is less restrictive on the specific covariance structure of repeated measurements. With the recent introduction of the linear mixedeffects model in common statistical packages, such as SPSS (mixed model), SAS (Proc Mixed) and SPlus, a powerful tool has become available to combine the prediction of workers’ exposure by process characteristics, job title, or other exposure determinants (fixed effects), while accounting for the withinworker and betweenworker variance (random effects). The basic idea of a mixed effects model is that the variance in measured exposure is partly explained by fixed determinants of exposure, thereby reducing the remaining random variance between and within workers (see fig 2). The primary objective is to make inferences about the fixed effects in the model, for example to estimate differences between occupational groups at specific times, differences between exposure conditions averaged over time, or changes over time in specific exposure conditions.^{11}
A straightforward linear mixed effects model describes the exposure level of worker i on day j, assuming a lognormal exposure distribution, in the model:
where μ is the intercept representing the true underlying exposure concentration (fixed) averaged over all workers, b_{1} to b_{k} are regression coefficients of the fixed effects of particular determinants of exposure, α_{i} represents the random effect of the ith worker corresponding to the difference between his mean exposure and the overall mean exposure, and ε_{ij} represents the random effect of the jth day for the ith worker. The underlying model assumptions play a crucial role in the estimation of the parameters and, hence, to specify a model for the covariance structure is an essential first step in the analysis. This involves evaluation of different structures in the selection of the best model.^{11} In the most restricted model all workers have the same withinworker variance (correlations between repeated measurements are equal) and the same betweenworker variance (variance between workers is equal across all fixed determinants of exposure); the aforementioned compound symmetry covariance structure. A less restricted model only assumes that the withinworker variance is constant, whereas in the least restrictive model the variance in repeated measurements within workers may vary as well as the variance between workers for different fixed effects. In this last model with an unstructured covariance each worker has his own regression model with different regression coefficients and different true mean exposure. In table 1 the estimated parameters are presented for a theoretical situation. The analysis will present estimates of the variance components, which may be similar across jobs depending on the assumptions on the variance structure. The fixed determinants of exposure can be interpreted similar to regression coefficients in a linear regression model. In general, more restrictions in a linear mixed effects model are required when less measurements are available, for example with only 2–3 repeated samples per worker the assumption of a common withinworker variance is essential for fitting an appropriate model (that is, in the example of table 1 the error term is equal across jobs).
A good illustration of a mixed effects model with restrictions on both within and betweenworker variance was recently presented by Burstyn and colleagues.^{12} A large database with over 1500 exposure measurements among asphalt workers from 37 different sources in eight European countries was constructed. This database enabled the researchers to present three models on the important determinants of bitumen fume, bitumen vapour, and polycyclic aromatic hydrocarbons (PAH) exposure intensity among paving workers. The geometric mean for bitumen fume among about 1200 workers was 0.28 mg/m^{3} with a wide range between 0.02 and 260 mg/m^{3}. The exposure model identified as important determinants: mastic laying (+0.88 mg/m^{3}), recycling operations (+0.89 mg/m^{3}), oil gravel paving (−1.51 mg/m^{3}), and years before 1997 (+0.06 mg/m^{3}). The specific sampling techniques applied also were significantly associated with the exposure level of bitumen fume. It appeared that the fixed effects explained about 41% of the total variability in exposure to bitumen fume. These fixed effects reduced the withinworker variance by only 8% but the betweenworker variance by about 56%. This exposure model was used for exposure assessment in a historical cohort study of asphalt paving in Western Europe. A validation of the empirical exposure model with exposure information from the USA showed large systematic differences in predicted bitumen fume exposures between Western European and USA paving practices.^{13} This finding argues for caution in the application of an exposure model to occupational populations not included in the development of the original model, since the relative importance of determinants of exposure may vary across workplaces and populations.
A comprehensive evaluation of the application of mixed effects models with different variance structures was performed by Rappaport and colleagues.^{14} Almost 200 measurements on total particulates were performed at nine workplaces among boilermakers, ironworkers, pipefitters, and welderfitters in the USA. Most workers were repeatedly sampled with a range of 3–14 measurements per worker, and six process and task related parameters were collected during the measurements. The comparison between three different models showed that it was reasonable to pool the withinworker variance across jobs and, hence, increasing the statistical power. The betweenworker variance was sufficiently different among the jobs with welderfitters showing the largest betweenworker variance and boilermakers and ironworkers the lowest. Thus, in the mixed effects model a fixed term was introduced for job title, allowing the betweenworker variance to vary across the four jobs. With regard to the fixed effects the exposure was significantly lower with the use of ventilation or when less than half of the day involved hot processes. The interaction of both fixed effects was also significant. This analysis of important determinants of exposure and sources of variability suggests that control of particulate exposure among boilermakers and ironworkers (with low betweenworker variance) should focus on broad environmental changes, such as engineering or administrative controls, and among welderfitters and pipefitters should address individual personal environments and working techniques.^{14}
The need to apply a linear mixed effects model rather than the classical linear regression analysis is largely determined by the influence of fixed effects on the betweenworker and withinworker variance. A comparison of both statistical models on two datasets showed that in a study on endotoxins exposure among pig farmers the results from both analyses were very similar, due to the fact that inclusion of 12 fixed farm characteristics in the exposure model reduced the betweenworker variance by 82% and the overall betweenworker variance was very low. However, in a dataset on exposure to inhalable dust among workers in the rubber manufacturing industry, the mixed effects model resulted in fewer exposure determinants with a lower estimated effect on exposure. In this example the betweenworker variance was large and could only be reduced by 35% through inclusion of fixed effects.^{10}
CONSEQUENCES FOR EXPOSURE ASSESSMENT
What are the advantages of these new statistical techniques in the evaluation of exposure patterns at the workplace? The mixed effects models require a substantial amount of measurements, their application needs a thorough statistical evaluation of underlying assumptions on variance structures, and specific software packages are required for the calculations. Then why bother to use these techniques?
The main advantages of a full exploration of exposure variability are that planning of measurement campaigns can be improved substantially and that interpretation of results is less biased by neglected sources of variation in exposure. These advantages can be illustrated in three key areas of exposure assessment: doseresponse relations in epidemiological studies, testing for compliance with exposure limits, and design and evaluation of control measures.
The consequences of exposure variability have been explored in detail in the context of their effects on the exposureresponse relation in epidemiological studies and subsequent adjustment of the exposure assessment strategy. Random error in exposure measurements usually biases the risk estimate (for example, odds ratio, relative risk, regression coefficient) towards unity—that is, no association. The withinworker variability is an important component of the total measurement error and the ratio of the withinworker variance over the betweenworker variance is directly linked to the expected attenuation in the observed risk estimate.^{15} Given the reported magnitude of the withinworker component in the exposure variability in several occupational groups, the true risk in an epidemiological study may be missed by a large margin. The consequences are that in these situations a group based exposure strategy may be preferred over an individual based strategy. Such a group strategy will result in little or no attenuation in the doseresponse relation if workers can be assigned to exposure groups that sufficiently differ in their average exposures.^{15} The information on the relative magnitude of sources of exposure variability can be used to evaluate the effect of different grouping strategies on observed associations between exposure and health outcomes. Equations using variance components have been developed to predict the effect of different strategies on the risk estimate and standard error.^{16} In a large study among carbon black workers it was shown that differences in grouping schemes had a large effect on the slope and standard error of the regression coefficient of dust exposure on lung function parameters. The similarities in predicted and observed attenuation of risk prompted the authors to conclude that these equations appear to be a useful tool in establishing the most efficient way of utilising exposure measurements.^{17} The same information may also guide towards an optimum sampling scheme for exposure measurements since the efficiency of increasing the number of repeated measurements per subject or, vice versa, increasing the number of subjects, is partly determined by the ratio of withinworker over betweenworker variance.^{15,}^{18}
The first test for compliance of workplaces against occupational exposure limits assumed that all exposure measurements were less than the limit. Subsequent testing procedures incorporated information on exposure variability, either by using a predetermined value for the geometric standard deviation in the observed occupational group or by requesting a few measurements to estimate the geometric standard deviation of the exposure distribution within a particular group. These strategies implicitly assumed that for each worker in the group the same exposure distribution was present, hence ignoring the presence of substantial betweenworker variance. A new compliance testing procedure for agents with chronic health effects has been proposed that accounts for both withinworker and betweenworker sources of variability.^{19} This procedure starts with two shiftlong measurements randomly collected from each of 10 randomly chosen workers from an occupational group. In the first step it is evaluated whether the selected occupational group may be regarded as a monomorphic group. A random effects analysis of variance model is fitted to the data to evaluate whether the individual mean exposures of workers in that group can be described by a log normal distribution, and thus whether these workers can be regarded as members of the same group. For these monomorphic groups the probability is assessed that a randomly selected worker’s mean exposure is above the occupational exposure limit. For nonmonomorphic groups alternative grouping should be attempted since the initially identified group of workers most likely constitutes several different groups, or particular workers were assigned to the wrong group. For occupational groups with unacceptable exposure levels, resampling is suggested to increase the power of the compliance test. If it appears that workers in the occupational group are uniformly exposed to unacceptable levels, engineering or administrative controls are recommended. For nonuniformly exposed workers in a group, interventions at individual level should be considered, such as modifications of tasks and work practices. This compliance testing strategy combines a compliance protocol showing that exposure at the workplace will not exceed the threshold limit value with the analysis of type of control measures best suited to reduce exposure levels at the workplace.
In the design and evaluation of control measures, traditional methods of analysis may regard exposure variability as a nuisance since it will diminish the power of the exposure survey. These methods fail to exploit the fact that the sources of exposure variability in itself contain important information as to the most appropriate control measures. For example, the presence of substantial betweenworker variance may suggest that a few workers have exposures well in excess of their coworkers and, thus, generic controls affecting everyone (such as engineering or administrative controls) are far less effective than specific measures to modify work practices of the highest exposed individual workers. A linear mixed effect model is the best method to derive all possible information from exposure variability, as was illustrated in the aforementioned study on determinants of exposure during hot processes in the construction industry.^{14} Another example is a survey among 19 small machine shops where determinants of exposure to water based metalworking fluids (MWF) were examined using a mixed effects multiple regression analysis. Contamination of MWF with tramp oil, MWF pH, MWF temperature, and type of MWF were all significant predictors for sump fluid endotoxin concentration. The high withinshop correlation of sump fluid endotoxin levels indicated that contamination of one sump is a sign to change the metalworking fluids in all sumps.^{20}
The application of these new techniques will require measurement strategies that accommodate estimation of all relevant sources of exposure variability, including repeated measurements on workers under various working conditions.^{21} This argument is not particularly new, but the application of new techniques such as linear mixed effects model offer great opportunities for a more meaningful analysis of determinants of exposure. This will greatly enhance the design of exposure assessment strategies in occupational epidemiology and the decision process on appropriate control measures.
Box 2 Sources of exposure variability
Fixed effects
Variable with a constant and repeatable effect on the exposure level across workplaces and workers

Determinants of exposure
Random effects
Differences in exposure among a classification variable

Betweengroup variance: random deviation of the mean exposure of group k from the mean exposure of all measurements in the total sample of workers

Betweenworker variance: random deviation of the mean exposure of person i from the mean exposure of group k

Withinworker variance: random deviation of the exposure of person i on day j from the mean exposure of person i
QUESTIONS (SEE ANSWERS ON P 289)

An analysis of variance with repeated measurements can be used to:
Evaluate the grouping of workers into comparable exposure groups.
Estimate the differences in mean exposure between workers.
Calculate the effect of determinants of exposure on grouping strategies.
Demonstrate systematic misclassification in exposure.

Which violation of underlying assumptions in an analysis of variance with repeated measurements may invalidate the results:
A large analytical error in the measurement technique.
A substantial correlation between repeated measurement within the same worker.
A betweenworker variance that exceeds the withinworker variance.
A heterogeneous variance both within and between workers.

What is the basic idea behind a mixed effect model for the analysis of exposure patterns?
To take into account repeated measurements on the same workers.
To introduce fixed determinants of exposure in the model that will reduce random variation between workers.
The estimation of temporal and individual variability in exposure.
To predict the worker’s exposure by process characteristics.

In a measurement strategy repeated measurements are conducted within workers in different occupational groups. The mixed effect model shows large betweenworker variance and small withinworker variance within specified jobs. What conclusion may be drawn with regard to the most appropriate control measures?
Specific measures are needed to modify work practices of the highest exposed individuals.
Engineering controls are needed to reduce the mean exposure of workers within the same job.
General ventilation is probably not sufficient to reduce exposure among workers in the same workplace.
Personal protective equipment is required in jobs with the highest exposure.

A prediction model for exposure is used to assign exposure levels to workers in a retrospective cohort study. What situation will hamper the application of such a model?
Fixed determinants of exposure that are interrelated.
A large variance between workers in the same occupational group.
A large variance within workers in the same occupational group.
Differences in work processes over time.
Acknowledgments
This paper is based partly on a book chapter on analysis and modelling of personal exposure from the book Exposure assessment in occupational and environmental epidemiology, edited by Mark Nieuwenhuijsen for Oxford University Press.1