Article Text
Abstract
Background: Rectal cancer has been previously associated with exposure to metalworking fluids in a cohort mortality study of autoworkers.
Objective: To better specify the exposure–response relationship with straight metalworking fluids (mineral oils) by applying nonparametric regression methods that avoid linearity constraints and arbitrary exposure cut points and by lagging exposure to account for cancer latency, in a nested case–control analysis.
Methods: In addition to the classical Poisson regression with categorical exposure, survival models with penalised splines were used to estimate the exposure–response relationship between cumulative exposure to straight metalworking fluid and mortality from rectal cancer. Exposures to waterbased metalworking fluids were treated as potential confounders, and all exposures were lagged by 5, 10, 15 and 20 years to account for cancer latency. The influence of the highest exposures was dealt with by a log transformation and outlier removal. The sensitivity of the penalised splines to alternative criteria for model selection and to the placement of knots was also examined.
Results: The hazard ratio for mortality from rectal cancer increased essentially linearly with cumulative exposure to straight metalworking fluid (with narrow confidence bands) up to a maximum of 2.2 at the 99th centile of exposure and then decreased (with wide confidence bands). Lagging exposure up to 15 years increased the initial steepness of the curve and raised the maximum hazard ratio to 3.2.
Conclusions: Nonparametric smoothing of lagged exposures has shown stronger evidence for a causal association between straight metalworking fluid and rectal cancer than was previously described using standard analytical methods. This analysis suggests an exposure–response trend that is close to linear and statistically significant over most of the exposure range and that increases further with lagged exposures. Smoothing should be regularly applied to environmental studies with quantitative exposure estimates to refine characterisation of the dose–response relationship.
 AIC, Akaike’s Information Criteria
 PMR, proportionate mortality ratio
Statistics from Altmetric.com
Metalworking fluids are complex mixtures of oils and chemicals used to cool and lubricate metal machining operations in many industrial settings. Automobile manufacturing is the largest sector with widespread metalworking fluid exposure. More than 1.2 million workers are exposed to metalworking fluids in the US,^{1} and their use is increasing internationally as developing countries industrialise. This growth is occurring despite the presence of known carcinogens, such as chlorinated compounds and polycyclic aromatic hydrocarbons in oilbased fluids, and ethanolamines and nitrosamines in waterbased fluids.^{2}
Metalworking fluids are generally classified into four types in two overlapping groups, according to base composition as oil based (straight and soluble mineral oil) or water based (soluble, semisynthetic and synthetic). All types include some oil (except for synthetics or “chemical fluids”), as well as a variety of additional compounds. The earliest report of cancer among workers exposed specifically to oils used in metalworking appeared in 1950.^{3} Between 1950 and 1967, two thirds of the 187 cases of scrotal cancer in the West Midlands area of England were attributed to exposure to machining fluid.^{4} With the introduction of highspeed machining and grinding, concern has shifted from dermal exposures to inhalation of aerosolised metalworking fluids and possible respiratory and gastrointestinal cancers.
The cancer mortality study of a United AutoworkersGeneral Motors cohort is the largest study with quantitative estimates of lifetime exposure to specific types of metalworking fluid.^{5} In the extended followup of this cohort, results were presented from exposure–response models for 12 cancers, including colon and rectal cancers, using a uniform set of exposure categories.^{6} Although most epidemiological studies examine rectal and colon cancer as a single outcome, they were considered separately because there is growing evidence that they are distinct diseases.^{7} Results of that study suggested the increasing risk of rectal cancer, but not of colon cancer, in relation to increasing exposure to straight metalworking fluids. In light of widespread industrial exposure and poor survival rates for rectal cancer, we have taken this opportunity to reexamine the data more closely by refining the measures of exposure to account for latency and applying more informative nonparametric exposure–response models with fewer statistical assumptions.
METHODS
Cohort data
Descriptions of the autoworkers cohort and exposure data have been discussed in detail previously.^{5,}^{6,}^{8,}^{9} We present a brief outline of the study. The cohort consists of 46 399 autoworkers from three manufacturing plants in Michigan, USA. All employees who worked for at least 3 years before 1 January 1985 were included in the cohort and followed from 1941 to 1985. The end of the followup was later extended to 1995.^{6} Demographic information was collected from work records. Vital status was obtained from the Social Security Administration and the National Death Index. Plant, union and state vital statistics office records were used along with the National Death Index to determine the underlying cause of death as recorded on the death certificate. The original study was approved by the internal review board of Harvard School of Public Health and the extended followup by internal review boards of University of Massachusetts and General Motors Corporation.
Exposure
An extensive exposure assessment was conducted to retrospectively estimate past exposure levels to specific types of metalworking fluids.^{8} All three major types of fluid had been used at each of the three study sites, and each plant provided past industrial hygiene records for exposure reconstruction. Plant II was the newest of the three and used the most waterbased fluids. Based on sampling data collected by the research team and plant records, the amount of exposure to each of three types of metalworking fluids, straight mineral oils, soluble (mineral oils emulsified in water) and synthetic (chemical fluids without oil), was estimated for homogeneous exposure categories in each calendar year of the study. (Semisynthetics were combined with soluble metalworking fluids in all analyses.) Combining this exposure matrix with employment records, cumulative exposure to each type of metalworking fluid was estimated for each participant in the cohort. Participants with entirely missing work histories were excluded. For each fluid, cumulative exposure was measured in mg/m^{3}years and lagged by 5, 10, 15 and 20 years to account for disease latency.^{10} Previous findings have indicated an association with straight mineral oils, therefore we focused primarily on straight oils, and included exposure to soluble and synthetic fluids as potential confounders.
Statistical methods
We began by fitting a Poisson regression model to the full dataset with a categorical variable defined for exposure to straight metalworking fluids, controlling for cumulative exposure to each of the other fluid types. We repeated this for synthetic and soluble exposures. Along with a nonexposed reference group, cut points for the exposure levels were chosen as the quartiles of the case distribution and exposures were lagged up to 20 years. Relative risks (RRs) were estimated in these models as adjusted mortality ratios.
We then fit Cox regression models with continuous exposure variables and adjusted hazard ratios (HRs) as estimates of RR. We used a flexible smoothing approach to estimate the exposure–response curve by fitting penalised splines^{11} of exposure to straight metalworking fluids. Soluble and synthetic fluids were included as linear terms in the models. The full dataset consists of approximately 1.5 million personyears of data, and to make the analysis more efficient, we created a nested case–control sample by randomly selecting 20 controls for each case from the cohort who were alive at the time of the case’s death. Values of the timedependent cumulative exposure variables were determined in the year the control reached the age of the case death.
Splines were used to model exposure–response to avoid parametric assumptions about the shape of the curve. The exposure for subject i, denoted x_{i}(t), is first transformed using a power basis expansion.
where K_{k} are known knots, x_{+} = max(x,0) and the dependency on time is suppressed. The unknown parameters, β_{1}, … β_{p}, b_{1}, …, b_{K}, are estimated in the Cox regression model by maximising the penalised partial log–likelihood,^{12} where the b_{k} coefficients are constrained to control the amount of smoothness in the fitted curve. This smoothness can be described in terms of the degrees of freedom (df), the effective number of parameters estimated after constraining the b_{k} coefficients. We initially applied the standard method for selecting the df to optimise the smoothing, Akaike’s Information Criteria (AIC).^{13} Calculated as AIC = –2logL+2df, where L denotes the likelihood evaluated at the maximum likelihood estimates of the parameters in the model, the optimal model minimises the AIC with respect to df, so that models with similar fits based on the log likelihood will be penalised for having more df—that is, less smoothness. This balances good fit to the data with smoothness of the spline. One drawback of AIC is that it is known to underpenalise,^{14} and so we also applied corrected AIC, AICc, which adjusts the AIC formula by substituting r(df+1)/(r−(df+2)), where r is the number of events, for degrees of freedom.^{15} Finally, we considered cross validation^{12,}^{16} as a third alternative for selecting the optimal amount of smoothness. Cross validation involves the sequential removal of each observation, the prediction of this deleted observation using the remaining data and the partial log likelihood as a measure of how well the predicted values fit the observed data.
We also used three approaches to deal with the influence of the highest exposures in the tail of the distribution on the shape of the exposure–response curve:

We took a natural logarithm transformation of the exposure variable.

We used alternative criteria for the number of knots and placement, locating knots at fixed centiles of the exposure distribution rather than equally spaced.

We examined models with outliers held out of the analysis, defined as those participants with exposures larger than the highest exposed case.
Results provide a sensitivity analysis for the final exposure–response model.
RESULTS
There were 90 deaths due to rectal cancer in the cohort (table 1). Most of the cases were white and male, and had worked in plant I, the oldest of the three plants. The oldest case was 94 years of age and the youngest 32 years. The maximum cumulative exposure to straight metalworking fluid was 108.6 mg/m^{3}years among cases and 188.2 mg/m^{3}years among the noncases selected for the nested case–control sample. All models included the three types of metalworking fluids, as well as race, sex, year of hire, calendar year and manufacturing plant at which the participant worked as potential confounders.
Poisson models
Table 2 shows the results for the Poisson regression models with unlagged cumulative exposure to each type of metalworking fluid. Cut points for the categories were selected separately for each fluid type by the exposure quartiles for the exposed cases, using participants unexposed to that particular fluid as the reference group. Models with lags of 5, 10, 15 and 20 years were also considered (data not shown). Raised RRs were found for exposures to both oilbased (straight and soluble) as well as synthetic metalworking fluid; however, the strongest associations were seen for straight metalworking fluid, with a significant RR of 2.7 in the highest exposure category. The adjusted RR for this category increased up to 3.6 for a lag of 20 years (with a 95% confidence interval (CI) that excluded 1). Mortality ratios for synthetic and soluble fluids were raised, with wide 95% CIs and without trends for all lagged and unlagged exposures.
Cox models with penalised splines
Figure 1 presents the results of the penalised spline model with unlagged exposure to straight metalworking fluid. The R software package (R Development core team, Vienna, Austria) provides automatic selection of the smoothing parameter by minimising AIC or AICc using a default of cubic splines with 17 terms in the power basis expansion. In this application, the optimal smoothing parameter using AIC was df = 1.86, whereas the AICc criteria selected a slightly lower df = 1.67. Cross validation gave an identical model to AIC. The resulting log RR of rectal cancer death for the two selected models illustrates the smoothness of each model fit (fig 1). Both exposure–response curves increase to a maximum RR of 2.2. The AIC selected curve reaches the maximum earlier, at a cumulative exposure of 69.4 v 83.8 mg/m^{3}years for the AICc curve. The pointwise confidence bands (similar for both curves, shown for AIC) support an increasing exposure–response curve up to approximately the 99th centile of exposure, after which they widen dramatically. The shape of the confidence bands reflects the sparseness of data beyond the 99th centile, illustrated by the vertical line in fig 1.
Lagged exposures
We then fit a series of Cox models with penalised splines for the exposure variable with increasing lags, from 0 to 20 years. AIC and AICc were used to determine the optimal degrees of freedom. Figure 2 gives the results for the 15year and 20year lags along with the unlagged model, focusing on the exposures up to 57 mg/m^{3}years, the 99th centile of 15year lagged exposures. Beyond that point, the confidence curves widened and the plot resembles fig 1. The lag 5 and lag 10 models had overall shapes similar to the unlagged model up to 40 mg/m^{3}years, and then decreased at slower rates than the 15year and 20year lag models (data not shown). Up to the 15year lag, the curves increased in initial steepness and maximum RR. The RR reached 3.2 at 40 mg/m^{3}years when exposure was lagged by 15 years. The estimated RRs at fixed cumulative exposure values for the unlagged, 15year lagged and 20year lagged models (table 3) indicate that the exposure–response was stronger as the lag increased up to 15 years. Although the intent of the lagging was to account for latency, it also had the unintended consequence of reducing exposures.
Influence
As typical of environmental data and seen in the rug plot on the horizontal axis of fig 1, the exposure distribution is highly right skewed. We used three approaches to deal with the influence of the highest exposures in the tail of the distribution. Results of alternative models are shown in fig 3.
Firstly, we examined the natural logarithm of the exposure variable. AIC, AICc and crossvalidation modelfit criteria selected a Cox model with penalised splines having df = 1, suggesting that the optimal model was linear with respect to the logarithm of the exposure data. The solid curve in fig 3 gives the RR for this model. The RR reaches a maximum value of 2.22, and is significant beyond a cumulative exposure of 0.04 mg/m^{3}years (as reflected by the lower pointwise confidence curve, output omitted).
In another approach to deal with the skewed distribution, we considered alternative numbers and placement of the knots. With knots placed at centiles of exposure on the original scale, instead of equally spaced (R default), the three modelfit criteria were all optimised at df = 1. With just a linear term for the untransformed exposure variable, the RR increased with cumulative straight exposure across the range. Thus, the data are consistent with models for RR that are linear on both the raw and log transformed straight metalworking fluid exposure scale. We also used 5, 8, 12 and 25 equally spaced knots. The number of knots did not affect the overall shape and size of the exposure–response curve (fig 3). This is consistent with the findings of Ruppert^{17} who suggested that number of knots typically has a small effect on the smoothness provided a sufficient number has been chosen.
Finally, we examined models with outliers held out of the analysis. Of the 10 most extreme observations (down to the exposure of the case with the highest exposure), all but one represented distinct noncases. Sequentially removing the three most extreme participants had only a local effect on the curves, where with each successive deletion the end of the truncated curves rose (data not shown). Holding out ⩾4 of the outliers resulted in models close to linear with a maximum RR of 3.5 at the highest cumulative exposure with a slower rate of initial increase.
DISCUSSION
Straight metalworking fluid contains a number of known or suspected carcinogens, including polycyclic aromatic hydrocarbons, chlorinated paraffins and sulphur compounds. Epidemiological analysis of metalworking fluids and cancer is further complicated by changes in the composition of these complex mixtures over time. For example, as mineral oils have become more highly refined, the amount of polycyclic aromatic hydrocarbons has decreased, presumably reducing carcinogenic risk.^{18–}^{20} The composition of metalworking fluid, however, is also a function of the industrial application itself, as well as the recycling of used fluids, whereby polycyclic aromatic hydrocarbons can be reintroduced in response to heat, and metals can accumulate. In addition to changes in composition, the route and nature of exposure have also changed over calendar time as the industrial operations have evolved. Originally dermal exposures were the primary concern, but as machine speeds have increased, metalworking fluid particulates have become aerosolised, resulting in exposure via inhalation and ingestion. The effect of past and present changes in industrial processes and machinery on the particle size distribution of aerosolised metalworking fluid has not been fully explored, leaving open the question of greater deposition and dose to some target tissues in the modern workplace.^{21} Thus, even with the benefit of an extensive exposure assessment, we need to interpret results assuming that the presence of nondifferential exposure misclassification has produced some bias towards the null.
Most epidemiological studies of rectal cancer have considered this disease in combination with colon cancer; however, there is growing evidence that these are two distinct diseases with different aetiologies and risk factors.^{7,}^{22,}^{23} There are several differences between the two organs that might explain why an environmental exposure might cause cancer at one site and not at the other. The rectum transports less water and electrolytes than the proximal colon, is more acidic and is exposed longer to waste.^{7,}^{24} In addition, molecular pathways leading to cancer of the large bowel are thought to differ by subsite. For example, people with the inherited syndrome familial adenomatous polyposis, characterised by mutations in the APC gene, are more likely to develop polyps first in the rectum, whereas those with hereditary nonpolyposis colorectal cancer, arising from mutations in mismatch repair genes, are more likely to develop cancer in the proximal colon.^{25} These differences between the colon and rectum underscore the need to consider their aetiology separately.
Eisen et al^{6} reported a marginally significant linear trend in rate ratios for rectal cancer with cumulative exposure to straight metalworking fluid in a Poisson regression and a nonsignificant slope in a standard Cox model with a linear exposure term. (No associations between colon cancer and type of metalworking fluid were observed.) The present results based on penalised splines suggest a much stronger exposure–response trend for rectal cancer that is close to linear and statistically significant over most of the exposure range before lagging. For example, at 10 mg/m^{3}years (20 years at the current recommended exposure limit^{1} of 0.5 mg/m^{3}), the previous linear model predicted an RR of 1, in contrast with a significantly increased RR of 1.3 based on penalised splines. When exposure to straight metalworking fluids was lagged by 20 years, the smoothed models predicted a significant RR of 1.6 at this cumulative exposure. Penalised spline models for colon cancer were consistent with past null results^{6} for all fluid types, with straight exposure lagged up to 20 years (data not presented). Thus, applying the more flexible penalised splines provided a clearer picture of the exposure–response relationship and showed a strong nearlinear trend over the denser region of the exposure distribution.
A few other studies show relevant findings specifically regarding rectal cancer and exposure to metalworking fluids, and most are limited in size, study design and/or exposure assessment. Increased RR has been reported for rectal cancer in several proportionate mortality ratio (PMR) studies. Silverstein et al^{26} found a PMR of 1.4 (95% CI 0.8 to 2.3) in white ballbearing plant workers potentially exposed to oilbased metalworking fluid and Vena et al^{27} reported a PMR of 2.8 (based on four observed deaths) among engine plant workers employed for >20 years. In an extended followup of an automobile manufacturing plant, Kazerouni et al^{28} found no association with rectal cancer among those with heavy exposure to metalworking fluid. There was an association in a study of magazine production workers, with a standardised mortality ratio of 1.5 among a small number of unskilled machine assistants with a wide CI.^{29} Overall, these studies contribute little additional evidence to support the findings reported here because they lack an adequate quantitative exposure assessment for metalworking fluid.
There have also been several relevant studies that report results for exposure to metalworking fluids and colon and rectal cancer combined into a single outcome. (The American Cancer Society presents national incidence and mortality statistics only for the two cancers combined because of potential misclassification.^{32}) Cohort studies of autoworkers^{31} and aerospace workers^{32} reported an RR close to 1, whereas an RR of 1.6 was reported for unskilled rotary press operators in a study of magazine production workers.^{29} In a populationbased case–control study of colorectal cancer in Sweden, Gerhardsson de Verdier et al^{33} found a significant increased RR of 2.1 for workers ever exposed to cutting fluids. The relative mix of colon and rectal cancers in any combination will presumably be dominated by colon cancers, although poorer survival rates for rectal cancer will reduce the imbalance among deceased cases. In this autoworkers cohort study, 90 of the 374 colorectal cancer deaths were due to cancer of the rectum, and in a recent prospective study the incidence of colon cancer was close to four times that of rectal cancer.^{7} With this uncertainty, the contribution of studies of colon and rectal cancer combined to assessing the role of metalworking fluids in the aetiology of rectal cancer alone is limited.
In the context of existing studies, the United AutoworkersGeneral Motors study thus provides a unique opportunity to quantify the relationship between mortality from rectal cancer and exposure to straight metalworking fluids. The use of Cox regression serves as an improvement over other techniques, such as standardised mortality ratio and Poisson regression analyses, which ignore the continuity of the exposure variable, in favour of a dichotomous or categorical variable, and cannot handle timevarying covariates well. In categorical analyses, the cut points between exposure categories are arbitrarily selected and can influence the shape of the dose–response curve.^{34–}^{36} Penalised splines allow more flexibility in analysing the shape of exposure–response curves than models with a linear term for exposure by avoiding parametric constraints.
All the models considered here suggest increasing risk with increasing cumulative exposure up to about the 99th centile of exposure. The model of most interest was semiparametric and based on penalised splines with equally spaced knots throughout the upper region of sparser data. These better reflect the observed data in both the denser and sparser regions of the exposure distribution. The initial rise in HR was essentially linear for lagged and unlagged data, and the confidence curves widened where the data were sparse. These models also allowed us to observe the decrease in risk at the highest exposure levels that may reflect the common bias due to healthy worker survivor effect.^{37–}^{39} Once seen, we can then choose between lagged or truncated exposures as a method to downweigh the tail. Although truncating graphs or deleting outliers may, in the end, be the best summary of the exposure–response, fitting the curve over the full range of exposure provides the information necessary to select the best representation of the data. This added flexibility makes penalised splines an attractive method.
CONCLUSIONS
The HR increased essentially linearly with cumulative exposure to straight metalworking fluids lagged 15 years, to a maximum of 3.2 at 40 mg/m^{3}years and then decreased (with wide confidence bands). All three modelfit criteria selected roughly the same degrees of freedom, resulting in similar curves over the entire exposure range. Where data were sparse in the tail, confidence curves widened, and the shape of the curves depended on the knot location. However, results over the relevant range of human exposures suggest a moderately strong and essentially linear association, and further clarify the evidence that exposure to straight metalworking fluid is causally associated with rectal cancer.
REFERENCES
Footnotes

Published Online First 9 November 2006

Funding: This study was supported by Grant CA74386 from the National Cancer Institute.

Competing interests: None.

The original study was approved by the internal review board of Harvard School of Public Health and the extended followup was approved by internal review boards of University of Massachusetts and General Motors Corporation.