Background The association between lung cancer and occupational exposure to organic solvents is discussed. Since different solvents are often used simultaneously, it is difficult to assess the role of individual substances.
Objectives The present study is focused on an in-depth investigation of the potential association between lung cancer risk and occupational exposure to a large group of organic solvents, taking into account the well-known risk factors for lung cancer, tobacco smoking and occupational exposure to asbestos.
Methods We analysed data from the Investigation of occupational and environmental causes of respiratory cancers (ICARE) study, a large French population-based case–control study, set up between 2001 and 2007. A total of 2276 male cases and 2780 male controls were interviewed, and long-life occupational history was collected. In order to overcome the analytical difficulties created by multiple correlated exposures, we carried out a novel type of analysis based on Bayesian profile regression.
Results After analysis with conventional logistic regression methods, none of the 11 solvents examined were associated with lung cancer risk. Through a profile regression approach, we did not observe any significant association between solvent exposure and lung cancer. However, we identified clusters at high risk that are related to occupations known to be at risk of developing lung cancer, such as painters.
Conclusions Organic solvents do not appear to be substantial contributors to the occupational risk of lung cancer for the occupations known to be at risk.
Statistics from Altmetric.com
What this paper adds
The role of some organic solvents in some cancers has been established, but in lung cancer it is still the object of discussion.
The multiple highly correlated exposures in our data made difficult the interpretation of traditional analyses results.
We proposed a new Bayesian approach allowing us to take into account the whole set of solvents as well as confounding factors, for example, tobacco smoking and occupational exposure to asbestos.
Organic solvents did not appear to be substantial contributors to the occupational risk of lung cancer.
Lung cancer has been and remains the most frequent cancer and the most common cause of death from cancer worldwide, especially in men. The French Institute for Public Health Surveillance (InVS) estimated around 40 000 new lung cancer cases in 2012 in France, representing the first cause of cancer mortality in men and the second in women.1 Although tobacco smoking is recognised as the leading cause of lung cancer,2 occupational exposures contribute to the burden of the disease, with an attributable fraction to these multiple risk factors estimated to be in the range 13–29% in French men.3
The presence of multiple exposure patterns is commonly encountered in occupational epidemiological studies investigating disease aetiology. The large number of substances which are typically present in working environments, and their interdependence do not easily allow for isolation of the causative agent contributing to the development of pathologies, such as cancer. A typical case is that of painters and rubber production industry workers who are exposed to multiple agents. Working in these occupations is considered to be a risk factor for lung cancer.4 However, the specific agents that potentially contribute to this risk have not been identified yet. For diseases such as lung cancer, an additional source of complexity for teasing out the role of occupational exposures is the need to account for their well-known set of non-occupational risk factors, the most predominant being tobacco smoking.
Millions of workers are exposed to organic solvents due to their wide application in practically all branches of modern industry, such as the manufacture of surface coating (paints, varnishes and printing inks), the manufacture of synthetic fibres, the cleaning sector and the construction industry. Among these substances, benzene and trichloroethylene were classified in group 1 by the International Agency for Research on Cancer (IARC) because sufficient evidence exists of their association with the risk of leukaemia and cancer of kidney, respectively.4 ,5 Contrasting results were found about their association with lung cancer. Indications for a positive association for benzene were shown in a few cohort studies6–8 and in studies on animals.9–14 Moderate increased risks were also identified in cohorts of female dry-cleaners,15 and in a few case–control studies on the exposure to tetrachloroethylene,15–17 including a previous analysis performed in our population.18
Since various solvents are often used simultaneously, and sometimes as components of solvent mixtures, it is difficult to assess the role of individual substances. For this reason, this study is focused on an in-depth investigation of the potential association between lung cancer and occupational exposure to a large group of organic solvents. In order to overcome the analytical difficulties created by the presence of multiple correlated exposures, we followed an analytic strategy, which complements the traditional analysis approach, based on stepwise logistic regressions, with a Bayesian approach, based on profile regression (PR).19
Study design and population
The Interdisciplinary Comprehensive Arm Rehabilitation (ICARE) study is a multicentre, population-based, case–control study of respiratory cancers, conducted from 2001 to 2007 in 10 French départements (ie, administrative regions) including a cancer registry.20 Eligible cases were participants aged <76 years, who were newly diagnosed with primary cancer of the lung and upper aerodigestive tract during the study period. All histologically confirmed lung cancer types were considered (codes C33 and C34 International Classification of Diseases for Oncology (ICDO) 3rd edition21). Controls, with no history of previous respiratory cancer, were randomly selected from the general population through incidence density sampling, in the same departments as cases. Controls were frequently matched to cases by gender and age (<40, 40–54, 55–64, ≥65 years). An additional stratification was used to achieve a distribution by socioeconomic status comparable to that of the general population living in the départements.
Of the 4865 eligible cases and 4673 eligible controls identified, 6481 participants participated in the study: 2926 cases (2276 men and 650 women) and 3555 controls (2780 men and 775 women). For more details about study design and participants’ selection.20
Ethics approval was obtained from the Institutional Review Board of the French National Institute of Health and Medical Research (IRB-Inserm, number 01-036 and CNIL number 90120).
Trained interviewers collected data during face-to-face interviews using standardised questionnaires, which included information about sociodemographic characteristics, residential history, medical history, familial history of cancer, detailed information about tobacco, alcohol and non-alcoholic beverage consumption, and detailed lifetime occupational history (about all jobs held for at least 1 month). Job titles and industrial activities were coded blinded to case–control status according to the International Standard Classification of Occupation (ISCO), 1968 revision,22 and the French Nomenclature of Activities (NAF), 1999 edition.23
A shortened version of the questionnaire was submitted to proxy respondents and participants too sick or too tired (5% of men and 3% of women), and mainly to collect smoking data and occupational history.
Study agents and exposure assessment
Substances of interest were several organic solvents or families of solvents, which can be grouped into three main categories: chlorinated solvents, fuels and petroleum-based solvents, and oxygenated solvents.
We applied different job-exposure matrices (JEM),24–27 developed by the InVS, to assess separately the exposure to each substance (or family or group of substances): tetrachloroethylene, trichloroethylene, methylene chloride, chloroform, carbon tetrachloride, benzene, kerosene/diesel oil/fuel oils, mineral spirits and other light aromatic fractions (white spirits), special boiling point spirits and other aliphatic petroleum-based solvents (SBPs), gasoline, alcohol, ketones/esters, diethyl ether, ethylene glycol and tetrahydrofuran.
For each combination of ISCO and NAF code, the JEM assigned three indices of exposure: (1) probability, expressed by the percentage of exposed workers, (2) intensity, and (3) frequency, given by the proportion of exposed working time (see eTable 1). Information on the duration of each job was collected in the questionnaire.
Asbestos exposure was assessed using a specific JEM,24 following the same methodological procedure described for solvents exposure. In this case, JEM provided two indices for frequency and intensity of exposure according to whether the exposure was due to the working environment or to the specific job tasks.
To account for changes in exposure over time, indices were provided for different calendar periods from 1950 to 2007 for chlorinated solvents, from 1947 to 2005 for fuels and petroleum-based solvents, from 1950 to 2012 for oxygenated solvents, and from 1945 to 2007 for asbestos. We used the indices corresponding to the earliest period of each JEM for job periods starting or holding before these dates (3.3%).
In order to improve specificity of JEMs, we assigned exposure to solvents and asbestos only to those jobs which had probability of exposure greater than a cut-off, chosen according to the categories of JEMs (30% for asbestos, chlorinated and oxygenated solvents, and 50% for fuels and petroleum-based solvents, eTable 1). The jobs below the cut-off were considered as non-exposed and not included in further exposure classification. For all the exposed jobs, we computed the Cumulative Exposure Index (CEI) by summing the product of the duration of each job period, over the entire work history, and weights for exposure probability, frequency and intensity. For asbestos exposure, CEI was calculated taking into account frequency and intensity either linked to working environment and job tasks.28
Results are presented for men only, because the small number of exposed women precluded analyses for most solvents. We restricted the study population to participants with a known working history (11 men excluded). Analyses were performed on substances to which at least 15 participants were exposed; thus, carbon tetrachloride, chloroform, tetrachloroethylene and methylene chloride were excluded. CEIs were transformed into categorical variables according to tertiles of controls distribution. This allowed for the definition of low, medium and high levels of exposure for each substance.
We used Spearman correlations (r, correlation coefficient) to assess correlation among substances, including asbestos. We estimated ORs of lung cancer and their corresponding 95% CIs using multivariable unconditional logistic regression models. These were performed including both all exposure substances and those selected by stepwise forward and backward procedure. In order to take into account the degree of multicollinearity present in our data and to produce regression coefficients which are more stable and reliable, a ridge regression model was also performed.29
Analyses were systematically adjusted for age at interview (<50, 50–60, 61–70, >70 years), départements, exposure to asbestos30 (CEI in tertiles) and cigarette smoking history, which was summarised using the Comprehensive Smoking Index (CSI).31 This index combines duration of smoking, time since cessation and intensity. It is null in never smokers.
We also performed PR model, a Bayesian clustering method, which partitions observations into clusters with respect to risk factor characteristics and to outcome status. Both risk factors and outcome are used jointly to form the clusters and each cluster is associated via a logistic link to the outcome of interest.32 The number of clusters is not fixed in advance but explored throughout the algorithm. Postprocessing of the output produces a ‘representative clustering’, which is an appropriate summary of all clustering explored.33 In particular, the representative clusters together with 95% credibility intervals for the associated logOR with respect to a baseline cluster are given. An estimation of the cluster-specific proportions for each category of exposure, the ‘exposure profile of the cluster’, is also given, and it is useful for understanding the main characteristics of each cluster. For example, some clusters may be characterised by a high probability of exposure to benzene and gasoline and a much smaller probability of other exposures, while for a different cluster other solvent combinations are predominant. Additionally, the PR model and the Bayesian output allow the carrying out of predictive inference and quantification of the range of risk associated to prespecified combinations of covariates of interest. In our data, clustering was based on CEI of solvents (4 categories) and CEI of asbestos (4 categories). LogORs were adjusted for age at interview (in classes), départements and CSI (as continuous variable). To quantify the role of a group of covariates, we specified a number of simulated predictive scenarios, called pseudoprofiles, which are used to compare and contrast risk predictions under different patterns of exposure.34
To study the robustness to the chosen cut-off for the probability of exposure to solvents, sensitivity analyses were performed on different cut-off points (20% and 40% for chlorinated and oxygenated solvents, and 10% for fuels and petroleum-based solvents). The categories of probability of exposure to asbestos did not allow sensitivity analysis to be performed.
All analyses were implemented in R (V.3.1.2). We used package ‘Hmisc’ (V.3.15–0) for Spearman correlations; package ‘MASS’ (V.7.3–33) for stepwise logistic regressions, based on Akaike information criterion (AIC); package ‘ridge’ (V.2.1–3) and package ‘PReMiuM’33 (V.3.0.32) for PR.
All p values were two-sided, and a p value ≤0.05 was the threshold for statistical significance. For the Bayesian analyses, we report 95% credibility intervals.
Population and substances description
The main sociodemographic characteristics of the study population and the histological subtypes of lung cancer cases are presented in table 1. Mean age was slightly higher for cases than for controls. As expected, a statistically significantly higher proportion of blue-collar workers was found among cases as compared with controls.
Squamous cell carcinomas and adenocarcinoma were the most frequently diagnosed cancer types (35% and 34%, respectively), and small cell carcinomas represented 15% of cases.
Smoking habits data were known for 5012 men (2241 cases), among whom 2.6% were non-smokers’ cases (59 cases). As expected, a strong association emerged between lung cancer and CSI, with a clear dose–response effect (p value trend <0.001) (see eTable 2).
Occupational exposures to solvents and asbestos were correlated (see eFigure1). Particularly strong and statistically significant (p<0.001) were correlations between exposures to ethylene glycol and gasoline (r=0.78), benzene and gasoline (r=0.62), benzene and ethylene glycol (r=0.60), and benzene and white spirits (r=0.58).
Logistic regression analysis
First, we incorporated into the logistic regression models the exposure to one solvent at a time (table 2). Statistically and borderline statistically significant associations emerged only for medium and high levels of white spirits exposure (ORmedium=1.66, 95% CI 1.12 to 2.46 and ORhigh=1.36, 95% CI 0.92 to 2.02) but no dose–response relationship was observed. When we included all solvents in the model, the regression did not show any association with lung cancer. We also applied stepwise forward and backward approaches. These selected the same solvents for the final model with 95% CI that were very similar to those obtained from the univariate analysis, namely: (1) white spirits, which are statistically significantly associated with lung cancer at medium level of exposure (ORmedium=1.67, 95% CI 1.13 to 2.48) and borderline significantly associated at high level (ORhigh=1.40, 95% CI 0.94 to 2.09); (2) tetrahydrofuran, which was not significantly associated. We finally estimated ORs using ridge regression, which typically produces ORs that are shrunk as compared with the ordinary least squares estimates. For the model including all solvents, the highest ORs were observed for medium and high exposure to white spirits, and medium exposure to SBPs and tetrahydrofuran (table 2).
Postprocessing of PR output produced a representative clustering dividing the entire male sample into 13 clusters. Each of them was characterised by a specific exposure profile summarised in table 3, which shows the posterior mean of the associated exposure probabilities.
For each cluster, we highlighted in bold the modal probability for every substance, for example, in cluster 6, 46% of participants were exposed to a high level of alcohol, 74% to a high level of ketones/esters and no other solvents; or in cluster 9, 49% were exposed to a low level of benzene, 48% to a medium level of white spirits, 52% to a low level of alcohol and 49% to a medium level of ketones/esters.
We assumed cluster 5 as the baseline for logORs estimation, as it is made up of non-exposed participants. All clusters have been presented in table 3 ranked in ascending order of logOR.
Figure 1 shows box plots of the distribution for logORs for each cluster (relative to the non-exposed cluster 5), with their corresponding 95% credible intervals (CI*). The horizontal line represents the average logOR in the whole population, obtained as a weighted average among clusters. We call it the logOR mean. We chose to display this as we want to use the clustering as a guide to uncover subgroups of participants who are clearly different both in terms of their exposure and their risk from the rest of the sample. Hence, we have coloured in grey what we call ‘baseline subgroups’, baseline in so far as their 95% CI* logORs contained the logOR mean representing the average logOR of the sample. We do not interpret in detail the profiles of baseline clusters.
Only clusters 11 and 13 (displayed in white in figure 1) have logORs significantly higher than the logOR mean, meaning that the association with lung cancer was significantly stronger in these clusters than in the others. The highest risk in cluster 13 was composed of participants characterised by exposure to high levels of asbestos and white spirits, and a low level of benzene. Cluster 11 was the largest cluster, made up of participants with a high level of asbestos exposure and low level of ketones/esters. The low number of participants composing cluster 12 did not allow a stable logOR estimate to be obtained (table 3).
We also cross-classified clusters according to their participants’ jobs. The job distribution of cluster 13 encompasses a large proportion of painters in the construction field (35%). In cluster 11, plumbers represented 14%, and carpenters and joinery 13% and sheet-metal workers 4% of all jobs grouped in that cluster (see eFigure 2).
Since the exposure to white spirits and benzene was common to high-risk clusters, we were interested to understand if and how the exposure to these substances affected the risk. With this purpose, we set up several pseudoprofiles of covariate patterns.
Figure 2 shows the posterior predictive density of logORs (relative to pseudoprofile 1) for different combinations of CEI of white spirits and benzene levels. CEI of asbestos and other solvents reflected the covariate patterns present in the main sample (for technical details, Ref. 34).
These plots allow one to visually understand how logOR changes as we altered these covariates from low to high exposures. We can observe a small non-significant shift to the right of logOR for the predicted risks, with wide credible intervals.
ICARE is one of the largest case–control studies investigating the role of occupational risk factors in respiratory cancer that collects detailed information about lifelong working history. Participants are exposed to many substances during their working life, and in this study we focused on investigating the role of several solvents on lung cancer.
The study was carried out in collaboration with the French network of cancer registries, allowing us to recruit cancer cases in almost all healthcare facilities in the included départements. The frequency matching strategy allowed for selection of controls comparable to cases with respect to gender distribution. Whereas the unique control group assumed for the two groups of cancer cases (lung and upper aerodigestive tract) encompassed a slight difference in terms of age, the large number of participants in each age group allowed for satisfactory adjustment with respect to age.
Thanks to the frequency matching, controls were comparable to cases with respect to age and gender distribution. Controls were randomly selected in the same département as cases, through incidence density sampling, and according to the socioeconomic distribution of the corresponding general population.
Owing to the large number of participants, we could not use an expert assessment based on job title history and free job description, known to be the gold standard technique, to determine past occupational exposure. Instead, we applied specific JEMs, developed at the InVS. JEMs are conceived to assign exposure in a reproducible and automatic way so that the assessment can be considered objective and independent of case–control status. Being aware of the risk of non-differential misclassification bias, we increased specificity by establishing a cut-off of 30% (or 50% as applicable) on the probability of exposure to consider a job title as ‘exposed’.
As often occurs in environmental epidemiological studies, our data were characterised by multiple highly correlated exposures, which led to known problems for statistical analyses.36 Fitting multiple logistic regressions models incorporating one exposure variable at a time does not allow for control of other exposure variables, which could be confounding factors.37 In addition, when a study generates a large number of tests, as in our case, multiple comparison issues arise and should be addressed.38
Alternatively, incorporating all exposure variables in the same regression model can achieve results difficult to interpret. In our data, no statistically significant associations emerged when all exposure variables were included in the model. When we performed a variable selection based on stepwise methods, the final model returned two substances, white spirits and tetrahydrofuran, which were respectively weakly and not significantly associated with lung cancer. It is known that in the presence of multicollinearity maximum likelihood estimates may be unstable.39 Implementing Ridge regression produced lower estimates overall for most substances, highlighting again sensitivity to multicollinearity and the difficulty of interpreting logistic regression results in this case.
PR is instead a Bayesian statistical approach conceived to examine the effect of combinations of variables that structure the variability of the data, allowing one to overcome several limitations of traditional regression methods.19 ,36 An important difference between the two approaches arises in the main unit of inference: logistic regression models the risk of individual participants, while PR infers the risk of groups of participants.32
PR divided the sample into 13 clusters, including one composed by participants never exposed to any solvent. All the other clusters were characterised by different profiles of exposure to solvents and asbestos. Only two were statistically associated with an increased lung cancer risk, clusters 11 and 13. The cluster at highest risk was composed of participants exposed to white spirits and benzene, in addition to asbestos. A pseudoprofile performed on benzene and white spirits revealed a non-significant small shift of predicted logORs with large 95% CIs*. We also reran pseudoprofiles on exposure to white spirits only. Overall, predicted logORs were higher but not statistically significant and the 95% CIs* were still large. The pseudoprofile aspect of analysis led us to conclude that neither white spirits nor benzene may have a role in lung cancer development.
In order to ensure that failure to observe any significant role of solvents was not due to our choice of cut-off, we performed a sensitivity analysis with different cut-points for the probability of exposure to each solvent. Results were similar. Overall, we did not find any evidence in support of the carcinogenicity of any of the organic solvents investigated.
To finalise our analysis, it was important to see how job titles were distributed among clusters. Although we did not observe any significant association between solvent exposure and lung cancer, clusters identified by the method were related to occupations that are known to be at risk of developing lung cancer. We found an interesting concentration of painters,40 construction workers, plumbers and pipe fitters, carpenters, joinery and parquetry workers in the high-risk clusters.
These results are in agreement with previous analyses of occupations performed on the same study population, where statistically significant associations had been obtained for painters (in the construction field) with an OR estimated at 2.68; for plumbers and pipe fitters with an OR of 2.27 and for carpenters and joiners with an OR of 1.45.41 What our new analysis shows is that organic solvents do not appear to be substantial contributors to the occupational risk of lung cancer for these occupations.
However, these occupations are exposed to other substances which are known or suspected carcinogens for the lung, such as cadmium and chromium compounds for painters,30 silica dust30 and wood dust for wood industries,30 and welding fumes42 and asbestos for plumbers and pipe fitters.40 ,42 The question of which of these agents contribute to increasing the risk of lung cancer for these occupations is still open.
With this multidimensional comprehensive analysis, we dissected a complex pattern of exposure to a large group of solvents using appropriate methodology. We did not detect any strong effect of solvents on risk of lung cancer. The large sample size and the careful study design aimed at reducing the different sources of bias give additional weight to our findings.
The authors thank Ms Joëlle Fevotte for designing occupational questionnaires; all members of the MatGéné working group from the Institut de Veille Sanitaire and, in particular, Ms Brigitte Dananché for providing job-exposure matrices; and Mr Gwenaël GR Leday for his support in performing statistical analyses.
SR and IS equally contributed.
Collaborators ICARE Study Group: Anne-Valérie Guizard; Arlette Danzon; Anne-Sophie Woronoff; Velten Michel; Antoine Buemi; Émilie Marrer; Brigitte Tretarre; Marc Colonna; Patricia Delafosse; Paolo Bercelli; Florence Molinie; Simona Bara; Benedicte Lapotre-Ledoux; Nicole Raverdy; Oumar Gaye; Farida Lamkarkach; Mireille Matrat; Florence Guida; Sylvie Cénée; Matthieu Carton; Diane Cyr; Gwen Menvielle; Sophie Paget-Bailly; Loredana Radoï; Annie Schamus; Alexandra Papadopoulos; Danièle Luce; Isabelle Stücker; Corinne Pilorget; Joëlle Fevotte.
Contributors IS and DL are the co-principal investigators of the ICARE Study. They designed the study, directed its implementation, and oversaw all aspects of the study, including patients and controls recruitment, funding, and quality control of data. SR developed the methodological Bayesian approach, provided critical comments on results interpretation and revised the manuscript critically for intellectual content. FM has performed the statistical analyses and interpreted the results. She wrote the manuscript and reviewed the whole document. SL made a substantial contribution to the analysis and interpretation of data. LA contributed to the statistical analysis. FG contributed to the interpretation of the results. SC and MS performed the data management, applied job-exposure matrices, and prepared data sets for statistical analyses. MM and GM have been involved in the conception of the different variables included in the analysis and in the strategy of analysis. They participated in the writing of the manuscript. CP was involved in exposure assessment. BL-L is responsible for a cancer register and was particularly involved in the coding of the histology of the lung cancer cases. The ICARE Study Group implemented the ICARE Study.
Funding This analysis was supported by La Fondation ARC pour la Recherche sur le Cancer. The ICARE study was funded by the French National Research Agency (ANR); French National Cancer Institute (INCA); French Agency for Food, Environmental and Occupational Health and Safety (ANSES); French Institute for Public Health Surveillance (InVS); Fondation pour la Recherche Médicale (FRM); La Fondation de France; Ministry of Labour (Direction Générale du Travail); Ministry of Health (Direction Générale de la Santé).
Competing interests None declared.
Patient consent Obtained.
Ethics approval The Institutional Review Board of the French National Institute of Health and Medical Research (IRB-Inserm, number 01-036 and CNIL number 90120).
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.