Statistics from Altmetric.com
If we date the origins of molecular epidemiology (defined as the reliance on tools of molecular biology in epidemiology, in particular, but not only, to assess exposures) to the publication, in 1993, of the book carrying this title,1 then this research field could be considered as having now reached the age of adulthood. While this young adult has some achievements already and has developed real strengths and recognition, many challenges lie ahead. The article by Lenters et al2 offers an opportunity to discuss one of the most promising challenges, namely, the one related to the application of the Exposome concept.
The Exposome concept
The Exposome encompasses lifetime environmental exposures from the prenatal period onwards.3 Although sometimes limited to exposures that can be assessed through biomarkers (the ‘internal Exposome)’, the Exposome should consider all exposures regardless of their most relevant mode of assessment, including lifestyle factors and factors assessed through questionnaires, environmental measurements and models. Consequently, this notion extends beyond the borders of molecular epidemiology. A first embodiment of the Exposome concept is the literature on biomonitoring, which provides a description of the levels of biomarkers of exposures to as many as several hundred environmental contaminants in population samples from some areas of the world, as illustrated in the National Health and Nutrition Examination Survey (NHANES) in the USA.4 This research vein can allow identifying patterns of co-varying exposures or behaviours, as is already carried out in dietary studies with the grouping of individuals sharing specific dietary patterns (eg, Mediterranean diet or fast-food type eating behaviours). It offers the perspective of identification of population subgroups simultaneously exposed to numerous adverse exposures, and of studying if these correlate with sociodemographic factors; to this extent, this research vein is connected to the issue of environmental equity,5 in which the focus is on the identification of communities or population subgroups carrying a larger share of environmental hazards. A second area is the study of the relations of the Exposome with specific health outcomes.6
Characterising the possible effect of the Exposome on health
The study by Lenters et al2 is actually not presented by its authors as an Exposome study, possibly because the number of exposure biomarkers considered (15) is smaller by an order of magnitude compared with former Exposome studies, such as those by Patel et al6 based on NHANES data, in which over 200 exposure variables were considered. The complexity of Exposome data entails statistical, epidemiological and exposure assessment challenges,7 ,8 several of which are, however, present and nicely tackled by Lenters et al.
The associations between health factors and exposures within a given study population (eg, a cohort) tend to be reported in separate papers, sometimes by family of exposures, as has been the case until now for the INUENDO project on which the Lenters et al study is based.9 ,10 This approach, although justified in practice by the fact that all exposures are not always assayed by the same laboratories at the same time, appears, in retrospect, the equivalent to selling salami in slices. A risk here is selective reporting of the clearest association, and discarding of the least appetising salami slices (eg, ‘not statistically significant’ associations). Exposome studies2 ,11 do show the whole salami in one single publication and allow putting each highlighted association in the context of all associations considered a priori.12
How to consider these (correlated) exposures simultaneously? The first publications explicitly relating the Exposome to health have adopted the so-called EWAS (Environment-wide association study) approach.11 EWAS is an adaptation of the GWAS (Genome-wide association study) approach developed for genomic data. It consists of separately testing the associations of each exposure with the considered outcome. An obvious issue is multiple testing. The issue has usually been taken care of by applying a False Detection Rate control procedure to the p values obtained from the distinct simple regression models estimating the association between each individual exposure and the health outcome. Such procedures, however, cannot distinguish exposure factors causally linked to the outcome from exposure factors correlated to a causally linked covariate, which is a real issue because of the already mentioned correlation between some exposures. Consequently, studies of the correlation structure of the covariates associated with the outcome, and multiple regression modelling of all exposures associated with the outcome, have been used.6 This constitutes an attempt to distinguish exposures associated with the outcome simply because of a correlation from truly influential covariates. In contrast (more precisely, in addition) to this type of approach, Lenters et al have opted for sPLS (sparse Partial Least Squares) regression modelling. Its implementation includes many steps such as centring exposure variables in each area (as a cautious way to limit bias resulting from the multicentric nature of the study), regression of exposures and outcomes for all potential confounders (since the sPLS approach in itself does not explicitly allow to adjust for confounders), dimension reduction (as a way to tackle the multiple testing issue), selection of the model’s sparsity and dimension by cross validation. Whether this sPLS approach is more efficient (in terms of bias, sensitivity or specificity to highlight determinants of the outcome and their possible synergy) than the EWAS approach6 previously used and the many other possible Bayesian or frequentist approaches13 remains to be seen. Indeed, so far, very few simulation studies have systematically investigated the relative efficiency of the statistical approaches suggested to analyse Exposome effects on health.14
Another challenge relates to the identification of possible synergy (cocktail effects, or interactions, to use loose terminology) between exposures. Considering statistical interactions in an EWAS setting may be challenging, as the number of tests performed would increase from p (the number of exposure covariates) to p times (p+1)/2 if all pairwise combinations among covariates are considered. Whether the sPLS approach, other dimension reduction techniques or clustering approaches are more efficient in this regard, should be investigated.
This variety of potential statistical methods available to analyse Exposome-health associations13 will constitute a challenge for future meta-analyses, if only because the nature of the effect estimates that they provide differs. Up to now, authors of meta-analyses generally had to combine ORs, HRs and, sometimes, prevalence ratios, and correct for them being reported for different increases in exposure levels in different studies. In the future, will they be able to combine ORs and HRs associated with single exposures to measures of association corresponding to linear combinations of exposure variables (as provided when data reduction techniques are applied), and to clusters of high-risk subjects such as those provided by clustering methods?15 Even when specific tools deemed relevant are used, conducting an additional analysis relying on more classical regression techniques, as done by Lenters et al with their Ordinary Least Squares additional analysis, will prove useful in this regard.
Challenges and perspectives for Exposome research
Many other challenges exist. These relate in particular to the assessment of the Exposome: assessing more exposures should not be carried out at the cost of the accuracy in the assessment of each exposure. New exposure sensors and biochemical assays have been and are being developed, allowing assessment of several hundreds of exposure variables per individual. Each of these metrics has different features in terms of analytical reproducibility (see table S12 for an illustration) and within-subject variability,16 thus inducing various degrees of exposure misclassification for each exposure variable. Models should move towards explicitly incorporating these varying amounts of exposure misclassification. Consider one chemical whose concentration has a high day-to-day variability, such as a phthalate compound, and another one with a much weaker day-to-day variability, such as a compound of the families of polychlorinated biphenyls (PCB), and assume that for both compounds the toxicologically-relevant exposure window is the year before the assessment of the health outcome. Then a study like this one,2 in which exposure assessment is based on a spot biospecimen collected when the health outcome is assessed, is more likely to show an effect (if any) for polychlorinated biphenyls than for the phthalate, for which the spot biospecimen will probably more poorly approximate exposure during the toxicologically-relevant window. Furthermore, there is no reason for the toxicologically relevant exposure window to be the same for each exposure, which calls for prospective studies of the Exposome—some are under way8—and repeated within-subject assessment of time-varying exposures.
Finally, another development would be the simultaneous consideration of several health outcomes in Exposome studies. Lenters et al2 actually provide an illustration of what such Diseasome-Exposome-wide association studies (for which the DEWAS acronym may be used) could be, by linking the 15 exposures considered to 22 biomarker outcomes.
These are daunting challenges that will require collaborations between all fields of environmental health research. Let us hope that molecular epidemiology, now a promising adult, will, together with toxicology, exposure science, biostatistics, biology and other cousin disciplines, face them efficiently.
Contributors RS drafted the manuscript and MV critically reviewed it.
Funding This work is supported by the European Community's Seventh Framework Programme (FP7/2007–2013) under grant agreement 308333–the HELIX project.
Competing interests None.
Provenance and peer review Commissioned; internally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.