Article Text
Abstract
Objectives Numerous environmental contaminants have been linked to adverse reproductive health outcomes. However, the complex correlation structure of exposures and multiple testing issues limit the interpretation of existing evidence. Our objective was to identify, from a large set of contaminant exposures, exposure profiles associated with biomarkers of male reproductive function.
Methods In this crosssectional study (n=602), male partners of pregnant women were enrolled between 2002 and 2004 during antenatal care visits in Greenland, Poland and Ukraine. Fifteen contaminants were detected in more than 70% of blood samples, including metabolites of di(2ethylhexyl) and diisononyl phthalates (DEHP, DiNP), perfluoroalkyl acids, metals and organochlorines. Twentytwo reproductive biomarkers were assessed, including serum levels of reproductive hormones, markers of semen quality, sperm chromatin integrity, epididymal and accessory sex gland function, and Y:X chromosome ratio. We evaluated multipollutant models with sparse partial least squares (sPLS) regression, a simultaneous dimension reduction and variable selection approach which accommodates joint modelling of correlated exposures.
Results Of the over 300 exposure–outcome associations tested in sPLS models, we detected 10 associations encompassing 8 outcomes. Several associations were notably consistent in direction across the three study populations: positive associations between mercury and inhibin B, and between cadmium and testosterone; and inverse associations between DiNP metabolites and testosterone, between polychlorinated biphenyl153 and progressive sperm motility, and between a DEHP metabolite and neutral αglucosidase, a marker of epididymal function.
Conclusions This global assessment of a mixture of environmental contaminants provides further indications that some organochlorines and phthalates adversely affect some parameters of male reproductive health.
 environmental pollutants
 partial least squares
 reproductive health
 sex hormones
 variable selection
Statistics from Altmetric.com
What this paper adds

Some ubiquitous environmental contaminants have been linked to changes in male reproductive health; however, interpretation of the mostly singlepollutant assessments is difficult given the complex mixture of daily exposures.

In a study of around 600 fertile men from Greenland, Poland and Ukraine, we assessed associations between 15 contaminants measured in serum and 22 biomarkers of male reproductive function using sparse partial least squares regression for variable selection.

We identified 8 perturbed biomarker outcomes, including a decrease in serum testosterone levels with increasing diisononyl phthalate levels, and confirmed a previously reported decrease in progressive sperm motility with increasing polychlorinated biphenyl153 levels.
Introduction
There has been considerable scientific and public concern, particularly over the past two decades, that exposure to environmental contaminants may impede development in utero, perturb hormonal homoeostasis and regulation, and contribute to subfertility in humans. This concern stems in part from evidence of geographic variability in semen quality,1 secular trends of decreasing testosterone2 and reports, although inconsistent and contested, of secular declines in sperm concentrations and semen quality.3 ,4
Despite the surge in studies on environmental contaminants and male reproductive health, interpretation of the evidence base is hampered by the piecemeal approach to many investigations. In observational epidemiology, often one class of compounds is correlated with a subset of outcomes, disregarding correlations among background lowlevel exposures. We chose to simultaneously evaluate a large set of environmental contaminant exposures and biomarkers of male reproductive function in a pregnancybased study of men from Greenland, Poland and Ukraine.5 ,6 The set of contaminants comprises high priority7 legacy and emerging contaminants of concern, suspected to interfere with the endocrine system: highmolecularweight phthalates, plasticisers used in a wide variety of consumer products, including polyvinyl chloride; perfluoroalkyl acids (PFAAs), surfactants used in many applications, such as water and stainrepellent coatings; three nonessential metals; one polychlorinated biphenyl and two organochlorine pesticides. These leach from products or are deposited directly into the environment, leading to nearubiquitous human exposure,8 ,9 and all, except phthalates, are persistent and bioaccumulate.
We sought an approach to identify exposure–outcome associations between the 19 exposures, representing four classes of xenobiotic compounds, and 22 biomarker outcomes, reflecting various aspects of male reproductive function. Given the relatively weak prior information for the many possible associations, we decided to take a more agnostic, datadriven approach. That the exposures are highly correlated represented a methodological challenge. For this reason, we used sparse partial least squares (sPLS) regression modelling, which relates linear combinations of a subset of the most predictive exposures to an outcome via linear regression.10 Unlike conventional ordinary least squares (OLS) regression, this dimension reduction approach does not suffer from instability in estimates and failure to converge due to multicollinearity, and enabled a joint assessment of the full set of measured biomarkers of lowlevel environmental contaminant exposures and reproductive function.
Materials and methods
Study populations
Study design and data collection procedures have been described previously.6 Briefly, pregnant women and their male partners were recruited during routine antenatal care visits from 2002 through 2004 inclusive at a large central hospital in Warsaw, Poland, at three hospitals and eight antenatal clinics in Kharkiv, Ukraine, and at local hospitals in 19 municipalities and settlements across Greenland. Of the 1710 couples enrolled (45% participation rate), male partners were consecutively invited to participate in a semen study until approximately 200 men at each of the three locations had agreed (38% participation rate). Participants provided a semen sample, a venous blood sample and information on lifestyle factors (further details provided in the online supplementary methods).
Exposure and outcome assessment
Phthalate metabolites and PFAAs were analysed in serum samples by liquid chromatographytandem mass spectrometry9 (expanded in the online supplementary methods). Phthalates included secondary oxidised metabolites of di(2ethylhexyl) phthalate (DEHP) [mono(2ethyl5hydroxyhexyl) phthalate (MEHHP, alternatively 5OHMEHP), mono(2ethyl5oxohexyl) phthalate (MEOHP, alternatively 5oxoMEHP) and mono(2ethyl5carboxypentyl) phthalate (MECPP, alternatively 5cxMEPP)], and of diisononyl phthalate (DiNP) [mono(4methyl7hydroxyoctyl) phthalate (MHiNP, alternatively 7OHMMeOP), mono(4methyl7oxooctyl) phthalate (MOiNP, alternatively 7oxoMMeOP) and mono(4methyl7carboxyheptyl) phthalate (MOiCP, alternatively 7cxMMeHP)]. Analysed PFAAs [a subset of per and polyfluoroalkyl substances (PFASs), also referred to as perfluorinated compounds (PFCs)] included perfluorooctane sulfonic acid (PFOS), perfluorooctanoic acid (PFOA), perfluorhexane sulfonic acid (PFHxS), perfluorononanoic acid (PFNA), perfluorodecanoic acid (PFDA), perfluoroundecanoic acid (PFUnDA) and perfluorododecanoic acid (PFDoDA). Metals were measured in whole blood samples; cadmium (Cd) and lead (Pb) by inductively coupled plasmamass spectrometry (Thermo X7, Thermo Elemental, Winsford, UK) and mercury (Hg) by cold vapour atom fluorescence spectrophotometry. 2,2′,4,4′,5,5′hexachlorobiphenyl (PCB153) and the dichlorodiphenyltrichloroethane metabolite, 1,1dichloro2,2bis(pchlorophenyl)ethylene (p,p′DDE), were analysed by gas chromatographymass spectrometry,11 as was hexachlorobenzene (HCB) according to a modified method by Otero et al.12
Twentytwo markers of reproductive function (listed in table 1) were assessed following standardised protocols13–17: reproductive hormones, conventional semen characteristics, markers of sperm chromatin integrity and apoptosis, markers of epididymal and accessory sex gland function, and the proportion of Y:X chromosomebearing sperm. Analytical details are provided in the online supplementary methods and table S1.
Statistical analysis
We used sPLS regression modelling to assess associations between the exposure profiles and each outcome. In PLS regression, an Xmatrix of exposures and Y outcome vector are simultaneously decomposed into latent variables and regressed in a way that maximises the covariance between X and Y.18 ,19 These latent variables, called components, are orthogonal, linear combinations of the full set of input variables. Chun and Keleş10 proposed sPLS, embedding variable selection into the PLS algorithm by eliminating ‘uninformative’ Xvariables via penalisation. A penalty term (L_{1}) is applied during dimension reduction, and regression coefficients are obtained via PLS for the reduced set of Xvariables (see online supplementary methods). Model complexity is a function of the number of components used to construct the model (K) and the degree of sparsity (η). To determine the optimal model (avoid overfitting), we performed Monte Carlo crossvalidation,20 and added a null (intercept only) model in the crossvalidation loop to assess significance. We tested models with K values 1–5 and η values between 0.01 and 0.99 (more sparse) in steps of 0.01, using a fivefold crossvalidation, repeated (with different random partitions of the data) 500 times. We selected the most parsimonious model within one SE of the overall minimum mean squared error of prediction (MSEP).19
We took several data pretreatment and exploration steps before undertaking sPLS analyses. We imputed exposure data below the limit of detection (LOD) (0–18%) and values missingatrandom in the matrix of exposures (2–16%). Values were imputed from a lognormal probability distribution via single conditional imputation, dependent on the population, and detected values for the other exposures. In the primary analyses, HCB, PCB153 and p,p′DDE were lipid adjusted (see online supplementary methods), and four contaminants for which <70%21 of the samples had concentrations above the LOD were excluded: MEOHP, MOiNP, PFUnDA and PFDoDA. To approximate normality, we used natural logarithm (ln)transformed values for all exposure variables and for 14 of the 22 outcome variables.
To address potential confounding, we employed a twostage regression approach22: first, each outcome and each exposure were separately regressed on potential confounders, and second, sPLS regression models were fit inputting the residuals. We a priori selected the set of potential confounders,9 including study population and serum cotinine level, and variably age, body mass index (BMI), abstinence period and time of blood sampling, depending on the outcome (as specified in table 1). We imputed missing covariate data (0.8–16%; as elaborated in the online supplementary methods). To assess the potential for overadjustment, in a secondary analysis we evaluated sPLS models with exposure and outcome variables only prestandardised (centred) for study population. This was considered the minimal adjustment, necessary due to the large differences in central tendencies between the populations for the exposures (figure 1), and to a lesser degree but still conspicuous, for the outcomes. Mean centring also removed baseline offsets, while scaling by logtransformation reduced the variance differences between Xvariables, and thus the relative importance during dimension reduction.23
As an additional screening approach, we tested 330 singlepollutant, covariateadjusted OLS linear regression models for associations between the 15 exposures (with >70% detection frequency) and 22 outcomes. To adjust for multiple comparisons, we computed the false discovery rate (FDR),24 and set the significance threshold at an FDR <10%. In sensitivity analyses, we tested the four contaminant exposures with <70% detection frequency dichotomised as detect versus nondetect variables, and tested models excluding participants with imputed confounder and (missingatrandom) exposure data. We also assessed models with lipidunadjusted organochlorines, including total lipids as a covariate. Finally, we tested for heterogeneity in effect estimates between study populations with an interaction term, and by assessing plots of the populationstratified regressions.
sPLS and OLS regression coefficients are presented per lnunit increase in exposure, and for interpretability, converted into the per cent change in outcome per IQR increase in exposure; for lntransformed outcomes, the proportional change, and for untransformed outcomes, the absolute change relative to the mean outcome level. Statistical analyses were performed using R V.3.0.1 (R Foundation for Statistical Computing, Vienna, Austria), and specifically the spls package for sPLS modelling.25
Results
Descriptive characteristics of the study populations are shown in table 1. Participants from Kharkiv were slightly younger than their counterparts from Greenland and Warsaw, and fewer participants from Warsaw smoked. In exploring the structure of the exposure data, we observed large variations within and between the three study populations (figure 1). Levels differed across study populations for all contaminant exposures (analysis of variance p value <0.05, online supplementary table S2), except for MOiCP. Levels were higher in samples from Greenland compared to Warsaw and Kharkiv for many exposures, especially for Hg and PCB153. Correlations within the Xmatrix ranged from r_{p}=−0.48 to 0.91 (figure 2). We also observed distributional differences for many outcomes (table 1), although distributions for the three populations were generally more similar for the outcomes than for the exposures.
Of the 22 sPLS regression models tested in the primary analysis (n=286–600 complete cases), the Xmatrix significantly improved the prediction error beyond the null model for 8 outcomes: luteinising hormone (LH), inhibin B, total testosterone, free testosterone, semen volume, progressive sperm motility, terminal deoxynucleotidyl transferase dUTP nick endlabelling assay (TUNEL) DNA fragmentation index (DFI), and neutral αglucosidase (NAG). Within each of these significant sPLS regression models, 1 to 3 exposures within the Xmatrix contributed to a total of 10 significant exposure–outcome associations (table 2). The optimal models were all K=1 component models and quite sparse (η between 0.64 and 0.99).
In the secondary analysis with exposure and outcome variables only prestandardised for study population but not for other potential confounders, sex hormonebinding globulin (SHBG) was additionally selected (the inclusion of the Xmatrix improved the MSEP beyond the null model), and the LH model was no longer selected (see online supplementary table S3). Extra exposures were also selected, for a total of 24 associations.
In the covariateadjusted OLS regression analyses, 8 overlapping exposure–outcome associations, comprising the same 8 outcomes but fewer exposures (8 of 10) compared to the primary sPLS analyses, were identified as significant (FDR <10%). Two associations, between MHiNP and total testosterone and TUNEL DFI, were significant at an FDR <5% significance threshold (see online supplementary figure S2). An additional association between MEHHP and TUNEL DFI, not selected in the sPLS model, was significant in OLS analyses (see online supplementary table S3). Consistent with the sPLS modelling results, a greater number of associations was significant in the secondary analysis with OLS models only adjusted for study population compared to the primary analysis with further adjusted models (15 compared to 9; 14 of which overlapped with those selected in sPLS secondary analysis only prestandardised for study population). No additional associations were detected in the sensitivity analysis testing contaminants with <70% detection frequency inputted as binary detect versus nondetect variables. In a reanalysis using only the complete (nonimputed) exposure and covariate data, 2 associations—p,p′DDE and LH, and MHiNP and total testosterone—were selected in both the sPLS and OLS modelling (with n=321–480 and 328–586 complete cases, respectively, except for BclxL positivity models which had 192–252 complete cases). This analysis had reduced power, and while CIs were wider, coefficients for the complete case analysis were remarkably similar to the primary analysis with imputed data (data not shown). Coefficients for lipid unadjusted organochlorines, adjusted for total lipids, were only slightly different in magnitude (see footnotes of table 2).
For 5 of the 10 sPLSselected associations, there was a significant interaction between exposure and study population in the OLS sensitivity analysis (table 2). It was apparent that 8 associations were consistent in slope direction across study populations in regression models stratified by population (see online supplementary figure S3). Poland exhibited a slope in the opposite direction for LH and p,p′DDE, and for the oxidative phthalates and TUNEL DFI. Associations with consistent slope directions, and their β_{sPLS} corresponding overall per cent change per IQR increase in exposure, included a 11% increase in inhibin B with Hg; a 6% decline in total testosterone with MHiNP; a 3% and 2% decline in free testosterone with DiNP metabolites, and a 5% increase with Cd; a 10% decline in semen volume with MEHHP; a 11% decline in the fraction of progressively motile sperm with PCB153; and a 16% decline in NAG with MEHHP.
Discussion
Using a multipollutant modelling approach, we identified 10 associations between biomarkers of environmental contaminant exposure and biomarkers of male reproductive function from several hundred associations tested in (sub)fertile men from Greenland, Warsaw and Kharkiv. This is the first time several of the relationships have been assessed in an epidemiological study, or simultaneously.
One of the most statistically robust and consistent associations was between DiNP metabolites and decreasing free (bioactive) testosterone and total testosterone. DEHP metabolites did not exceed the significance threshold in confounderadjusted analyses but exhibited inverse relationships. DiNP exposure is outpacing DEHP exposure in Europe, where DiNP use is less restricted than that of DEHP. These inverse associations are consistent with an antiandrogenic effect, which has more often been observed compared to proandrogenic effects in experimental studies.26 Phthalates are generally considered to act by inhibiting Leydig cell synthesis of testosterone. While most experimental studies have focused on in utero exposure, DEHP and its primary metabolite MEHP were recently shown to suppress steroidogenesis of adult human testis explants in vitro, without affecting production of the hormone insulinlike factor 3 (INSL3).27 Several epidemiological studies have found inverse associations between (urinary) MEHP and testosterone in men; in 74 occupationally exposed workers and 63 controls in China,28 and in a pooled analysis of 425 men of infertile couples and 425 men recruited through a US infertility clinic.29 No association was found in 234 young Swedish men sampled during routine medical conscript examinations.30 Joensen et al31 considered another exposure metric, the proportion of the summed urinary metabolites excreted as the primary metabolite, as a marker of DEHP and DiNP metabolism; %MEHP and %MiNP were associated (significant at p<0.05 or nearsignificant) with total testosterone and free androgen index in 881 healthy young men from Denmark around 19 years of age. However, no significant associations were reported for the sum of primary and secondary metabolites. Primary monoester phthalate metabolites were not measured in the current study—only secondary oxidative metabolites—due to the lipase activity of serum and risk of contamination from the diesters from the sampling devices.
We found a clear dosedependent decrease in serum testosterone levels with increasing levels of MHiNP. We recently discovered, however, that the measurement of MHiNP was possibly confounded by an unknown coeluting agent. Nevertheless, in an additional analysis using the molar sum of the other two DiNP metabolites versus all three oxidative metabolites tested in relation to total testosterone, the magnitude of the regression coefficients was similar: β_{OLS} −0.713 (p value=0.0037) for ΣDiNP_{MOiNP+MOiCP} versus −0.929 (0.0007) for ΣDiNPom. Moreover, that other phthalate metabolites were significantly inversely associated with testosterone lends weight to the evidence that one or more phthalates is indeed related to decreasing testosterone levels.
We observed a positive association between cadmium and free testosterone in the sPLS analyses, although this association was not significant in the OLS analyses adjusted for potential confounders, notably cotinine. Disentangling the effect of smoking and cadmium is difficult as smoking is an important source of cadmium exposure. A cadmium–testosterone association has previously been reported in 219 men recruited from two US infertility clinics,32 and in a representative sample of 1262 US men,33 although this latter association did not sustain statistical significance after adjustment for smoking. An experimental study of chronic lowdose oral exposure in rats reported cadmiumrelated elevated testosterone34; however, the underlying mechanism remains unknown.
We observed decreasing NAG levels with increasing MEHHP. Sperm undergo maturation in the epididymis, and NAG is a marker of epididymal function. A smaller study (n=234) of urinary phthalates and NAG in Swedish conscripts was null.30 In a study of 40 rats administered dinbutyl phthalate, epididymal weight and NAG activity decreased dose dependently, in line with our finding.35
We found decreasing sperm DNA damage with increasing levels of the secondary hydroxylated metabolite of DiNP, counter to the hypothesis of a deleterious effect of exposure. However, the exposure–outcome relationship was inconsistent in direction, exhibiting a positive slope in the Polish study population. We detected associations for DNA fragmentation assessed with TUNEL, but not the sperm chromatin structure assay (SCSA). TUNEL represents DNA strand breaks measured directly in the ejaculated spermatozoa, whereas SCSA is a measure of susceptibility to acid denaturation.36 Sperm DNA damage might be mediated by testicular oxidative stress. In rats, DEHP decreased antioxidant enzymes and concomitantly increased reactive oxygen species.37 In 379 US men seeking infertility treatment, urinary monoethyl phthalate and MEHP were positively associated with neutral Comet assay parameters.38 MEHP was more strongly associated when adjusted for oxidative metabolites (MEHHP and MEOHP), and similar to our finding for MEHHP, the metabolites were inversely associated with DNA damage. Hauser et al38 suggested that an increased ratio of oxidative metabolites to MEHP could represent interindividual differences in rates of DEHP metabolism to ‘less toxic’ metabolites, or it may reflect inhibition of phase 1 enzymes involved in xenobiotic metabolism.
Some of the exposure–outcome associations tested in the current analysis have previously been tested in the same or overlapping study populations in analyses considering a singlepollutant or a single class of chemicals (refer to refs. 39–41; previous investigations of PCB153 and p,p′DDE were reviewed by Bonde et al5). In addition to identifying several novel associations, we confirmed, using a more conservative significance threshold and joint modelling approach, several previously reported associations, including an unfavourable inverse association between PCB153 and the proportion of progressively motile sperm.5 ,17 Other smaller studies have reported comparable findings, for instance, for PCB153 in 305 Swedish men,42 and for PCB138 but not PCB153 and the odds of <50% progressive motility in 212 US men recruited at an infertility clinic (uncorrected for multiple comparisons).43
We also confirm a previously reported strong positive association between mercury and inhibin B.40 As inhibin B, predominantly excreted by Sertoli cells, is correlated with sperm concentration and counts and serves as a marker of spermatogenesis,44 this positive association is contrary to a deleterious effect of exposure. The authors40 suggest that this association may be driven by a higher consumption of seafood—particularly relevant for Greenlandic Inuit—and the resultant concomitant exposure to mercury and ω3 polyunsaturated fatty acids. There are indications that the latter may beneficially impact certain semen quality parameters.45 This mercury–inhibin B association was not observed in 219 US men recruited from infertility clinics.32
We observed an association between MEHHP and a decline in semen volume, as previously reported for this study population.39 A null finding for MEHP was reported for a study of 234 young Swedish men,30 and a statistically significant increase in semen volume for the highest versus lowest %MiNP quartile in 881 Danish men was interpreted by the authors as a chance finding.31
We also observed a positive, yet inconsistentacrosspopulations, association between p,p′DDE and increasing LH, as previously reported.17 As LH regulates testosterone production by Leydig cells, this is in line with an antiandrogenic effect of p,p′DDE. A null finding was reported for 341 men from a US infertility clinic with lower p,p′DDE levels (geometric mean 236 ng/g lipids).46
We found indications of associations between phthalates, metals and organochlorines and SHBG, which did not sustain significance on adjustment for confounders, specifically BMI. These nonsignificant perturbations in SHBG, together with the other associations with hormones, might be indicative of effects of exposure profiles on hypothalamo–pituitary–gonadal axis regulation. Similar to the potential (over)adjustment for cotinine, adjustment for BMI may introduce overadjustment bias rather than (only) prevent confounding bias, given that some exposures may causally influence BMI.
A limitation of this assessment is the crosssectional nature of this study, although, besides phthalates, all measured contaminants are relatively stable over time and as such were most likely reasonable proxies of recent past exposure as related to the measured outcomes. However, while blood and semen samples were collected on the same day for nearly all of the participants from Poland and Ukraine, for around 60% of the participants from Greenland, samples were collected more than 3 days apart (elaborated in the online supplementary methods), contributing to misclassification of exposure, especially for the phthalates. Analyses excluding these participants yielded generally consistent results.
Recent advances in molecular epidemiology (highthroughput screening techniques) and the introduction of the exposome concept47 imply that epidemiological researchers will be confronted more and more with the challenging task of analysing data with high numbers of measured analytes with often complex correlation structures. Singlepollutant modelling may result in a high potential for false positives (type I errors), although multiple testing corrections and the recently proposed environmentwide association study48 design addresses this. Using multipollutant OLS regression modelling may lead to inflated variances and flipped signs of parameter estimates, thus making it difficult to disentangle the independent effects of these variables. Methods which project data onto a lower dimensional space, like PLS regression, can better cope with collinear variables and, as a corollary, reduce the number of models tested.10 ,18 While PLSbased methods are well established in chemometrics and bioinformatics, they are only recently gaining traction in genomics, metabolomics and epidemiology.49
In the current analysis, the associations selected using sPLS rather closely matched those we would have selected using singlepollutant OLS models. However, if we had screened for associations using multipollutant OLS models, accounting for correlations between exposures, these estimates would have suffered from multicollinearity and been unreliable (variance inflation factors exceeded 3 for 8 to 9 of the 15 exposure terms across multipollutant OLS models).
As sPLS captures the complementary contribution of Xvariables, it has the power to detect multiple pollutants which might not be detected in singlepollutant modelling. This can be explained by additivity, and/or consistency at large, in that noise from measurement error and biological variation can be averaged out when a larger number of Xvariables inform the latent variable. This was evident for the free testosterone model, and for several models in the sensitivity analysis with inputs only prestandardised for study population, in which up to 8 exposures were selected. Further, in constructing latent variables, sPLS regression delineates groupings in the Xmatrix of exposures, which is a step forward in tackling the longstanding challenge of assessing mixture effects of contaminants.
Some aspects of (s)PLS modelling implementation are underdeveloped, such as quantifying uncertainty (stability of the selection), and explicitly correcting for multiple testing, beyond the implicit sparseness step. We estimated an empirical null distribution using a Monte Carlo approach to establish a significance threshold. A caveat of sPLS is that penalisation methods optimise prediction and will tend to select one of two highly correlated exposures with nearly equal coefficient sizes.19 Competing multipollutant modelling approaches exist22 (eg, treebased models, elastic net penalised regression); however, simulation and validation assessments, specifically for data structures relevant for environmental epidemiology (often low dimensional), are lacking.
There is no straightforward way to adjust for potential confounders in (s)PLS analyses. Adjusting for confounders in sPLS models by inputting residuals in a twostep approach should, as with OLS modelling,50 yield unbiased regression coefficients, although with a slight loss of efficiency under certain (sample size, correlation) scenarios. We observed large contrasts in exposures across the three study populations, and some differences in outcomes such as TUNEL DFI. To reduce the risk of ecological bias due to potential betweenpopulation confounding, in all analyses we prestandardised (mean centred) the exposure and outcome data by population (sPLS models) or adjusted by population (OLS models). This may have artificially reduced true contrast in exposure and outcome data between study populations, and negatively affected our power to detect some associations.
While (s)PLS modelling has some power to detect nonlinear relationships (weighted over components), the PLS and OLS regression coefficients reflect linear relationships. In an exploratory analysis with generalised additive modelling, this assumption of linearity was met for most exposure–outcome relationships (<2% deviated from linearity, with an estimated degree of freedom >1.0). Further research is needed to compare the efficacy of sPLS regression with other multipollutant modelling and variable selection approaches, including nonparametric techniques.
Conclusions
We used a systematic, multipollutant modelling approach to perform a global assessment of biomarkers of contaminant exposure and male reproduction function in one of the largest studies to date. We identified several environmentally perturbed outcomes, including a robust inverse association between DiNP phthalates and circulating testosterone, and between PCB153 and sperm motility.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
 Data supplement 1  Online supplement
Footnotes

Contributors VL, LP, DH and RV conceived and designed the current data analyses. JPB, GT, AG, LR, BAGJ, MS, HSP, JKL and LC designed and collected data for the cohort study. BAGJ and CHL performed chemical analyses. VL analysed the data and drafted the manuscript, with supervision from LP and RV. All authors, particularly LP, RV, DH, LAMS, AHP, JPB, GT, AG and MS, participated in the interpretation of results and revision of the manuscript. All authors approved the final version of the manuscript.

Funding This work was supported by the European Commission's 7th and 5th Framework Programmes (grants FP7ENV20081226217 and QLK4CT200100202).

Competing interests None.

Patient consent Obtained.

Ethics approval Polish Bioethical Committee, Ethical Committee for Human Research in Greenland, Commission on Ethics and Bioethics of Kharkiv National Medical University.

Provenance and peer review Not commissioned; externally peer reviewed.