Article Text
Abstract
Objectives: This study seeks to assess the impact of measurement errors in cumulative exposure on estimates of a gene-environment interaction in a nested case-control study in occupational epidemiology. In the approach considered here, exposure intensity is assessed at the group level and the exposure duration individually (both with error). Genetic susceptibility is assumed to be known exactly. Differences in “gene” are assumed to affect disease risk only in exposed subjects.
Methods: Three data analysis strategies were considered: one using a correctly specified disease model (exposure and exposure-gene interaction), and two using mis-specified disease models, one with “gene” as the only risk factor (“gene-only” model) and the other with main effects of both gene and exposure along with their interaction (“full” model).
Results: In simulations, estimates of the gene-environment interaction based on the correctly specified disease model were greatly attenuated and power was diminished appreciably even when errors in exposure were modest. Significant associations were detected more frequently in the gene-only model when errors in exposure were large. When the “full” mis-specified model was fitted to the simulated data, it yielded erratic estimates. This is illustrated in an analysis of the interaction of cumulative exposure to organophosphate pesticides and paraoxonase gene on the risk of chronic neuropsychological effects among farmers who dip sheep.
Conclusion: If “gene” contributes to disease risk only in the presence of exposure, the existence of the gene-environment interaction can be efficiently inferred from a deliberately mis-specified “gene-only” disease model in nested case-control studies.
Statistics from Altmetric.com
Considerable attention has been devoted to issues of power and sample size in studies of gene-environment interaction,1–5 and study designs that maximise efficiency of such investigations.6–8 The impact of exposure and gene misclassification on the power of case-control studies to detect gene-environment interactions was reviewed by Rothman et al.9 In this paper, we focus on the problem of detecting gene-environment interactions in occupational epidemiology when exposure has complex measurement error structures and where there is strong a priori knowledge that genetic variation can be associated with the disease risk only in the presence of exposure. We consider a case-control study nested in an occupational cohort with information on cumulative exposure: intensity of exposure is assessed at the group level10–12 and duration of exposure is assessed at the individual level, both with error. Although duration of employment is typically known precisely from personnel records, duration of exposure does not necessarily correlate with it and must be inferred via error-prone methods.13 14 We assume that genotyping is free of errors, while recognising that in some circumstances epidemiological results can be biased by such errors.15 In the studies under consideration, all subjects are exposed to some extent because they belong to a cohort defined by hazardous occupation. We wish to know to what extent, under these circumstances, the estimate of gene-environment interaction will be biased and how power for detecting the interaction is affected. Next, we investigate whether the knowledge that gene per se is not a risk factor for the disease, can be used to improve the chance of detecting gene-environment interaction by fitting a deliberately mis-specified disease risk model to the case-control data. In doing so, we exploit Mendelian randomisation of genetic traits,16 which implies that exposure is independent of genetic status so that errors and biases inherent in exposure assessment will not affect the observed effect of genetic status on health. The fundamental assumptions of our approach are that gene per se does not cause the disease of interest, all subjects are exposed to some extent and that genetic status is independent of both exposure and potential confounders/effect modifiers.
The key question we will try to address is whether fitting a correct disease model or the deliberately mis-specified model that exploits a Mendelian randomisation approach gives us the better chance of detecting biologically meaningful gene-environment interactions. For the moment, we do not consider how estimates of gene-environment interaction can be adjusted for measurement error in cumulative exposure. However, this is an obvious alternative analytical approach to be explored if measurement error structure is known (or estimated with sufficient confidence).
SHEEP DIPPERS’ STUDY: MOTIVATING PROBLEM
Epidemiological methods and exposure assessment
Study design and methods of data collection were described previously and only the key features will be highlighted here. In the case-referent study of chronic organophosphate poisoning, 175 farmers who dipped sheep and reported that they suffered from chronic ill health were asked to nominate a non-blood relative with a similar dipping pattern but apparently in good health, as referents (233, one excluded compared with the original report, because he was reported to be exposed to pesticides only prior to 1970, the start of the observation period for this study).17 A subsequent discriminant analysis showed that cases and referents differed in neuropsychological symptoms such as difficulty concentrating and standing up, smell of chemicals and ringing sounds.18 The referents were also sheep dippers and had similar duration of work with pesticides as cases and the design can be considered as nested within the (unidentified) total cohort of people starting work as sheep dippers and surviving to 2000. For every year (t) between 1970 and 2000, subjects were asked to retrospectively report, among other things, which pesticide they employed in the sheep dip, how many days they dipped sheep (DIPt) and the number of times they mixed/handled sheep-dip concentrate on an average day (MIXt). All subjects were genotyped for specific polymorphisms in the paraoxonase gene. Cumulative exposure for each organophosphate pesticide group was estimated individually according to the procedure previously developed and validated by exposure measurements19 (further details are in Appendix 1). Environmental and working conditions during measurements used to derive the exposure intensity models may have differed from those experiences by cases and referents, creating an additional source of error in estimated cumulative exposure.
Motivation for analytical strategy
If it were true (but see below) that diazinon causes symptoms among sheep dippers due to accumulation of chronic irreversible damage to the nervous system, then it is plausible for cumulative exposure to adequately model the biologically effective dose20 in the absence of genetic variation in susceptibility. It is reasonable to assume that this effect can occur in “exposed” subjects even in the absence of genetic susceptibility and can be enhanced by some variants of paraoxonase gene, as was recently demonstrated.17 There is no reason to suspect that variations in paraoxonase gene per se lead to the studied health conditions in absence of exposure to organophosphates, although the same paraoxonase gene polymorphisms have been associated with coronary heart disease21 and Parkinson’s disease.22 Consequently, the true disease model (equation (3), Appendix 2) that is appropriate for the population and exposure under investigation was assumed to be a logistic model with cumulative exposure and the gene-cumulative exposure interaction as the only risk factors. It was fitted to the case-control data with different cumulative exposure metrics reflecting the type of sheep dip used (assuming that cumulative exposure for diazinon will be most specific, though still assessed with error), as well as mis-specified disease models (see expressions (4) and (5), Appendix 2). For illustrative proposes, we focused only on one genetic variation that was shown to be significant in previous analyses of these data: at least one paraoxonase R allele at position 192 (glutamine to arginine amino acid substitution, present in about one-quarter of alleles, in half of the subjects).
Results
Neither the effects of cumulative exposure metrics, nor the estimates of gene-environment interactions were statistically significant in the analysis of risk of organophosphate poisoning among farmers who dip sheep using the disease model that is presumed to be true (table 1). The estimates of gene-environment interaction were in the expected direction (and approached statistical significance for both diazinon and all organophosphates combined), but the estimates of the effect of cumulative exposure, though of borderline statistical significance, were not in the expected direction except for diazinon. It must be noted that the estimates of the cumulative exposure effect were intentionally minimised by the selection of referents with comparable duration and conditions of employment (and consequently exposure). Overall, unambiguous interpretation of the disease model that is presumed (for illustrative purposes) to be correct is not possible in this case: consistent with the results, it is not even certain that the cumulative exposure model is appropriate for this type of health effect. Further, as in many occupational studies, cases stopped work when their health deteriorated, leading to shorter duration of exposure than that among healthy referents.
When the “full” model (includes main effect of gene and exposure plus their interaction, see Appendix 2 for details) was fitted to the data, the interpretation of the results was likewise difficult (details not shown). For example, when the exposure metric reflected exposure to all organophosphates, the effect of cumulative exposure was (on log-odds scale) −0.0002 (p = 0.2), the effect of “gene” was 0.7 (p = 0.001) and the gene-exposure interaction terms were estimated to be 0.0002 (p = 0.2). When a more specific exposure metric that assesses exposure to diazinon is used in the “full” model, the results are similar: with the estimates of −0.0002 (p = 0.2), 0.8 (p = 0.004) and 0.0002 (p = 0.3) for the effects of cumulative exposure, “gene”, and the interaction, respectively. This suggests a slight protective effect of exposure and increased risk in the presence of the susceptibility gene, but only the effect of the gene is statistically significant. Consequently, a naïve analyst not familiar with Mendelian randomisation would tend to reduce this model by eliminating the non-significant terms, yielding a gene-only model. In the Mendelian randomisation approach, on the other hand, the estimate of the effect of the measure of genetic susceptibility alone was positive (in the expected direction) and statistically significant: odds ratio (OR) = 2.27; 95% CI 1.52 to 3.39. We will now turn to simulation studies that aid in the interpretation of these findings.
SIMULATION STUDY
Methods
The detailed description of statistical models of exposure and disease used in simulations are provided in Appendix 2; all expressions referenced below by numbers are in Appendix 2.
Source population: cohort
We conducted simulation studies with plausible values for the inputs into the logistic model, with particular emphasis on estimation of bias in the estimate of gene-environment interaction and power (probability that will be statistically significantly different from zero with the type I error probability of 0.05 or less).
In each simulation, we considered a population with five exposure groups, each group containing 10 000 subjects. True exposure intensity for each group was assumed to follow a lognormal distribution with logarithmic means of 0.1, 1, 2, 3 and 4. True exposure intensity for each subject (μg+γgi) was generated randomly from group means allowing between-worker standard deviation (σB) to take on values of 0, 0.5, 1.5 or 2. Small (n = 20) and large (n = 100) samples of workers were drawn from each group. For these sampled workers, two repeated subject-specific exposure intensity measurements (Igij) were generated according to expression (1) allowing within-worker standard deviation (σW) to take on values of either 0.5 or 2. These within-subject and between-subject variances were selected to represent typical values of these variance parameters in occupational settings for exposures to chemical.23 24 We assumed that true exposure duration is lognormal with geometric mean of 10 ( = exp(μτ)) and geometric standard deviations of 1.5 ( = exp(στ)). Different errors in exposure duration were considered in accordance with expression (2), allowing error variance of duration (σ2D) to take on values of 0, 0.5, or 1.5, in generating observed durations of exposure. True cumulative exposure was assigned to each subject as a product of exposure intensity and duration.
The simulation was set up with exposure structure and variances that are typical of work among sheep dippers. In assessing exposure of farmers who dip sheep, repeated measurements from subjects within groups were not collected and, therefore, between- and within-worker variances in the study of Buchanan et al19 were unknown. However, the magnitudes of between- and within-worker variances were similar across routes of exposure, industries and agents, and were covered by the conditions in the simulation study.23 24
We assumed that the genotype that confers susceptibility was present in 25% of the subjects. All subjects in the cohort were assigned either susceptible or resistant genotype at random based on the Bernoulli distribution using RANBIN function of SAS (version 9.1, SAS Institute, Cary, North Carolina).
Disease status was then assigned to each subject according to expression (3). The risk parameter for cumulative exposure (βE) was set to 0.04, the intercept, β0, was −20 (a very rare disease in the unexposed population: p([Hgi = 1|ηgi = 0] = 2×10−9)), and the true value of gene-environment interaction was fixed at 0.04. Thus, the true model for the probability of having the disease (p) for all simulations was: p = 1/{1+exp(20−0.04ηgi−0.04[Ggi×ηgi])}. On the basis of the specific probability, each subject was to be assigned either 1 (diseased) or 0 (healthy) by sampling from the Bernoulli distribution (described above).
Nested case-control study
Random samples of 200 subjects with disease, and 400 disease-free subjects were drawn from the simulated cohort. For each subject, exposure intensity was assigned based on either the geometric or arithmetic mean of exposure intensity of a group to which they belonged. The observed cumulative exposure was the product of the assigned group-mean exposure and the observed exposure duration. The status of genetic susceptibility was assumed to be measured exactly. The logistic regression model with cumulative exposure and interaction as the only risk factors (correct model, expression (3)) was fit in each simulated case-control study as well as the model based on the Mendelian randomisation approach (mis-specified gene-only model, expression (4)) and the mis-specified “full model”, expression (5). We also considered larger case-controls studies with the same case-control ratio (400 and 800 cases). Each simulation was repeated 1000 times. All simulations and data analysis were performed using SAS.
Results
The results of the simulated nested case-control studies with 200 cases and 400 controls (when 100 subjects were selected for exposure monitoring from each group (n) and geometric group mean was used) are summarised in tables 2 and 3. It is evident that, even with modest uncertainty in the estimate of the exposure duration (σD = 0.5), the estimate of gene-environment interaction can be severely attenuated. For example, when there is no between-worker variability, the attenuation is fourfold regardless of the magnitude of the within-worker (day-to-day) variability. A fourfold attenuation in the estimate of gene-environment interaction (down to 0.01 from the true value of 0.04) is also seen when duration of exposure is known exactly, but there is small between-worker variability (σB = 0.5). The attenuation increases to orders of magnitude for large, yet realistic values of between-worker variability and/or when uncertainty in the estimate of exposure duration increases: for example, from a true value of 0.04 to the estimate of 0.0006 when σB = σD = 1.5. Attenuation does not appear to be appreciably affected by within-worker variability and size of a sample drawn from each group to estimate geometric mean exposures for each group (details for n = 20 not shown). The use of arithmetic group means to estimate exposure intensity resulted in more severe attenuation (results not shown), and only exposure situations where geometric group means were used will be considered hereafter.
The power to detect the gene-environment interaction deteriorates to below 80% once between-worker variability becomes large (σB = 1.5); power appears to be affected to a lesser degree by errors in exposure duration. Increasing the sample size used to estimate group means appears to only improve power, albeit marginally, when within-worker variability is large. For example, for σB = σD = 1.5, power increases from 55 to 58% when 100 workers instead of 20 workers are measured per group (results not shown).
All factors in the simulation appear to affect power when the correct disease model is fitted to the data. However, when we fitted the mis-specified gene-only model to the simulated case-control data (expression (4)), we observed: (a) most estimates were positive and (b) power depended only on between-worker variability (the overall results summarised in table 4). This is not surprising, because the observed effect of gene, if any, would depend on its interaction with true exposure, rather than its mis-measured estimate.
The probability for detecting a positive association for the gene in a mis-specified model exceeded the power for detecting a gene-environment interaction in the correctly specified model. For example, when errors in exposure duration and between-worker variability are large (σB = 2.0, σD = 1.5), the power of the correct model is between 30 and 40% (tables 2 and 3), but the probability of the effect of the gene in the mis-specified gene-only model being statistically significant is approximately 70% (table 4). It must be noted that in the simulation study the true distribution of exposure durations and contrast between true group means were held constant and therefore their impact on bias and power was not investigated. However, it can be expected that they affect estimates in both the correct and mis-specified gene-only models, just as the between-worker variance does, being the determinants of the true cumulative exposure values.
As table 5 suggests, when measurement error is present, the mis-specified gene-only model appears to have greater power for the same sample size. For example, the nominal power of 80% is exceeded in direct estimation of gene-environment interaction with 800 cases and 1600 controls, but only 400 cases and 800 controls are needed for “power” of 94% if the gene-only model is used to infer the presence of gene-environment interaction. This suggests that greater efficiency in study design (fewer subjects and no need to assess exposure in detail) aimed at detecting rather then quantifying the gene-environment interaction can be achieved by both carefully selecting the hypothesis about gene-environment interaction (ie, gene influences disease risk only when exposure is present) and selecting an appropriate analytical approach (ie, gene-only model based on the Mendelian randomisation approach).
When disease model was mis-specified according to expression (5), the observed estimates of gene-environment interaction were erratic (200 cases and 400 controls, details not shown). For example, when there was no measurement error (σB = σD = 0), the estimate of the interaction tended to be inflated: 0.07 (σW = 0.5, n = 100) and 0.06 (σW = 2, n = 100) with negligible power: 0.2 and 2%, respectively. However, a small increase in measurement error resulted in the attenuated estimate of the interaction (0.008 in both cases considered above), but with reasonable power of approximately 70%. Nonetheless, in most simulations, the mis-specified “full” model tended to produce severely attenuated estimates of the interaction with poor power.
Main message
Direct estimation of gene-environment interactions in occupational settings requires large populations and accurate exposure estimates, both of which may not be attainable. However, the existence of an interaction between a gene and occupational exposure can be efficiently detected in case-referent studies nested in occupational cohorts by examining only the gene-disease association under the assumption that the “gene” is allocated at random and associated with occupational disease only through interaction with exposure.
Policy implication
Our methodological research supports the previous conclusion that exposure to organophosphates is related to chronic neuropsychological conditions among sheep dippers.
RE-APPRAISAL OF THE ANALYSIS OF SHEEP DIPPERS’ STUDY
Our illustrative analyses of the case-control study of ill health among farmers who dip sheep yielded ambiguous results (table 1). We observed no significant gene-environment interaction was with cumulative exposures to organophosphate pesticides. However, it was previously shown that genetic variation per se was associated with ill health in the presence of exposure. The simulation study helps us interpret these findings on the assumption that cumulative exposure is relevant in light of (a) the likely extent of measurement error in cumulative exposure estimate; and (b) the hypothesised mechanism of modification of the effect of exposure by the susceptibility gene. According to the simulation, the absence of the effect in the true disease model can be attributed to errors in estimates of cumulative exposure that would dramatically attenuate the estimate of a strong gene-environment interaction and substantially degrade power. The mis-specified “full” model is not expected to yield stable estimates and its behaviour appears to be erratic. At the same time, a deliberately mis-specified gene-only disease model that relies on a Mendelian randomisation approach would only yield elevated risk estimate for the gene if there was a true gene-environment interaction in the population. This was the design of the study that assumed that estimate of the effect did not depend on accurate knowledge of the dose metric. Analogous results, combined with understanding of the underlying biology, led to the original conclusion in favour of causation of chronic neuropsychological conditions by exposure to diazinon, at least among susceptible individuals.17 It must be noted that if cumulative exposure is not an appropriate dose metric for a given study (as may well be the case for sheep dippers), then it would further contribute to lack of association with cumulative exposure even in the absence of measurement error. This problem of model mis-specification would not affect the performance of the mis-specified gene-only model, which does not depend on knowledge of true dose metric.
DISCUSSION AND CONCLUSIONS
It appears that the choice of optimal analytical strategy for studies of gene-environment interaction should include the deliberately mis-specified gene-only disease model, if our assumptions can be justified. This is especially the case when the goal is to demonstrate the existence of an interaction rather then characterising the exact extent of risk due to exposure and its modification by susceptible individuals. Our results emphasise the importance of a clear hypothesis for an efficient design of studies of gene-environment interaction. If there is a strong a priori hypothesis that matches assumptions in our simulation study (that gene per se does not cause the disease of interest, all subjects are exposed to some extent, genetic status is independent of both exposure and potential confounders/effect modifiers), reasoning based on Mendelian randomisation can be used to infer gene-environment interaction in a modest sample of exposed individuals using a case-control design without even attempting to accurately estimate occupational exposures. To successfully apply this approach it is sufficient to establish that all subjects (cases and controls) are exposed to an agent of interest to some biologically meaningful extent. Occupational cohorts, with relatively well-defined highly exposed populations, are a very promising setting for meeting the assumption of Mendelian randomisation and ensuring that both cases and controls are “exposed” (perhaps to a different, unknown, extent). If such assumptions are not sensible, the investigators will have to invest in a large study with sophisticated (and precise) exposure assessment in order to detect an existing interaction. However, if existence of the interaction can be shown, then greater motivation exists to conduct further inquiries into the actual magnitude of the interaction, which would require a substantial investment into exposure assessment and correction of the estimate of the interaction for measurement error.
Our results do not support mis-specification of the disease model by including the effect of gene in a model that contains main effect of exposure and its interaction with the gene: such over-parameterisation is expected, on average, to produce inconsistent results that are very sensitive to measurement error. But if gene does indeed have an effect on its own or through interaction with another exposure that is not assessed, then gene effect should be included in the disease model. Given that it is difficult to be assured a priori that the strong assumption we put forward is satisfied, it constitutes a major weakness of the approach we propose. For example, a failure to assess non-occupational exposures might lead to a significant gene-only term in the full model and a good fit of such a model if levels of occupational and environmental exposure are not strongly related. However, it must be noted that occupational exposures in most cases will be of greater magnitude than non-occupational exposures and therefore the impact of failure to assess exposure of interest from the general environment is likely to have negligible effect. After all, occupational populations are studied because they have an unusually high exposure to a particular agent of interest.
It must be emphasised that our simulations do not cover all possible situations and are based on strong assumptions that appear to be appropriate for the motivating example — the study of sheep dippers. Therefore, an investigator planning to utilise this approach is encouraged to assess the power of alternative study designs and analytical strategies in situations that reflect specifics of their research question. Pending development of a better understanding of theoretical reasons for methodological issues illustrated by the simulations, caution is advised in generalising our experience, because a different constellation of simulation parameters (eg, a stronger gene-environment interaction, different frequency of susceptible genotype, presence of truly unexposed group in the study, different true underlying exposure-response or measurement error models) may well lead to different conclusions.
Acknowledgments
The project was supported by a research grant from the Canadian Institutes for Health Research. Drs Igor Burstyn and Yutaka Yasui are supported by salary awards from the Canadian Institutes for Health Research and Canada Research Chair program, respectively, and both the Alberta Heritage Foundation for Medical Research. Mr Suwen Li provided invaluable help developing efficient SAS macro for the simulation study. In the sheep dippers’ study, Drs Nicola Cherry and Andy Povey were the principal investigators, Mr Martin Dippnall assisted with coding of exposure information and Drs Mike and Barti Mackness analysed the polymorphisms. The authors wish to thank Dr Andy Povey of Manchester University and Dr Paul Gustafson of the University of British Columbia for their valuable comments on the draft of this manuscript.
APPENDIX 1: ESTIMATING CUMULATIVE EXPOSURE
In a study of sheep dippers in Scotland it was shown that cumulative number of events in which concentrate was handled had a nearly perfect correlation with cumulative exposure estimated using a model derived from personal measurements and estimates of duration of pesticide use: a Pearson correlation of 0.99.19 Thus, the units of cumulative exposure estimates were “cumulative number of events in which concentrate was handled”, which is assumed to be a correlate of “organophosphate metabolite (nmol/mmol creatinine) × years”. Equivalent cumulative exposure estimate can be obtained from our data as a sum, from 1970 to 2000, of year-specific products of DIPt and MIXt (see text for definitions). Given that inputs into these calculations were obtained from self-reports, they can be deemed to be imprecise and the resultant cumulative exposure estimates to contain error. Model-based exposure intensity estimation used by Buchanan et al19 to validate their various exposure metrics, was analogous to group-based assessment of exposure intensity.25
APPENDIX 2: EXPOSURE AND DISEASE MODELS
Exposure to chemicals in occupational epidemiology is typically assessed as a product of its duration (D) and intensity (I). We assume that this cumulative exposure (E = I×D) leads to a biologically effective dose, the effects of which are mediated by genetic susceptibility (G). The assumption about the proportionality of biologically effective dose and cumulative exposure is reasonable under the first-order kinetics for the relation between the two.26 We assume that genetic susceptibility alone is not sufficient to cause the health outcome (H).
In the assessment of exposure intensity, the log-transformed measurements (LIgij) are assumed to satisfy the model below (classical measurement error structure):
where g denotes groups (g = 1, · · · , M), i denotes workers in the gth group (i = 1, · · · , Kg) and j denotes measurement taken on the ith worker (j = 1, · · · , Ngi). In this model, μg is the fixed parameter representing group mean exposure, γgi are the random effects for the ith worker in the gth group that are normally distributed with zero means and variances σ2B, and ϵgij is the random error term for jth measurement on the ith worker in the gth group that is normally distributed with mean zero and variance σ2W. It is assumed that γgi and ϵgij are mutually independent. Each worker’s true exposure is μgi = μg+γgi. We will consider group-based exposure assessment in which only a sample of subjects from each group is measured (eg, each several times on randomly chosen days). Group-specific geometric means are then assigned to all members of the respective groups (single imputation). Therefore, the observed exposure intensity in this setting is a group geometric mean calculated as:
We also considered the consequences of using arithmetic group means (mean of Igij from each group) to estimate observed exposure intensity, as is commonly done.27
Duration of exposure is typically specific to each study subject and therefore can be assumed to have the classical error structure. Specifically, the logarithms of the observed duration of exposure of the ith worker in the gth group (LDgi) is a sum of the logarithm of the true duration of exposure (normally distributed random variable τgi∼N(μτ, στ2)) and a random error (normally distributed random variable δgi∼N(0, σ2D)) where τgi and δgi are independent:
We denoted the true cumulative exposure of the ith worker in the gth group as ηgi = exp(τgi)×exp(μgi) and the observed cumulative exposure as Egi = Dgi×Īg. We assume that errors in exposure intensity (ϵgij) and duration (δgi) are independent.
In summary, the errors in the exposure estimate arise from σ2W, σ2D and sampling of workers, whereas the true exposure is determined by μg and μτ with random variations due to σ2B, and στ2.
We assume that binary health outcome Hgi arises in accordance with the following logistic model (assuming that the susceptibility gene per se does not lead to a disease: βG = 0 in βGGgi):
The estimated correct disease model is the same as expression (3), but with: instead of β0, instead of βE, Egi instead of ηgi and instead of βG×E.
To simulate analysis utilising Mendelian randomisation, we considered a disease model that had gene as the risk factor even though βG = 0:
We also considered a mis-specified “full” model that includes both main effects and their interaction to examine the consequences of trying to guard against association of gene with exposure in a particular dataset even though gene per se is known not to cause the disease:
REFERENCES
Footnotes
Competing interests: None declared.