Article Text
Abstract
Objectives People are only rarely, if ever, exposed to a single exposure and predisposition to a disease is generally thought to be caused by the cumulative effects of a multitude of exposures and lifestyle factors in combination with individual susceptibility. However, incorporating all factors and their interactions to assess disease risk requires increasingly complex statistical models. Failure to include all relevant variables and their interactions may result in biased risk estimates, decreased power or computational difficulties.
Methods We describe a hierarchical Bayesian mixture framework (BMF) incorporating a variable-selection prior. Its performance has been compared to a fully adjusted logistic regression model (LM) in simulated case-control studies. Up to twenty dichotomous or continuous exposures and confounders were simulated with prevalences ranging between 10% and 80% and correlations ranging between 0.20 and 0.80. 10% and 50% of exposures were associated with disease (OR∼2.0) with 2 exposures interacting via an increase in log-odds of 2.0-4.0.
Results Mean-squared errors (MSE) of the BMF were smaller than of the LM for all simulations, and whereas the MSE of the LM increased with the number of parameters in the model, this was independent of the number of parameters for the BMF. The numbers of BMF type I errors were minimal (≤1), while for the LM this increased with the number of model parameters and correlation between exposures. The number of type II errors were comparable.
Conclusions This Bayesian mixture framework proves an improved method compared to standard logistic regression models in dealing with the statistical analysis of (correlated) exposure mixtures in case-control studies.