Performance of variable selection methods for assessing the health effects of correlated exposures in case–control studies

Virissa Lenters; Roel Vermeulen; Lützen Portengen

doi:10.1136/oemed-2016-104231

Article Text

Methodology

Original article

Performance of variable selection methods for assessing the health effects of correlated exposures in case–control studies

http://orcid.org/0000-0002-0444-9150Virissa Lenters1,
http://orcid.org/0000-0003-4082-8163Roel Vermeulen1,2,
Lützen Portengen1

¹ Division of Environmental Epidemiology, Institute for Risk Assessment Sciences, Utrecht University, Utrecht, The Netherlands
² Departmentof Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands

Correspondence to Dr Lützen Portengen, Institute for Risk Assessment Sciences, Utrecht University, PO Box 80.178, 3508 TD Utrecht, The Netherlands; L.Portengen{at}uu.nl

Abstract

Objectives There is growing recognition that simultaneously assessing multiple exposures may reduce false positive discoveries and improve epidemiological effect estimates. We evaluated the performance of statistical methods for identifying exposure–outcome associations across various data structures typical of environmental and occupational epidemiology analyses.

Methods We simulated a case–control study, generating 100 data sets for each of 270 different simulation scenarios; varying the number of exposure variables, the correlation between exposures, sample size, the number of effective exposures and the magnitude of effect estimates. We compared conventional analytical approaches, that is, univariable (with and without multiplicity adjustment), multivariable and stepwise logistic regression, with variable selection methods: sparse partial least squares discriminant analysis, boosting, and frequentist and Bayesian penalised regression approaches.

Results The variable selection methods consistently yielded more precise effect estimates and generally improved selection accuracy compared with conventional logistic regression methods, especially for scenarios with higher correlation levels. Penalised lasso and elastic net regression both seemed to perform particularly well, specifically when statistical inference based on a balanced weighting of high sensitivity and a low proportion of false discoveries is sought.

Conclusions In this extensive simulation study with multicollinear data, we found that most variable selection methods consistently outperformed conventional approaches, and demonstrated how performance is influenced by the structure of the data and underlying model.

collinearity
environment-wide association
model selection
multipollutant
variable selection

https://doi.org/10.1136/oemed-2016-104231

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

View Full Text

Footnotes

RV and LP contributed equally.
Contributors LP and RV conceived and designed the analysis. LP performed the simulations and statistical analyses, with contributions from VL. All authors were actively involved in interpretation of results. VL drafted and revised the manuscript, with contributions from LP and RV. All authors approved the final version of this manuscript.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The R code for the simulation and analyses is available upon request.

Linked Articles

Commentary
Modern statistics, multiple testing and wishful thinking

Graham Byrnes
Occupational and Environmental Medicine 2018; 75 477-478 Published Online First: 09 Mar 2018. doi: 10.1136/oemed-2017-104807

Log in using your username and password

Main menu

Log in using your username and password

You are here

Abstract

Statistics from Altmetric.com

Request Permissions

Footnotes

Linked Articles

Read the full text or download the PDF:

Log in using your username and password

Read the full text or download the PDF:

Log in using your username and password