0199 Using machine learning to efficiently use multiple experts to assign occupational lead exposure estimates in a case-control study

Melissa C Friesen; Sarah J Locke; Dennis Zaebst; Susan Viet; Susan Shortreed; Yu-Cheng Chen; Dong-Hee Koh; Larissa Pardo; Kendra L Schwartz; Faith G Davis; Patricia A Stewart; Joanne S Colt; Mark P Purdue

doi:10.1136/oemed-2014-102362.79

Article Text

Oral presentation

0199 Using machine learning to efficiently use multiple experts to assign occupational lead exposure estimates in a case-control study

Free

Melissa C Friesen¹,
Sarah J Locke¹,
Dennis Zaebst²,
Susan Viet²,
Susan Shortreed³,
Yu-Cheng Chen¹,
Dong-Hee Koh⁴,
Larissa Pardo⁵,
Kendra L Schwartz⁶,
Faith G Davis⁷,
Patricia A Stewart⁸,
Joanne S Colt¹,
Mark P Purdue⁹

¹Occupational and Environmental Epidemiology Branch, NCI, Bethesda, MD, USA
²Westat, Rockville, MD, USA
³Biostatistics Unit, Group Health Research Institute, Seattle, WA, USA
⁴Carcinogenic Hazard Branch, National Cancer Center, Seoul, Republic of Korea
⁵Emory University Rollins School of Public Health, Atlanta, GA, USA
⁶Department of Family Medicine and Public Health Sciences and Barbara Ann Karmanos Institute, Wayne State University School of Medicine, Detroit, MI, USA
⁷School of Public Health, University of Alberta, Edmonton, AB, Canada
⁸Stewart Exposure Assessments, LLC, Arlington, VA, USA

Abstract

Objectives We applied machine learning approaches to efficiently assist multiple experts to transparently estimate occupational lead exposure in a case-control study of renal cell carcinoma.

Method We used hierarchical cluster models to classify the 7154 study jobs with occupational history and job/industry questionnaires into 360 groups with similar responses. Each group was reviewed independently by two or three experts and was assigned probabilities of lead exposure (<5%, ≥5– <50%, ≥50%) for three time periods (<1980, 1980–1994, ≥1995). When the group’s mean response pattern suggested within-group exposure variability, experts identified programmable conditions that defined the rating differences where possible or flagged the group for further review. After splitting jobs that overlapped time periods at the calendar cut point, the 9992 job/time periods were assigned their relevant expert/group/time period estimate. Classification and regression tree (CART) models were developed to predict each expert’s expected assignment, based on previous decisions, to assign estimates for jobs in groups that expert had not assessed and for jobs requiring further review.

Results In preliminary analyses, CART models predicted 91–96% of the experts’ pre-1995 estimates and 77–96% of ≥1995 estimates. CART estimates were assigned to 3–48% of the job/time periods, varying by expert. Overall, 92% of the job/time periods were assigned the same estimate by at least two experts.

Conclusions Our framework reduced the number of exposure decisions needed from each expert compared to job-by-job assessment. Future work will use CART models to identify differences between experts to be resolved and incorporate frequency and intensity of lead exposure estimates.

https://doi.org/10.1136/oemed-2014-102362.79

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Log in using your username and password

Main menu

Log in using your username and password

You are here

Abstract

Statistics from Altmetric.com

Request Permissions

Read the full text or download the PDF:

Log in using your username and password