Article Text

Download PDFPDF

0199 Using machine learning to efficiently use multiple experts to assign occupational lead exposure estimates in a case-control study
  1. Melissa C Friesen1,
  2. Sarah J Locke1,
  3. Dennis Zaebst2,
  4. Susan Viet2,
  5. Susan Shortreed3,
  6. Yu-Cheng Chen1,
  7. Dong-Hee Koh4,
  8. Larissa Pardo5,
  9. Kendra L Schwartz6,
  10. Faith G Davis7,
  11. Patricia A Stewart8,
  12. Joanne S Colt1,
  13. Mark P Purdue9
  1. 1Occupational and Environmental Epidemiology Branch, NCI, Bethesda, MD, USA
  2. 2Westat, Rockville, MD, USA
  3. 3Biostatistics Unit, Group Health Research Institute, Seattle, WA, USA
  4. 4Carcinogenic Hazard Branch, National Cancer Center, Seoul, Republic of Korea
  5. 5Emory University Rollins School of Public Health, Atlanta, GA, USA
  6. 6Department of Family Medicine and Public Health Sciences and Barbara Ann Karmanos Institute, Wayne State University School of Medicine, Detroit, MI, USA
  7. 7School of Public Health, University of Alberta, Edmonton, AB, Canada
  8. 8Stewart Exposure Assessments, LLC, Arlington, VA, USA


Objectives We applied machine learning approaches to efficiently assist multiple experts to transparently estimate occupational lead exposure in a case-control study of renal cell carcinoma.

Method We used hierarchical cluster models to classify the 7154 study jobs with occupational history and job/industry questionnaires into 360 groups with similar responses. Each group was reviewed independently by two or three experts and was assigned probabilities of lead exposure (<5%, ≥5– <50%, ≥50%) for three time periods (<1980, 1980–1994, ≥1995). When the group’s mean response pattern suggested within-group exposure variability, experts identified programmable conditions that defined the rating differences where possible or flagged the group for further review. After splitting jobs that overlapped time periods at the calendar cut point, the 9992 job/time periods were assigned their relevant expert/group/time period estimate. Classification and regression tree (CART) models were developed to predict each expert’s expected assignment, based on previous decisions, to assign estimates for jobs in groups that expert had not assessed and for jobs requiring further review.

Results In preliminary analyses, CART models predicted 91–96% of the experts’ pre-1995 estimates and 77–96% of ≥1995 estimates. CART estimates were assigned to 3–48% of the job/time periods, varying by expert. Overall, 92% of the job/time periods were assigned the same estimate by at least two experts.

Conclusions Our framework reduced the number of exposure decisions needed from each expert compared to job-by-job assessment. Future work will use CART models to identify differences between experts to be resolved and incorporate frequency and intensity of lead exposure estimates.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.