Article Text

Download PDFPDF

O-467 Evaluating the impact of sex and gender on the performance of machine learning for auto encoding of job titles
Free
  1. Christopher Baker,
  2. Anil Adisesh,
  3. Cesar Augusto Suarez,
  4. Ellen Sweeney,
  5. Amanda Von Seehausen1,
  6. Mohammad Sadnan Al Manir,
  7. Deobrah Addey,
  8. Yunsong Cui,
  9. Hicks Jason,
  10. Cheryl Peters
  1. 1University of New Brunswick, Canada

Abstract

Introduction Ongoing studies into the use of algorithms for the automated coding of job titles to the Canadian National Occupation Classification have performance accuracy which are at least equivalent to manual coding accuracy. Moreover automated coding provides significant time savings. These studies have identified that both natural language processing and machine learning algorithms are effective for auto coding. Whereas NLP based and machine learning approaches both rely on bespoke rules, and existing data sets, machine learning models can proliferate bias from training data if not corrected.

Objectives The goal of the study is to explore the impact of altering sex/gender ratios in training data sets on overall performance of the machine learning based prediction of NOC codes using patient provided job titles.

Methods Using data participant patient data provided by Atlantic PATH, training data sets were prepared for 100 4-digit NOC categories. The data sets were prepared with sex/gender ratios of 50/50 30/70, 70/30. The data sets were used to train ENENOC machine learning platform and tested on a set of manually coded job titles provided by Atlantic PATH CanPATH . Performance levels were contrasted for all 4-digit NOC categories used in the study.

Results Initial results in this preliminary study have identified that sex and gender are variables that can influence auto coding performance, however the extent to which overall coding accuracy is impacted is relative minor. Further studies are required with larger training sets to fully explore the extent of sex and gender as contributing variables to bias to ENENOC.

Conclusion We initiated studies to investigate the impact of sex and gender bias on performance of the ENENOC algorithm. Together, the ENENOC contributed training and test sets provide a suitable framework for ongoing work in this area.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.