Article Text

Download PDFPDF

S-499 The application of artificial intelligence in the coding of occupational information
Free
  1. Anil Adisesh1,
  2. Christopher JO Baker
  1. 1University of Calgary, Canada

Abstract

Many research studies seek to identify the social determinants of health and occupation is an important predictor, both at the level of the individual as well as for populations. Whereas job titles are usually solicited during interviews or by questionnaire, before being able to use this information the responses need to be categorized using a coding system, such as the Canadian National Occupational Classification (NOC).

Manual coding is the usual method, which is a time-consuming and error-prone activity with variable or inconsistent outcomes from teams of coders. In recent work the ACA-NOC algorithm1 was developed to perform automated coding based on matching job title text with the NOC’s job titles and textual descriptions. This algorithm was benchmarked on a small sample manually coded data set with subject matter experts subsequent review of coding discrepancies to facilitate functional improvements to the algorithm. Performance levels achieved illustrated the viability of the approach albeit larger benchmarking data sets were required.

CanPATH2has collected data from approximately 330,000 volunteer Canadians, including information about health, lifestyle, occupation, environment and behavior. We report on the further benchmarking and further development of this algorithm in CanPATH funded project using over 60,000 manually coded job titles from the constituent Alberta Tomorrow Project. The algorithm was also applied to over 100,000 un-coded job titles from Atlantic PATH, including the Core questionnaire and occupational history data.

The core outcome of the project identified that auto-coding results are comparable to manual coding in accuracy and superior in speed e.g. 2 years of manual coding (64,000 records) can be auto coded in 72 hours. The algorithm was considered ready for deployment in operational settings: point of care, decision support for manual coders.

Additional insights gained during the project revealed that (i) NOC and ATP data sets have a distribution bias where some NOC categories were over or under-represented and numerous non-standard lexical features were found in job titles and NOC job descriptions, (ii) benchmarking datasets from ATP included coding errors that were corrected by expert coders leading to the creation of gold standard test sets for further algorithm improvement studies, (iii) a study on 17 categories of occupations initially difficult to code, identified some job categories with near 90% coding accuracy.

Automated coding of job titles to the NOC has been shown to be both practicable to good levels of accuracy and shown to significantly accelerate manual coding efforts from years to autocoding in a matter of hours without decrease in accuracy. Autocoding can replace costly, error prone manual labor with accurate point-of-care auto-coding such that patient occupation information during healthcare encounters could now supplement existing administrative data sets in electronic health record systems. This data can be used better to understand the socioeconomic consequences of health conditions, advise patients about returning to work with a health condition, recognizing occupations at risk of disease e.g. as in the COVID-19 pandemic.

References

  1. Bao H, Baker CJO, Adisesh A. Occupation coding of job titles: iterative development of an automated coding algorithm for the canadian national occupation classification (ACA-NOC). JMIR Form Res 2020 Aug 5;4(8):e16422. doi:10.2196/16422

  2. CanPath the Canadian Partnership for Tomorrow Project. https://canpath.ca/ accessed: 01.09.2021

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.