Article Text

Download PDFPDF
P194 Recommendations for prioritising expert review of free-text job descriptions that underwent computer-based coding using the soccer algorithm
  1. Daniel Russ1,
  2. Thomas Remen2,
  3. Kwan-Yuet Ho1,
  4. Wong-Hi Chow3,
  5. Faith Davis4,
  6. Jonathan Hofmann5,
  7. Huang Huang6,
  8. Mark Purdue5,
  9. Kendra Schwartz7,
  10. Jack Siemiatycki2,
  11. Yawei Zhang6,
  12. Debra Silverman5,
  13. Calvin Johnson1,
  14. Jerome Lavoué2,
  15. Melissa Friesen5
  1. 1Center for Information Technology, National Institutes of Health, Bethesda, USA
  2. 2CHUM Research Centre, Université De Montréal, Montréal, Canada
  3. 3University of Texas MD Anderson Cancer Centre, Houston, USA
  4. 4University of Alberta, Edmonton, Canada
  5. 5Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, USA
  6. 6Department of Environmental Health Sciences, Yale School of Public Health, New Haven, USA
  7. 7Department of Family Medicine and Public Health Sciences, Wayne State University, Detroit, USA


Objectives Previous evaluations of algorithms to code job descriptions to standardised occupation classification (SOC) codes suggest that some jobs will need expert coding to reduce misclassification. For jobs coded using the SOCcer algorithm (, we evaluated the utility of several metrics for identifying discordances between expert and automated SOC assignments to develop recommendations to prioritise expert review.

Methods The SOCcer algorithm was applied to expert-coded job descriptions from three studies to obtain each job’s top ten scoring U.S. SOC-2010 codes and their ‘score’ (measure of fit; continuous 0–1). The SOCcer and expert SOC codes were linked to the CANJEM job-exposure matrix comprising exposure estimates for 258 agents (probability, intensity, exposure status: probability > 0 vs. 0). We evaluated the agreement between the expert and the top scoring SOC code (proportion of agreement), and in their agent-specific CANJEM estimates (kappa for exposure status; intra-class correlation coefficient, ICC, for probability and intensity) in subsets of jobs stratified by metrics derived from the SOCcer score and CANJEM. We describe the overall patterns.

Results Moderate agreement was usually achieved for jobs with a maximum score ≥ 0.3. Higher agreement was observed for jobs with SOCcer score distance between the top two scoring SOC codes of ≥0.1 versus <0.1. Combining these two characteristics, kappa’s and ICC’s were 0.7–0.8 for jobs with ≥0.3 maximum score and ≥0.1 score distance (36–53% of all jobs) compared to 0.3–0.5 for jobs that did not meet both thresholds. We also found higher agreement for jobs with the same versus different exposure status for the top two scoring SOC codes.

Conclusions When applying SOCcer to un-coded jobs, we found that expert review would be most informative (reduce misclassification) for jobs with maximum scores < 0.3 and for jobs where the top two ranked SOC codes had score distances < 0.1 or differing exposure estimates.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.