Table 1

Proportion of exposure predictions from CART and random forest models that agreed with the expert exposure estimate in the validation dataset (n=2248)

Agreement (%)
Expert exposure metricNumber of jobsCART, OH variablesCART, OH and module variablesRandom Forest, OH variablesRandom Forest, OH and module variables
Probability, binary
 Negligible/low188793959395
 Medium/high35879898092
 Overall224592949294
Probability, ordinal
 Negligible170597979898
 Low18232432332
 Medium73721814
 High28568857285
 Overall224585898589
Intensity
 None1708*98979697
 Low39464686771
 Medium8648564157
 High5761656160
 Overall224587898890
Frequency
 None1757*97979898
 Low20932522634
 Medium14112391335
 High14157635965
 Overall2248†83878486
  • *The number of unexposed jobs based on intensity and frequency exceeds the number of unexposed jobs based on the probability metric. This occurred because the estimated level of exposure intensity and frequency for some jobs did not exceed the minimum threshold (<0.25 µg m−3 REC and <0.25 h per week, respectively), even though a diesel exposure source was identified.

  • †There were three additional observations for the frequency metric; these observations were excluded from the probability and intensity analyses, because for those metrics an ‘unknown’ had been assigned.

  • CART, classification and regression tree; OH, occupational history; REC, respirable elemental carbon.