Article Text

Download PDFPDF

0303 Evaluating differences in expert agreement between subgroups to identify where to prioritise use of multiple raters
  1. Pamela Dopart1,
  2. Hormuzd Katki2,
  3. Bu-Tian Ji1,
  4. Patricia Stewart1,3,
  5. Melissa Friesen1
  1. 1Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
  2. 2Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
  3. 3Formerly of the National Cancer Institute; currently Stewart Exposure Assessments, LLC, Arlington, VA, USA


The validity and reliability of expert-based assessments can be improved by using multiple raters. However, to maximise scarce resources, use of multiple raters should focus on jobs for which experts are more likely to disagree. For comparisons of agreement across subgroups, the standard metric Kappa must be used cautiously because it is sensitive to the ratings’ marginal distribution. As an alternative, we used Kappa’s numerator: the difference between observed and expected agreement. This value equals the Mean Risk Stratification (MRS), a novel metric also used to evaluate the predictiveness of risk models. MRS is interpreted as the number of observations (per 100) that raters will agree on beyond chance. For subgroups of jobs in three industries stratified based on 4 characteristics, we evaluated quadratically-weighted MRS from six experts’ ordinal, 4-category exposure ratings (67–74 workers per industry). For all industries, MRS was consistently lower for jobs in far vs. near proximity to an exposure source and for jobs with multiple vs. one work locations, with experts agreeing on 2–8 fewer jobs (per 100) for far proximity jobs and 0.4–12 fewer jobs with multiple work locations. MRS was also lower for jobs with subject-reported non-visible vs. visible dust accumulation in two industries (difference: 1–6 jobs) and for non-production vs. production jobs in one industry (difference: 9 jobs). The use of MRS allowed us to identify job characteristics that are associated with lower agreement between experts and to quantify the potential benefit of using multiple raters.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.