Article Text


289 Using hierarchical clustering methods to identify jobs with similar response patterns in a population-based case-control study
  1. M C Friesen1,
  2. Shortreed2,
  3. Wheeler3,
  4. Stewart4,
  5. Burstyn5,
  6. Vermeulen6,
  7. Pronk7,
  8. Colt1,
  9. Baris1,
  10. Karagas8,
  11. Schwenn9,
  12. McCoy10,
  13. Silverman1,
  14. Yu1
  1. 1National Cancer Institute, North Bethesda, United States of America
  2. 2Group Health Research Institute, Seattle, United States of America
  3. 3Virginia Commonwealth University, Richmond, United States of America
  4. 4Stewart Exposure Assessments, Arlington, United States of America
  5. 5Drexel University, Philadelphia, United States of America
  6. 6Utrecht University, Utrecht, The Netherlands
  7. 7TNO, Zeist, The Netherlands
  8. 8Dartmouth Medical School, Hanover, United States of America
  9. 9Maine Cancer Registry, Augusta, United States of America
  10. 10Vermont Cancer Registry, Burlington, United States of America


Objectives Studies have demonstrated the utility of developing expert-based decision rules based on questionnaire response patterns to assign exposure in population-based studies. However, each expert may identify different response patterns to represent exposure scenarios. To improve the reproducibility of identifying these patterns and increase the efficiency of assigning exposures, we used hierarchical clustering methods to identify groups of jobs (clusters) with similar response patterns.

Methods For each job module in the New England Bladder Cancer Case-Control Study, we applied Ward’s average linkage hierarchical cluster models to the questionnaire responses related to occupational diesel exhaust exposure to identify the most distinct 25 and 50 clusters of jobs per module. We assessed the clusters’ homogeneity based on the proportion of jobs assigned the same probability category (<50% vs. ≥50% probability of occupational diesel exhaust exposure) from a previous expert-based assessment of each job. A cluster was ‘homogeneous’ if ≥75% of the jobs were assigned the same probability category. Here we present the results for three modules: carpenter (357 jobs, 17% exposed, 52 unique response patterns), office professional (3,328 jobs, 22% exposed, 87 unique response patterns), and truck driver (508 jobs, 74% exposed, 404 unique response patterns).

Results For carpenters, 76% and 90% of the groups were homogeneous based on 25 and 50 clusters, respectively. For office professionals, 84% and 78% of the groups were homogeneous based on 25 and 50 clusters, respectively. For truck drivers, 76% and 70% of the groups were homogeneous based on 25 and 50 clusters, respectively. b

Conclusions There was reasonable homogeneity using 25–50 clusters per module (representing 3–15% of the number of jobs per questionnaire), but important heterogeneity remained. A more efficient use of expert judgment may be to assess exposure at the cluster-level and then, within expert-identified heterogeneous clusters, at the job-level.

Statistics from

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.