Article Text
Abstract
Objectives The Integrated Management Information System (IMIS) is the largest multi-industry source of exposure measurements available in North America. However, the lack of information on the censoring value (that depends on duration of sampling) of non-detected (ND) measurements considerably limits the usefulness of this databank. Released in 2010, the Chemical Exposure Health Database (CEHD) contains analytical results and measurement details, including duration of sampling for some of the records in IMIS. We assessed which ND results stored in IMIS are short-term (ST), and which are shift-long (LT) samples, based on information available in CEHD.
Methods We analyzed exposure measurements for 54 agents from 1984–2009 (n=238,826). First, we calculated kappa coefficients (&_x0138;) for each agent to investigate the agreement between the exposure type of IMIS detected records (already indicated as ST or LT, i.e. selected by OSHA officers) and the exposure type suggested by sampling duration found in CEHD. If &_x0138; exceeded 0.3 for an agent, we employed classification and regression trees (CART) models to predict whether the ND results from IMIS should be classified as ST or LT samples. CART was developed using CEHD and applied to IMIS, relying on predictors common to both databanks: industry, reason for inspection, scope of inspection, region, union status, and year of sampling.
Results The median proportion of ND results per agent was 37% (interquartile range (IQR)=22%–62%). The median &_x0138; was 0.45 (IQR=0.37–0.64) and 0.03 (IQR=0.01–0.16) for solvents/gases and metals/isocyanates, respectively. Solvents (n=22) and gases (n=7) were selected for CART modeling. Industry was the most important predictor variable in classifying ND results into either ST or LT.
Conclusions This novel approach can be used to assign a censoring value to ND results, thus allowing more accurate inference about distribution of exposure levels in IMIS.