Article Text

Download PDFPDF

Predicting occupational diseases
  1. Eva Suarthana1,2,3,
  2. Evert Meijer1,
  3. Diederick E Grobbee2,
  4. Dick Heederik1,2
  1. 1
    IRAS (Institute for Risk Assessment Sciences), Division of Environmental Epidemiology, Utrecht University, Utrecht, The Netherlands
  2. 2
    Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
  3. 3
    Community Medicine Department, Faculty of Medicine, University of Indonesia, Jakarta Pusat, Indonesia
  1. Correspondence to Dr Eva Suarthana, IRAS (Institute for Risk Assessment Sciences), Environmental Epidemiology Division, Utrecht University, PO Box 80178, 3508 TD, Utrecht, The Netherlands; E.Suarthana{at}

Statistics from

Prediction research is relatively new in the occupational health field,1 2 3 4 although it is well established in clinical medicine.5 6 Prediction models are developed to estimate the individual probability of the presence (diagnostic model) or future occurrence (prognostic model) of an outcome (ie, disease). As an example from clinical practice, Wells and colleagues demonstrated that a diagnostic model (comprised of the patient’s history and physical examination) in combination with impedance plethysmography can safely rule out the presence of deep vein thrombosis. This approach reduced patient health care costs by avoiding expensive venography.5 Assessment of the 10-year risk of coronary heart disease (CHD) using the Framingham scores is a well known example of prognostic prediction.6 Such prediction allows physicians to identify a subset of patients with a higher probability of CHD in whom preventive action should be more effective.

The development of prediction models makes it possible to identify a small number of predictors to provide the best possible knowledge base for diagnosis. These models enable risk groups to be easily identified by quantification of the individual probability of having an occupational disease. Recently, we developed and validated diagnostic models to predict IgE sensitisation to occupational high molecular weight allergens in laboratory animal workers and bakers, and silicosis in construction workers. The models resulted in user-friendly prediction rules. This approach allows sequential testing by more burdensome tests only in the high-risk group.1 3 4

For example, the traditional approach does not specify which workers exposed to crystalline silica should undergo periodic health surveillance with extensive questionnaires, chest x rays and lung function tests.7 On the basis of disease prevalence, a high number of negative x ray results can be expected. However, a diagnostic model based on questionnaire results and lung function tests accurately identified workers with a low probability of pneumoconiosis and ruled them out from further investigations and thus improved diagnostic efficiency over no model.3

Issues in model development

The first consideration when developing prediction models is the choice of the relevant outcome and how to define it.8 In occupational allergy, identification should focus not on clinically established allergic disease but on highly associated preliminary symptoms and signs. Sensitisation to occupational allergens is an outcome strongly associated with work-related allergy and is the most appropriate characteristic that can easily be investigated.9 A logical approach is therefore to first identify sensitised workers, and then carry out sequential diagnostic investigations at the clinical level only in these workers. We therefore aimed to identify individuals with a high probability of sensitisation.1 4

The second consideration is the choice of predictors to be included in the model. In contrast to aetiological studies, prediction studies do not aim to identify causal associations.10 Furthermore, a time sequence (a cause should precede an effect) is not required in diagnostic studies. These studies are inherently cross-sectional and most predictors are actually the consequence of disease presence. In the diagnostic model for pneumoconiosis, being a “current smoker” was selected as an important independent predictor with an odds ratio (OR) of 2.4. The OR means that, compared to non-smokers, current smokers have 2.4 times higher probability of having a positive chest x ray result, without any reference to causality between smoking and pneumoconiosis.3

In practice, diagnosis is rarely established by a single test result and many test results generate more or less identical diagnostic information. Therefore, multivariable regression analysis is used to evaluate the independent diagnostic or prognostic value of each test.10 Candidate predictors with a univariable p value (<0.5) are included in the multivariable regression analysis. The final multivariable model can usually be generated with a stepwise selection method using p<0.157 for inclusion. To adjust for over-optimism, bootstrap procedures are performed and shrinkage factors calculated.10

Issues in model application

In general, prediction models show a lower performance in populations compared to that from which the model was derived. Therefore, external validation of the model in a new, but related, population is necessary to ensure its generalisability. A model is also more generalisable when applied in the same domain.11 For example, the Dutch diagnostic model for sensitisation to laboratory animal allergens was externally validated and updated to be generalisable in a Canadian animal health apprentice setting.12 Once we have a valid model, we can transform it into an easy-to-use score chart or nomogram to facilitate its use in practice.

The next important issue is to determine probability thresholds for risk stratification. Overall, a higher threshold leads to a smaller high-risk group (lower referral number) and higher specificity (lower false positive rate), but at the cost of lower sensitivity (higher false negative rate). Therefore, the choice of threshold must be a balance between the proportion of missed cases and a reduction in unnecessary diagnostic tests. It will also depend on the severity of disease, disease prognosis when missed, or, alternatively, improved prognosis when detected.

Ethical and legal aspects should also be carefully considered when applying a prediction model in occupational practice. Generally the application of prediction models should be limited to health surveillance programs and to occupational physicians screening individuals for disease and subclinical illness for early intervention.

The future of prediction modelling in occupational health

The application of prediction models in occupational health care is a novel approach. Detection of disease and subclinical illness may decrease the burden of occupational disease. Nevertheless, investigators should carefully design the model and clearly state the context where and how it can be used. There is now an opportunity to develop prediction models for diverse occupational diseases, especially given the existing large epidemiological studies. Future research is also needed to more explicitly evaluate the cost-effectiveness of these models in occupational health practice.



  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.