Article Text

Download PDFPDF

1154 Big data and occupational health surveillance: use of french medico-administrative databases for hypothesis generation regarding occupational risks in agriculture
  1. C Maugard1,2,
  2. D Bosson-Rieutort1,
  3. O François2,
  4. V Bonneterre1,3
  1. 1EPSP Team, TIMC-IMAG Laboratory, Grenoble-Alpes University, Grenoble, France
  2. 2BCM Team, TIMC-IMAG Laboratory, Grenoble-Alpes University, Grenoble, France
  3. 3Occupational Medicine and Health Dpt., CHU Grenoble-Alpes, Grenoble, France


Introduction Surveillance of diseases and associated exposures is a major issue in occupational health, especially for identifying new work-related diseases. In addition to classical epidemiology (hypothesis-driven studies), complementary methods relying on data mining of health insurance data must be developed for early detection of work-related diseases, without prior hypothesis.

Methods Data from the insurance fund of French agricultural workers (‘Mutualité Sociale Agricole’, MSA), which covers about 3 million individuals, were considered. The study population included all self-employed or employee affiliates from the 2006–2015 period. MSA holds medico-administrative databases, which include information on occupational activities as well as long-term diseases identified with ICD-10 codes. Following authorisation of MSA and of the French National Commission on Informatics and Liberty, these databases were cross-linked for the first time. After preliminary data treatments, generalised linear models and latent factor models were applied to detect over-represented statistical associations between occupational activity and long-term disease. Results were represented as p-value plots in order to highlight the key statistical signals.

Results The population covered by this study accounted for more than 2 million individuals (n=2,250,177) with a majority of men (64%) and an average age of 46 years. Within this population, 2 45 748 individuals were reported with long-term diseases. Key statistical signals are presented for several disease groups (respiratory, neoplasms, neurodegenerative, etc).

Discussion The approach presented had the following advantages:

  • enabling systematic evaluations of all disease – occupational activity associations,

  • high statistical power, and

  • costless data acquisition.

The main drawback is its lack of direct information regarding exposure. For this reason, further work is currently performed to estimate exposure to pesticides retrospectively relying on other databases. This data mining approach will later be enriched by identifying diseases using the medical-related expense databases (including information on medications, biological exams, etc).

  • data mining
  • long-term diseases
  • agricultural workers

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.