Article Text
Abstract
Introduction Surveillance of diseases and associated exposures is a major issue in occupational health, especially for identifying new work-related diseases. In addition to classical epidemiology (hypothesis-driven studies), complementary methods relying on data mining of health insurance data must be developed for early detection of work-related diseases, without prior hypothesis.
Methods Data from the insurance fund of French agricultural workers (‘Mutualité Sociale Agricole’, MSA), which covers about 3 million individuals, were considered. The study population included all self-employed or employee affiliates from the 2006–2015 period. MSA holds medico-administrative databases, which include information on occupational activities as well as long-term diseases identified with ICD-10 codes. Following authorisation of MSA and of the French National Commission on Informatics and Liberty, these databases were cross-linked for the first time. After preliminary data treatments, generalised linear models and latent factor models were applied to detect over-represented statistical associations between occupational activity and long-term disease. Results were represented as p-value plots in order to highlight the key statistical signals.
Results The population covered by this study accounted for more than 2 million individuals (n=2,250,177) with a majority of men (64%) and an average age of 46 years. Within this population, 2 45 748 individuals were reported with long-term diseases. Key statistical signals are presented for several disease groups (respiratory, neoplasms, neurodegenerative, etc).
Discussion The approach presented had the following advantages:
enabling systematic evaluations of all disease – occupational activity associations,
high statistical power, and
costless data acquisition.
The main drawback is its lack of direct information regarding exposure. For this reason, further work is currently performed to estimate exposure to pesticides retrospectively relying on other databases. This data mining approach will later be enriched by identifying diseases using the medical-related expense databases (including information on medications, biological exams, etc).