Article Text

Download PDFPDF

0431 How can we avoid re-identification risk in big-data analysis? proposition of a new strategy of geographical subdivisions using gis tools
  1. Charlotte Maugard1,2,
  2. Christophe Cancé3,
  3. Pauline Achard1,
  4. Olivier François2,
  5. Vincent Bonneterre1,4,
  6. Delphine Bosson-Rieutort1,2
  1. 1Grenoble-Alpes University (UGA)/TIMC-IMAG Laboratory (UMR CNRS 5525)/EPSP Team (Environment and Health Prediction of Populations), Grenoble, France
  2. 2Grenoble-Alpes University (UGA)/TIMC-IMAG Laboratory (UMR CNRS 5525)/BCM Team (Computational and Mathematical Biology), Grenoble, France
  3. 3Grenoble-Alpes University (UGA)/UMS GRICAD, Grenoble, France
  4. 4Grenoble-Alpes Teaching Hospital (“CHU Grenoble-Alpes”)/Occupational Medicine and Health Department, Grenoble, France


In order to look for relevant signals for detection of emerging occupational diseases among agricultural workers, we developed a data-mining approach applied on health-insurance data (see C. Maugard communication). Applied on the databases of the French dedicated social security system (MSA), this approach first aims to look for associations between chronic diseases and occupational activities (recorded as activity sector codes in the MSA contributors database).

To avoid re-identification, workers location has not been provided, although it is recognised as closely related to cultural practices. Therefore, it was not possible to directly estimate individuals involvement in specific cultures (through existing parcel register and agricultural census for instance) and finally use cultures x pesticides to estimate pesticides exposures.

To deal with this issue, we used an innovative approach to cut off the national territory into ”meshes”, to obtain a geographical variable accurate enough to assess cultures types while respecting a sufficient number of agricultural workers per meshes to avoid re-identification. This approach consists of an iterative process dividing each geographical unit into 4 parts while respecting a minimum threshold of workers in each mesh. The process continues until each mesh contains a homogeneous number of individuals. Taking into account the prevalence of the chronic diseases of interest, and typology of cultures, we defined a minimum number of individuals per mesh (n=1500). This methodological development allows us to get indirect information about location by MSA at a level interesting to identify cultures (proxy for pesticides use), but restricting the possibilities of individuals re-identification.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.