The use of Benford's law for evaluation of quality of occupational hygiene data

Ann Occup Hyg. 2013 Apr;57(3):296-304. doi: 10.1093/annhyg/mes067. Epub 2012 Sep 20.

Abstract

Benford's law is the contra-intuitive empirical observation that the digits 1-9 are not equally likely to appear as the initial digit in numbers resulting from the same phenomenon. Manipulated, unrelated, or created numbers usually do not follow Benford's law, and as such this law has been used in the investigation of fraudulent data in, for example, accounting and to identify errors in data sets due to, for example, data transfer. We describe the use of Benford's law to screen occupational hygiene measurement data sets using exposure data from the European rubber manufacturing industry as an illustration. Two rubber process dust measurement data sets added to the European Union ExAsRub project but initially collected by the UK Health and Safety Executive (HSE) and British Rubber Manufacturers' Association (BRMA) and one pre- and one post-treatment n-nitrosamines data set collated in the German MEGA database and also added to the ExAsRub database were compared with the expected first-digit (1BL) and second-digit (2BL) Benford distributions. Evaluation indicated only small deviations from the expected 1BL and 2BL distributions for the data sets collated by the UK HSE and industry (BRMA), respectively, while for the MEGA data larger deviations were observed. To a large extent the latter could be attributed to imputation and replacement by a constant of n-nitrosamine measurements below the limit of detection, but further evaluation of these data to determine why other deviations from 1BL and 2BL expected distributions exist may be beneficial. Benford's law is a straightforward and easy-to-implement analytical tool to evaluate the quality of occupational hygiene data sets, and as such can be used to detect potential problems in large data sets that may be caused by malcontent a priori or a posteriori manipulation of data sets and by issues like treatment of observations below the limit of detection, rounding and transfer of data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Factual / standards*
  • European Union
  • Humans
  • Industry
  • Occupational Health / statistics & numerical data*
  • Quality Control
  • Research Design / standards*
  • Rubber
  • Scientific Misconduct
  • Statistics as Topic / standards

Substances

  • Rubber