Article Text
Abstract
Objectives Retrospective assessment of environmental pesticide exposure is challenging. Exposure measurements or information on crop-specific pesticide use are often lacking historically. We applied expert assessment to reconstruct historical pesticide use patterns in the Netherlands, and evaluated reliability and accuracy of this procedure.
Methods For six main crops in the Netherlands, two experts per crop individually rated the probability (percentage of farmers applying) and frequency of use of authorised active ingredients between 1961 and 2005 per 5-year period. Inter-rater agreement was investigated by the percentage overall agreement and weighted Cohen's κ's (κw). Experts’ ratings were compared with self-reported pesticide use from recent farmer surveys to determine accuracy of the ratings.
Results Inter-rater agreement on the probability of use varied between crops (κw 0.25 to 0.69), as well as agreement on the frequency of use (κw 0.32 to 0.64). Inter-rater agreement was marginally higher for herbicides and fungicides than insecticides. Comparisons with survey data indicated fair to good accuracy of the experts’ ratings for the probability (κw 0.48 to 0.65) and frequency of use (κw 0.38 to 0.68). For all crops except fruit, the specificity of the experts’ ratings was higher than the sensitivity.
Conclusions Overall inter-rater agreement between experts was fair to good and experts’ ratings were reasonably accurate. Results of this study indicate that expert assessment can be used to derive information on historical pesticide use, which is essential for epidemiological studies evaluating the effect of (past) environmental exposure to pesticides on health.
Statistics from Altmetric.com
What this paper adds
-
Retrospective assessment of pesticide exposure is a major challenge for epidemiological studies evaluating the effect of (environmental) pesticide exposure on health.
-
It is often unclear which authorised pesticides have actually been applied by farmers in the past.
-
Expert assessment was used to evaluate the probability and frequency of use of individual pesticides on six main crops in the Netherlands between 1961 and 2005.
-
Reliability between experts and accuracy of expert's ratings compared with external data was overall fair to good.
-
Expert assessment appears to be a valuable tool to reconstruct complex crop-specific pesticide use patterns back in time for use in epidemiological investigations.
Introduction
Exposure to pesticides has been associated with a wide range of health effects, like adverse reproductive outcomes, cancers and neurodegenerative diseases as Parkinson's disease.1–3 Retrospective assessment of exposure to pesticides is a considerable challenge, but necessary to study health outcomes that involve past exposures and long latency periods, or that are dependent on the timing of exposure. Accurate assessment of historical exposure to pesticides is critical for the validity and power of such studies, as misclassification can lead to bias and attenuated risk estimates.4–6
Although occupational pesticide exposure levels are likely to be higher than exposure from environmental sources, the number of people exposed will be relatively small in the general population (eg, farmers and commercial applicators). In contrast, environmental pesticide exposure levels are likely to be low, but the number of people exposed will be much larger, including a potentially more sensitive part of the population (eg, children and elderly). Studies addressing the association between environmental pesticide exposure and health have traditionally focused on the distance between residences and agricultural land, as proxy for environmental pesticide exposure.7 ,8 More recent investigations have used geographic information systems (GIS) to assess environmental exposure to pesticides, combining spatial and temporal data on the location of residences and crops, with information on crop-specific pesticide use.9 ,10 Often only a limited time frame is covered in these studies due to limitations in the collected residential histories, land-use data or pesticide-use data available. While spatially resolved data on agricultural land-use may be present back in time, records on crop-specific pesticide use will only be available in some countries and predominantly for recent years. The variety of active ingredients used as pesticides in agriculture, their numerous trade names and changes in application practices over time are complicating factors. In the Netherlands the number of active ingredients and pesticide products peaked around 1990 with roughly 1800 pesticide products being commercially available.11 As records on crop-specific pesticide use are lacking in the Netherlands before the mid-90s, other sources need to be used to reconstruct this information for use in epidemiological investigations.
Expert assessment has been widely applied to assess historical occupational exposures12 and several studies have used an expert-based approach to address occupational exposure to pesticides in agricultural and forestry workers.13–16 However, to the best of our knowledge, no study to date has used expert ratings as an integral part of the retrospective assessment of environmental exposure to pesticides.
We used expert assessment to reconstruct historical pesticide use for six main crops in the Netherlands to facilitate retrospective assessment of environmental exposure to pesticides based on residential histories from a multicenter hospital-based case–control study on Parkinson's disease.17 To evaluate the reliability and accuracy of this procedure, we assessed inter-rater agreement between experts and agreement between the experts’ ratings and external data sets, that is, self-reported pesticide use by farmers from national surveys.
Materials and methods
Expert selection
Former agricultural extension workers and crop protection researchers affiliated with the main Dutch agricultural organisations (governmental and private) and crop-cooperatives were approached to participate in this study if they started their work as crop protection specialist around 1970. Owing to the small pool of eligible (retired) experts per crop, it was decided to select experts by peer nomination. An initial set of candidates was identified through contact with the organisations and cooperatives. Subsequently, each expert was asked to identify other experts who met the defined criteria. Two experts were interviewed for each crop.
Data collection
The experts participated in an interview and completed a crop-specific questionnaire on the use of active ingredients between 1961 and 2005. These questionnaires were based on available historical agricultural pesticide guidebooks issued annually by the Ministry of Agriculture, which contained crop-specific information on authorised pesticide active ingredients and commercially available products. Each questionnaire contained checklists of active ingredients that were authorised to be used on the crop, providing information on the time period each active ingredient was authorised as well as several related trade names to enhance recall. The experts rated the probability of use, by estimating the percentage of farmers who would annually apply the active ingredient, for each 5-year time period between 1961 and 2005 if the active ingredient was authorised during this period. Furthermore, experts rated the average annual frequency of application per active ingredient. This latter rating was not time-specific. Ratings were performed independently by each rater; there was no contact or discussion between the experts before or during the rating process. To determine the accuracy of the experts’ ratings we compared these with external data sets, that is, recent national survey data on self-reported pesticide use among farmers, obtained from Statistics Netherlands (CBS).18 We used the available survey data sets from 1995, 1998, 2000 and 2004, which provided information on the self-reported use of active ingredients per crop on a national level, based on approximately 3000 respondents.
Crops
We selected six main crops in the Netherlands: beets (predominantly sugar beets), cereals, maize, potatoes, fruit (predominantly apple and pear) and tulip bulbs. These crops were selected because of their total share in Dutch agriculture (beets, cereals, maize and potatoes), or relatively high pesticide use (fruit and tulip bulbs). In addition, agricultural land-use maps of the Netherlands, that were available for multiple points in time, contained information on the location where these crops were cultivated. The initial aim was to obtain expert ratings of pesticide use on flower bulbs in general, but during the interviews it became apparent that flower bulbs were a too complex culture, with many subspecies (eg, tulips, lilies, narcissus and hyacinths) with distinct pesticide use. As the experts were unable to capture this diversity in one estimate, the questionnaire was repeated and restricted to tulip bulbs, which is the largest bulb species cultured in the Netherlands. Greenhouses were not included, as available spatial data on agricultural land use in the Netherlands did not include information on the type of greenhouse (eg, vegetables or ornamental plant cultivation) or individual crops grown in the greenhouse, limiting our ability to use this information in subsequent works. Also, previous studies have shown the complexity of pesticide use in Dutch greenhouses,19 and we anticipated that experts would not be able to derive overall ratings for the use of active ingredients in greenhouses in general.
Statistical analysis
Statistical analyses were performed using SAS V.9.2 Software. The probability of use was categorised into no or limited use (≤10% of farmers), potential use (>10–50% of farmers) and probable use (>50% of farmers), with assigned values of 0, 1 and 2, respectively. The exception were tulip bulbs for which the experts rated the probability of use directly in the categories as defined above. Frequency of application was also categorised in three categories: not used (0), used once or twice (1) per year and used more than twice per year (2). Inter-rater agreement on the categorical metrics was evaluated by the percentage of overall agreement and weighted Cohen's κ scores (κw). Overall agreement was calculated by dividing the sum of the frequencies of the main diagonal of the contingency table by the sample size. The κw gives the proportion of agreement that cannot be expected by chance alone.20 Linear weights were applied to take into account the extent of disagreement.21 The strength of the agreement (κw) was interpreted using the following cut-off points: <0.4 poor, 0.4–0.75 fair to good and >0.75 excellent agreement.22 All analyses were restricted to authorised combinations of active ingredients and time periods. Stratified analyses were performed to assess whether agreement between experts changed over time and differed between functional groups of pesticides (insecticides, herbicides and fungicides). Accuracy of the experts’ responses was assessed by comparing the experts’ ratings (average and individual ratings) with data on self-reported pesticide use from recent farmer surveys. Per crop the average rating was calculated from the continuous ratings, and subsequently categorised following the defined cut-off values. Tulip bulbs were the exception as the probability ratings were already in categories. Here the midpoint of each category was used to calculate the average rating, which was then again categorised. Expert ratings of the probability of use of active ingredients during the periods 1990–1995, 1996–2000 and 2001–2005 were compared with self-reported use from the farmer surveys of 1995, 1998/2000 (average of the two surveys) and 2004, respectively. Only the 1995 survey contained information on the frequency of application (of the most important active ingredients per crop), and agreement with the experts’ frequency rating was calculated for the active ingredients included in this survey. All survey data were categorised using the same cut-points as used for the experts’ ratings. Sensitivity and specificity were calculated with the survey data as reference, in dichotomised variables (≤10% of farmers vs >10% of farmers applying an active ingredient). For the crops where continuous ratings were collected, we investigated the effect of using different cut-points for the categorisation of the ratings on the κw. In addition, we evaluated agreement using the intraclass correlation coefficient (ICC).
Results
Nine experts participated in this study, of which two participated for multiple crops. The experts’ experience in crop protection ranged from 36 to 47 years (see online supplementary table S1). The number of active ingredients investigated per crop varied between 47 for maize and 138 for fruit, and the total number of ratings (number of active ingredient multiplied with the relevant time periods) ranged from 228 (maize) to 855 (fruit). Of the total number of active ingredients assessed, experts denoted 27–57% as ever used (by >10% of farmers), of which 3–25% was likely ever used by the majority of farmers (table 1).
Inter-rater reliability
The overall percentage agreement and weighted Cohen's κ scores (κw) with corresponding 95% CI for inter-rater agreement on the probability of use are shown in table 2. Agreement between the experts varied between crops, but was overall fair to good (κw 0.45 to 0.69), except for cereals (κw 0.25) and tulip bulbs (κw 0.32). Restricting the analyses to the time period both experts were employed as crop protection specialists did not change the level of agreement (data not shown). Inter-rater agreement on the frequency of application was also fair to good for all crops (κw 0.57 to 0.64) except cereals (κw 0.32; table 2).
No trends in agreement between the experts were observed over time (data not shown), which is in line with the observation that restricting the analyses to experts’ overlapping years of employment did not change the level of agreement. Agreement between the experts on the probability of use tended to be slightly higher for herbicides and fungicides (table 3), although for potatoes and fruit the highest level of inter-rater agreement was found for insecticides. The overlap in ratings of insecticide and fungicide use on cereals was poor (κw 0.08 and κw 0.04 respectively). Despite the low κw, the percentage overall agreement is relatively high due to the high number of ratings in the no or limited use category.
Accuracy expert ratings
Comparisons between the average experts’ ratings of the probability of use and self-reported pesticide use from the farmer surveys indicated fair to good agreement for all crops (κw 0.48 to 0.65; table 4). Information on the frequency of application was only available for the most important active ingredients per crop in one edition of the surveys (1995). Agreement (κw) between the average experts’ rating of application frequency and survey data ranged between 0.38 and 0.68 (table 4).
The level of agreement between the individual expert ratings and survey data differed between the experts however (see online supplementary tables S2 and S3), and this difference was substantial for some of the crops, for example, cereals (expert 1 κw 0.24 vs expert 2 κw 0.50, probability of use). For three of the crops, the average expert ratings showed higher agreement with the survey data than either one of the individual ratings, indicating that experts disagreed with the survey data on different active ingredients. Sensitivity and specificity were calculated with the survey data as reference. Sensitivity was moderate (0.58–0.82) but specificity was high (0.85–0.95), with the exception of fruit where specificity was only moderate (0.64) and sensitivity high (0.92; see online supplementary table S4).
No significant correlation was found between years of relevant employment of the experts and the κw values of the individual experts ratings compared with the survey data. Overall, fungicides and herbicides were most heavily used on the crops of interest (highest probability and/or highest frequency of use) according to the experts (see online supplementary table S5). Using different cut-points for the categorisation of the experts’ ratings did not result in different κw and the ICC calculated from the continuous ratings corresponded well to the κw presented (data not shown).
Discussion
Accurate retrospective assessment of environmental pesticide exposure is challenging but critical for the validity and power of epidemiological studies evaluating possible associations between environmental pesticide exposure and health. In this study, we applied expert assessment to reconstruct historical crop-specific pesticide use for six main crops in the Netherlands. Inter-rater agreement varied between crops, but was overall fair to good. Inter-rater agreement tended to be marginally higher for herbicides and fungicides than for insecticides. Comparison with self-reported pesticide use data from recent farmer surveys indicated that experts’ ratings were reasonable accurate.
Expert assessment has some important benefits over self-reported exposure, which can be prone to differential recall and large uncertainties.4 ,5 In the case of environmental pesticide exposure, people will likely be unaware of the pesticides applied in their vicinity, which excludes the possibility of using self-reported information. Even self-reported proximity to crops, a proxy used for environmental pesticide exposure, has shown to be subject to differential recall between cases and controls.8 Combining expert-derived historical pesticide use data with spatially resolved information on agricultural land use in GIS provides a powerful tool to retrospectively assess environmental exposure to pesticides with minimal recall bias when residential histories of the study population are available.9 ,23 Information on historical pesticide use was not available in the Netherlands and therefore this data had to be collected from experts in the field.
Our study focuses on individual active ingredients and takes into account their relative importance (probability and frequency of use) as well, which is important to evaluate potential health effects of specific pesticides and to address exposure-response relationships in subsequent epidemiological analyses. Our extensive questionnaires included all active ingredients authorised for use on the crops of interest in the Netherlands between 1961 and 2005, while other studies mainly focused on a limited number of pesticides, or broad categories (eg, pesticides and herbicides). Gathering information on such a wide range of active ingredients will provide the opportunity to study potential health effects of combined exposures as well.
We limited this study to six main crops (which cover approximately 80% of arable land in the Netherlands), and did not collect information on pesticide use in greenhouses and on ‘other open field crops’. Both these classifications comprise a wide range of crops cultivated, adding to the complexity of pesticide use, as shown for Dutch greenhouses.19 We anticipated that experts would not be able to derive overall estimates for these complex crop-groups.
We used weighted Cohen's κ scores to evaluate agreement between experts and between experts and external data, which are regarded as a chance-corrected metric of agreement.20 The use of κ scores has been criticised as they are considered to be difficult to interpret and cannot easily be compared between studies and groups, due to their high dependence on weighting scheme, sample size, number of rating categories and marginal distributions of the ratings.24–26 The number of ratings and their distribution differed for each of the crops and pesticide groups presented in this paper, which makes a direct comparison between these strata difficult. However, we think that the κw and overall percentage agreement do provide insight into the performance of the experts and the value of the data collected. The experts’ ratings were treated as being independent in the analyses. As active ingredients were used over multiple consecutive time periods and a wide range of (chemically) related substances have been used on the same crop, they are not fully independent. Experts will likely use their ratings of previous time periods or related active ingredients as reference for subsequent ratings. As the number of active-ingredient time period combinations was limited, we did not have enough power to apply more complex hierarchical modelling approaches to account for this correlation.
Further improvements in the expert assessment procedure are possible. No confidence score was gathered for each rating, as this was considered too time consuming for this questionnaire, which covered the use of up to 138 active ingredients over nine 5-year time periods. No prior training and calibration of the experts was performed. Training and calibrating of experts by providing them with current data on crop-specific pesticide use could potentially result in more accurate estimates.12 ,15 ,27 However, improved ratings after presenting calibration data are not a consistent finding28 ,29 and we decided to use the external information available to assess the accuracy of the experts’ responses instead of performing a calibration exercise.
The weighted Cohen's κ (κw) scores ranged between 0.25 and 0.69 for the probability of use, and between 0.32 and 0.64 for the frequency of application. This indicates that the experts in this study were able to reliably assign time-specific use of pesticides to crops. Lack of full agreement indicates that subjective recall and/or differences in prior knowledge influenced the experts’ ratings. We did observe variation in inter-rater reliability between crops. The observed κw were generally within the range found by other (occupational) studies using expert assessment,12 despite the large number of chemicals and time periods covered in this study. We did not find trends in agreement between the experts over time and restricting the analyses to the period both experts were employed as crop protection specialists did not improve the level of inter-rater agreement. The large variety of active ingredients authorised over time and changes in trade names tend to make recalling the use of specific pesticides over time difficult. The questionnaire provided experts with information on the time periods the active ingredients were authorised to be used, and examples of trade names used over time. Presenting such information in checklists and providing a timeline may enhance recall.12 ,30 Inter-rater agreement appeared to be slightly higher for herbicides and fungicides, although this is a difficult comparison to make as the κw are highly influenced by the number and distribution of ratings within each group. In general, herbicides and fungicides are more frequently and preventively applied than insecticides, where application depends primarily on the occurrence of pests and exceeding a certain threshold of pest abundance. We found that mainly herbicides and fungicides have been widely applied to the different crops according to the experts, which is in line with available data on self-reported use of these pesticide groups by famers.18 The higher inter-rater agreement observed for these groups corresponds with previous observations that experts find it easier to make estimates for commonly used agents and classes of chemicals.12
Comparison between the average expert ratings and self-reported pesticide use from recent farmer surveys indicated fair to good agreement for all crops, for the probability (κw 0.48 to 0.65) as well as frequency of use (κw 0.38 to 0.68). For cereals, fruit and tulip bulbs the averaged experts’ ratings showed higher agreement with the survey data than either one of the individual ratings. This indicates that the experts performed better for some active ingredients and poorer for others, possibly reflecting their individual experience with these specific compounds or regional differences in pesticide use. Increasing the number of raters could potentially improve the accuracy of the experts’ assessment.13 ,31 There were differences in the accuracy of the individual expert ratings. For example, for cereals the ratings of one expert showed substantially lower agreement with the survey data than those of the other expert. This highlights the need for using multiple raters to obtain more stable and accurate average estimates and to identify deviant raters and/or ratings if additional external sources of information are available. We will take such differences between experts into account in further applications of this expert-derived data. In general, specificity was higher than sensitivity, indicating that experts underestimated use of active ingredients, but did not produce many false-positive ratings. High specificity of an exposure assessment method is important for application in population-based epidemiological studies to minimise attenuation in exposure-response relationships.4 ,5 A consideration for future applications is the observation that specificity decreased slightly for three crops when expert ratings were averaged (maize, fruit and tulip bulbs). Sensitivity analyses using averaged and individual expert ratings might be needed to determine the effect in subsequent epidemiological analyses. It should be noted that the survey data used as reference for this comparison was based on self-reported pesticide use by a large sample of Dutch farmers who participated in the surveys on a voluntary basis (approximately 3.000 participants in total) and therefore might not reflect the entire population of farmers. The self-reported pesticide use was not validated by means of accounting records, but there are reports showing similar trends in these survey data and national sales figures.11 These data sets could be prone to some participation and reporting bias, although it is unclear to which extent and how this might affect accuracy of the experts’ ratings in our study. Although not a perfect gold standard, we believe that the comparison with this self-reported data does provide insight in the performance of our experts. We could only assess accuracy for a recent time period (1995–2004). As we did not observe trends in inter-rater agreement over time, we expect that the accuracy observed for this recent time window also applies to the preceding periods, and that these expert ratings of historical crop-specific pesticide use can be used in the assessment of lifetime environmental exposure to pesticides.
Although the obtained pesticide use ratings are likely driven by factors such as local farming practices and meteorology and thus represent country specific estimates, this approach using expert assessment could be applied also in other countries, provided that there are agricultural crop-protection specialists that advise farmers. We will combine this expert-derived information on historical pesticide use with spatially and temporally resolved data on agricultural land use and residential histories in GIS, to evaluate potential health risks from lifetime environmental exposure to pesticides in the Netherlands.
In conclusion, we observed that experts were capable of identifying pesticides historically being used and reliably assign a probability and frequency of use to individual active ingredients. Comparison with external data also revealed fair to good agreement, indicating that the experts’ ratings were sufficiently accurate. Results of this study indicate that expert assessment is a valuable tool to reconstruct complex crop-specific pesticide use patterns back in time.
Acknowledgments
The authors would like to thank the participating experts for the time and effort invested and their valuable contribution to this study.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online supplement
Footnotes
-
Contributors All authors have been involved in the work submitted, share responsibility for, and approved of the submission of the manuscript.
-
Funding This study was partly supported by funding from the Stichting Internationaal Parkinson Fonds (The Netherlands), research grant 2007-18.
-
Competing interests None.
-
Provenance and peer review Not commissioned; externally peer reviewed.