Article Text

This article has a correction. Please see:

Download PDFPDF

Clinical validation of methods of diagnosis of neuropathy in a field study of United Kingdom sheep dippers
  1. D Buchanan1,
  2. G A Jamal3,
  3. A Pilkington2,
  4. S Hansen4
  1. 1ISD, Scottish Executive, Edinburgh, UK
  2. 2Wellwork Ltd, Edinburgh, UK
  3. 3Department of Neurology, Charing Cross Hospital, London, UK
  4. 4Department of Neurology, INS, Southern General Hospital, Glasgow, UK
  1. Correspondence to:
 Dr S Hansen, Department of Clinical Physics, Southern General Hospital, 1345 Govan Road, Glasgow G51 4TF, UK;


Objectives: To investigate the reproducibility of measured indices of chronic peripheral neuropathy from a field study of sheep dippers when compared with similar measurements carried out in a clinical setting.

Methods: A stratified random sample of field study subjects was invited to attend a clinic. Neuropathy was measured both in the field and at the clinic with a modified version of a standard symptoms questionnaire and quantitative sensory thresholds for hot, cold, and vibration. These were combined into a classification of the likelihood of neuropathy with a neuropathy scoring system. Indicators of sensory abnormality were based on comparison of sensory thresholds to age dependent reference values derived from an external reference group.

Results: Only 51% of subjects were assigned similar classifications in the field and clinic based on the neuropathy scoring system. Of the component indices, grouped symptom scores, with 65% of subjects showing exact agreement, proved to be more reproducible than quantitative sensory test indicators. There were biases in the comparison of field and clinic measurements of hot and vibration sensations, but no evidence of greater variation between individual people in sensory thresholds in the field relative to at the clinic.

Conclusions: The neuropathy scoring system proved to be of limited reproducibility, due in a large part to the lack of reproducibility of the indicators of sensory test abnormality caused by inadequate temperature control. However, the symptoms score and measured sensory thresholds could be used separately as indices of neuropathy in exposure-response analyses.

  • chronic peripheral neuropathy
  • reproducibility
  • scoring system
  • OPs, organophosphates
  • Ip, index of nerve pathology
  • QST, quantitative sensory tests

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

An epidemiological study has recently been carried out into the relations between exposure to organophosphates and indices of chronic peripheral neuropathy in United Kingdom sheep farmers and dippers.1 This study comprised a field survey of neurological health effects and long term exposure to organophosphates (OPs) among over 600 working sheep farmers in two regions of the United Kingdom. Two smaller unexposed control groups, consisting of non-sheep farmers and workers from ceramics factories, were also included.

A clinical substudy was also carried out, in which a sample of sheep dippers were invited to attend a clinical examination in which the field measurements were repeated and additional tests carried out. The objective of the clinical substudy was twofold. Firstly, the more detailed clinical examination gave more precise information on the nature of any adverse neurological health effects indicated in the field study, together with an assessment of any additional neuropsychological abnormalities. The results of this component of the clinical study are described in detail elsewhere.2 Secondly, repeating the field survey procedures in the clinic gave the opportunity to investigate the reproducibility of the field survey results and hence the reliability of the field survey measurements as indicators of neuropathy, within the sample of sheep dippers. The present paper considers this reliability aspect of the clinical substudy.

The Mayo Clinic method has for many years provided a system for the diagnosis of polyneuropathy among hospital attendees. This method was derived from comparison of different clinical tests against a quantitative measure of abnormality in peripheral nerves, the index of nerve pathology (Ip).3 The index compares the density of myelinated fibres in nerve biopsies from healthy subjects with fibre density in nerve biopsies from diabetic patients, both with and without neuropathy. Abnormality within any two of the clinical tests occurring independently resulted in the same allocation of subjects to normal and abnormal categories as did the Ip.

The Mayo Clinic method for diagnosing neuropathy has only been validated in a clinical setting with professional staff. The current study allowed the opportunity to compare the method in the clinic and the field.


Subject selection

Selection of field study subjects for invitation to the clinic (at the Institute of Neurological Sciences, Glasgow) was by stratified random sample with the strata corresponding to predicted diagnosis of neuropathy based on field survey results as described later. Eighty subjects were invited from each of the none and possible neuropathy categories. Due to small numbers in the group, all 44 sheep farmers from the probable or definite group were invited. In the letter of invitation, subjects were given some indication of their results from the field study. Non-responders were followed up by phone about 2 weeks after the initial invitation. Only farmers or farm workers were invited to attend, although this included those who had never been exposed to sheep dips.

In the field study, subjects were assessed in farm premises, usually the living quarters.1 Surveys were carried out in the winter months to avoid distortion by acute effects resulting from recent dipping, which normally took place over the summer. Clinical assessment took place at a single clinic (the Institute of Neurological Sciences, Glasgow) between 7 months and 18 months after the field survey, with a mean (SD) gap of 13.4 months (1.9 months). As the field study had investigated indices of chronic peripheral neuropathy, it was considered likely that clinical subjects would not show any major changes in their neurological deficit during the interval between the field and clinical studies.

Diagnosis of neuropathy

The method used to diagnose neuropathy in the field was a modified version of the Mayo Clinic method4 which was originally designed to detect the prevalence and severity of polyneuropathy among subgroups of the general population in a hospital setting. The full battery of clinical tests included in the Mayo Clinic method consisted of a neurological symptoms score, neurological disability score, quantitative sensory tests (QST), and nerve conduction studies. It was considered that only the neurological symptom score and the QST components, both after modification, could be used in the field study by non-clinical personnel.

Symptoms questionnaire

A modified version of the Mayo Clinic symptoms questionnaire was devised specifically to detect chronic neurological effects which may be associated with exposure to OP pesticides.1 Peripheral nerves were assessed for muscle weakness and sensory functions in the upper and lower limbs. Both negative and positive sensory symptoms were investigated. The autonomic nervous system questions considered symptoms occurring within specific organ groups—for example, bladder function or gastrointestinal symptoms. The modification excluded questions involving the cranial nerves and included others to ensure that symptoms were only scored if they occurred bilaterally. Only symptoms that had been present for at least 1 month within the previous year were recorded. Symptoms were summed to give a symptoms score, although autonomic symptoms were down weighted by a factor of 0.5 to increase specificity. The questionnaire was administered in the field by a trained technician and in the clinic by a neurologist.

Quantitative sensory tests

Two thermal sensory tests, for hot and cold sensation, and a vibration sensory test were used in both field and clinic.

The Somedic Vibrameter was used for measurement of vibration threshold.5 This equipment follows physiological principles by indicating the stimulus strength as the amplitude of vibration and not the voltage input to the vibration transducer.6 The vibration threshold, which tests the fibre function of large peripheral nerve, was measured over the right metatarsal bone in the foot.

The method for measuring thermal thresholds has been developed at the the Institute of Neurological Sciences, Glasgow7 and the instruments used were manufactured by Medelec.8 This assesses the function of the small peripheral nerve fibres, cold threshold for small diameter myelinated Aδ fibres, and hot threshold for small diameter unmyelinated (C) fibres. The thresholds were measured over the dorsum of the foot and calculated as the change from basic skin temperature (34°C) under the probe. The foot skin temperature was measured at the start of the assessment, and if lower than 31°C was warmed using warm water.

The QSTs were administered in the field by a trained technician1 and in the clinic by the scientist who developed the thermal threshold technique. The same equipment models were used in both the field and clinical surveys. In the field study, where ambient temperatures were not easily controlled, it was often necessary to warm the foot during and before thermal sensory testing.

Each QST threshold was scored as normal or abnormal by comparison with clinical reference values. The reference values for the thermal sensory tests were calculated with a sample of 68 healthy male and female volunteers aged between 16 and 768 to estimate the upper 95th percentiles adjusted for age. For the vibration threshold, age dependent reference values were obtained from the handbook for the Somedic Vibrameter based on a study of 100 normal subjects.

Neuropathy scoring system

A neuropathy scoring system, based on the Mayo Clinic method, was used to classify subjects into categories of none, possible, probable, and definite neuropathy (table 1).

Table 1

Classification of likelihood of neuropathy based on neurological symptoms and quantitative sensory testing (QST)

It was originally intended that all three QST results would be used in the scoring system. However, after initial investigation of the field survey results it was found that there was an unexpectedly high number of apparently abnormal cold thresholds in comparison with clinical reference values.1 This resulted in many subjects falling into the possible neuropathy category, even among the non-exposed groups. This was thought to be due to the adverse effect of cold ambient temperatures on thermal thresholds in the field and therefore the inapplicability of hospital based clinical reference values. Therefore, to ensure a wide selection of true disease states among those attending the clinic, a modified scoring system was used for selection that excluded results from the cold QST.

Statistical methods

Agreement between field and clinical classifications of neuropathy was investigated, initially by cross tabulation. More formal comparisons were made using χ2 tests of association and κ statistics.9 The κ statistic is a unitless measure for the level of agreement between two categorical variables where 0 represents no agreement and 1 perfect agreement. With ordered categories, a weighted κ statistic takes account of the level of disagreement in terms of number of categories by which the two classifications differ, and weights accordingly. Altman10 presents guidelines for interpreting κ statistics which indicate that values less than 0.20 signify poor agreement. The level of agreement between continuous variables was assessed with scatter plots, linear correlation coefficients, and the method of paired differences,11 after log transformation of thresholds to normality. Statistical analysis was carried out with Genstat for Windows version


Study group

Seventy nine subjects attended the clinic, most of whom were men (87%), and all of whom had been exposed to sheep dips. Ages ranged from 20 to 66 years with a mean (SD) of 45 (11.2) years. There was no evidence of a difference in recruitment rates from the two regions surveyed, with 65% of those attending being from Scottish farms. Subjects drank on average 9.5 units of alcohol a week (SD 8.6 units).

Neuropathy scoring system

The comparison of field and clinic classifications of neuropathy is shown in table 2. The probable and definite categories have been combined due to the small numbers in the definite category. Forty (51%) of the 79 subjects were classified in the same category in both the field and clinic. A χ2 test of association was significant (p=0.02). However, agreement based on the κ statistic (κ 0.26, SE 0.08) would only be described as fair based on the guidelines of Altman.10 The weighted κ statistic was very similar (κ=0.27), indicating that the level of disagreement was as likely to be by two categories as one. There was no evidence of bias in the direction of disagreement, 18 (46%) of the 39 off diagonal elements being in the direction of greater likelihood of neuropathy in the clinic compared with the field. However, 11% were classified at opposite ends of the diagnostic scale on the separate occasions.

Table 2

Comparison of field and clinic classification of neuropathy

Symptoms score

Table 3 shows the cross tabulation of symptoms score in the field and clinic. Symptoms scores have been grouped into three categories that correspond to how it is used within the overall neuropathy scoring system. Exact agreement of the grouped neuropathy symptom score was found for 51 (65%) of the 79 subjects. Overall agreement was significantly better than chance, based on the χ2 test of association (p<0.001). The unweighted κ statistic, κ 0.37 (SE 0.10), points to reasonable reproducibility of the questionnaire between field and clinic, whereas the higher value of the weighted κ of 0.46 suggested that often disagreement was only by one, rather than two, categories.

Table 3

Comparison of field and clinic symptoms scores

Agreement of QST outcomes

Among the three QST outcomes, the proportion of subjects for which field and clinical outcomes were concordant, representing exact agreement, ranged from 60%–68% (table 4). This was significantly better than chance for the heat and vibration tests, but not the cold test (p=0.10). Relative to chance, overall agreement was highest for the vibration test (κ 0.30, SE) than for either the hot (κ 0.22) or cold tests (κ 0.18), but, in all three, was poorer than for the symptoms score.

Table 4

Reproducibility of outcomes between field and clinic, showing number of concordant and discordant pairs of measurements, together with χ2 test of association and κ statistic

Measured QST thresholds

A comparison of the actual measured sensory test thresholds is more informative about magnitude of differences between field and clinic than comparing binary test outcomes. Figure 1 shows scatter plots of the three QST thresholds, on the log scale, as they were measured in both field and clinic. These show, for each test, a positive linear correlation between field and clinic. Linear correlation coefficients were 0.71, 0.44, and 0.66 for the log transformed hot, cold, and vibration thresholds respectively. There was evidence of bias in the hot and vibration, but not the cold, measurements (table 5). Hot thresholds tended to be lower by a factor of two, and vibration thresholds higher by a factor of two, in the field relative to the clinic. Adjusted for these biases, differences between field and clinic varied within subjects with a GSD of about 2.8 across all three sensory thresholds. Based on the ratio of GSDs, variation between subjects in the field was comparable with that in the clinic for cold and vibration thresholds (0.99 for cold, 0.96 for vibration) and was only slightly higher in the field for hot thresholds (1.25).

Table 5

Comparison of field and clinic QST thresholds (log scale)

Figure 1

Comparison of field and clinic measurement of QST thresholds. The solid line represents the line of equality.

Other factors

In the main epidemiological phase of the study it was found that, for the same age, sex, and exposure, farmers in the English regions reported symptoms more often than farmers in Scotland, with an odds ratio (OR) of 2.0.1 Therefore, the reproducibility of symptoms (categorised as <1 and ≥1 due to generally low prevalence of any symptoms) in the clinic was analysed by region. Reproducibility among the 26 English farmers (κ 0.55; SE 0.16) was found to be better than that among the 53 Scottish farmers (κ 0.24; SE 0.14), principally because the Scottish farmers tended to underreport symptoms in the field relative to the clinic. The reason for this is unclear but may have reflected greater media exposure in England about the health effects of organophosphates around the time of the epidemiological field survey. Also in the main epidemiological phase, women of the same age and exposure were found to report symptoms more often than men (OR 2.9). However, there was no evidence of markedly different reproducibility of symptoms between men and women, although data were only available for 11 women who attended the clinic.

The biases noted for the hot and vibration thresholds in the field relative to the clinic were found to be consistent when subjects were categorised by region and sex.


The current study allowed the opportunity to compare a modified version of the Mayo Clinic method for diagnosis of neuropathy in both a field and clinic setting. The response rates, representativeness, and the selection of the subgroup of subjects for the clinical study from the epidemiological survey has been described elsewhere.1,2 The question of response rates and representativeness is less important as this paper only compares the findings from the field and clinic measurements in the same subjects.

The reproducibility of the overall neuropathy scoring system was only marginally better than chance, with a significant minority of subjects being classified at opposite ends of the scale on the two separate occasions. Therefore, to elucidate this finding, the reproducibility of symptoms score and QST score components of the classification system were assessed separately.

The symptom scores showed an agreement that was significantly better than chance and there was no evidence of bias between the scores obtained in the field and at the clinic. Furthermore, disagreements were often by one category rather than two.

The QST scores reflected an inconsistency between the QST field measurements, especially cold, and the hospital based clinical reference values from which abnormality was defined. There were an unexpectedly high number of abnormal cold thresholds in the field, even among the non-exposed control group of ceramics factory workers (48%), suggesting a methodological problem of this measurement in the field.

Clinical experience with QST measurements show these methods to be very valuable in detecting early nerve damage because of their high sensitivity. For this reason, and because they can be applied by trained field officers, the measurement of QST thresholds was included in the epidemiological study. The repeatability of the thermal sensory tests in the clinic has been evaluated with a group of 320 normal people.7,13 The method is used in many centres and the high sensitivity of the technique is supported by a study of 143 patients with neuropathies of diverse aetiologies.14 A consensus report of the International Panel of the Peripheral Neuropathy Association has endorsed the principle of the technique.15 About 40 original publications by our group at the the Institute of Neurological Sciences, Glasgow and others confirmed the reproducibility of the results and the usefulness of the technique.

Biases in the field QST thresholds were thought to be due to lack of control of the limb temperatures of farmers who were mostly surveyed during winter months, and often just after working out of doors. It was necessary to carry out this study during winter to have a time window in which dipping did not take place and when farmers are able to give up time to take part in a study.

The function of peripheral nerve is highly temperature dependent and the temperature dependency of QST thresholds is well known.7 In the clinic, room temperature is kept at a constant level, limb temperature measured often, and heating applied if necessary. Steps were taken to rectify this problem in the field study. Digital thermometers for measuring skin temperature were used by the field survey teams. Although heating was applied before the test, it was impossible to maintain the temperature of the foot at a reasonable level during the whole investigation. Skin temperatures measured in the field before sensory testing ranged from 20°C to 35°C, with a median of 31°C, significantly below the optimum 34°C normally used in the hospital setting. Body and core temperatures of limbs may also have been lower than normal.

Receptors in the skin for heat and cold sensation are different and have different temperature dependencies (Kenshalo 197016). In cold temperatures increased sensitivity to heat might reduce hot thresholds, thus biasing field measurements downwards compared with the clinic. Cold temperatures would similarly reduce sensitivity to cold and vibration sensations, both of which were higher on average in the field.

It therefore seems likely that the cold temperatures experienced during the field study is a major factor in the poorer agreement between field and hospital clinic measurements although other causes cannot be excluded.

How can QST measurements be applied in future field studies? It would be difficult to establish a more appropriate reference population. Allowing room and skin temperatures to vary across a larger range would lead to higher variability among thresholds and thus decrease the sensitivity of the method. No data are available to allow correction of threshold values for different temperatures.

The deliberate exclusion of subjects from the clinic, however, with an unreliably high cold threshold, is likely to have reduced the observed bias of clinical measurements relative to those in the field, and this may explain why the observed bias is lower in magnitude than for the vibration threshold. It is therefore likely that the reproducibility of the scoring system, across the study group as a whole, has been underestimated. The time lapse between field and clinic surveys was not thought to be a significant factor when looking at chronic health effects.

Based on these results, in the field study it was decided to use the four neuropathy indices as separate response variables in analyses of exposure-response relations, rather than the neuropathy scoring system based on the Mayo method. In a linear regression framework, a bias in the sensory thresholds that applied independently of exposure, would not affect the detection of a statistical exposure-response gradient, whether one existed or not. Random measurement error is an unavoidable component of any system and, in this context, would apply in both clinic and field. As there was no evidence that the field measurements incorporated significant random error relative to the clinical ones, the random scatter within the plots in figure 1 is most likely to reflect inherent measurement error in sensory tests of this type. The effect of this random error is to weaken the power of the regression of sensory thresholds against exposure to detect an exposure-response gradient when one truly exists.


The overall neuropathy scoring system showed an agreement that, although limited, was better than chance with no evidence of a systematic bias between the scores obtained in the field and in the clinic. However, the neurology symptom questionnaire in isolation was more reproducible and this method could be used in the field by non-neurologists with basic training.

There was a significant positive linear correlation between the threshold measurement in the field and the clinic for each QST, indicating that if a threshold is high in the clinic, it is also likely to be high in the field. However, there were substantial biases among the field measurements of sensory tests relative to the clinical measurements, particularly for the hot and vibration sensation tests.

The QST measurements should still be considered in epidemiological field studies, but it is essential to enforce much tighter control of conditions, in particular avoidance of low temperatures. The tests should be carried out in a well heated room and the subject allowed to become acclimatised beforehand. Such rooms could be found nearer clusters of patients and the test applied by trained field officers. Tests would take longer than in the current study but the costs would still be considerably lower than referring subjects to a hospital clinic.

Main messages

  • The overall neuropathy scoring system showed an agreement that was better than chance with no evidence of a systematic bias between the scores obtained in field and in the clinic.

  • The neurology symptom questionnaire in isolation was more reproducible and this method could be used in the field by non-neurologists with basic training.

  • There was a significant positive linear correlation between the threshold measurement in the field and the clinic for each QST, but there were biases among the field measurements of sensory tests relative to the clinical measurements.

Policy implications

  • The QST measurements should still be considered in epidemiological field studies, but it is essential to enforce much tighter control of conditions, in particular avoidance of low temperatures.


We gratefully acknowledge the help of all farmers, their families, and farm workers who took part in this study. We also acknowledge the work and diligence of the nursing staff (Miss M MacKinnon and Miss M J Dolan) and clerical staff (Ms E Jackson) who assisted throughout the study at the Institute of Neurological Sciences, Glasgow. We also thank Ms Mags Parker, Mrs Marion Brebner, and Mrs Margaret Burnett of the Institute of Occupational Medicine for their administrative support during the project. Finally we acknowledge, with thanks, the funding from the Health and Safety Executive, the Department of Health, and the Ministry of Agriculture, Fisheries and Food. This work was supported by the Health and Safety Executive, the Ministry of Agriculture, Fisheries and Food, and the Department of Health, UK.


Linked Articles

  • Correction
    BMJ Publishing Group Ltd