Article Text

Download PDFPDF

Neurobehavioural testing in workers occupationally exposed to lead: systematic review and meta-analysis of publications
  1. M Goodman1,
  2. N LaVerda1,
  3. C Clarke2,
  4. E D Foster1,
  5. J Iannuzzi1,
  6. J Mandel3
  1. 1Exponent Health Group, Alexandria, VA, USA
  2. 2Johns Hopkins University School of Public Health, Baltimore, MD, USA
  3. 3Exponent Health Group, Menlo Park, CA, USA
  1. Correspondence to:
 Dr M Goodman, Exponent Health Group, 1800 Diagonal Road, Suite 355, Alexandria, Virginia 22314, USA;


Although the toxic effects of lead on the central nervous system have been well described, the blood concentration at which lead begins to exert adverse effects remains the focus of debate. A meta-analysis of occupational studies was conducted evaluating the association between neurobehavioural testing results and moderate blood lead concentrations.

  • lead
  • epidemiology
  • meta-analysis
  • neurobehavioural testing
  • PNS peripheral nervous system
  • CNS, central nervous system
  • ATSDR, Agency for Toxic Substances and Disease Registry

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

The inclusion criteria for the meta-analysis were (a) central tendency for blood lead concentration less than 70 μg/dl; (b) numbers of exposed and unexposed workers were reported; and (c) test score arithmetic means and measures of dispersion were available for the exposed and the unexposed workers. The data were extracted only for those tests that were included in three or more studies. Analyses involved both the fixed and the random effects models, adjusted, whenever possible, for the test reliability. Publication bias was evaluated by calculating the fail safe N, defined as the number of studies with a non-significant result (p>0.05) that would bring a significant pooled analysis to non-significant levels. Additional analyses examined only those studies that implemented blinding procedures. 22 Studies provided information on 22 tests. Only two of the 22 tests showed an unequivocal significant difference between people with high blood concentrations and controls. The results were sensitive to stratification of studies, adjustment for reliability, and choice of statistical analysis, and were not consistent with the results of an earlier meta-analysis. The data available to date are inconsistent and are unable to provide adequate information on the neurobehavioural effects of exposure to moderate blood concentrations of lead. Lack of true measures of premorbid state, observer bias, and publication bias affect the results.

The major targets of lead toxicity are the peripheral and central nervous systems (PNS and CNS).1 In the CNS, symptoms of lead poisoning include dullness, forgetfulness, irritability, poor attention span, headache, fatigue, impotence, dizziness, and depression.1 Lead encephalopathy, a progressive and potentially fatal degeneration of the brain, is the most severe neurological effect of lead poisoning.1

The concentration at which lead begins to exert adverse health effects is not known. Several studies have suggested the existence of subclinical abnormalities in the absence of overt signs and symptoms of clinical lead poisoning among people with moderately (≤70 μg/dl) increased blood lead concentrations.2–4 By contrast, Parkinson et al found few significant differences in neurobehavioural performance between workers exposed and not exposed to lead and concluded that concentrations of current exposure of lead have “no detectable impact on psychological functioning.”5

A review of the literature on the neurobehavioural effects of cumulative exposure to lead concluded that the current scientific evidence is flawed because of inadequate estimation of cumulative exposure to or absorption of lead and inadequate adjustment for age and intellectual ability before exposure.6 Another review by Ehle and McKee discussed the difficulties of applying psychological testing in field research and concluded that, for exposure to low to moderate concentrations (<70 μg/dl) of lead, the evidence of neurobehavioural effects is suggestive, but not conclusive.7 To quantitatively summarise the available evidence, we conducted a meta-analysis of occupational studies evaluating the association between results of neurobehavioural testing and moderate blood lead concentrations.


Literature review and study selection

Study selection involved Medline searches using the keywords: “lead AND neurotoxic AND occupational exposure”; “lead AND neurotoxic AND occupation”; “blood lead AND neurobehavioural effects”; “lead AND health effects AND occupation”. A research librarian conducted additional searches of other relevant databases, including those referencing ongoing research. These searches were supplemented by a review and retrieval of references from the Toxicological profile for lead published by the Agency for Toxic Substances and Disease Registry (ATSDR).1

As well as studies published in English, potentially relevant reports were translated from Hebrew, Chinese, Japanese, German, and Danish. In situations where information was missing, attempts were made to contact the author(s) to obtain the missing data. Descriptive information on each neurobehavioural test was reviewed from standard textbooks.8–10

The following inclusion criteria were used for the meta-analysis:

  • Central tendency for lead exposure was less than 70 μg/dl

  • Numbers of exposed and unexposed were reported

  • Test score arithmetic means and measures of dispersion were reported for both the exposed and the unexposed workers.

Data extraction

Pertinent data from the studies were entered into an electronic database including central tendency measures for blood lead concentrations among the occupationally exposed and unexposed groups. The outcome information included mean test scores and their corresponding units and measures of dispersion for the exposed and unexposed (control) groups. Before data entry, blood lead concentrations given in μmol/l were converted to μg/100 ml and all measures of dispersion were converted to SDs. Some studies included test score information for subgroups of exposed workers with a mean blood lead concentration >70 μg/dl. For these studies, only information on those subgroups of exposed workers with an average blood lead concentration <70 μg/dl were extracted. Several studies provided test scores for certain strata within the exposed and unexposed groups. In these instances, information on each stratum was entered separately. Thus, some articles provided more than one study group.

Data were extracted only for neurobehavioural tests included in three or more studies. We omitted tests involving self rating of affect such as profile of mood states and multiple adjective affect checklist. Only studies reporting non-overlapping data were included.

Data analysis

The analysis of the data included two steps. The first step involved a review of the studies to assess their quality using the following criteria:

  • Evaluation of the pre-exposure status

  • Adjustment for age

  • Adjustment for other occupational exposures

  • Adjustment for alcohol use

  • Adjustment for socioeconomic confounding factors such as income level, education, etc

  • Use of blinding procedures.

The second step included a quantitative meta-analysis of the pooled data. The statistical analysis protocol is illustrated in figure 1. There are two approaches for combining the data: a fixed effects model and a random effects model. The fixed effects method assumes no heterogeneity between studies and attributes all observed variation between results to sampling error alone.11 The random effects model assumes that the study specific effect sizes come from a random distribution of effect sizes with a certain mean and variance.

Figure 1

Plan of statistical analysis.

Although several tests of heterogeneity have been proposed, their interpretation is problematic because: (a) the power of the test may be insufficient to detect significant heterogeneity12; (b) the studies may be subject to similar design flaws leading to consistent bias and providing a false impression of homogeneity; and (c) the effect sizes may seem to be more consistent than they really are if studies with zero or negative effects are less likely to be published.13

Because of the wide variation in testing procedures and scoring practices among the studies, as well as variations in the selected study populations, assumptions of homogeneity, even in the presence of a non-significant test, were generally inappropriate. Therefore, we used both the fixed and the random effects models for each of our analyses. Where the variance between studies was negligible (high level of homogeneity), the random effects models reduced to fixed effects models. To correct for measurement error, whenever possible, all results were adjusted for test-retest reliability.8,14–17

Using the fixed effects assumption, the general formula for the weighted average effect size of k studies isEmbedded Imagewhere Ti is the effect size estimate of the ith study and wi is the weight associated with it. The weights that minimise the variance of T. are given byEmbedded Imagewhere vi is the conditional variance in each study. The average effect size T. has a conditional variance v. given byEmbedded Image

The effect size of each study (Ti ) was estimated from the standardised mean difference statistic with the small sample bias correction applied as proposed in Hedges and Olkin.14 The statistic di is calculated as follows:Embedded Imagewhere Xexpi is the mean of the exposed group, Xci is the mean of the control group, si is the pooled SD of the two groups, Ni is the total sample size of the two groups, and ri is the reliability of the psychological test used in the study. The conditional variance of di is calculated as follows:Embedded Imagewhere nexpi and nci are the sample sizes of the exposed and control groups respectively.

A test of whether the fixed effects assumption holds uses the following statistic:Embedded Image

When the assumption holds, Q has a χ2 distribution with k-1 degrees of freedom.

Equations for variance change for the random effects assumption. The total variance of an effect size estimate is given by:Embedded Imagewhere σ2 is the random effects variance and vi is the conditional variance already given. There are two methods for estimating σ2. The first uses the ordinary unweighted sample estimate for the variance of the effect sizes computed as:Embedded Image

The random variance is then estimated by:Embedded Image

If this estimate is negative, it is assumed to be zero and the fixed effects model applies.

The second method for estimating the random variance uses Q, which is taken as an estimate of the weighted sample estimate of the unconditional variance of Ti. In this method, the random variance is estimated by:Embedded Image

This second method will give a non-zero estimate only when Q is greater than its expected value. Otherwise, it is assumed to be zero and the fixed effects model applies.

When the random effects model applies, the average effect size T. and its variance v. are calculated from equations 1–3; however, v*i is substituted for vi.

When a study provided data for several subgroups, to prevent its overweighting in the meta-analysis, we combined subgroups to calculate a single effect size and then included the result in the final meta-analysis. If the strata represented different control and exposure groups, the combined effect size was calculated from the fixed effects model where k=number of strata. If the strata represented the same exposure and control groups that were tested two or more times with different test versions—such as simple reaction time with different types of stimuli—the effect size was calculated with the fixed effects model where k=number of tests. If a study consisted of one control group and two or more exposure groups, then the exposure groups were combined to calculate a common mean (SD), provided the exposure level fit the criteria.

Publication bias was examined by calculating the fail safe N, defined as the number of studies with a non-significant result (p>0.05) that would bring a significant pooled analysis to non-significant levels. The calculations were based on Rosenthal's application of the Stouffer-Liptar inverse normal method of combining p values.18 To evaluate observer bias, studies that implemented blinding procedures were included in a separate analysis.


Review of the literature

About 140 papers, reports, and books were reviewed. Twenty two studies met the inclusion criteria (table 1). Lead concentrations among study subjects ranged from 24 to 63 μg/dl for exposed and from 0 to 28 μg/dl for unexposed workers. However, none of the studies had an overlap in terms of mean blood lead concentration.

Table 1

Summary of studies included in the meta-analysis

There was considerable variation in quality among the 22 studies that met the inclusion criteria. None of the studies directly compared the test scores in their study populations before and after exposure. However, Bolla et al,19 recognising the importance of controlling for premorbid verbal intelligence, used the vocabulary subtest score of the WAIS-R in her regression model.

Five studies (Baker et al,3 Campara et al,20 Chia et al,21 Maizlish et al,22 and Williamson and Teo23) adjusted results for the confounders age, education, and alcohol use. One study did not control for any of these variables.24 The rest of the studies showed varying and inconsistent attention to controls for these potential confounders (table 1).

There seemed to be no consistent pattern for adjustment for other potential confounders. For example, Chia et al21 adjusted for smoking history and ethnicity, Campara et al20 matched the study group and referents on “domestic status” (marital status), distance from work, and number of chronic illnesses, and Pasternak et al32 eliminated subjects with evidence of current illness or injury.

Six of the studies (Bolla et al,19 Campara et al,20 Chia et al,21 Johnson et al,30 Pasternak et al,32 and Valciukas et al4) indicated that they had used observer blinding procedures.


Twenty two neurobehavioural tests met the inclusion criteria. The results of the meta-analysis for each test are presented in table 2. Two tests (digit symbol and D-2 errors) showed a significant effect for all three models used. The result for one test (D-2 speed) was significant when using the fixed effects model and weighted random effects model, but was not significant when using the unweighted random effects model. Also, several tests (simple reaction time, grooved pegboard, trail making A and B, picture completion, visual reproduction, eye-hand coordination, and vocabulary) showed a significant effect, but for the fixed effects model only.

Table 2

Meta-analysis results for three models: fixed effects, weighted random effects, and unweighted random effects

Correction for reliability coefficients did not seem to affect the results substantially. When the data from different subgroups in a single study were combined as opposed to included separately, the results of the meta-analysis changed in some instances, but only for the fixed effects model. The visual reproduction test result changed from statistically non-significant to statistically significant, whereas the result for paired associates remained the same.

Separate inclusion of strata led to overestimation of homogeneity. For example, the random effects model for the visual reproduction test with Ryan et al36 included as four separate strata was reduced to a fixed effects model because the variance between studies was estimated to be zero. By contrast, inclusion of Ryan et al36 as a single estimate (after combining the results of the four strata) produced a meaningful random effects model result.


Our evaluation of the literature indicates that none of the studies reviewed allow definitive conclusions about the presence or absence of adverse neurobehavioural effects at low (<70 μg/dl) current blood lead concentrations. Perhaps the most difficult aspect of observational cross sectional studies involving neurobehavioural outcomes is their limited ability to control for wide interpersonal variability of test results.

Only one study (Mantere et al37) attempted to use testing results before employment as a background measure of neurobehavioural performance. The authors then re-evaluated the lead workers and controls for the number of people whose testing improved, declined, or stayed the same. They also performed a linear regression analysis of the relation between blood lead time weighted average and performance score. However, these results were reported selectively for two tests only: Santa Ana coordination and block design. Unfortunately, the study by Mantere et al37 reported the mean test scores and SD only before the start of exposure and, therefore, was not included in our analysis.

Various attempts to control for factors such as age, education, and alcohol use, although useful, are probably insufficient because of the many confounding variables. Therefore, our review of the literature indicates a need for prospective studies that would directly compare the neurobehavioural function before exposure to test results after exposure.

Our meta-analysis shows that changes in study selection criteria or methods of analysis may greatly affect the results. For example, the results for the visual reproduction test changed several times depending on the model used. This finding indicates a need for caution in interpreting the results of meta-analyses in general, and meta-analyses of studies evaluating minor subclinical changes in particular.

In conducting meta-analyses, it is necessary to also consider the possibility of publication bias. Easterbrook et al38 surveyed 487 research projects approved by the Central Oxford Research Ethics Committee between 1984 and 1987. By 1990, only 52% of these studies were published. Studies with significant positive findings were far more likely to be published. The observational studies were at particularly high risk of such publication bias.

We considered publication bias with two approaches. Firstly, we attempted to contact authors of published studies to identify unpublished and ongoing research. Although we became aware of several potentially relevant studies, we were unable to obtain the data. Secondly, we calculated a fail safe N. These calculations showed that in some instances the results might have been attributed to publication bias.

Our review of individual study results showed substantial variability, even in the absence of a significant homogeneity test. In some cases, the differences in mean values reported by different studies leads one to question whether these studies did, indeed, use the same tests. For example, for simple reaction time (non-preferred hand), Braun and Daigneault,26 Haenninen et al,28 and Repko et al34 reported means for the exposed groups of 178.78, 1310, and 29.62, respectively. The source of such discrepancy is not apparent, although it is possible that different studies used different numbers of items.

In some cases, the source of discrepancy is more evident. For example, the articles by Araki et al,25 Baker et al,3 Bolla et al,19 Campara et al,20 Chia et al,21 Hogstedt et al,29 Jeyaratnam et al,2 Maizlich et al,22 and Valciukas et al,4 with means for the digit symbol test ranging from 29.4 to 59.9, indicate that this test is taken from the WAIS, whereas the papers by Parkinson et al,5 Pasternak et al,32 and Ryan et al,36 with means ranging from 8.3 to 9.83 for the digit symbol test, indicate that the source for the test was the WAIS-R.

We think that in instances when the tests are evaluating the same psychological construct, although perhaps using different procedures and scoring, they legitimately may be combined in a meta-analysis using a random effects model. One methodological feature of this paper is the fact that analyses were conducted and presented with both the fixed effects and random effects models so that the results of these two models can be directly compared. Our reluctance to rigidly follow a certain analytical model is related to the fact that each of the proposed approaches has its limitations.

Main messages

  • None of the individual studies is conclusive or adequate in providing information on the subclinical neurobehavioural effects of exposure to lead.

  • The reviewed studies showed lack of uniform testing method.

  • The meta-analysis results are extremely sensitive to changes in study inclusion criteria and use of statistical methods.

  • Lack of true measures of premorbid state, observer bias, and publication bias, as well as the selection of individual studies, can all affect the results of the meta-analysis.

  • There is a need for prospective studies that would take into consideration variability between people by comparing test scores before and after exposure.

Policy implications

  • The data available to date are inconsistent and do not provide adequate information to draw firm conclusions about the biological effects of exposure to lead at current blood lead concentrations of less than 70 μg/dl.

  • It is not clear whether regulating exposure to lead based on current blood lead concentration will provide adequate protection against potential neurotoxic effects of lead.

  • In making occupational health and safety decisions, the quality of scientific data is more important than the results of pooled analyses based on many studies.

A recent meta-analysis by Meyer-Baron and Seeber39 evaluated 12 tests from 22 studies using similar selection criteria, but a different analytical approach. For each test, a fixed effects model was assumed, the effect sizes calculated, and a test of homogeneity performed. If the test for homogeneity was significant, then a random effects model was used.

With this approach, the results for the block design, logical memory, and Santa Ana (preferred hand only) tests showed significant effects. By splitting the studies that included the digit symbol test into two groups and dropping one study,5 two homogeneous groups were formed and the effect sizes of both groups were significant. The authors concluded that “The evidence of neurobehavioural deficits at a current blood lead concentration of ∼40 μg/100 ml is obvious,” however, several concerns make this conclusion seem far reaching. Their meta-analysis included studies with a mean current lead concentration of <70 μg/dl. Therefore, many of the observed effects could be attributed to the inclusion of people whose lead concentrations were much higher than those claimed to be the focus of the study. The authors proposed that their calculated effect sizes for lead exposure might correspond to the effects of aging. This statement is not supported by the data. Moreover, as noted by Balbus-Kornfield et al,6 the effects of current lead exposure may reflect only transient and reversible changes, whereas the interpretation offered by Meyer-Baron and Seeber assumes permanent effects.

Also, concerns about study selection, inclusion and exclusion of tests, and choice of statistical analyses cast doubt on the conclusions of their meta-analysis. For example, Lindgren et al40 reported score results for cumulative exposure categories only. Therefore, it is inappropriate to combine Lindgren et al40 with other studies. Exclusion of the Lindgren study would exclude the digit span, trail making, and logical memory tests in accordance with their study criteria.

For the aiming test, the authors used the eye-hand coordination test from Repko et al34 and the pursuit aiming test from Maizlish et al22 and Chia et al21 However, eye-hand coordination measures time “between successive pulses,” whereas pursuit aiming measures the “number of dotted circles.” Thus, the study by Repko et al34 should have been excluded, thereby eliminating the aiming test from their meta-analysis.

Their block design test meta-analysis included data from Lindgren et al,40 Haenninen et al,28 Valciukas et al,4 Campara et al,20 and Mantere et al37 As already stated, the study by Lindgren should have been excluded, but papers by Baker et al3 and Parkinson et al5 that contain block design data should have been included.

Comparison of our results with those of Meyer-Baron and Seeber39 indicates that all three tests (block design, Santa Ana and logical memory) reported as showing significant effects were no longer significant in our analysis. However, our analysis of the digit symbol test with a fixed effects model, unlike the result of Meyer-Baron and Seeber, was significant. In conducting analyses involving a long list of tests, it would be expected that at least some tests would show significant effects by chance. To interpret the results as reflecting an underlying biological process, consistency of findings across studies would be expected. A lack of consistency should caution against far reaching conclusions about the true biological effects of low to moderate lead concentrations.