Article Text
Abstract
Although the toxic effects of lead on the central nervous system have been well described, the blood concentration at which lead begins to exert adverse effects remains the focus of debate. A metaanalysis of occupational studies was conducted evaluating the association between neurobehavioural testing results and moderate blood lead concentrations.
 lead
 epidemiology
 metaanalysis
 neurobehavioural testing
 PNS peripheral nervous system
 CNS, central nervous system
 ATSDR, Agency for Toxic Substances and Disease Registry
Statistics from Altmetric.com
 PNS peripheral nervous system
 CNS, central nervous system
 ATSDR, Agency for Toxic Substances and Disease Registry
The inclusion criteria for the metaanalysis were (a) central tendency for blood lead concentration less than 70 μg/dl; (b) numbers of exposed and unexposed workers were reported; and (c) test score arithmetic means and measures of dispersion were available for the exposed and the unexposed workers. The data were extracted only for those tests that were included in three or more studies. Analyses involved both the fixed and the random effects models, adjusted, whenever possible, for the test reliability. Publication bias was evaluated by calculating the fail safe N, defined as the number of studies with a nonsignificant result (p>0.05) that would bring a significant pooled analysis to nonsignificant levels. Additional analyses examined only those studies that implemented blinding procedures. 22 Studies provided information on 22 tests. Only two of the 22 tests showed an unequivocal significant difference between people with high blood concentrations and controls. The results were sensitive to stratification of studies, adjustment for reliability, and choice of statistical analysis, and were not consistent with the results of an earlier metaanalysis. The data available to date are inconsistent and are unable to provide adequate information on the neurobehavioural effects of exposure to moderate blood concentrations of lead. Lack of true measures of premorbid state, observer bias, and publication bias affect the results.
The major targets of lead toxicity are the peripheral and central nervous systems (PNS and CNS).^{1} In the CNS, symptoms of lead poisoning include dullness, forgetfulness, irritability, poor attention span, headache, fatigue, impotence, dizziness, and depression.^{1} Lead encephalopathy, a progressive and potentially fatal degeneration of the brain, is the most severe neurological effect of lead poisoning.^{1}
The concentration at which lead begins to exert adverse health effects is not known. Several studies have suggested the existence of subclinical abnormalities in the absence of overt signs and symptoms of clinical lead poisoning among people with moderately (≤70 μg/dl) increased blood lead concentrations.^{2–}^{4} By contrast, Parkinson et al found few significant differences in neurobehavioural performance between workers exposed and not exposed to lead and concluded that concentrations of current exposure of lead have “no detectable impact on psychological functioning.”^{5}
A review of the literature on the neurobehavioural effects of cumulative exposure to lead concluded that the current scientific evidence is flawed because of inadequate estimation of cumulative exposure to or absorption of lead and inadequate adjustment for age and intellectual ability before exposure.^{6} Another review by Ehle and McKee discussed the difficulties of applying psychological testing in field research and concluded that, for exposure to low to moderate concentrations (<70 μg/dl) of lead, the evidence of neurobehavioural effects is suggestive, but not conclusive.^{7} To quantitatively summarise the available evidence, we conducted a metaanalysis of occupational studies evaluating the association between results of neurobehavioural testing and moderate blood lead concentrations.
METHODS
Literature review and study selection
Study selection involved Medline searches using the keywords: “lead AND neurotoxic AND occupational exposure”; “lead AND neurotoxic AND occupation”; “blood lead AND neurobehavioural effects”; “lead AND health effects AND occupation”. A research librarian conducted additional searches of other relevant databases, including those referencing ongoing research. These searches were supplemented by a review and retrieval of references from the Toxicological profile for lead published by the Agency for Toxic Substances and Disease Registry (ATSDR).^{1}
As well as studies published in English, potentially relevant reports were translated from Hebrew, Chinese, Japanese, German, and Danish. In situations where information was missing, attempts were made to contact the author(s) to obtain the missing data. Descriptive information on each neurobehavioural test was reviewed from standard textbooks.^{8–}^{10}
The following inclusion criteria were used for the metaanalysis:

Central tendency for lead exposure was less than 70 μg/dl

Numbers of exposed and unexposed were reported

Test score arithmetic means and measures of dispersion were reported for both the exposed and the unexposed workers.
Data extraction
Pertinent data from the studies were entered into an electronic database including central tendency measures for blood lead concentrations among the occupationally exposed and unexposed groups. The outcome information included mean test scores and their corresponding units and measures of dispersion for the exposed and unexposed (control) groups. Before data entry, blood lead concentrations given in μmol/l were converted to μg/100 ml and all measures of dispersion were converted to SDs. Some studies included test score information for subgroups of exposed workers with a mean blood lead concentration >70 μg/dl. For these studies, only information on those subgroups of exposed workers with an average blood lead concentration <70 μg/dl were extracted. Several studies provided test scores for certain strata within the exposed and unexposed groups. In these instances, information on each stratum was entered separately. Thus, some articles provided more than one study group.
Data were extracted only for neurobehavioural tests included in three or more studies. We omitted tests involving self rating of affect such as profile of mood states and multiple adjective affect checklist. Only studies reporting nonoverlapping data were included.
Data analysis
The analysis of the data included two steps. The first step involved a review of the studies to assess their quality using the following criteria:

Evaluation of the preexposure status

Adjustment for age

Adjustment for other occupational exposures

Adjustment for alcohol use

Adjustment for socioeconomic confounding factors such as income level, education, etc

Use of blinding procedures.
The second step included a quantitative metaanalysis of the pooled data. The statistical analysis protocol is illustrated in figure 1. There are two approaches for combining the data: a fixed effects model and a random effects model. The fixed effects method assumes no heterogeneity between studies and attributes all observed variation between results to sampling error alone.^{11} The random effects model assumes that the study specific effect sizes come from a random distribution of effect sizes with a certain mean and variance.
Although several tests of heterogeneity have been proposed, their interpretation is problematic because: (a) the power of the test may be insufficient to detect significant heterogeneity^{12}; (b) the studies may be subject to similar design flaws leading to consistent bias and providing a false impression of homogeneity; and (c) the effect sizes may seem to be more consistent than they really are if studies with zero or negative effects are less likely to be published.^{13}
Because of the wide variation in testing procedures and scoring practices among the studies, as well as variations in the selected study populations, assumptions of homogeneity, even in the presence of a nonsignificant test, were generally inappropriate. Therefore, we used both the fixed and the random effects models for each of our analyses. Where the variance between studies was negligible (high level of homogeneity), the random effects models reduced to fixed effects models. To correct for measurement error, whenever possible, all results were adjusted for testretest reliability.^{8,}^{14–}^{17}
Using the fixed effects assumption, the general formula for the weighted average effect size of k studies iswhere T_{i} is the effect size estimate of the ith study and w_{i} is the weight associated with it. The weights that minimise the variance of T. are given bywhere v_{i} is the conditional variance in each study. The average effect size T. has a conditional variance v. given by
The effect size of each study (T_{i} ) was estimated from the standardised mean difference statistic with the small sample bias correction applied as proposed in Hedges and Olkin.^{14} The statistic d_{i} is calculated as follows:where X^{exp}_{i} is the mean of the exposed group, X^{c}_{i} is the mean of the control group, s_{i} is the pooled SD of the two groups, N_{i} is the total sample size of the two groups, and r_{i} is the reliability of the psychological test used in the study. The conditional variance of d_{i} is calculated as follows:where n^{exp}_{i} and n^{c}_{i} are the sample sizes of the exposed and control groups respectively.
A test of whether the fixed effects assumption holds uses the following statistic:
When the assumption holds, Q has a χ^{2} distribution with k1 degrees of freedom.
Equations for variance change for the random effects assumption. The total variance of an effect size estimate is given by:where σ^{2} is the random effects variance and v_{i} is the conditional variance already given. There are two methods for estimating σ^{2}. The first uses the ordinary unweighted sample estimate for the variance of the effect sizes computed as:
The random variance is then estimated by:
If this estimate is negative, it is assumed to be zero and the fixed effects model applies.
The second method for estimating the random variance uses Q, which is taken as an estimate of the weighted sample estimate of the unconditional variance of T_{i}. In this method, the random variance is estimated by:
This second method will give a nonzero estimate only when Q is greater than its expected value. Otherwise, it is assumed to be zero and the fixed effects model applies.
When the random effects model applies, the average effect size T. and its variance v. are calculated from equations 1–3; however, v*_{i} is substituted for v_{i}.
When a study provided data for several subgroups, to prevent its overweighting in the metaanalysis, we combined subgroups to calculate a single effect size and then included the result in the final metaanalysis. If the strata represented different control and exposure groups, the combined effect size was calculated from the fixed effects model where k=number of strata. If the strata represented the same exposure and control groups that were tested two or more times with different test versions—such as simple reaction time with different types of stimuli—the effect size was calculated with the fixed effects model where k=number of tests. If a study consisted of one control group and two or more exposure groups, then the exposure groups were combined to calculate a common mean (SD), provided the exposure level fit the criteria.
Publication bias was examined by calculating the fail safe N, defined as the number of studies with a nonsignificant result (p>0.05) that would bring a significant pooled analysis to nonsignificant levels. The calculations were based on Rosenthal's application of the StoufferLiptar inverse normal method of combining p values.^{18} To evaluate observer bias, studies that implemented blinding procedures were included in a separate analysis.
RESULTS
Review of the literature
About 140 papers, reports, and books were reviewed. Twenty two studies met the inclusion criteria (table 1). Lead concentrations among study subjects ranged from 24 to 63 μg/dl for exposed and from 0 to 28 μg/dl for unexposed workers. However, none of the studies had an overlap in terms of mean blood lead concentration.
There was considerable variation in quality among the 22 studies that met the inclusion criteria. None of the studies directly compared the test scores in their study populations before and after exposure. However, Bolla et al,^{19} recognising the importance of controlling for premorbid verbal intelligence, used the vocabulary subtest score of the WAISR in her regression model.
Five studies (Baker et al,^{3} Campara et al,^{20} Chia et al,^{21} Maizlish et al,^{22} and Williamson and Teo^{23}) adjusted results for the confounders age, education, and alcohol use. One study did not control for any of these variables.^{24} The rest of the studies showed varying and inconsistent attention to controls for these potential confounders (table 1).
There seemed to be no consistent pattern for adjustment for other potential confounders. For example, Chia et al^{21} adjusted for smoking history and ethnicity, Campara et al^{20} matched the study group and referents on “domestic status” (marital status), distance from work, and number of chronic illnesses, and Pasternak et al^{32} eliminated subjects with evidence of current illness or injury.
Six of the studies (Bolla et al,^{19} Campara et al,^{20} Chia et al,^{21} Johnson et al,^{30} Pasternak et al,^{32} and Valciukas et al^{4}) indicated that they had used observer blinding procedures.
Metaanalysis
Twenty two neurobehavioural tests met the inclusion criteria. The results of the metaanalysis for each test are presented in table 2. Two tests (digit symbol and D2 errors) showed a significant effect for all three models used. The result for one test (D2 speed) was significant when using the fixed effects model and weighted random effects model, but was not significant when using the unweighted random effects model. Also, several tests (simple reaction time, grooved pegboard, trail making A and B, picture completion, visual reproduction, eyehand coordination, and vocabulary) showed a significant effect, but for the fixed effects model only.
Correction for reliability coefficients did not seem to affect the results substantially. When the data from different subgroups in a single study were combined as opposed to included separately, the results of the metaanalysis changed in some instances, but only for the fixed effects model. The visual reproduction test result changed from statistically nonsignificant to statistically significant, whereas the result for paired associates remained the same.
Separate inclusion of strata led to overestimation of homogeneity. For example, the random effects model for the visual reproduction test with Ryan et al^{36} included as four separate strata was reduced to a fixed effects model because the variance between studies was estimated to be zero. By contrast, inclusion of Ryan et al^{36} as a single estimate (after combining the results of the four strata) produced a meaningful random effects model result.
DISCUSSION
Our evaluation of the literature indicates that none of the studies reviewed allow definitive conclusions about the presence or absence of adverse neurobehavioural effects at low (<70 μg/dl) current blood lead concentrations. Perhaps the most difficult aspect of observational cross sectional studies involving neurobehavioural outcomes is their limited ability to control for wide interpersonal variability of test results.
Only one study (Mantere et al^{37}) attempted to use testing results before employment as a background measure of neurobehavioural performance. The authors then reevaluated the lead workers and controls for the number of people whose testing improved, declined, or stayed the same. They also performed a linear regression analysis of the relation between blood lead time weighted average and performance score. However, these results were reported selectively for two tests only: Santa Ana coordination and block design. Unfortunately, the study by Mantere et al^{37} reported the mean test scores and SD only before the start of exposure and, therefore, was not included in our analysis.
Various attempts to control for factors such as age, education, and alcohol use, although useful, are probably insufficient because of the many confounding variables. Therefore, our review of the literature indicates a need for prospective studies that would directly compare the neurobehavioural function before exposure to test results after exposure.
Our metaanalysis shows that changes in study selection criteria or methods of analysis may greatly affect the results. For example, the results for the visual reproduction test changed several times depending on the model used. This finding indicates a need for caution in interpreting the results of metaanalyses in general, and metaanalyses of studies evaluating minor subclinical changes in particular.
In conducting metaanalyses, it is necessary to also consider the possibility of publication bias. Easterbrook et al^{38} surveyed 487 research projects approved by the Central Oxford Research Ethics Committee between 1984 and 1987. By 1990, only 52% of these studies were published. Studies with significant positive findings were far more likely to be published. The observational studies were at particularly high risk of such publication bias.
We considered publication bias with two approaches. Firstly, we attempted to contact authors of published studies to identify unpublished and ongoing research. Although we became aware of several potentially relevant studies, we were unable to obtain the data. Secondly, we calculated a fail safe N. These calculations showed that in some instances the results might have been attributed to publication bias.
Our review of individual study results showed substantial variability, even in the absence of a significant homogeneity test. In some cases, the differences in mean values reported by different studies leads one to question whether these studies did, indeed, use the same tests. For example, for simple reaction time (nonpreferred hand), Braun and Daigneault,^{26} Haenninen et al,^{28} and Repko et al^{34} reported means for the exposed groups of 178.78, 1310, and 29.62, respectively. The source of such discrepancy is not apparent, although it is possible that different studies used different numbers of items.
In some cases, the source of discrepancy is more evident. For example, the articles by Araki et al,^{25} Baker et al,^{3} Bolla et al,^{19} Campara et al,^{20} Chia et al,^{21} Hogstedt et al,^{29} Jeyaratnam et al,^{2} Maizlich et al,^{22} and Valciukas et al,^{4} with means for the digit symbol test ranging from 29.4 to 59.9, indicate that this test is taken from the WAIS, whereas the papers by Parkinson et al,^{5} Pasternak et al,^{32} and Ryan et al,^{36} with means ranging from 8.3 to 9.83 for the digit symbol test, indicate that the source for the test was the WAISR.
We think that in instances when the tests are evaluating the same psychological construct, although perhaps using different procedures and scoring, they legitimately may be combined in a metaanalysis using a random effects model. One methodological feature of this paper is the fact that analyses were conducted and presented with both the fixed effects and random effects models so that the results of these two models can be directly compared. Our reluctance to rigidly follow a certain analytical model is related to the fact that each of the proposed approaches has its limitations.
Main messages

None of the individual studies is conclusive or adequate in providing information on the subclinical neurobehavioural effects of exposure to lead.

The reviewed studies showed lack of uniform testing method.

The metaanalysis results are extremely sensitive to changes in study inclusion criteria and use of statistical methods.

Lack of true measures of premorbid state, observer bias, and publication bias, as well as the selection of individual studies, can all affect the results of the metaanalysis.

There is a need for prospective studies that would take into consideration variability between people by comparing test scores before and after exposure.
Policy implications

The data available to date are inconsistent and do not provide adequate information to draw firm conclusions about the biological effects of exposure to lead at current blood lead concentrations of less than 70 μg/dl.

It is not clear whether regulating exposure to lead based on current blood lead concentration will provide adequate protection against potential neurotoxic effects of lead.

In making occupational health and safety decisions, the quality of scientific data is more important than the results of pooled analyses based on many studies.
A recent metaanalysis by MeyerBaron and Seeber^{39} evaluated 12 tests from 22 studies using similar selection criteria, but a different analytical approach. For each test, a fixed effects model was assumed, the effect sizes calculated, and a test of homogeneity performed. If the test for homogeneity was significant, then a random effects model was used.
With this approach, the results for the block design, logical memory, and Santa Ana (preferred hand only) tests showed significant effects. By splitting the studies that included the digit symbol test into two groups and dropping one study,^{5} two homogeneous groups were formed and the effect sizes of both groups were significant. The authors concluded that “The evidence of neurobehavioural deficits at a current blood lead concentration of ∼40 μg/100 ml is obvious,” however, several concerns make this conclusion seem far reaching. Their metaanalysis included studies with a mean current lead concentration of <70 μg/dl. Therefore, many of the observed effects could be attributed to the inclusion of people whose lead concentrations were much higher than those claimed to be the focus of the study. The authors proposed that their calculated effect sizes for lead exposure might correspond to the effects of aging. This statement is not supported by the data. Moreover, as noted by BalbusKornfield et al,^{6} the effects of current lead exposure may reflect only transient and reversible changes, whereas the interpretation offered by MeyerBaron and Seeber assumes permanent effects.
Also, concerns about study selection, inclusion and exclusion of tests, and choice of statistical analyses cast doubt on the conclusions of their metaanalysis. For example, Lindgren et al^{40} reported score results for cumulative exposure categories only. Therefore, it is inappropriate to combine Lindgren et al^{40} with other studies. Exclusion of the Lindgren study would exclude the digit span, trail making, and logical memory tests in accordance with their study criteria.
For the aiming test, the authors used the eyehand coordination test from Repko et al^{34} and the pursuit aiming test from Maizlish et al^{22} and Chia et al^{21} However, eyehand coordination measures time “between successive pulses,” whereas pursuit aiming measures the “number of dotted circles.” Thus, the study by Repko et al^{34} should have been excluded, thereby eliminating the aiming test from their metaanalysis.
Their block design test metaanalysis included data from Lindgren et al,^{40} Haenninen et al,^{28} Valciukas et al,^{4} Campara et al,^{20} and Mantere et al^{37} As already stated, the study by Lindgren should have been excluded, but papers by Baker et al^{3} and Parkinson et al^{5} that contain block design data should have been included.
Comparison of our results with those of MeyerBaron and Seeber^{39} indicates that all three tests (block design, Santa Ana and logical memory) reported as showing significant effects were no longer significant in our analysis. However, our analysis of the digit symbol test with a fixed effects model, unlike the result of MeyerBaron and Seeber, was significant. In conducting analyses involving a long list of tests, it would be expected that at least some tests would show significant effects by chance. To interpret the results as reflecting an underlying biological process, consistency of findings across studies would be expected. A lack of consistency should caution against far reaching conclusions about the true biological effects of low to moderate lead concentrations.