Article Text

  1. W Kent Anger
  1. Correspondence to: 
 Dr W Kent Anger, Oregon Health & Science University, 3181 SW Samuel Jackson Park Road, Portland, OR 97034, USA; anger{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

The nervous system has, since the earliest recorded history of workplace hazards, been a sensitive target organ for chemical exposures.w1 Technological advances as well as disasters such as the mercury exposures during the 1950s in Minimataw2 and the 1970s in Iraqw3 led to reduced workplace exposures during the mid 1900s and a consequent shift from the detection of obvious debility (for example, tremors, paralysis) detectable with even gross clinical methods,w4, w5 to the detection of subtle subclinical effects. Hänninen and colleagues1 were the first to tackle this issue by studying carbon disulfide exposures in the viscose rayon industry. Hänninen, a clinical psychologist at Finland’s Institute of Occupational Health, employed the tools of her discipline, and experimental psychologists brought in the tools employed in the laboratory. A new field, human behavioural neurotoxicology,w6 began to emerge.


Arguably, the primary stimuli in the USA for human neurobehavioural research on occupational and environmental chemical exposures, were the use of behavioural methods to set occupational standards in the Soviet Union,w7 and Beard and Wertheim’s2 exposure chamber research on carbon monoxide. The demonstration that the accuracy of estimates of the duration of a stimulus by college students was reduced by exposure to very low concentrations of carbon monoxide2 convinced federal agencies in the USA that behaviour was the bellweather of damage to the nervous system.w8 Federal programmes that included human neurobehavioural research began in the 1970s at the National Institute for Occupational Safety and Health (NIOSH)w9 and the Environmental Protection Agency (EPA)w10 long after similar programmes were developed in Europe.

Behavioural measures have been well established for a century as reliable and valid indicators of nervous system function in experimental research, performance assessment,w11 and clinical assessment.w12 However, before the development of behavioural neurotoxicology, behavioural measures had not been used to compare populations exposed to chemicals with unexposed controls for evidence that those chemicals had damaged the nervous system. It has now been clear for well over a decade that behavioural measures do reliably identify deficits in populations exposed to neurotoxic chemicals and have done so in many different countries exposed to marker neurotoxic chemicals.3–6,w13 Further, while consistent patterns of effect are seen for different chemicals,3,w13 and those findings have played a role in risk assessment leading to lower standards,w14–21 it must be noted that there is insufficient specificity to employ these tests to diagnose neurotoxic disorders in individuals.7

The focus of this article is on the behavioural methods used to detect adverse nervous system effects caused by occupational and environmental chemical exposures, typically for population based research in a workplace or community setting.5,w6, w22 While behavioural methods have added significantly to our knowledge of short term exposures due to laboratory research,4 intentional (laboratory) exposures have effectively disappeared from the research scene. Though not addressed in the published literature, the potential for accidents is likely a primary reason.


Recognising the growth in human tests of neurotoxicity in the 1970s and 1980s,3 Barry Johnson of NIOSH and Charles Xintaras (also of NIOSH but then assigned to the World Health Organization) convened an international meeting. They invited research scientists studying neurotoxic exposures in the workplace with, primarily, behavioural measures. An expert group of participants was charged with recommending a battery of tests for detecting neurotoxic effects in humans. Because of the broad range of functions that might conceivably be affected by the vast number of neurotoxic chemicals, very few of which had (or have since) been studied, the group decided on a “screening” battery with the goal of sampling the widest possible range of functions to detect any adverse change, but one that could be completed in an hour. The WHO group selected seven of the most widely used tests in human behavioural neurotoxicology research that were judged to be sensitive to marker neurotoxic chemicals—lead, mercury, and carbon disulfide. This set of tests was named the WHO recommended “neurobehavioral core test battery” (NCTB): digit symbol, digit span, Benton visual retention, pursuit aiming II, simple reaction time (SRT), Santa Ana, and profile of mood states (POMS).8

The recommendation of the NCTB had an impact, but it is difficult to gauge the degree of that impact. While the individual NCTB tests have been used frequently to detect neurotoxicity,3 the most accurate conclusion is that they continued to be used frequently,w13 occasionally as a complete battery.9,w23 The WHO guidance continues to affect test selection and produce new findings (for example, Farahat et al10). The neuropsychological and more broadly neurobehavioural tests in this field were initially administered individually by technically trained personnel or by neuropsychologists, as in Bowler’s CNS/B.11,12,w24, w25 Individual neuropsychological examinations can be highly effective at discriminating neurotoxic effects, but they are expensive and time consuming. Efficiency dictated that the field move to automation. Thus, rather than the NCTB recommendations, the greatest impact on this field during the 1980s and 1990s was the development of computer based testing and specifically the “neurobehavioral evaluation system” (NES)6 and the parallel but less widely used “Swedish performance evaluation system” (SPES).13


Letz and Baker developed the NES in the mid 1980s,14 contemporary with the NCTB. The NES is a computer based testing system that incorporates the cognitive tests from the NCTB and a number of other tests used in clinical neuropsychology. From the broad menu of available tests, a subset is selected for any given study. Letz’s subsequent adaptation of the battery (the NES2) became the dominant testing system in the 1990s, largely documenting the adverse effects of workplace exposures in that decade.6,14–28,w26–33


After the NCTB, a second consensus screening battery was developed, for environmental exposures. When Barry Johnson moved from NIOSH to the Agency for Toxic Substances and Disease Registry (ATSDR), he followed the same approach as he had 10 years earlier with WHO, convening a group of experts to recommend neurobehavioural tests to detect adverse effects of environmental exposures. By way of rationale, it was assumed that chemicals found in the environment would produce exposures at a much lower level than found in the workplace, so more sensitive tests would be needed to detect more subtle deficits. That expert group took the same approach and set in place essentially the same criteria as had the group that proposed the NCTB (indeed, there was some overlap in the membership of these two groups). A wider series of functions was identified than in the NCTB, due to the wider range of functions known to be affected by neurotoxic chemicals in the mid 1990s (for example, Anger 19903 was a primary source for this group), as well as the need for greater sensitivity for the lower exposures found in the environment. The battery was named the “adult environmental neurobehavioral test battery” (AENTB).29

So, by 1994, there were two consensus batteries, respectively for worksite and environmental research, both stimulated by Johnson.8,29 At about the same time, Iregren and Letz30 recommended a “minimum common core computerized battery” (MCCCB), consisting of the symbol digit, tapping, and SRT tests, although this sound recommendation is not often cited. In summary, the growth of human neurobehavioural testing to identify adverse effects of neurotoxic exposures has been guided by the recommendations of the NCTB, which were confirmed and expanded by the AENTB, and fuelled and channelled in the 1990s by the NES2 testing engine.


Extensive cross sectional research has demonstrated that many neurobehavioural tests detect effects of neurotoxic substances. The following exemplify tests that have frequently revealed group differences.3,5,w13

Symbol digit—The digit symbolw34 test of psychomotor performance and its computer based alternative, the symbol digit test of complex scanning and visual tracking, constitute the most widely used and sensitive tests in human behavioural neurotoxicology research.3 These tests present nine symbols, each paired with a number between 1 and 9 in a matrix or 2 × 9 table. Below the matrix is a similar matrix but with only the number (digit symbol) or symbol (symbol digit), and the participant must add the missing member of each pair, as quickly as possible. In the digit symbol, motor performance is more challenging in that people have more practice writing numbers than symbols. While the motor component of the digit symbol would appear to make this a very different test than when the person writes numbers or types them, the two tests correlate well.w35

Digit span—The digit span is a simple test of attention in which a series of numbers between 1 and 9 are read or shown to a participant who must, after the series is completed, repeat the series in order orally or by typing the numbers. The test is then repeated with new numbers, but participants are to repeat them backwards (that is, reverse of the order in which they were read).

Continuous performance test—The continuous performance test (CPT) measures sustained attention. Symbols are presented in an unpredictable order, and the participant has to press a button quickly at the appearance of a pre-selected symbol or when two symbols appear consecutively.

Simple reaction time—The SRT test of response speed presents a visual or auditory stimulus to which the participant is to respond as quickly as possible on a button indicating detection, producing a “reaction time”.

Tapping—The participant is instructed to press a button as many times as possible in a fixed time period, such as 30 seconds, in this test of motor speed and coordination. This may occur with the dominant and subsequently the non-dominant hand, and alternating between two buttons with one or both hands.

While motor and cognitive tests dominated research through the 1980s, sensory tests measuring vibration,w36, w37 colour vision,31,w38 and balance32 emerged to round out, for now, the testing arsenal of human behavioural neurotoxicology. Hudnell and colleaguesw39 recommend that contrast sensitivity be used to adjust other test results for vision, but this call has yet to receive widespread support.

The individual tests described above may be termed screening tests because each test reflects the concerted action of many neurobehavioural functions rather than specific functions that could reveal the behavioural mechanism or the brain system affected. And they are but a small selection from among the hundreds of tests that have been used in this field.w6 Because the brain serves so many functions and so few neurotoxic substances have been studied in humans, the goal of any assessment must be to include tests that would detect a very wide range of potential effects, to avoid omission of the critical function or brain region for any chemical.

Although not reflected in the measures listed above, there is considerable value in using test variants also used in animal research. This affords access to complementary experimental research and ultimately to mechanisms. Paule and colleagues developed the National Center for Toxicological Research (NCTR) operant test battery and use it with both non-human primates and humans,33 Davidson and colleagues34 employ the CANTAB batteryw40 used with both non-human primates and humans, and Anger, Rohlman and colleagues have implemented tests used in the animal literature along with traditional neuropsychological tests in their “behavioral assessment and research system” (BARS).35,36,w41 Figure 1 reflects data from the NCTB recommended digit span test presented in the BARS computer based testing system. Results are presented by age decade from orchard workers exposed to pesticides for a working lifetime and compared to unexposed controls.w42

Figure 1

Digit span backwards (participant repeats a multi-digit number from last digit to first) in orchard workers and unexposed controls. Error bars are standard errors.


Children, too, are exposed to neurotoxic chemicals, and it is widely assumed that they represent the most sensitive end point for some chemical exposures.37,w43 Parenthetically, this same point was made about behavioural tests of nervous system function in the 1970s.38 Children, of course, represent a far more challenging problem from the standpoint of selecting tests because of the many developmental stages through which they progress.w43 The initial exposure studied extensively in children was lead.37,39,w44, w45 Needleman and Bellinger took screening to an even more macro level than had those testing adults. They employed the venerable intelligence quotient (IQ) that summed the results of many different tests and the functions measured by those tests, into a single measure.40,w44 Further still, to maximise their sensitivity, they combined the results of many different studies and in the process defined the meta-analytic approach, in this case to find the lowest level at which lead affected children. They point out that the cost of such global indices is a loss of specificity.7,37 Targeted batteries have been developed to study childhood exposures to individual chemicals, most notably methylmercury in the Seychelles34 and Faroe Islands,41,42 and polychlorinated biphenyls in the Great Lakesw46 and North Carolina.w47

At the same time ATSDR developed the AENTB, that agency convened an expert group to suggest a parallel series of tests for children. The group recommended a strategy rather than a specific battery.43 ATSDR implemented the strategy following pilot testing in a range of children, naming the system “pediatric environmental neurobehavioral test battery” (PENTB).44 The PENTB is heavily tilted toward observer or caregiver rating scales for younger children, with performance tests employed in those 4 years old and above. This testing system has not been used in published research, although ATSDR is currently using it to study children exposed to an organophosphate pesticide (Kaye W, personal communication, 5 November 2001).

Generic test batteries for children that employ computer based testing have been developed and evaluated in different population groups. An early example employed the NES2, using tests of adults with substantial support from examiners.25 Rohlman and colleagues45,w41, w48 conducted a series of mini-studies using some PENTB recommended tests and computer based tests from the “behavioral assessment and research system” (BARS) with different parameters in different cultural groups (English speaking US majority, Latino and Brazilian children). Children earned tokens redeemed for toys following correct performance, to maintain interest (fig 2 shows a child working on a BARS test), and a response unit with nine large buttons to simplify responding (see table 1 for additional information on BARS). This group has developed a battery of tests that can be used across cultures in ages 4 years and above.45,w41, w48

Table 1

Testing systems for human behavioural neurotoxicity research

Figure 2

Picture of child working on a behavioral assessment and research system (BARS) test.

This is a nascent specialisation within behavioural neurotoxicology that has only scratched the surface of the multitude of issues of age, culture, sex, and, cutting across all issues, development.


When selecting neurobehavioural tests to assess potential neurotoxicity in an exposed population, the primary factor should be the chemical(s) to which the target population is/are exposed. The peer reviewed literature on the chemical and any symptoms reported by the exposed population should drive the initial selection of tests.3,5,w13 In addition, Iregren and Letz’s30 recommendation of the symbol digit, SRT, and tapping tests should be included in the core set of tests.

This article is focused on testing in the field based on the (often unstated) assumption that it is less expensive to bring the tests to the affected population at the workplace or in the community, than to bring the population to a clinical testing centre. One of the goals of the WHO-NIOSH meeting in 19838 was to select tests that could be administered by technically trained individuals. By 1994 and the development of the AENTB, this recommendation had been elevated to recommending the use of computers for test administration whenever possible,29 chiefly to improve consistency and minimise examiner variability and bias. Rohlman and colleagues36 have focused on making the instructions in the BARS system as intuitive as possible to further reduce the need for the technically trained examiners to explain the instructions to the test takers, as have Letz and colleagues23 in the NES3.

Methods employed in human behavioural neurotoxicology

  • Repeated or long term chemical exposure is the primary problem under study today, requiring methods that can distinguish between exposed and non-exposed groups

  • Neurobehavioural methods are reliable and valid indicators of brain function and dysfunction, and they can discriminate group differences caused by chemical exposure

  • The neurobehavioral core test battery (NCTB) provides a consensus recommendation for tests to be used in “screening” for deficits

  • Primary tests include digit symbol/symbol digit, digit span, simple reaction time, tapping, and continuous performance test. Many more tests are employed to measure the diverse nervous system functions that may be affected by chemical exposure

  • Testing children requires a special set of tests, but the only consensus battery for children remains untested, suggesting an especially fertile area of needed research

Testing systems for adults are summarised in table 1, including benefits and limitations. Those batteries that are consensus recommendations, the NCTB and AENTB, are listed first. The following two batteries are the most widely used in recent years, BARS and NES2, followed by the related NES3 that has been used recently. Batteries that have been recommended for neurotoxicity research in adults, but with limited results to date, are listed last. Computer based tests, when affordable, can be superior if they eliminate the inconsistencies inherent to individual administration without compromising instruction clarity, and if they present the tests with graphics equivalent to “paper” versions.36,62 The computer based batteries offer a fully packaged option, in cases where the relevant tests are available.


One of the most frequently voiced concerns of industry regarding neurobehavioural testing is that workers will purposefully do poorly on tests to change working conditions in their favour and that will reduce their productivity. While the concerns are reasonable, experience suggests that most people work in a manner that may be described as “trying one’s best”, an instruction that should be paraphrased in any study. Anger and colleaguesw22 confronted this issue when developing a test battery to identify neurobehavioral effects from possible but unknown exposures to US military personnel during the Persian Gulf War. They drew from the robust literature addressing purposeful attempts to perform poorly on neurobehavioural tests, termed malingering or more generically “motivation”.w96 Anger and colleaguesw22 modified Binder’sw96, w97 test of motivation into the “Oregon dual task procedure” (ODTP). This test instructs the participant that the test will become progressively more difficult because of an increasing delay. While the increasing delay does not produce any significant increase in difficulty, people with options for secondary gain for poor performance demonstrate highly significant increases in errors.w96, w97 There was no evidence of poor motivation or malingering in the veterans tested, although neurobehavioural deficits did emerge in a subset of those tested.w22, w72, w75

Questionnaire measures of symptoms, standardised if possible, should be included in most neurotoxicity research. While listening to the target population is of course important, standardised questionnaires increase the likelihood of incorporating a range of symptoms known to be associated with neurotoxic exposures.66,w98 There are specialised tests for both depression (for example, Beck depression inventory (BDI)w99) and anxiety (for example, Beck anxiety inventory (BAI)w100) that have been employed to reveal symptoms and can be used to rule out competing explanations of poor performance. Anger and colleagues have employed the widely used SF-36 test of psychological and physical health symptoms, available at no cost, in studies of both Persian Gulf veterans and solvent exposures to exclude those with serious psychological distress from data analyses and at the same time provide a standardised measure of symptoms.w22, w74, w101 Of course, in some cases, psychological distress may be the outcome of the exposure incident, rather than the chemical itself.11,67,w24, w102–105


Often, neurotoxicity assessments are conducted by personnel with intimate knowledge of the exposure but limited knowledge of neurobehavioural testing. While it is well within the capabilities of diverse health professionals to carry out a neurotoxicity assessment, extensive consultation during planning with one of the testing system developers identified in table 1 (or other established investigators cited here) is essential to the conduct of a competent, interpretable study. This is due particularly to limitations in the tests and the testing systems that may not be immediately apparent. Notably, those from different educational and cultural groups may need adaptations of the testing battery to obtain viable data.53 For example, people with no or little education cannot use pencils because they do not have the coordination learned in practice with writing instruments.46,49

Once a set of tests has been selected, they should be administered in exactly the planned manner to a sample drawn from the target population for the research study. This is especially important when the sample is to be drawn from a cultural group that is different from that with which the tests were developed and validated.w41, w106 This provides an accurate estimate of the time duration and range of per-subject testing and the mean and variance of the target population for statistical power calculations. Power calculations should be based on an estimate of the quantitative difference that is realistically or clinically relevant and that the investigator will later want to describe as “a deficit”. This calculation should also take into consideration the number of test measures that will be compared (exposed v controls) to confront statistically the issue of multiple comparisons. If the number of tests is so large as to jeopardise the potential for detecting realistic effects or performance deficits, “primary measures” should be designated, leaving the remaining measures as hypothesis generating.w106 Alternatively, measures could be combined into a single metric for analyses,w107 although this should be undertaken only in concert with analyses of the individual measures. As sufficient data accumulate on a specific chemical, along with sufficient exposure data, techniques such as regression analysis or principal components analysis may extract useful outcome measures from multiple tests or single tests with multiple measures. Finally, the comparison group needs to be chosen carefully to be the same as the exposed group(s), particularly in age, sex, ethnicity, and cultural background, and it is these factors that should be included in the analysis,21,55,63,w108, w109 at a minimum.w110

Planning human neurobehavioural research to detect exposure effects

  • The primary factor guiding test selection is the chemical(s) to which the target population is/are exposed

  • Computer based test systems are preferred for standardisation and to avoid bias

  • Include supplemental tests of motivation, standardised measures of physical and psychological symptoms and intellectual or educational level

  • Administer tests to a sample from the target population to identify problems and develop an accurate estimate of statistical power, considering multiple comparisons

  • Select a comparison population as similar to exposed participants as possible

  • Collect information on sex, age, education (years), and cultural or ethnic group for data analysis, at a minimum


Human behavioural neurotoxicity research programmes began with the development of methods and were soon in the laboratory and the field. Extramural research at universities became focused on lead in children and heavy metals and solvents in workers. In the 1990s, however, the large European research programmes turned to problems other than neurotoxicity, although a focus on solvent research continued with ever more people being diagnosed with solvent encephalopathy.w111 In the USA, NIOSH’s National Occupational Research Agenda (NORA) dropped neurotoxicity as a priority,w112 and the US EPA scaled back on human research. Extramural (that is, university based) funding has grown in recent years, although much of it has been pre-focused by funding agencies to create large studies such as research on Gulf War veterans,w22, w73–75, w77, w113, w114 lead,68,w115 and mercury,34,41,42 which limits investigator initiated ideas to the topic at hand. Other priorities have been established by evidence of serious memory disturbances in a few who came into contact with pfiesteria,w116 new evidence on manganese31,58–60,69–71,w82–84 and fuel exposures,w72 and solvent research continues to thrive in Europe.24,66,w30, w117, w118 Research on exposures in children has grown substantially. There are large studies on methylmercury in the Seychelles34 and the Faroe Islands,41,42 organic mercury exposure from amalgam placement in the USA and Portugal,w119 and agricultural exposures to migrant children as well as their parents working in agriculture.45,w41, w48, w70 As this implies, research on neurotoxicant exposures to children is an area of growing funding, and it is in this area that neurobehavioural test batteries are the least well developed. This suggests one opportunity for scientists in the field of behavioural neurotoxicology.


  1. What differentiates the NCTB and AENTB from the NES2, BARS, and SPES?

    1. The NCTB and AENTB are consensus test recommendations developed by expert groups, while the NES2, BARS, and SPES are computerised testing systems that implement recommended tests

    2. The NES2, BARS, and SPES are consensus test recommendations developed by expert groups, while the NCTB and AENTB are computerised testing systems that implement recommended tests

    3. The NCTB and AENTB are widely used testing systems while the others are not

    4. Nothing. The NCTB, AENTB, NES2, BARS, and SPES are all widely used testing systems to assess neurotoxicity

    5. The NCTB and AENTB assess psychological symptoms while the NES2, BARS, and SPES assess neurobehavioural performance

  2. Which of the following is not a behavioural test used to assess neurotoxicity in humans?

    1. Symbol digit

    2. Digit symbol

    3. Mini-mult

    4. Continuous performance test

    5. Digit span

  3. Which of the following is not a true statement about preparing and developing a study to assess neurotoxic chemicals in humans?

    1. Symptoms are a major basis for choosing tests for the study

    2. Computerised tests are always the preferred choice for neurobehavioural testing to improve consistency and eliminate bias, regardless of the target population

    3. Select a core set of tests from consensus test recommendations

    4. Include tests of motivation when possible

    5. Before initiating any study, test 10–15 people from the target population to assess the duration of testing and statistical power factors

  4. Which of the following is a true statement about neurotoxicity testing in children?

    1. The WHO recommended NCTB was specifically developed to assess neurotoxic effects in children

    2. It is not difficult to select appropriate neurobehavioural tests to assess neurotoxic effects in children because they are highly sensitive to neurotoxic chemicals

    3. The ATSDR pediatric environmental neurobehavioral test battery (PENTB) has revealed adverse effects of organophosphate pesticides, in a recent publication

    4. Because of the extensive neurobehavioural research that has demonstrated adverse effects of lead and solvents in children, the tests can be used to diagnose lead and solvent poisoning in individual children

    5. The NCTR operant test battery, CANTAB and BARS are testing systems that include tests derived from animal research and have been used to assess neurotoxicity in children

  5. Which of the following is a major area of research opportunity in human behavioural neurotoxicology?

    1. Assessing the validity of widely used neurobehavioural tests from the WHO recommended NCTB

    2. Developing new computerised neurobehavioural tests for adults

    3. Research on neurobehavioural tests to assess neurotoxic effects in children

    4. Developing a test of liver damage to parallel adverse neurobehavioural effects

    5. Developing a new research programme focused on the adverse effects of mercury exposures


This article was supported by funding from NIEHS R01 ES08707 and the Center for Research on Occupational and Environmental Toxicology at Oregon Health & Science University. The author is a co-developer of the BARS computerised testing system and he is also associated with the development of the NCTB and AENTB, all of which are favourably reviewed in this article. Other colleagues and co-developers of BARS who have contributed to the information in this article are Drs Diane S Rohlman, Daniel Storzbach, and David A Eckerman.


Supplementary materials

  • .

    Web-only References
    Available as a PDF (printer-friendly file)

    Files in this Data Supplement:

Linked Articles