Article Text
Abstract
Community based case–control studies are an efficient means to study disease aetiologies, and may be the only practical means to investigate rare diseases. However, exposure assessment remains problematic. We review the literature on the validity and reliability of common case–control exposure assessment methods: occupational histories, job–exposure matrices (JEMs), self reported exposures, and expert assessments. Given the variable quality of current exposure assessment techniques, we suggest methods to improve assessments, including the incorporation of hygiene measurements: using data from administrative exposure databases; using results of studies identifying determinants of exposure to develop questionnaires; and where reasonable given latency and biological half life considerations, directly measuring exposures of study subjects.
- case-control studies
- occupational exposure
- reproducibility of results
Statistics from Altmetric.com
Community based case–control studies remain one of the most efficient epidemiological study designs, especially for investigating the aetiologies of rare diseases. For certain extremely low incidence outcomes, such as childhood cancers, case–control studies may be the only viable study method. In comparison to cohort studies, the other most common design used in occupational epidemiology, exposure assessment in case–control studies offers certain advantages, but also poses major challenges.
For exposures which occur in widely dispersed segments of the population, a population based case–control design theoretically allows examination of the broadest possible range of exposure levels, though the prevalence of exposure to most agents is likely to be low. When the exposed individuals are scattered in small worksites (for example, farmers), a case–control study centred in a geographical area where these workers reside may be logistically simpler than assembling a cohort. Perhaps most importantly, case–control studies offer the opportunity to enumerate multiple exposures, including occupational and residential exposures throughout a subject's lifetime, as well as medical and lifestyle factors that may confound or modify an exposure–disease association. Information on such a broad range of exposures is generally not available in industry based cohort studies.
Despite these advantages, exposure assessment remains one of the most problematic elements of case–control studies. Exposure data are usually gathered by interviewer administered questionnaires, or occasionally from mailed questionnaires, medical records, or vital statistics. Exposures ascertained using these sources are almost never quantitative measurements, but subject or proxy reported job histories, tasks, or recalled exposures to specific agents. On occasion, expert judgement is used to infer exposures from job histories, or to review and modify exposure self reports. The merit of these methods is therefore an essential consideration in the interpretation of study results.
The purpose of this paper is to review evidence about the validity and reliability of qualitative or semiquantitative exposure assessment techniques commonly used in case–control studies, with the aim of identifying means to optimise these methods. In addition, we will discuss some opportunities for greater quantification of exposures in case–control studies.
METHODS
A number of methods were used to gather the literature. The Medline database was searched from 1966 to April 2001 using the following terms: validity, reliability, sensitivity, specificity, agreement, kappa, intraclass, reproducibility of results, expert, subjective estimate, self-report, exposure estimate, semiquantitative estimate, qualitative estimate, or job–exposure matrix. Search results were limited using the following terms: occupation, hygiene, work, job, industry, or occupational exposure. All English and French abstracts and/or titles were reviewed for relevance.
There is little standardised terminology for identifying the literature on validity and reliability of exposure assessment methods. Therefore more manual search methods were also used, including a review of the citations in identified articles and the publications resulting from four international initiatives on assessment of occupational exposures in epidemiology: a conference in Woods Hole, USA in 1988 (reported in Rappaport SM, Smith TJ, eds, Exposure Assessment for Epidemiology and Hazard Control, Chelsea, MI: Lewis Publishers, 1991); a conference in Leesberg, USA in 1990 (reported in Applied Occupational and Environmental Hygiene 1991;6:417–558); a European concerted action (reported in International Journal of Epidemiology 1993;22(suppl 2):S1–S133); and a conference in Lyon, France in 1994 (reported in Occupational Hygiene 1996;3:1–208). Stewart and Dosemeci's bibliography of exposure assessment literature1 was also consulted, as were two review articles on exposure assessment in case–control studies.2,3
This review does not include studies of the following issues: proxy reporting, questionnaire delivery methods, ergonomic or work organisation exposures, and industry specific job–exposure matrices developed for cohort studies or industry based nested case–control studies.
The paper begins with a review of the most common exposure assessment methods used in population based case–control studies: subject reported occupational histories; use of occupational histories to infer exposure (that is, job–exposure matrices); self reported exposures; and expert assessment of exposures. It then examines additional methods which should allow more quantification of exposure: using measurements from exposure databases; using determinants of exposure studies to design exposure questionnaires; and measuring exposures among study subjects. Some of the terminology of validity and reliability studies is briefly described in the appendix.
OCCUPATIONAL HISTORIES
Collection of data on each subject's employment history, including product manufactured or service provided, job title, and usual duties, has become a routine part of many population based case–control studies using questionnaires. Studies using medical records, birth or death certificates, or other administrative data sources also usually include information on at least one job, often the most recent or usual job. Data on occupation and industry, whether from medical records or questionnaires, are usually derived from self reports or, when a subject is dead or in some way incapable, reports by next of kin.
A number of studies4–16 have examined the validity of self reported occupational histories by comparisons with company, pension, or union records; others have examined reliability by comparisons to previous self reports (table 1). Validity and reliability studies report rather consistent results, with levels of raw agreement for employer, job classification, person-years in a job, and start and termination dates generally in the range of 70–90% and with kappas from 0.65 to 0.82.5,7–16 Some studies within single industries found lower agreement on the number of work area assignments (50.6%),10 job title held longest (67%),8 and starting date (62%),11 perhaps because there may be minor distinctions between jobs within a company that are difficult to elicit by questionnaire.
The reliability and validity of occupational histories have also been tested by examining whether they can be used to accurately assign exposures. Rosenberg17 examined the reliability of estimates of cumulative PCB exposure based on occupational histories taken first in 1976, then again in 1979. Average measured exposure in each job was cumulated using the two job histories and the results compared: the intraclass correlation was 0.94 for early jobs and 0.96 for jobs in the most recent 10 years. Birdsong and colleagues18 assessed the validity of solvent exposure assignments based on self reported job histories by comparisons to those based on personnel records, and found that 99% of subjects were correctly classified as exposed or unexposed, but that the correlation between measures of exposure duration was only moderate (Pearson r = 0.63).
True validities of self reported jobs are likely to be somewhat higher than measured, as the reference standards are not likely to be true gold standards.5 For example, personnel records may not reflect changes in the tasks an employee performs if the title or pay has not changed. Conversely, human resources personnel may record a change in job title, when the functional characteristics of a job may be unaltered. In addition, jobs may simply be labeled differently in administrative records than by employees. Reliability studies should avoid problems with job title terminology, because they test recall of a person's own way of describing a job.
Two reliability studies12,14 raise the parallel issue of job coding: even if the job histories are well reported by study subjects, the way that research staff codes each job can affect their exposure group assignment. Wärneryd and colleagues14 found the worst agreement for difficult to code clerical and administrative jobs. Kennedy and colleagues19 found that errors in coding jobs were responsible for reducing an odds ratio for asthma of 1.5 to 1.0, because a job's potential for exposure to known allergens could not be properly classified when incorrectly coded.
Factors consistently found to reduce validity and reliability of occupational histories include increasing complexity of a subject's occupational history, shorter duration of a job, and longer period of recall.5–7,10,11,13,14,17,18 Other factors, such as age, race, language, and education had either little or no association with recall.5,7,8,13,18 Two studies were able to check for differences in validity of job reporting between cases and controls, and found no evidence to suggest recall bias.5,10
Given the reasonable quality of self reported occupational histories, epidemiological analyses by occupation and industry are likely to be useful initial steps towards the identification of hazardous exposures. Where exposures to complex mixtures are of interest, an industry or occupation may be an appropriate way to represent the combined exposures. The main shortcoming of analyses by occupation and industry is that they do not identify specific agents as risk factors. For example, painters may be exposed to solvents, but they also have varying potential for exposures to other agents, such as metals, pesticides, isocyanates, epoxies, wood dust, formaldehyde, and silica. In addition, although most painters use solvents, some may not. An increased risk in a job can only be suggestive of risks from particular agents. In addition, the lack of an association with a particular job may mask the effect of an agent to which only some individuals in the job are exposed.
EXPOSURE MATRICES: USING JOBS TO INFER EXPOSURES
In an effort to use the reasonably accurate recall by subjects of their occupational histories, but overcome the indirect connection to exposures, there was a movement in occupational epidemiology starting in the 1980s to develop job–exposure matrices (JEMs). JEMs list a wide range of occupations and/or industries on one axis, a wide range of exposure agents on the other, and the cells of the matrix indicate the presence, intensity, frequency, and/or probability of exposure to a specific agent in a specific job. In some JEMs, calendar period may form a third axis of the matrix. Industry based cohort studies have long used this matrix format for assigning exposures to cohort members' job histories within a company; the new idea was to create JEMs which could describe exposures across the range of jobs and industries that might be observed in a population based study.
Several such JEMs, using European or American occupation and industry coding systems, have been made publicly available (hereafter, these are called “generic” JEMs, in contrast to study specific JEMs). Some were created using expert judgement, usually aided by published literature and communication with industry personnel19–24; others were based on observations of potential exposure to hazardous agents in walkthrough surveys of a representative sample of US worksites25; a more recent Finnish JEM used a database of exposure measurements to aid expert assessments26; and a Swedish JEM of magnetic field exposures was created using measurement data.27
Table 2 lists studies23,29–45 which have attempted to examine the validity of generic JEMs.20–23,25,26,28 Only one of these used quantitative exposure measurements as the basis of evaluation. Tielemans and colleagues43 compared the JEM of Hoar and colleagues 20 to urinary measurements of toluene, xylene, and chromium, and found only slight agreement and low specificities and sensitivities. Several studies examined agreement between two generic JEMs. Most found kappas to be slight to fair.23,30,33,34,41 Other investigators have compared JEMs to self reports23,30,33,34,36,45 or expert assessments.31,32,35,37,38,40,42,45 Although neither self reports nor expert assessments can be considered gold standards, sensitivity and specificity of the JEMs against these assessment techniques were the usual comparison measures. Sensitivities were most often below 0.5, with specificities usually higher, above 0.85. Kappas for agreement tended to be low, similar to the JEM to JEM comparisons. Some studies compared odds ratios derived from generic JEMs and study specific expert exposure evaluations.31,40 Although both methods produced increased odds ratios where expected, only the study specific expert assessments produced clear exposure–response trends. In McNamee's evaluation, a study specific JEM also performed better than a generic JEM.40 Study specific “internal” JEMs are, in most instances, essentially the same as expert assessments; these are discussed later in the review.
Most authors investigating the properties of generic JEMs concluded that they were not sensitive, and in only slight to fair agreement with techniques in which they had more confidence. The often low sensitivities of generic JEMs are understandable given the number of cells for which exposures need to be evaluated, and the often unpredictable circumstances in which exposures may occur in industry. A major factor which contributes to the poor performance of generic JEMs is their inability to account for variability in exposures within jobs or, in most cases, across time.19,31,35,36,41,45 In addition, generic JEMs may not be useful if the jobs or exposures under investigation are not included in the matrix, or are grouped in such a way as to obscure their impact. These limitations have tempered the early enthusiasm for generic JEMs and promoted study specific exposure assessment methods.
SELF REPORTED EXPOSURES
Questionnaires used in case–control studies commonly ask about more than a subject's occupational history, querying use of specific agents, trade name products, or classes of compounds. Over the past two decades, there have been numerous reports4,9,10,42,43,46–72 examining the validity and reliability of this method of exposure assessment (table 3).
Many have compared self reported exposures to industrial hygiene measurements of exposure to one or a few agents. Most of these have found significant associations between the two measures, though the proportions of variance in exposure explained (R2) by the self reports have varied widely, from as low as 0.03 to as high as 0.71, with a median of about 0.2.47,49,51,53,58,67–69 Some of the problem is likely to lie with the gold standard. Self reported exposures are often elicited to represent “usual” exposures, whereas exposure measurements quantify exposure over individual shifts. Exposures are well known to be extremely variable over time and place,47,49,51,53,58,68 so even a single worker may have measurements on different days that vary by orders of magnitude. This day to day variability can account for a large proportion of exposure variability, but is not meant to be explained by self reports.47,51,68 When Kromhout and colleagues47 restricted exposure variability to the between task variability estimated by the subjects, the median proportion of variance explained improved somewhat, from 0.14 to 0.23; though the range over all plants and substances became even wider (0 to 0.62).
Two studies summarised the validity of self reports against quantitative measurements in terms of sensitivity and specificity.43,62 Both measures were extremely variable, ranging from 0 to 0.85 and from 0.34 to 1.0, respectively.
Other studies have compared self reported exposure estimates to estimates by experts (note that sometimes the experts used the self reported exposures or jobs as one of their data sources). In these studies, kappa for agreement was the most frequent measure of validity. Once again, a striking characteristic of the measures of agreement was their variability from study to study and within studies for different agents, with kappas varying from −0.05 to 0.94, median ∼0.6.9,56,61,63,64,70
A few studies examined the reliability of self reported exposures estimated at different points in time. Kappas and intraclass correlation coefficients ranged from 0.36 to 0.84, median ∼0.6.50,54,56,59,65 Proportions of variance explained in continuous measures ranged from 0.16 to 0.84, median ∼0.6.4,71
Two studies examined the characteristics of both generic JEMs and self reported exposures. Rybicki and colleagues42 compared the two methods to an expert industrial hygiene review of exposures to copper, lead, and iron. They found that self reports had much higher sensitivities (0.65 to 0.84) than the JEM25 (0 to 0.21), and slightly improved specificities (0.88 to 0.96 versus 0.86 to 0.93). Tielemans and others43 used urinary measurements of chromium, toluene, and xylene as the basis for validity comparisons. Again sensitivities were higher using exposure self reports (0.41 for chromium and 0.85 for the solvents) than for the JEM20 (0.26 and 0.6, respectively); however, specificities suffered as a result (0.68 for chromium and 0.34 for the solvents versus 0.79 and 0.63 respectively for the JEM), and therefore so did positive predictive values.
Given the variability in subjects' ability to accurately and reliably report their own exposures, it is worthwhile to consider whether there are characteristics that are consistently associated with improved reporting. Investigators have found that subjects were better able to estimate exposure to agents which they can easily sense, for example, solvents they can smell,47,52 dusts with larger particle sizes,68 and vibrations they can feel.65,72 In a similar vein, they were more able to report exposures when queried in terms they recognised, for example, “oils and greases”, “degreasers”, or “stainless steel”, rather than about specific chemical compounds, for example, “chromium” or “imidazoline”.55,62 Those involved in the purchasing or selection of chemicals were more likely to accurately recall exposures (for example, farmers or applicators using pesticides),57,66 than labourers who were not involved in such tasks (for example, farmworkers harvesting crops).73 Most investigators prompt recall with a list of exposure agents of interest. This method resulted in higher sensitivities than open ended questioning, without an equivalent loss in specificity.57,62,74 Other characteristics of subjects, such as age, sex, duration of employment, socioeconomic status, education, disease symptoms, and language had little or no effect on the accuracy of reporting exposures.42,55,59,65,68
An important concern with exposure self reports is recall bias—that is, whether reporting is influenced by disease status. Most investigators who compared the responses of cases and controls found little or no difference in the validity or reliability of their exposure assessments.55,57,63,70 Rodvall and colleagues64 did find some variations in the accuracy of reporting between cases and controls; for some agents cases were better estimators, for some controls were better, but for most agents there was little substantive difference. A recent study indicated that exposures volunteered on open ended questioning were more likely to be subject to recall bias than exposures cited after probing with a list of agents.75 There is also evidence that the potential for recall bias may be greater in studies which use subjective measures of both exposure and outcome (a design more commonly used in cross sectional studies).58
A difficulty that subjects face when deciding whether to report exposures is the lack of relative or objective benchmarks against which to judge their work conditions. For example, office workers whose building was sprayed with insecticides might consider themselves exposed, but might not give the same answer if asked to compare their exposure to that of pesticide applicators. In the study of Ising and colleagues,67 subjects were able to categorise their noise exposure intensity very well; they were provided with examples of well known machines against which to gauge each noise category. In the studies of Kromhout and colleagues,47 Hertzman and colleagues,49 and Teschke and colleagues,51 workers who rated only their own exposures tended to do so less well than workers or supervisors who ranked exposures in all jobs, illustrating that even relative comparisons help subjects put their exposures in context.
The variable quality of self reported exposure information indicates that although subjects can reliably and accurately report exposures in certain circumstances, it is also possible for subjects to provide exposure data of such low quality that true exposure–effect relations will be obscured or even reversed in direction.76 It is incumbent on study designers to consider features which improve subjects' reporting accuracy, including prompted questions about agents they can sense, using familiar terms common in worksite discourse, and presenting guideposts which will help them to place their exposure in relation to that of others.
EXPERT ASSESSMENT OF EXPOSURES
There has been an increasing trend to use experts, such as occupational hygienists, chemists, engineers, and other professionals, to infer exposures from job histories or make exposure estimates based on review of subject reported information. Experts are expected to have a better vantage point than subjects: by training, they understand the mechanisms of occupational exposures and know where to find data about them; within the context of a study, they know the types of exposures considered relevant; and based on study data, they have an overview of the range of jobs whose exposures need to be estimated. But experts also bear some handicaps: they may not be familiar with many of the jobs and industries which appear in subjects' occupational histories; and unless they have detailed reports from subjects, they are certainly unlikely to be aware of conditions present in specific worksites of subjects. How these trade offs balance can be examined through studies of the validity and reliability of experts' exposure assessments (table 4).43,47,51,77–93
Because expert assessments have generally been considered the best possible exposure estimation method short of exposure measurements,2 studies examining their validity have exclusively used comparisons to measurements. As in similar tests of subject's self reports, these validity studies have examined experts' estimates of exposure intensity for only a few agents. Many have reported results in such a way that the proportions of variance explained can be compared. As noted for self reported exposures, variability in the validity results is the most striking feature, with proportions of variance explained ranging from 0 to 0.86, with a median of about 0.3.47,51,81,84,86,92 These results are slightly better overall than those of self reports. As with self reported exposures, it is likely that a portion of the unexplained variability is caused by day to day variation in measured exposure. The report of Kromhout and colleagues47 excluded this variation, and found a considerable improvement in the median proportion of variance explained, from 0.25 to 0.45; though the range over all plants, substances, and hygienist estimators once again increased somewhat (0 to 0.63).
Two studies examined validity in terms of sensitivity and specificity.43,88 The sensitivities were extremely variable, ranging from 0.21 to 0.79, median 0.35, but specificities were higher and more stable, from 0.91 to 0.98. In studies where exposure prevalence is low, as in most case–control studies, it is vital to maximise specificity to minimise attenuation of effect estimates as a result of exposure misclassification94; therefore the high specificities are an encouraging result.
Studies examining agreement between experts' ratings have mainly compared exposure assessments of different experts, with kappas or intraclass correlation coefficients ranging from 0 to 1.0 with a median of about 0.6.51,78,79,82,83,87,90,91 Two studies have examined repeatability of ratings by the same experts, with similar results (kappas from 0.26 to 0.77, median ∼0.6).89,91
Three of the studies examining the validity of experts' assessments against exposure measurements similarly examined the validity of self reports, so provide a basis for comparison. Kromhout and colleagues47 found slightly higher proportions of variance in solvent and dust measurements explained by hygienists' estimates, as did Teschke and colleagues51 in a study of chlorophenate fungicide exposures. In the study by Tielemans and colleagues43 of solvent and chromium exposure, sensitivities were higher for self reported exposures, but specificities and positive predictive values were higher for the experts' estimates.
Although expert assessments are often thought of as a single method, many different assessment structures and tools can be used by experts to assign exposures in case–control studies. One common structure involves using a subject's job description as the basis for assigning exposures, another is to have experts estimate exposures of jobs and/or industries, without subject supplied information. The data used to create exposure estimates are often published literature and judgement, as used in many of the first generic JEMs.20–23,28 “Internal” JEMs differ from generic JEMs in that the exposures and jobs selected for assessment are study specific, and the assessors can be chosen for their particular expertise in these areas. Experts' estimates can be made subject specific, usually by providing experts with subjects' self reported exposure and job duty information. In a method developed by Gérin and colleagues95 and elaborated for more jobs by Stewart and colleagues,96 experts are guided by subjects' answers to detailed questions about tasks, materials, equipment, and control measures in occupation or industry specific modules. Finally, some expert assessment methods augment the above tools with whatever measurement data might be available, for example, measurements of similar jobs or industries from national exposure databases.97
Several studies have compared the validity and reliability of different levels of expert assessment. Stewart and colleagues93 evaluated experts' assessments of formaldehyde exposure in manufacturing plants, starting with information on job title, then adding department, industry, date, and plant reports in stages. There was little difference in the quality of the assessments with the amount of data provided. Similarly, de Cock and colleagues86 found little effect on experts' estimates of captan exposure among fruit growers between phases of assessment which started with a video about factors affecting exposure, then added information on pesticide application tasks, and finally information on pesticides. Segnan and colleagues87 compared assessments by experts based on occupational histories to assessments based on industry specific modules (using as the gold standard, the same experts' estimates with additional product information and exposure measurements). They found little change in sensitivity using the industry specific modules, but median specificities increased from 0.52 to 0.77. Tielemans and colleagues43 compared two very similar methods using urinary measurements of chromium, toluene, and xylene as the gold standard. Compared to using occupational histories alone, sensitivities increased slightly when industry specific questionnaires were used, specificities were nearly unchanged, and kappas increased.
Other investigators have examined the effect of offering industrial hygiene measurement data to the experts conducting the assessments. Hawkins and Evans80 examined the ability of occupational hygienists to estimate toluene exposures of workers in the chemical industry, and found that initial estimates without data overestimated exposures by more than twofold, but that offering some limited measurement data allowed the hygienists to “calibrate” their estimates so they were less biased. Post and colleagues81 examined hygienists' estimates of exposures to styrene and methylene chloride among polyester factory workers. Although the relative ranking of jobs did not seem to improve as the hygienists were provided with additional measurement data, the added data did improve their classification of jobs into quantitative exposure categories.
Other factors which might influence the validity and reliability of experts' assessments include the agents being assessed, and the expertise of the assessors. Segnan and colleagues87 found higher intraclass correlations for insecticides, fungicides, nickel, copper, chromium, and aliphatics hydrocarbons than for specific pesticides, inorganic compounds, and halogenated organics. Sensitivities and specificities followed a similar pattern. Benke and colleagues45 found that kappas for agreement were higher for cutting fluids, welding and soldering fumes, oils and greases, and solvents than for specific agents such as phenol, vinyl chloride, acrylonitrile, and toluene di-isocyanate. Post and colleagues81 found that hygienists were able to rank exposures to methylene chloride better than styrene, perhaps because of differences in the odour thresholds. These studies suggest that experts are influenced by some of the same factors as subjects—that is, sensory perceptions affect judgements, and estimation is easier for broad classes of agents than for specific chemical compounds.
Some studies have examined the extent to which prior expertise affects assessments. In a study of fungicides in sawmills, Teschke and colleagues51 found that lumber industry hygienists had higher inter-rater agreement, but the validity of their exposure estimates was very similar to that of hygienists from other industry sectors. In their study of pesticide use in fruit growing, de Cock and colleagues86 did not find a consistent pattern for inter-rater agreement between their three groups of experts, but hygienists and pesticide experts gave more valid ratings than fruit growing experts, suggesting that the critical expertise is understanding the exposure rather than intimate knowledge of the work activity.
The evidence to date on expert assessments supports the belief that experts are better able to estimate exposures than study subjects, though this evidence is not as strong or consistent as epidemiologists might hope. Experts' estimates can be so poor that true exposure–effect relations are obscured or even reversed in direction,76 indicating the value of testing reliability and validity for the most important exposures in a study, and ensuring that experts have access to information that may incrementally improve performance, such as subject reported exposures and work conditions, and measurement data.
QUANTITATIVE DATA
The above review of exposure assessment methods in common use in case–control studies indicates that there remains much room for improvement. Incorporation of quantitative exposure measurements into case–control studies has always seemed a quixotic goal, but developments in occupational hygiene data collection, management, and analysis suggest several means to systematically include measurements in exposure estimation for population based studies.
Exposure databases
Exposure databases are not new—data on ionizing radiation exposures have been collected on designated workers since 1950 in Canada98 and elsewhere. The Mine Safety and Health Administration in the United States has been storing data on coal dust, silica dust, and other mining exposures since 1970,99 and the German Institute for Occupational Safety began its comprehensive chemical exposure database a couple of years later.100 However, the number of such databases98–107 has increased substantially over the past two decades (see examples in table 5), with advances in computer technology. International conferences have been held to promote thoughtful data collection and compatibility between data sets.108–110
Administrative exposure data sets have only rarely been used in case–control studies, but they present many interesting possibilities. Databases such as the National Dose Registry in Canada offer the opportunity to assign cumulative radiation exposures over five decades to individual study subjects, since personal identifying information has been retained in the registry.98 However, this level of detail is the exception.
Most exposure databases include job and industry information, but no data identifying individuals whose exposures were measured. This means that average exposures for an occupation and/or industry can be calculated and used to estimate exposures of subjects with those jobs. Of course, this method does not account for within job variations in exposure, and is not helpful where there are no measurements for a particular job–exposure combination. These problems might be addressed in part by using database information as only one component of exposure assessment. For example, Stewart and Stewart97 proposed supplementing detailed occupational questionnaires and job specific modules with data from the US Occupational Safety and Health Administration Integrated Management Information System. The potential for tailoring database information to individual subjects depends on the supplementary data fields included in the database. For example, if information on tasks, control measures, raw materials, etc are included, as in the French COLCHIC system,106 reports by subjects about these conditions in their own worksites could be used to adjust job based exposure estimates.
Given that exposure measurements in administrative databases are not likely to have arisen from subjects' workplaces, validity and reliability studies of estimates derived from databases should be conducted. There are other possible problems with administrative data. The original purpose of data collection (for example, complaint, compliance, research), changes in measurement techniques, and clustering of data in one or a few workplaces, all have the potential to bias exposure measurements. If information on these factors is included in the database, it may be possible to adjust for any biases using empirical modelling.111
Determinants of exposure studies
A method which holds promise for improving the validity of exposures assessed by questionnaires is to guide the formulation of questions and interpretation of responses using results of “determinants of exposure” studies. Such studies examine which characteristics (for example, workplace, process, employee) are associated with increased or decreased exposure levels. There is a growing body of literature on the determinants of exposure in a wide range of industries.112 Factors which have been examined as potential exposure determinants are extremely varied, for example, type of facility, worksite construction materials, industrial processes, automation, raw materials and machinery used, geographical location, indoor versus outdoor work, ambient environmental conditions, tasks, work practices, training, ventilation, use of enclosures, skin contact, protective clothing, and cleaning facilities.
Translating these data into questions useful to assess exposures in case–control studies is not a simple process. Questions must be answerable by study subjects, therefore determinants such as tasks and equipment will be more feasible to query than technical ones such as air flow rates of ventilation systems. Given that determinants data are likely not to have been collected in the worksites or residences of the study subjects, it would also be necessary to consider the transferability of the information. Where determinants studies show consistent patterns and where there is greater variability between the determinants of interest than between worksites, it should be possible to develop useful questions to distinguish exposure levels.
Where sufficient information on exposure determinants is not available in existing scientific literature, researchers might consider designing their own determinants studies prior to embarking on an epidemiological investigation. There are some interesting examples of studies which have measured exposures in a large number of worksites to create predictive models for use in questionnaire based epidemiological research.112,113
Subject specific exposure measurements
An avenue for exposure assessment which has only rarely been used in case–control studies is direct exposure measurements of the study subjects. For outcomes with short induction and latency periods, measurements of current exposures may serve as reasonable surrogates for exposures in the disease induction period. Measurements of exogenous agents in biological tissues assess the body burden at the time the sample was taken, but can provide information on historical exposures in a limited set of circumstances—that is, where the chemical of interest has a sufficiently long biological half life, and the body burden is not affected by the disease or its treatment.114
There are a number of case–control studies which have used exposure measurements. For example, Floderus and colleagues,115 in a case–control study of brain cancer and leukaemia, made 924 magnetic field measurements of 169 jobs (those held longest) in the workplaces of study subjects. Veulemans and colleagues116 measured urinary metabolites of methoxy and ethoxy acetic acid in 1019 infertile men and 475 controls. Tielemans and colleagues43 measured levels of industrial solvents in the urine of 99 cases with reduced semen quality and 27 controls. Caldwell and colleagues117, and Scheele and colleagues118,119 measured pesticide levels in bone marrow and serum in adult and childhood cancer cases and controls.
One of the great difficulties of measuring exposures in case–control studies is the potentially wide geographical dispersion of study subjects. This logistical difficulty might be possible to overcome with advances in sample collection and preservation methods. For example, urine and semen samples can be collected by study participants in their homes and shipped to the study site. Blood samples can be collected by a family physician or local clinic and forwarded to the appropriate laboratory for analysis. Advances in occupational hygiene monitoring equipment over the past several decades also make it reasonable to consider mailing simple sampling equipment, such as passive dosimeters or electronic data loggers, to study subjects for exposure assessment. As an example, Kromhout and colleagues120 mailed magnetic field dosimeters to subjects of a cohort study in geographically dispersed locations in the United States.
If these more quantitative methods of exposure assessment are adopted in case–control studies, the issues involved will be similar to those faced by researchers using measured exposure data in cohort or cross sectional studies—that is, sampling strategy issues such as how many measurements to take, and epidemiological analysis issues such as whether and how to group subjects.120–124
DISCUSSION
This review illustrates that exposure assessment methods typically used in case–control studies, though often thought of as distinct from each other, are inter-related and interdependent. Generic job–exposure matrices have most often been based on experts' judgements. Some JEMs use self reports to provide estimates of the proportions of exposed individuals in each job.40,125 Assessments by experts almost always rely on self reports as the starting point, using job history data at a minimum, but often utilising subjects' exposure reports and sometimes information on work tasks and conditions. Self reports themselves are answers to questions formulated by experts. Not surprisingly then, the results of validity and reliability studies of these estimation methods show similarities. Foremost is the conclusion that questionnaire based methods commonly used in case–control studies do not produce consistently valid and reliable results, underscoring the importance of continued development and testing of methods.
Evidence to date also reveals a number of strategies which can optimise these exposure estimation methods. Self reported exposure estimates may be improved by using terms familiar to workers, by asking about exposures that can be smelled, seen, or felt by subjects, and by presenting benchmarks against which exposures can be gauged. Instead of asking about exposures themselves, subjects can be asked about factors related to exposures, but more likely to be known and accurately recalled (for example, tasks, raw materials, equipment, processes); empirical models can be used to relate these factors to exposures. Experts find it easier to make estimates for commonly used agents and classes of chemicals, rather than arcane individual agents. In addition, experts' assessments may be improved by providing experts with exposure measurement data, information about the properties of the agents, and data reported by subjects about their work conditions and exposures. Occupational history taking would benefit from techniques such as chronicling of major life events to enhance recall,126 particularly where the job history is complex, for example, multiple short term jobs or jobs in the distant past.
There are a number of issues important to exposure estimation methods which have not yet received much attention. Although studies have investigated the effect of time since a job was held on the quality of an occupational history,7,10,11,13,14,17,18 the effect of the duration of elapsed time on the validity of subjects' or experts' exposure estimates has not been examined. In many epidemiological investigations using experts, more than one expert is used, but the optimum number of experts and the value of independent versus consensus estimates has rarely been tested.88
Although many studies examining the validity of exposure estimation methods indicated rather disappointing performance, it is important to remember that gold standards are never perfect. This was particularly extreme for studies of generic job exposure matrices; all comparisons, except one, were to self reported or expert estimates of exposure. Studies of self reports and expert assessments more frequently used measured exposure levels as the basis for evaluation, usually using one of two techniques. Where continuous exposure estimates were made, proportions of variance explained or correlations were calculated. In almost every case, exposure estimates assigned to a study subject were compared to measurements of exposure taken on individual days, thus requiring the estimation method to predict not only subject to subject variability in exposure, but also day to day variability within subject. Short term variations in exposure are not thought to be related to body burden or disease development, except where biological half lives are very short.127 Therefore, for studies of chronic diseases, it would be more reasonable to test whether an estimation method is related to the long term average exposure level. In studies where only the presence or absence of exposure was estimated, sensitivity, specificity, and/or positive predictive value were used as the measures of validity. The issue of individual daily measurements of exposure versus long term average exposure is also a consideration here. But in addition, calculations of sensitivity and specificity require that the gold standard measurements be dichotomised. The definition of a value above which exposure “exists” is difficult and often arbitrary, for example, the analytical detection limit has often been used. Ideally the cut point would be set at a level above which there is disease potential, but case–control studies are often conducted at the initial stages of aetiological research, before such knowledge has accumulated. Another consideration in defining what constitutes exposure is that in most case–control studies exposure prevalence is low, so specificity is more important than sensitivity for minimising attenuation in exposure–response relations.94 Therefore it is usually better to use a stringent definition of exposure (for example, only highly exposed subjects considered exposed) in epidemiological analyses.
There is room for an increase in the sophistication of validation studies. In cohort and cross sectional studies, where quantitative measurements are usually made, the major methodological developments in exposure assessment in the past decade have focused on the benefit of grouping study subjects for analysis, based on similarities in exposure. By assigning subjects the mean exposure of their group, the precision of the exposure estimate is increased, and the error structure approximates the Berkson error model. The advantage is a reduction in misclassification bias that can attenuate the observed association in exposure–response analyses.121,123,128 Since the advantage of grouping was recognised, methodological research on quantitative exposure measurements for epidemiology has been directed at finding the best ways to group study subjects.120,122,124 It seems reasonable that validity testing of experts' or subjects' estimates should incorporate these methods. Thus in validity studies, instead of comparing exposure estimates for individual subjects to individual exposure measurements, the exposure estimation method could be used to group subjects and these groups compared to optimal groupings based on exposure measurements. This idea is an extension of that of Kromhout and colleagues,47 who examined the proportion of between group exposure variability explained by exposure estimates for individual subjects, as a way to exclude day to day variations in measured exposures. The proposed approach will provide a more reasonable (and likely less stringent) test of the validity of estimation methods.
In summary, among the exposure estimation methods in common use today, expert assessment is usually the best approach. All exposure estimation methods, whether by subjects or experts, can have low validity and reliability; they therefore need to be carefully designed using evidence about techniques which improve performance and, where possible, tested. A new generation of case–control studies could evolve if methods which incorporate exposure measurements are adopted. Direct measurements of study subjects, if the science and logistics permit, would be ideal. A more frequently feasible method would be to combine questionnaires and measurements—that is, subjects can be asked about factors shown to be related to exposures in determinants of exposure models, and the models used to predict exposure levels. If quantitative methods are embraced, many of the methodological developments in exposure assessment for cohort and cross sectional studies could be applied directly to case–control studies. In addition, the inclusion of exposure measurement data would extend the utility of results of case–control studies—in risk assessments and exposure standard setting.
APPENDIX: Selected terms used in validity and reliability studies
The following is a brief and simplified overview of some terminology used in the validity and reliability studies reviewed in this paper. For a full understanding, it is best to consult the methodological literature, some of which is cited below.
Note that although the following discussion separates terminology according to whether the measures are usually used in validity versus reliability studies, the measures are sometimes used in either type of study.
Common measures of validity when using a dichotomous classification of exposure—that is, exposed versus unexposed
-
Sensitivity—proportion of those truly exposed who are classified as exposed by the assessment method being evaluated (values between 0 and 1).
-
Specificity—proportion of those truly not exposed who are classified as unexposed by the assessment method being evaluated (values between 0 and 1).
-
Positive predictive value—proportion of those classified as exposed who are truly exposed (values between 0 and 1). This proportion depends on the sensitivity and specificity of the classification method and the prevalence of exposure in the population being assessed.
The effect of misclassification of dichotomous exposure estimates has been described in a number of methodological papers (see Flegal and colleagues76 and Dosemeci and Stewart94). Non-differential misclassification will usually attenuate relative risk estimates towards the null value. The resulting relative risk estimate will depend on the strength of the true relative risk and the extent of misclassification. If sensitivity and specificity are so low that their sum is less than 1, a relative risk estimate using the estimated exposure values will indicate an association opposite in direction to the true association.76 When the prevalence of exposure is low, as in most population based case–control studies, it is important for the specificity to be as high as possible (that is, >0.9, and ideally very close to 1) to ensure that the small exposed group is not diluted by a large number of unexposed individuals.94
Common measures of validity when using continuous measures of exposure
-
R2—proportion of the variance in true exposure explained by the exposure estimation method being evaluated (values between 0 and 1).
-
Pearson r—correlation coefficient (values between −1 and 1); sign the same as the slope of the relation between the true exposure and the estimated exposure, and magnitude related to degree of linear association between the two. The square of r is R2.
-
Spearman rank r—rank correlation coefficient (values between −1 and 1); same as Pearson r, except that it is based on the ranks of the true and estimated exposures, rather than the data itself.
The impact of misclassification of continuous exposure estimates is generally the same as for categorical data, and has been described in a number of papers (see Armstrong121). Non-differential misclassification will usually attenuate relative risk estimates towards the null value, with the degree of attenuation dependent on the true relative risk and the extent of misclassification. If the correlation coefficient is negative, a relative risk estimate using the estimated exposure values will indicate an association opposite in direction to the true association.
Main messages
-
The main techniques currently used for exposure assessment in population based case–control studies include generic job–exposure matrices (JEMs), exposure self reports by study subjects, and assessment of exposures by experts.
-
An extensive literature is now available with which to evaluate the validity and reliability of these methods.
-
Most generic JEMs do not perform well, no matter how they are evaluated. Self reported exposures are usually better than generic JEMs, but vary greatly in validity and reliability. The accuracy of self reports is improved by using terms familiar to employees, asking about agents that can be sensed, and providing relative or absolute benchmarks against which to gauge exposures. Expert assessments are usually somewhat better than self reports, though validity and reliability are also variable. Experts are aided in their assessments by subject reported data on exposures and work conditions, and measurement data. Careful design and evaluation are required for all exposure estimation techniques.
-
Exposure assessment methods which incorporate quantitative measurements are difficult in population based studies, but increasingly possible with improvements in measurement techniques and administrative databases. These methods offer the possibility of a new generation of exposure assessment in case–control studies.
Common measures of reliability
-
Percent agreement—percent of exposure estimates, estimated on two different occasions or by two different raters, which agree with each other (values between 0 and 100). This measure does not account for the proportion of agreement likely by chance alone.
-
Kappa—proportion of agreement beyond that expected by chance alone (values between −∞ and 1); for categorical measures of exposure.
-
Intraclass correlation—proportion of the total variability as a result of differences in exposure between subjects (rather than differences between repeated estimates for individual subjects) (values between 0 and 1); for continuous estimates of exposure.
Reliability (precision) is a component of validity, with the effect of non-differential misclassification indicated above. Landis and Koch129 gave the following verbal interpretations of the strength of the kappa statistic; these have also been used to describe intraclass correlations: poor = <0; 0–0.2 = slight agreement; 0.21–0.40 = fair agreement; 0.41–0.60 = moderate agreement; 0.61–0.80 = substantial agreement; 0.81–1 = almost perfect agreement.
Acknowledgments
This work was supported in part by an Izaak Walton Killam Memorial Fellowship. The authors greatly appreciate the helpful comments of Jane Schroeder, Hugh Davies, the reviewers, and University of North Carolina Epidemiology Department students, staff, and faculty who participated in a seminar series on this subject.
REFERENCES
Commentary
In their very comprehensive review on methods for assessment of occupational exposure in case–control studies, Teschke et al state that “among the exposure estimation methods in common use today, expert assessment is usually the best approach”. They do so, despite the fact that it is well known that subjective assessments by experts is of a relative nature1 and that in order to have a more quantitative assessment the experts have to be calibrated.2,3 The main reason for choosing experts can be traced back to the alternative methods of self reported exposures and generic job–exposure matrices (JEM) which, as they claim, suffer from severe limitations. Recently, the limitations and possibilities of exposure assessment on the basis of JEM were extensively discussed.4 From a somewhat broader perspective, expert assessment and JEM are not as different as often is being suggested. A study in which an expert judges the job history of every case and control, is actually applying a very detailed (job) exposure matrix where the input axis is made up by exposure determinants which the expert think of as being important. The problem with the case by case expert assessment is that the process of assigning exposure to an individual on the basis of determinants of exposure generally takes place in the black box made up by the mind and heart of an occupational hygienist or exposure assessor (in the best case). Teschke et al show that recently results of determinants of exposure studies (pointing at determinants of exposure such as physical properties of the agent, work environment, tasks, and use of control measures, including personal protective equipment) have increasingly become available to the expert and the field at large. With this in mind, I would like to propose that we use the result of such studies together with the hidden treasures in the mind and hearts of experts to elaborate deterministic exposure models. These models can subsequently be used to assign exposure to individual subjects on the basis of information collected on a priori identified determinants of exposure in standardised interviews (of next of kin) or questionnaires.5 In other words, experts should be used collectively to devise these deterministic–exposure models (DEM). The models will combine the specificity of experts and the structured approach of the JEM. Exposure assessment for case–control studies in this way will become more reproducible and reliable and less prone to biases and the resulting harsh critiques it is often (justifiably) exposed to.6
With occupational risk assessment becoming more quantitative, it is conceivable that case–control studies (in the general population) will become less popular. The main reason for this is that the retrospective nature and resulting limitations of the exposure assessment will at best produce semiquantitative estimates of past exposures. However, case–control studies on short term health effects, such as reproductive effects,7,8 as discussed by Teschke et al, point into a new direction. Banking of biological material in large community based studies (for instance, the European Community Respiratory Health Survey)9 together with adequate collection of deterministic information will enable the future exposure assessor to produce more quantitative estimates of (internal) exposure. In addition, much needed expert calibration studies have been shown to be possible with the introduction of simple sampling methods based on passive monitoring.7 Self assessment of occupational exposure10 and a more rigorous use of experts as described above are needed in order to have a future for community based occupational case–control studies. Nevertheless, everyone considering such a study should not go along that way without consulting the insightful review of exposure assessment methods by Teschke and her colleagues.