Article Text

Download PDFPDF

Reanalysis: lessons great and small
  1. David Kriebel
  1. David Kriebel, School of Health & Environment, University of Massachusetts Lowell, Lowell, Massachusetts 01854, USA; david_kriebel{at}

Statistics from

Debate and critique are fundamental to the scientific method. The closest scientists ever get to “proof” is the persistence of a hypothesis after repeated attempts to refute it. For this reason, free exchange of opinions about the interpretation of data is essential. In epidemiology, it can be useful to challenge a study’s conclusions by conducting a reanalysis of the data, with new investigators starting from different assumptions and using different methods. Consistent findings can strengthen confidence in the conclusions, as in the Health Effects Institute’s reanalysis of the Harvard Six Cities and American Cancer Society air pollution studies.1 2 But reanalysis can also create confusion and impede scientific progress if it is not done in the service of impartial inquiry. After initial comments on an important reanalysis of a study of beryllium and lung cancer in this issue (see page 379), I will step back to consider broader themes in reanalysis: why it is done, and the problem of conflicts of interest.

Schubauer-Berigan and colleagues3 conducted a reanalysis of a nested case–control study of beryllium and lung cancer originally published by Sanderson and colleagues.4 The reanalysis makes a valuable contribution by strengthening the original study’s finding of an association between beryllium and lung cancer. The authors investigated potential confounding and effect modification by birth year, and assessed the findings’ sensitivity to a small but potentially important methodological choice – the use of a “start” value when calculating the logarithm of exposure metrics to avoid taking the log of zero. The reanalysis revealed complicated temporal patterns in the lung cancer data which probably resulted from hiring patterns, secular trends in underlying lung cancer risk and smoking habits. These important determinants make it difficult to isolate the independent effect of exposure.

The essential conclusion of the Schubauer-Berigan reanalysis was that the strongest evidence for an association between beryllium and lung cancer was not found with cumulative exposure lagged by 20 years (as reported by Sanderson) but with average exposure lagged by 10 years.

The choices of summary measure of exposure and of a lag are necessary methodological steps when using any standard epidemiological model. But these choices probably represent awkward impositions of static concepts onto dynamic biological processes.5 Thus it is not surprising to find different investigators reporting different “best” lags or summary measures of exposure. It is likely that neither average nor cumulative exposure is, in a biological sense, the “correct” metric of beryllium dose, just as arbitrarily drawn time windows may only roughly capture dynamic changes in potency of exposures. If we could know the correct dose metric, we might expect to find that it was correlated with both cumulative and average exposures, as well as with calendar time and even with birth year (exposures often decrease with time as environmental controls are installed). But in such a scenario of correlated time dependent covariates, one cannot rely upon regression models to partition variance accurately between birth year and a summary measure of exposure which is only a rough surrogate for the true but latent dose metric.

The bottom line, then, is that these reanalyses strengthen the evidence by showing that under varying assumptions about the best ways to manage time dependent covariates, an association between beryllium and lung cancer remains; the fact that the reanalysis found a different “best” lag and summary exposure measure is of lesser concern.

Schubauer-Berigan and her colleagues at NIOSH have begun a large retrospective cohort study of U.S. beryllium production facilities, with quantitative exposure reconstruction. The results may help to disentangle the complexities of the time varying risk factors. In the meantime, beryllium users should continue to follow the recommendations of the International Agency for Research on Cancer6 and the U.S. National Toxicology Program7 that beryllium be treated as a carcinogen. Their conclusions were based on several lines of evidence, including two cohort studies which preceded the Sanderson study and its reanalyses.8 9

Deubner and colleagues (with financial support from a major beryllium producer) have criticised the Sanderson study, prompting Schubauer-Berigan's reanalysis published in the current issue.3 Deubner’s letter,10 Sanderson’s response11 and now the Schubauer-Berigan reanalysis illustrate the constructive contributions of open debate in science. But the beryllium industry has not limited its role to letters to the editor;12 among their recent efforts are a reanalysis of the Sanderson data that finds no evidence of an association between beryllium and lung cancer.13 A second paper from the same group provides a complex critique which they call “empirical evaluation” of the nested case–control study design arguing that the method has fundamental flaws.14 While framed as a methodological investigation, the results could also be aimed at refuting the Sanderson findings. A recent letter challenges the basic assumptions behind this critique.15

Anecdotally, reanalyses often seem to be undertaken by parties with a stake in the outcome of a study not favourable to their interests.16 This creates a distorting asymmetry in the evidence whereby studies which do not find associations between exposures and disease are never re-examined, while those that do (and where enough money is at stake) are re-examined. If public funds were to be used, one could imagine a different prioritisation of which studies to reanalyse, for example those where a false negative result might lead to much disability or death.

Parties with financial interests must proceed cautiously when entering scientific debates about their products or the health of their workers, and in particular when undertaking as complex a procedure as a reanalysis. In observational epidemiology, there are many assumptions and choices that must be made to translate a working hypothesis – beryllium causes lung cancer – into a relative risk. And, while these choices and assumptions should be made as transparently as possible, subject to scrutiny by other investigators, it is difficult in practice to avoid hidden assumptions that may bear on the results. Also, because epidemiologists are looking for weak signals inside highly uncertain data, it is much easier to “inadvertently” fail to find an effect than to artificially create one. For these reasons, analyses and reanalyses should be undertaken by investigators who are free of even the appearance of conflicts of interest.

Sadly, manufactured uncertainty has become an increasingly common strategy to slow down regulation of environmental hazards.16 Because of this climate, all investigators must take pains to insulate themselves from conflicts of interest. A company wishing to have a study reanalysed should create a firewall between itself and the researchers, perhaps through a free-standing science advisory board which would be charged with funding independent investigators.2 17 The board would protect the investigators from interference and guarantee that the results are published, regardless of the outcome.

Francis Bacon famously said: “truth emerges more readily from error than confusion”.18 Removing conflicts of interest from epidemiology can ensure that reanalysis truly advances science by finding errors, but avoiding confusion.


View Abstract


  • Competing interests: None.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles