Article Text


Oral Session 10 – Methodological issues

Statistics from


P. Morfeld.Institut für Arbeitswissenschaften der RAG Aktiengesellschaft, Institut und Poliklinik für Arbeits- und Sozialmedizin der Universität zu Köln, Germany

Introduction: Excess years of potential life lost due to exposure (EYPLL) are an important measure of health impact in occupational epidemiology complementing rate or risk statistics. They are calculated in two steps: firstly, for each age group the product of the number of excess deaths in the exposed is multiplied by the expected remaining years of life at age of death; secondly, these products are summed up over all age categories. Recently, following a presentation at EPICOH 2001, this approach was extended by Park et al1 from SMR based calculations to EYPLL estimates based on Poisson regression models to estimate age specific and endpoint specific exposure effects. We investigated the limitations of this concept.

Methods: Counterfactual logic is used to explore whether EYPLL does measure the true excess years of life lost due to exposure (EYLL) without bias. This approach follows an abstract reasoning presented by Robins and Greenland.2

Results: I show that the total EYLL can be estimated unbiasedly by calculating the corresponding EYPLL. I further demonstrate by life table examples that the EYLL conditional on age at death and the EYLL for a specific cause of death, such as lung cancer, cannot be estimated unbiasedly without adopting speculative causal models. This potential bias can be fairly extreme.

Conclusions: EYLL estimates calculated from life tables or regression models, as presented by some authors for lung cancer or after stratification for age, are potentially biased. Although statistics conveying information about the advancement of disease onset are helpful in exposure impact analysis and especially worthwhile in exposure impact communication, we believe that attention should be drawn to the difficulties involved and that epidemiologists should always be aware of these conceptual limits of the years of potential life lost method when applying it as a regular tool in cohort analysis.




L. M. Carpenter, D. R. Cox, J. Doughty1, N. T. Fear1, G. Law1, J. Simpson1, E. Roman1, N. E. S. Maconochie2.University of Oxford, 1Leukaemia Research Fund, University of York, 2London School of Hygiene and Tropical Medicine, University of London, UK

Introduction: Occupational groups at increased risk of cancer were identified using a novel graphical approach applied to cancer registration data from England and Wales for 1971–1990, when over 3 million cancers were registered in adults aged 20–74 years.

Methods: Standard UK occupational classifications were bridge coded to form 212 occupational groups. Bridge coding of cancer sites resulted in 40 cancer sites/types. A graphical approach (using a simple empirical Bayesian technique) was applied because application of traditional statistical methods in the analysis of these data leads to multiple comparisons, with the inherent generation of chance associations.

Results: Occupational information was available and coded for approximately 1.3 million people (0.9 million men and 0.4 million women). Among men, clear excesses of cancer of the nose and nasal sinuses were observed among workers exposed to wood dust and bladder cancer among rubber workers. Also outstanding were excesses of lip cancer in outdoor workers, nasopharyngeal cancer in restaurateurs and cooks, and pleural cancer in asbestos exposed workers. Among women, excess risks of stomach cancer among workers exposed to organic dust were evident. Other notable findings suggest excess risk of nasopharyngeal cancer among waiters; liver cancer among cooks; cancer of the larynx, liver, and mouth/pharynx among publicans; and mouth/pharynx cancers among bar staff and kitchen hands.

Conclusions: As well as demonstrating the ability to identify several well-established associations between occupation and cancer, this novel graphical approach has the ability to identify new associations that may warrant further investigation. This approach may also be appropriate for the analysis of several other routinely collected datasets, such as mortality data, and other applications, such as genetic epidemiology.


E. A. Eisen1,2, I. Agalliu1, B. R. Coull2, S. W. Thurston3, H. Checkoway4.University of Massachusetts Lowell, Harvard School of Public Health, University of Rochester Medical Center, University of Washington School of Public Health, WA, USA

Introduction: To illustrate the contribution of smoothing methods to modelling exposure–response data, Cox models with penalised splines were used to reanalyse lung cancer risk in a cohort of workers exposed to silica in California’s diatomaceous earth industry.

Methods: Relying on graphic plots of hazard ratios as smooth functions of exposure, we evaluated the sensitivity of the curve to amount of smoothing, length of the exposure lag, and the influence of the highest exposures. Trimming and data transformations were used to downweight influential observations.

Results: The estimated hazard ratio increased steeply with cumulative silica exposure before flattening and then declining over the sparser regions of exposure. The curve was sensitive to changes in degrees of freedom, but insensitive to the number or location of knots. As the length of lag increased, so did the maximum hazard ratio, but the shape was similar. Deleting the two highest exposed subjects eliminated the top half of the range and allowed the hazard ratio to continue to rise. The shape of the splines suggested a parametric model with log hazard as a linear function of log transformed exposure would fit well.

Conclusions: This flexible statistical approach reduces the dependence on a priori assumptions. In the absence of an appropriate parametric form, splines provide exposure–response information that is useful for aetiological research and public health intervention.


S. McCracken, J. Langley, A.-MFeyer, J. Broughton.Dept of Preventive and Social Medicine, University of Otago, Dunedin, New Zealand

Introduction: Text descriptions of the circumstances leading to injury are stored electronically for the more than 220 000 work related injury claims per annum registered with New Zealand’s Accident Compensation Corporation (ACC). This study sought to evaluate methods for automatically classifying this data into injury mechanism categories.

Methods: Three machine learning techniques (Support Vector Machine (SVM), k-Nearest-Neighbour, and Naïve-Bayes) were used to classify 3000 injury descriptions into standard major and minor injury mechanism categories. A 60 fold learn and test regime (cross validation) was used to establish the accuracy of each technique in relation to category labels assigned by a person.

Results: The SVM was the most accurate classifier overall. For major categories, the microaveraged sensitivity of the SVM was 79.9% (95% confidence interval 78.4 to 81.3) and the specificity was 97.5% (97.3 to 97.7). For minor categories, the sensitivity of the SVM sensitivity was reduced to 53.1% (51.3 to 54.8) whereas the specificity increased slightly to 98.5% (98.4 to 98.6). The accuracy of the SVM as measured by Cohen’s kappa score was found to be 74% and 46% across the major and minor injury categories respectively. Inter-rater kappa agreement between human classifiers for major and minor injury categories was 88% and 76%. In terms of classification efficiency, manually classifying 3000 cases took more than 8 hours, whereas the SVM took 2.5 minutes for the same 3000 cases.

Conclusions: Automated techniques for efficiently classifying large volumes of free text data appear promising, despite not being as accurate as human classifications. The problems created by this greater inaccuracy, however, can be overcome to a certain extent, provided that the degree of systematic error associated with a classification technique is established per category and subsequently incorporated into the confidence interval of each category estimate.


I. Veyalkin.International Sakharov Environmental University, Minsk, Belarus

Introduction: This study evaluates proportionate cancer mortality among workers employed in the largest tanning plant in Belarus, situated in Minsk.

Methods: A total of 768 dye industry workers who were actively employed for a minimum of 6 months were followed from 1 January 1953 to 31 December 2000. There were 328 women and 440 men. Proportionate mortality ratios (PMRs) were calculated using the population of Minsk to generate expected numbers. Analyses by job (tanning/liming, dyeing/stuffing, and administration), age at which hired, latency, and duration of employment were conducted.

Results: Among women employed in the tannery, there was a significant excess of pancreatic cancer, based on eight deaths (PMR 3.13; 95% confidence interval 1.35 to 6.17). Six of the eight pancreatic cancer deaths occurred among women occupied in dyeing/stuffing workshops (3.67; 1.34 to 7.97), all among workers hired and made redundant between 1962 and 1984 (6.54; 2.6 to 13.4). Significantly high mortality was also shown for melanoma and skin cancers (4.55; 1.3 to 11.3) of females occupied in dyeing/stuffing workshops, all of whom were hired before 1970 (4.4; 1.21 to 11.37). We also ascertained a significant increase in mortality due to buccal cavity–pharynx cancers in male workers hired between 1971 and 1980 (5.13; 1.05 to 14.97) and uterine cancers in female workers hired before 1950 (2.9; 1.06 to 6.33) and made redundant before 1960 (3.4; 1.36 to 7.0). Using seniority analysis, we found significantly high mortality from melanoma and skin cancers in men with 6–10 years’ seniority (12.90; 1.5 to 46.6). Women employed for >10 years had significantly elevated mortality from melanoma and skin cancers (8.0; 1.0 to 28.46); uterine cancers (2.87; 1.05 to 6.2), and pancreatic cancer (4.9; 1.0 to 14.5).

Conclusions: Women in this tanning industry cohort experienced excess mortality of cancer of the pancreas, with suggested increase of melanoma, and skin, corpus, cervical and uterine cancers. The timespan before 1980 is characterised by high cancer mortality, which could be associated with direct black azo dyes used in leather staining. Chromium III compounds could promote pancreatic cancer.

View Abstract

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.