Article Text

Download PDFPDF

Self-reported health problems and sickness absence in different age groups predominantly engaged in physical work
  1. Simo Taimela1,
  2. Esa Lr2,
  3. Antti Malmivaara4,
  4. Jaakko Tiekso1,
  5. Harri Sintonen3,
  6. Selina Justn1,
  7. Timo Aro5
  1. 1
    Evalua International, Vantaa, Finland
  2. 2
    University of Oulu, Department of Mathematical Sciences, Oulu, Finland
  3. 3
    University of Helsinki, Department of Public Health, Helsinki, Finland
  4. 4
    Finnish Office for Health Technology Assessment, FinOHTA/Stakes, Helsinki, Finland
  5. 5
    Mutual Pension Insurance Company Ilmarinen, Helsinki, Finland
  1. Dr S Taimela, Evalua International, PO Box 35, FIN-01531 Vantaa, Finland; simo.taimela{at}


Objectives: To study the associations between self-reported health problems and sickness absence from work.

Methods: The results of a questionnaire survey were combined with archival data of sickness absence of 1341 employees 88 males; 62 blue-collar in the construction, service and maintenance work within one corporation in Finland. Sex, age and occupational grading were controlled as confounders. A zero-inflated negative binomial ZINB regression model was used in the statistical analysis of sickness absence data.

Results: The prevalence of self-reported health problems increased with age, from 23 in 1830-year-olds to 54 in 5561-year-olds. However, in those aged 1830 years, 71 had been absent from work and in those aged 5561 years this proportion was 53. When health problems and occupational grading were accounted for in the ZINB model, age as such was not associated with the number of days on sick leave, but the young workers still had higher propensity for any sickness absence than the old. Self-rated future working ability and musculoskeletal impairment were strong determinants of sickness absence. Among those susceptible to taking sick leave, the estimated mean number of absence days increased by 14 for each rise of 1 unit of the impairment score scale 010.

Conclusions: Young subjects had surprisingly high probability for sickness absence although they reported better health than their older colleagues. A higher total count of absence days was found among subjects reporting health problems and poorer working ability, regardless of age, sex and occupational grade. These findings have implications for both management and the healthcare system in the prevention of work disability.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Sickness absence means non-attendance by an employee at work due to a (certified) health complaint when the employer expects attendance. Despite the straightforward definition, sickness absence has proved to be a complex phenomenon. In addition to illness, it has been associated with, for example, demographical and socioeconomic factors, organisational features, job content and attitudes to work.1 The key psychosocial predictors of sickness absence include individuals own perceptions of health and working ability.2 3

It is a common belief that older (supposedly in poorer health) employees are more absent from work than their younger (supposedly healthier) colleagues.4 5 However, the young seem to stay out of work due to minor health complaints more than older workers. Also some earlier studies have found that older age increases the risk of overall sickness absences, but decreases that of one-day absences.6

We investigated how age and self-reported health problems are associated with sickness absence within a cohort predominantly employed in physical work.


Study design and ethics

The design was cross-sectional: data from questionnaires were combined with records of demographics and sickness absence from the employers salary register. The Helsinki University Research Ethics Board approved the study, and it was performed according to the Declaration of Helsinki.


Inclusion criteria were permanent employment and age 1860 years. Questionnaires were sent to a cohort of 3115 employees in one corporation in September 2004. The proposed study design, implications of the trial and alternative options were explained in the cover letter. The letter also emphasised that taking part in the trial was voluntary and that employees would get the best treatment available and the full attention of the occupational doctor even if they did not want to participate. Those invited were told that they were free to withdraw from the trial at any point, and that this would not prejudice their treatment. At most two reminders were sent. The respondents signed an informed consent. Of the target group, 49 were employed in the field of construction industry: civil engineering, building contracting, technical building services and building materials industry. 51 were employed in installing, repairing, service and maintenance of buildings, industrial installations or communications networks.

Self-reported health problems

The self-administered questionnaire contained items about lifestyle, anthropometrics, sleep disturbances, work-related stress and fatigue, depression, pain, disability due to musculoskeletal problems and a prediction of future working ability. It included previously validated items714 (table 1).

Table 1 Questionnaire topics

The responses were interpreted on the basis of a priori defined cut-off limits. Subjects who reported problems with future working ability, pain, impairment due to musculoskeletal problems, insomnia or insufficient sleep, frequent stress or fatigue, or had a high depression score, were rated as having health problems (table 2). Furthermore, the presence of health problems were eventually classified as none, one or two or more in order to take into account coexisting health problems in each participant.

Table 2 Health problems: findings in one or more of these topics. Percentages have been calculated within the group

Sickness absence from work

Sickness absence data were obtained from the employers records, covering a one-year period from 1 October 2003 to 30 September 2004 (although without medical diagnoses). Data privacy was strictly followed. Records were checked for inconsistencies. Overlapping and consecutive spells of sickness absence were combined. The employer records the sick leave periods, including the dates when each spell started and ended. In the company involved in our study, permanent employees are paid a full salary during their sick leave from the first day. The blue-collar employees cannot complete their own certificates for any sick leave. White-collar employees must provide a written explanation for short sick leaves and a medical certification for sick leaves longer than three days.

Maternity/paternity leave and absence from work to care for a sick child are not included in the sickness absences.

We also received the sickness absence records of the non-respondents in an anonymous manner, which made it possible to compare the respondents and non-respondents as groups regarding sickness absence.


Sickness absence was operationalised as the accumulated number of days on sick leave during the one-year study period. When analysing how sickness absence depends on covariates (explanatory variables and prognostic factors), we initially tried four different types of regression models: the simple Poisson regression model, the zero-inflated Poisson model, the simple negative binomial model, and the zero-inflated negative binomial model (ZINB). It turned out that (i) there was great overdispersion in relation to the Poisson model, and (ii) an essential excess of zero absences compared with what could be reasonably expected in the simple non-inflated Poisson and negative binomial models. Therefore, as it was necessary to allow for both of these features, we concentrated on using the ZINB model in subsequent analyses.

The ZINB model15 16 starts by postulating that the study population is latently divided into two subsets: Asubjects with a very high propensity to have zero days on sick leave, and Bsubjects with substantial probability of at least one absence day. The zero-inflation part of the ZINB model predicts the odds of membership in the immune subpopulation A rather than in the susceptible subpopulation B. Dependency of these odds on covariates was modelled according to a logistic model, its regression coefficients describing the logarithms of the corresponding odds ratios associated with the covariates. The estimated odds ratios (with 95 CI) will also be presented in tabulated form. For easier interpretation and coherence with the negative binomial part below we switched the outcome to be membership of the susceptible subset B. This reparametrisation of the mathematically equivalent original model implies only a change of sign of the regression coefficients and the inversion of odds ratios from the original zero-inflation model.

It is further postulated that in the immune subpopulation A the probability of zero absence is simply 100. In contrast to this, in the susceptible subpopulation B the number of days on sick leave is assumed to obey the negative binomial distribution. In this negative binomial part of the ZINB model the mean number of absence days is assigned to be dependent on the relevant covariates according to a log-linear model. Hence, in this part a given regression coefficient represents the natural logarithm of the ratio of mean values of the response variable associated with a unit change in the pertaining covariate. When presenting results, the estimated ratios of means (with 95 CI) are reported. See the Appendix for a more detailed description of the ZINB model.

The parameters of the ZINB model were estimated by maximum likelihood using the function zeroinfl() in the package pscl15 attached with the R environment for statistical computing and graphics ( The models were compared using the Akaike information criterion (AIC), and goodness-of-fit was evaluated by comparing the marginal observed frequencies to the expected frequencies, the latter being based on the fitted model in classes of categorised outcome.


We received 1507 responses (48.4) of which 166 were excluded due to following reasons: inadequately filled questionnaire (n29), age-related pension granted (n1), part-time or disability pension granted (n24), or the subject did not provide consent to analyse sickness absence or pension records (n110). Additionally two subjects had missing absence data.

The final study population thus consisted of 1341 subjects. At the time of the questionnaire survey, the respondents were on average 44 years old (range 1961 years). Of them 12 were females, and 61 were blue-collar workers.

The distribution of sickness days among non-respondents was very similar to that in respondents (table 3). Non-respondents were on average somewhat younger (mean 40 years) than respondents. Five per cent of non-respondents were females.

Table 3 The prevalence of self-reported health problems and characteristics of the distribution of the number of days on sick leave by gender, occupational grade and age

A total of 12837 days of sickness absence were recorded in the study population during the 12 months. The distribution was heavily right-skewed in all age groups. Moreover, 42 had not been on sick leave at all, indicating a substantial zero-component in the response distribution (tables 3 and 4). The proportions of zero-absences were 31, 73 and 47 in blue-collar males, white-collar males and white-collar females, respectively. The mean numbers of absence days among those with any sickness absence were 19, 11 and 8 days in these three groups, respectively. In blue-collar males and white-collar females the proportions with no sickness absence were lower in young employees than among those at least 40 years of age. An increasing trend of absence days by age was observed among those with any sick leave in the male groups. Thirty one per cent of subjects reported health problems (table 3). Their share of the total number of days on sick leave was 61.

Table 4 The observed counts in 11 classes of the outcome variable and the expected frequencies predicted by the three fitted zero-inflated negative binomial regression (ZINB) models, including their values of the Akaike Information Criterion (AIC)

Our first regression model, Model 1, included as covariates: the combination of gender and occupational grade (categories: male and blue-collar, male and white-collar, female and white-collar), age (seven groups), and self-reported health complaints (none, 1, 2). The AICs were 28963, 20441, 7124 and 7029, for the simple Poisson, the zero-inflated Poisson, the simple negative binomial and the zero-inflated binomial (ZINB) model, respectively. Based on these figures we chose the ZINB model for the subsequent analyses and presentation of results. The statistical appendix provides instructions on how the estimated model coefficients can be translated into predicted probabilities of susceptibility of sickness absence and of mean numbers of days on sick leave for any combination of prognostic factors. As the baseline odds for susceptibility to any sickness absence was more than 50, the reported odds ratios are exaggerating the respective relative risks. Hence, we avoid direct quantitative interpretation of these odds ratios.

The results from fitting Model 1 are displayed in table 5. The high odds ratios for being susceptible to any sickness absence in male blue-collar and female white-collar workers, respectively, when compared to male white-collar employees were very consistent with the great contrasts observed in the proportions of workers with any sickness absence between these groups, as implied in table 3. The average number of sickness days among the susceptible to any sickness absence was about twice as high in male blue-collar workers as in male white-collar employees, but female white-collar subjects were not seen to differ from male white-collar employees in this regard. There was some evidence of an overall decreasing trend by age in the susceptibility to sickness absence by increasing age, but not for the average number of days on sick leave. The presence of health problems was associated with both the susceptibility to and the mean number of days on sick leave. Those who reported one health problem had on average almost twice the number of sickness absence days and those with two or more health complaints had both higher propensity for any sickness absence and 3.4 times higher total number of absence days than those who did not report any health problems, when adjusted for gender, occupational group and age (table 5).

Table 5 Predicting the propensity to being susceptible versus immune to any sickness absence (zero-inflation part) and the duration of sickness absence, if susceptible (negative binomial part)

In our second ZINB model, Model 2, we included as covariates gender, age, body mass index, alcohol consumption, depression score (DEPS score), stress and fatigue, shortage of sleep (in hours), daytime alertness (ESS score), pain, impairment due to musculoskeletal problems at work (scale 010), and self-predicted future work ability (categories: able to work, uncertain, unable to work). The goodness-of-fit improved from Model 1 (table 4). However, apart from age, occupational grade and gender, only musculoskeletal problems, insufficient sleep and predicted future work ability appeared to have any major effect on the outcome (data not shown). As it also became apparent that the independent effect of age was essentially similar within the broad age classes 1939 years and 4561 years, respectively, we pooled the age factor into three levels only.

We then fitted a third model, Model 3, with these covariates: combination of gender and occupational grade, age, musculoskeletal impairment at work, insufficient sleep and predicted work ability. The AIC was clearly smaller than in the previous models, and the expected counts were very similar to those of Model 2 (table 4). The results on age, gender and occupational grade were very similar to those from Model 1 (table 5) apart from some changes in the mean ratios across the subgroups defined by gender and occupational grade. In this model both the self-predicted future working ability and the score for musculoskeletal impairment were strong predictors for the number of sickness absence days (table 6). Among the susceptible, the estimated mean number of absence days increased by 14 for each rise of 1 unit of the impairment score. Those susceptible to any sickness absence and whose prediction of their future working ability was uncertain or not able had twice or three times as high mean number of days on sick leave, respectively, when compared to those whose own prediction on working ability was positive. In addition, insufficient sleep predicted a somewhat increased propensity for any sickness absence, but not the total number of absence days.

Table 6 Predicting the propensity to being susceptible versus immune to any sickness absence (zero-inflation part) and the duration of sickness absence, if susceptible (negative binomial part)


Main findings

The prevalence of health problems increased with age, and blue-collar workers had far more sickness absence days than white-collar employees. When self-reported health problems and occupational grade were accounted for, age was not associated with the total number of absence days, and older workers were less likely to stay out of work than the young. Self-reported health problems predicted sickness absence in a dose-related manner. Of the individual items of self-reported health problems, self-rating of future working ability and impairment due to musculoskeletal problems showed strongest associations with sickness absence.

Strengths and weaknesses of the study

Sickness absences serve as a measure of health in the working population when health is understood as a mixture of social, psychological and physiological functioning.17 18 Recorded sickness absence data have several advantages: the quality of the data in terms of coverage, accuracy and consistency over time is superior to that achievable via self-reports.19 However, their analysis is difficult with traditional statistical methods because a substantial fraction is clustered at value zero, and this proportion is greater than predicted by any basic probability model for count data. Also, the residual variability in the non-zero part of the distribution exceeds that predicted by a Poisson model for counts. For these reasons we chose the zero-inflated negative binomial (ZINB) regression model15 16 as our analysis tool, which provided a reasonably acceptable fit. Although it was perhaps not able to deal with all the complexity associated with this type of response variable, among computationally feasible approaches it is clearly more appropriate than the common simpler alternative models in dealing with both the extra-zero component and the overdispersion. However, the observed counts in response classes 12 and 2142 absence days were systematically lower than the expected counts predicted by the ZINB models, whereas in classes 36 absence days the situation was vice versa (table 4). This pattern suggests that the fit of the ZINB model was not as good as desired, although it was the best of the realistically available models. The relative peak at 36 days could be interpreted that the outcome distribution may in reality have more than two components: the excess zero part, a component centred around small values (36) of absence days, and a third component centred around a relatively high mean level, perhaps more than 84 days. It is difficult to evaluate what the quantitative implications are of this observed deficiency of our model to the validity and precision of the estimates based on it. One likely consequence is, however, that the confidence intervals reported here underestimate to some extent the true uncertainty associated with our estimation.

A healthy worker effect might be present if employees with worse health level (long-term absence and disability states) had not responded. This potential bias would underestimate the associations as the respondents would be healthier, and possibly have had less sickness absence than non-respondents. The participation rate was in line with other studies in occupational populations in many countries.20 In our study, the non-respondents were slightly younger than respondents. When comparing the distribution of absence days between respondents and non-respondents, there was no relevant difference in mean absence. Therefore we think that the study population is reasonably representative of the original target population in this respect.

As our study is based on cross-sectional data, there is a possibility of reverse causality. That is, sickness absence due to any reason could potentially modify the reporting of health problems. Although this may partly explain the results, especially because those on sick leave at the time of responding to the survey were also included, we believe that experienced health problems determine sickness absence, and not vice versa.

Some differences in comparison to previous studies

Besides age, gender and occupational grade, the assessment of future working ability and the score for musculoskeletal impairment were strong determinants of sickness absence, in line with our hypothesis and previous studies.21 22 Contrary to our expectations and earlier findings,2325 the prevalence of depression, fatigue or stress was fairly low and was not significantly associated with sickness absence in this cohort. Although greater decision authority predicts low sickness absence,26 27 it may increase the risk of psychological distress and fatigue,28 29 especially if the employees are exposed to high job demands. Our cohort mainly included blue-collar workers with low decision authority concerning which job tasks to perform, but good job-related autonomy concerning how to perform the task. This may partly explain our results that that the prevalence of psychological distress or fatigue was low (table 2) and not associated with sickness absence, and that the most frequently reported health problem was physical impairment from musculoskeletal problems. Neither alcohol consumption nor smoking explained the associations of self-reported health problems or age to sickness absence.

Many previous studies have reported that females have more sickness absence than males, but this was not the case in our study. Female white-collar workers had higher propensity for any sickness absence, if susceptible, but similar numbers of absence days as their male counterparts.

Meaning of the study

Construction workers are apparently at a greater risk of developing certain health disorders and sickness absence than workers in many other industries.30 31 Physically demanding job tasks and occupational injuries are likely determinants for the high prevalence. Subjects exposed to challenging tasks more likely report underlying health problems than subjects in sedentary tasks. However, this does not explain the inverse association between age and propensity to sickness absence.

The healthy worker survivor effect describes a continuing selection process: those who remain employed in a specific profession tend to be healthier than those who leave employment. This phenomenon is particularly true in the construction industry32 as well as in other physically demanding jobs. Maybe this partly explains the inverse association between age and propensity to absence, which was contrary to some previous reports.4 5 However, all employees participating in the present study were paid a full salary during their sick leave from the first day and there was no diversity in this respect due to age. We think that there may also be psychosocial and behavioural differences between the younger and older workers: perhaps their attitudes and values towards work are different. This may have implications for the prevention of work absence among young construction workers. In addition, irrespective of age, the healthcare system needs to address health and working ability, which are strongly related to sickness absence.

Unanswered questions and future research

It remains to be seen whether similar associations between age, self-reported health problems and sickness absence exist also in, for example, knowledge-intensive sedentary occupations. The order of the causalitythat is, that age and self-reported health problems determine sickness absencemust also be confirmed in prospective studies. Further research is needed to find out the medical, psychosocial and behavioural determinants of sickness absence in the young.

Main messages

  • Higher total counts of absence days were found among subjects reporting certain health problems and weakened working ability, regardless of age, sex and occupational grade.

  • One third of the subjects reported named health problems, but their share of the total number of days on sick leave was over 60.

  • When self-reported health problems, gender and occupational grade were accounted for, age was not associated with the total number of absence days, and older workers were less likely to stay out of work than younger employees.

  • A zero-inflated negative binomial regression model provided a reasonably acceptable fit to sickness absence data characterised by skewness, overdispersion and heavy clumping at zero value.

Policy implications

  • It is possible to identify individuals at a high risk of sickness absence with a simple health questionnaire among employees predominantly engaged in physical work.

  • Irrespective of age, the healthcare system needs to pay more attention to the health problems and working ability experienced by employees, as these are strongly related to sickness absence.

  • Psychosocial and behavioural differences between younger and older workers should be taken into account in the prevention of work absence among the young.

Statistical appendix

We provide here a detailed technical description of the zero-inflated negative binomial (ZINB) model (see also15 16). The outcome or response variable is denoted by Y number of days on sick leave during the one-year observation period, and it can obtain non-negative integer values. The ZINB model is a mixture of (a) the zero-inflation (ZI) part, and (b) the negative binomial (NB) part.

The zero-inflation part

We postulate that the study population is latently divided into two subsets:

A subjects with a very high propensity to have zero days on sick leave

B subjects with substantial probability of at least one absence day.

Let pB be the probability that an individual is susceptiblethat is, he/she belongs to subset B, and pA 1 pB is the probability of being immunethat is, the subject belongs to subset A. It is assumed that these probabilities depend on the individual values of the model terms X1, X2, , Xm that are appropriately constructed from the relevant explanatory variables or covariates, according to the common logistic regression model:

logit(pA/pB) a0 a1X1 amXm,

in which log stands for the natural logarithm function. Each coefficient aj (j1, , m) is interpreted as the change of the log-odds of the subject belonging to subset A rather than to subset B corresponding to a unit change in the value of covariate term Xj when all the other covariates are kept unchanged. Thus, ORjexp(aj), the antilog of aj, is the odds ratio describing the effect of a unit change in Xj on the chances of being immune rather than susceptible, adjusted for the other covariates.

Equally, this logistic model can be specified in terms of contrasting the odds for B versus A, in which case the regression coefficients will only have their signs changed, and the odds ratios will be inverted. In fact, when presenting our results (tables 5 and 6), we chose to display the relative odds in this way to describe the covariate effects on the probability of being susceptible rather than immune.

If the subject belongs to the immune subset A, the distribution of the response is assumed to be degenerate such that the probability of zero days on sick leave is 1.

The negative binomial part

When a subject belongs to the susceptible subset B, the response variable Y may get either a zero or any positive integer value. Let qy be the conditional probability of being exactly y days on sick leave, given membership in this subpopulation. This probability is assumed to come from the negative binomial (NB) distribution, obeying the following formula for any y0, 1, 2,

qy () y (1 )y1/ ( y 1/) (y1) (1/) 1

in which is the expected value or theoretical mean and >0 is the dispersion parameter of the NB distribution, and (u) refers to the gamma function evaluated at real number value u. Actually after some manipulation this probability can also be expressed in a simplified form as

qy y exp()/y! R(y, , )

which is a product of the simple and familiar Poisson probability formula and the more complicated function R(y, , ) describing the relative deviation of the NB distribution from the Poisson one at each value of y. The NB variance is (1 ), being obviously greater than which is the Poisson variance.

In the NB part of the ZINB model the mean number of days on sick leave in the susceptible is postulated to depend on the covariates according to a log-linear structure:

log() b0 b1X1 bmXm.

Here a regression coefficient bj refers to the change in the logarithm of the expected value per unit change in covariate term Xj keeping the other covariates constant. Accordingly, MRj exp(bj) is the ratio of mean responsesthat is, the multiplicative effect of a unit change in covariate Xj on the expected response among the susceptible and adjusted for the other covariates.

Note that we have the same set of covariate terms X1, X2, , Xm to predict both the probability pA of being immune and the mean response among the susceptible. However, it may well be that certain covariates Xj have no effect on predicting pA, in which case the parameters aj associated with these covariates in the ZI part are zero-valued, whereas in the NB component some other covariate terms Xk may have no effect on the mean response in the susceptible.

ZINB model: mixture of the two parts

Finally, the total or marginal probability Qy for a subject being exactly y days on sick leave during the one-year period is combined from the above probabilities as follows:

Q0 pA pBq0 probability of zero days,

Qy pBqy probability of y days for y1, 2,

Hence the marginal probability distribution is a mixture of the degenerate distribution (concentrated at zero) pertinent to the immune subjects and the NB distribution, which is presumed to hold for the susceptible individuals, such that the mixing proportions are pA and pB, respectively. The marginal expected value E(Y) of the response is a weighted average of the conditional means, which simplifies into E(Y) pB. The variance of this mixture distribution is var(Y) pB 1 (pA ) .

This general specification of the ZINB model contains the following special cases: the zero-inflated Poisson (ZIP) model is obtained when the dispersion parameter is put to approach 0. On the other hand, keeping positive but putting pA0, we get the non-inflated NB model. When both > 0 and pA0, the model reduces to the simple Poisson model.

The likelihood function is created straightforwardly from the definitions of the probabilities Qy expressed as functions of the 2m2 regression coefficients a0, a1, , am, b0, b1, , bm, and the dispersion parameter (see Moon and Shin16). Estimation of the parameters and assessment of their precision (by standard errors and confidence intervals) applying the principle of maximum likelihood can be computationally effected in some statistical programmes such as R, Stata, Limdep and S-Plus.

Predicting sickness absence by the model

We illustrate how the fitted model can be used for individual predictions on sickness absence days given any covariate profile. From the results of Model 3 reported in table 6 we find the following:

Case 1

Male, white-collar, age 30 years, no musculoskeletal impairment, sufficient sleep, and self-predicted working ability rated able. The baseline log-odds of 0.54 converts to baseline odds of exp(0.54) 0.58 and estimated probability 0.58/(10.58) 37 of being susceptible to sick leave. Given susceptibility, the conditional baseline mean number of days on sick leave is 5.67. Hence, the marginal expected value for this type of worker is 0.375.672.1 days.

Case 2

Male, blue-collar, age 50 years, musculoskeletal impairment score 7, insufficiency of sleep 2 h/night, predicted working ability not able. The log-odds for belonging to subset B is computed as


from which the estimated probability of susceptibility is exp(2.94)/1 exp(2.94)95. The mean number of sickness days for susceptible workers like him is obtained as

exp(1.730.340.1270.132(0.09) 1.13) exp(3.81) 45.2 days

from which the marginal expected value is 0.9545.242.9 days.

Case 3

Female, white-collar, age 42 years, musculoskeletal impairment score 3, insufficiency of sleep 1 h/night, predicted working ability uncertain. The log-odds for belonging to subset B is


from which the probability of susceptibility is estimated as exp(1.35)/1 exp(1.35)79. The mean number of sickness days for susceptible workers like her is obtained as

exp(1.730.260.1130.131(0.09) 0.69) exp(2.35) 10.5 days

from which the marginal expected value is

0.7910.58.3 days.

Adequacy of the ZINB model

In our application, the ZINB model proved to be a more suitable approach to analyse sickness absence data as compared with some popular but simpler models for discrete counts. It was certainly more appropriate than common procedures for continuous outcome variables, like normal-theory linear modelling or non-parametric testing. However, assuming complete immunity is obviously an oversimplification of having very low propensity of being on sick leave. On the other hand, inspection of observed and expected frequencies (table 4) suggested that the probability distribution of the response variable may actually be composed of three components: one with nearly zero mean, another with low mean, and a third with high mean value for the number of days on sick leave. Applications of finite mixture models with a low and a high mean component in analogous contexts have been reported,33 34 but these were based on the simpler Poisson distribution for the separate components. Fitting complicated mixture models would also require tailoring of special computing solution. In our case, it is difficult to say how essential the impact was of the shortcomings in our model specification. One likely consequence is that the reported standard errors and confidence intervals are apparently, to some extent, underestimating the uncertainty associated with the estimation on the interesting quantities. Nevertheless, we believe that allowance of a nearly zero mean and a high mean component in the ZINB model enabled us to capture essential features of the response distribution in order to obtain reasonably realistic estimates of the effects of relevant covariates and adequate predictions on the overall mean levels and variability of the number of days on sick leave.



  • Ethics: The Helsinki University Research Ethics Board for the Occupational Health reviewed the study plan and gave their approval in advance. Record number (Dnro): 28/E2/04 (23 April 2004). All subjects received written information regarding the study according to the principles of the Declaration of Helsinki. Only subjects who gave their signed informed consent were included in the study. The consent letters are stored with other study material.

  • Competing interests: ST and JT are shareholders of and SJ employed by Evalua International. EL, AM, HS and TA have no competing interests to declare.

  • Funding: Finnish Funding Agency for Technology and Innovation (TEKES); The Finnish National Fund for Research and Development (SITRA); Pfizer Oy. The authors work was independent of the funders.

  • Abbreviations:
    Akaike information criterion
    zero-inflated negative binomial