Article Text
Abstract
Objectives: To study the associations between selfreported health problems and sickness absence from work.
Methods: The results of a questionnaire survey were combined with archival data of sickness absence of 1341 employees 88 males; 62 bluecollar in the construction, service and maintenance work within one corporation in Finland. Sex, age and occupational grading were controlled as confounders. A zeroinflated negative binomial ZINB regression model was used in the statistical analysis of sickness absence data.
Results: The prevalence of selfreported health problems increased with age, from 23 in 1830yearolds to 54 in 5561yearolds. However, in those aged 1830 years, 71 had been absent from work and in those aged 5561 years this proportion was 53. When health problems and occupational grading were accounted for in the ZINB model, age as such was not associated with the number of days on sick leave, but the young workers still had higher propensity for any sickness absence than the old. Selfrated future working ability and musculoskeletal impairment were strong determinants of sickness absence. Among those susceptible to taking sick leave, the estimated mean number of absence days increased by 14 for each rise of 1 unit of the impairment score scale 010.
Conclusions: Young subjects had surprisingly high probability for sickness absence although they reported better health than their older colleagues. A higher total count of absence days was found among subjects reporting health problems and poorer working ability, regardless of age, sex and occupational grade. These findings have implications for both management and the healthcare system in the prevention of work disability.
Statistics from Altmetric.com
Sickness absence means nonattendance by an employee at work due to a (certified) health complaint when the employer expects attendance. Despite the straightforward definition, sickness absence has proved to be a complex phenomenon. In addition to illness, it has been associated with, for example, demographical and socioeconomic factors, organisational features, job content and attitudes to work.1 The key psychosocial predictors of sickness absence include individuals own perceptions of health and working ability.2 3
It is a common belief that older (supposedly in poorer health) employees are more absent from work than their younger (supposedly healthier) colleagues.4 5 However, the young seem to stay out of work due to minor health complaints more than older workers. Also some earlier studies have found that older age increases the risk of overall sickness absences, but decreases that of oneday absences.6
We investigated how age and selfreported health problems are associated with sickness absence within a cohort predominantly employed in physical work.
METHODS
Study design and ethics
The design was crosssectional: data from questionnaires were combined with records of demographics and sickness absence from the employers salary register. The Helsinki University Research Ethics Board approved the study, and it was performed according to the Declaration of Helsinki.
Participants
Inclusion criteria were permanent employment and age 1860 years. Questionnaires were sent to a cohort of 3115 employees in one corporation in September 2004. The proposed study design, implications of the trial and alternative options were explained in the cover letter. The letter also emphasised that taking part in the trial was voluntary and that employees would get the best treatment available and the full attention of the occupational doctor even if they did not want to participate. Those invited were told that they were free to withdraw from the trial at any point, and that this would not prejudice their treatment. At most two reminders were sent. The respondents signed an informed consent. Of the target group, 49 were employed in the field of construction industry: civil engineering, building contracting, technical building services and building materials industry. 51 were employed in installing, repairing, service and maintenance of buildings, industrial installations or communications networks.
Selfreported health problems
The selfadministered questionnaire contained items about lifestyle, anthropometrics, sleep disturbances, workrelated stress and fatigue, depression, pain, disability due to musculoskeletal problems and a prediction of future working ability. It included previously validated items7^{}14 (table 1).
The responses were interpreted on the basis of a priori defined cutoff limits. Subjects who reported problems with future working ability, pain, impairment due to musculoskeletal problems, insomnia or insufficient sleep, frequent stress or fatigue, or had a high depression score, were rated as having health problems (table 2). Furthermore, the presence of health problems were eventually classified as none, one or two or more in order to take into account coexisting health problems in each participant.
Sickness absence from work
Sickness absence data were obtained from the employers records, covering a oneyear period from 1 October 2003 to 30 September 2004 (although without medical diagnoses). Data privacy was strictly followed. Records were checked for inconsistencies. Overlapping and consecutive spells of sickness absence were combined. The employer records the sick leave periods, including the dates when each spell started and ended. In the company involved in our study, permanent employees are paid a full salary during their sick leave from the first day. The bluecollar employees cannot complete their own certificates for any sick leave. Whitecollar employees must provide a written explanation for short sick leaves and a medical certification for sick leaves longer than three days.
Maternity/paternity leave and absence from work to care for a sick child are not included in the sickness absences.
We also received the sickness absence records of the nonrespondents in an anonymous manner, which made it possible to compare the respondents and nonrespondents as groups regarding sickness absence.
Statistics
Sickness absence was operationalised as the accumulated number of days on sick leave during the oneyear study period. When analysing how sickness absence depends on covariates (explanatory variables and prognostic factors), we initially tried four different types of regression models: the simple Poisson regression model, the zeroinflated Poisson model, the simple negative binomial model, and the zeroinflated negative binomial model (ZINB). It turned out that (i) there was great overdispersion in relation to the Poisson model, and (ii) an essential excess of zero absences compared with what could be reasonably expected in the simple noninflated Poisson and negative binomial models. Therefore, as it was necessary to allow for both of these features, we concentrated on using the ZINB model in subsequent analyses.
The ZINB model15 16 starts by postulating that the study population is latently divided into two subsets: Asubjects with a very high propensity to have zero days on sick leave, and Bsubjects with substantial probability of at least one absence day. The zeroinflation part of the ZINB model predicts the odds of membership in the immune subpopulation A rather than in the susceptible subpopulation B. Dependency of these odds on covariates was modelled according to a logistic model, its regression coefficients describing the logarithms of the corresponding odds ratios associated with the covariates. The estimated odds ratios (with 95 CI) will also be presented in tabulated form. For easier interpretation and coherence with the negative binomial part below we switched the outcome to be membership of the susceptible subset B. This reparametrisation of the mathematically equivalent original model implies only a change of sign of the regression coefficients and the inversion of odds ratios from the original zeroinflation model.
It is further postulated that in the immune subpopulation A the probability of zero absence is simply 100. In contrast to this, in the susceptible subpopulation B the number of days on sick leave is assumed to obey the negative binomial distribution. In this negative binomial part of the ZINB model the mean number of absence days is assigned to be dependent on the relevant covariates according to a loglinear model. Hence, in this part a given regression coefficient represents the natural logarithm of the ratio of mean values of the response variable associated with a unit change in the pertaining covariate. When presenting results, the estimated ratios of means (with 95 CI) are reported. See the Appendix for a more detailed description of the ZINB model.
The parameters of the ZINB model were estimated by maximum likelihood using the function zeroinfl() in the package pscl15 attached with the R environment for statistical computing and graphics (http://www.rproject.org/). The models were compared using the Akaike information criterion (AIC), and goodnessoffit was evaluated by comparing the marginal observed frequencies to the expected frequencies, the latter being based on the fitted model in classes of categorised outcome.
RESULTS
We received 1507 responses (48.4) of which 166 were excluded due to following reasons: inadequately filled questionnaire (n29), agerelated pension granted (n1), parttime or disability pension granted (n24), or the subject did not provide consent to analyse sickness absence or pension records (n110). Additionally two subjects had missing absence data.
The final study population thus consisted of 1341 subjects. At the time of the questionnaire survey, the respondents were on average 44 years old (range 1961 years). Of them 12 were females, and 61 were bluecollar workers.
The distribution of sickness days among nonrespondents was very similar to that in respondents (table 3). Nonrespondents were on average somewhat younger (mean 40 years) than respondents. Five per cent of nonrespondents were females.
A total of 12837 days of sickness absence were recorded in the study population during the 12 months. The distribution was heavily rightskewed in all age groups. Moreover, 42 had not been on sick leave at all, indicating a substantial zerocomponent in the response distribution (tables 3 and 4). The proportions of zeroabsences were 31, 73 and 47 in bluecollar males, whitecollar males and whitecollar females, respectively. The mean numbers of absence days among those with any sickness absence were 19, 11 and 8 days in these three groups, respectively. In bluecollar males and whitecollar females the proportions with no sickness absence were lower in young employees than among those at least 40 years of age. An increasing trend of absence days by age was observed among those with any sick leave in the male groups. Thirty one per cent of subjects reported health problems (table 3). Their share of the total number of days on sick leave was 61.
Our first regression model, Model 1, included as covariates: the combination of gender and occupational grade (categories: male and bluecollar, male and whitecollar, female and whitecollar), age (seven groups), and selfreported health complaints (none, 1, 2). The AICs were 28963, 20441, 7124 and 7029, for the simple Poisson, the zeroinflated Poisson, the simple negative binomial and the zeroinflated binomial (ZINB) model, respectively. Based on these figures we chose the ZINB model for the subsequent analyses and presentation of results. The statistical appendix provides instructions on how the estimated model coefficients can be translated into predicted probabilities of susceptibility of sickness absence and of mean numbers of days on sick leave for any combination of prognostic factors. As the baseline odds for susceptibility to any sickness absence was more than 50, the reported odds ratios are exaggerating the respective relative risks. Hence, we avoid direct quantitative interpretation of these odds ratios.
The results from fitting Model 1 are displayed in table 5. The high odds ratios for being susceptible to any sickness absence in male bluecollar and female whitecollar workers, respectively, when compared to male whitecollar employees were very consistent with the great contrasts observed in the proportions of workers with any sickness absence between these groups, as implied in table 3. The average number of sickness days among the susceptible to any sickness absence was about twice as high in male bluecollar workers as in male whitecollar employees, but female whitecollar subjects were not seen to differ from male whitecollar employees in this regard. There was some evidence of an overall decreasing trend by age in the susceptibility to sickness absence by increasing age, but not for the average number of days on sick leave. The presence of health problems was associated with both the susceptibility to and the mean number of days on sick leave. Those who reported one health problem had on average almost twice the number of sickness absence days and those with two or more health complaints had both higher propensity for any sickness absence and 3.4 times higher total number of absence days than those who did not report any health problems, when adjusted for gender, occupational group and age (table 5).
In our second ZINB model, Model 2, we included as covariates gender, age, body mass index, alcohol consumption, depression score (DEPS score), stress and fatigue, shortage of sleep (in hours), daytime alertness (ESS score), pain, impairment due to musculoskeletal problems at work (scale 010), and selfpredicted future work ability (categories: able to work, uncertain, unable to work). The goodnessoffit improved from Model 1 (table 4). However, apart from age, occupational grade and gender, only musculoskeletal problems, insufficient sleep and predicted future work ability appeared to have any major effect on the outcome (data not shown). As it also became apparent that the independent effect of age was essentially similar within the broad age classes 1939 years and 4561 years, respectively, we pooled the age factor into three levels only.
We then fitted a third model, Model 3, with these covariates: combination of gender and occupational grade, age, musculoskeletal impairment at work, insufficient sleep and predicted work ability. The AIC was clearly smaller than in the previous models, and the expected counts were very similar to those of Model 2 (table 4). The results on age, gender and occupational grade were very similar to those from Model 1 (table 5) apart from some changes in the mean ratios across the subgroups defined by gender and occupational grade. In this model both the selfpredicted future working ability and the score for musculoskeletal impairment were strong predictors for the number of sickness absence days (table 6). Among the susceptible, the estimated mean number of absence days increased by 14 for each rise of 1 unit of the impairment score. Those susceptible to any sickness absence and whose prediction of their future working ability was uncertain or not able had twice or three times as high mean number of days on sick leave, respectively, when compared to those whose own prediction on working ability was positive. In addition, insufficient sleep predicted a somewhat increased propensity for any sickness absence, but not the total number of absence days.
DISCUSSION
Main findings
The prevalence of health problems increased with age, and bluecollar workers had far more sickness absence days than whitecollar employees. When selfreported health problems and occupational grade were accounted for, age was not associated with the total number of absence days, and older workers were less likely to stay out of work than the young. Selfreported health problems predicted sickness absence in a doserelated manner. Of the individual items of selfreported health problems, selfrating of future working ability and impairment due to musculoskeletal problems showed strongest associations with sickness absence.
Strengths and weaknesses of the study
Sickness absences serve as a measure of health in the working population when health is understood as a mixture of social, psychological and physiological functioning.17 18 Recorded sickness absence data have several advantages: the quality of the data in terms of coverage, accuracy and consistency over time is superior to that achievable via selfreports.19 However, their analysis is difficult with traditional statistical methods because a substantial fraction is clustered at value zero, and this proportion is greater than predicted by any basic probability model for count data. Also, the residual variability in the nonzero part of the distribution exceeds that predicted by a Poisson model for counts. For these reasons we chose the zeroinflated negative binomial (ZINB) regression model15 16 as our analysis tool, which provided a reasonably acceptable fit. Although it was perhaps not able to deal with all the complexity associated with this type of response variable, among computationally feasible approaches it is clearly more appropriate than the common simpler alternative models in dealing with both the extrazero component and the overdispersion. However, the observed counts in response classes 12 and 2142 absence days were systematically lower than the expected counts predicted by the ZINB models, whereas in classes 36 absence days the situation was vice versa (table 4). This pattern suggests that the fit of the ZINB model was not as good as desired, although it was the best of the realistically available models. The relative peak at 36 days could be interpreted that the outcome distribution may in reality have more than two components: the excess zero part, a component centred around small values (36) of absence days, and a third component centred around a relatively high mean level, perhaps more than 84 days. It is difficult to evaluate what the quantitative implications are of this observed deficiency of our model to the validity and precision of the estimates based on it. One likely consequence is, however, that the confidence intervals reported here underestimate to some extent the true uncertainty associated with our estimation.
A healthy worker effect might be present if employees with worse health level (longterm absence and disability states) had not responded. This potential bias would underestimate the associations as the respondents would be healthier, and possibly have had less sickness absence than nonrespondents. The participation rate was in line with other studies in occupational populations in many countries.20 In our study, the nonrespondents were slightly younger than respondents. When comparing the distribution of absence days between respondents and nonrespondents, there was no relevant difference in mean absence. Therefore we think that the study population is reasonably representative of the original target population in this respect.
As our study is based on crosssectional data, there is a possibility of reverse causality. That is, sickness absence due to any reason could potentially modify the reporting of health problems. Although this may partly explain the results, especially because those on sick leave at the time of responding to the survey were also included, we believe that experienced health problems determine sickness absence, and not vice versa.
Some differences in comparison to previous studies
Besides age, gender and occupational grade, the assessment of future working ability and the score for musculoskeletal impairment were strong determinants of sickness absence, in line with our hypothesis and previous studies.21 22 Contrary to our expectations and earlier findings,23^{}25 the prevalence of depression, fatigue or stress was fairly low and was not significantly associated with sickness absence in this cohort. Although greater decision authority predicts low sickness absence,26 27 it may increase the risk of psychological distress and fatigue,28 29 especially if the employees are exposed to high job demands. Our cohort mainly included bluecollar workers with low decision authority concerning which job tasks to perform, but good jobrelated autonomy concerning how to perform the task. This may partly explain our results that that the prevalence of psychological distress or fatigue was low (table 2) and not associated with sickness absence, and that the most frequently reported health problem was physical impairment from musculoskeletal problems. Neither alcohol consumption nor smoking explained the associations of selfreported health problems or age to sickness absence.
Many previous studies have reported that females have more sickness absence than males, but this was not the case in our study. Female whitecollar workers had higher propensity for any sickness absence, if susceptible, but similar numbers of absence days as their male counterparts.
Meaning of the study
Construction workers are apparently at a greater risk of developing certain health disorders and sickness absence than workers in many other industries.30 31 Physically demanding job tasks and occupational injuries are likely determinants for the high prevalence. Subjects exposed to challenging tasks more likely report underlying health problems than subjects in sedentary tasks. However, this does not explain the inverse association between age and propensity to sickness absence.
The healthy worker survivor effect describes a continuing selection process: those who remain employed in a specific profession tend to be healthier than those who leave employment. This phenomenon is particularly true in the construction industry32 as well as in other physically demanding jobs. Maybe this partly explains the inverse association between age and propensity to absence, which was contrary to some previous reports.4 5 However, all employees participating in the present study were paid a full salary during their sick leave from the first day and there was no diversity in this respect due to age. We think that there may also be psychosocial and behavioural differences between the younger and older workers: perhaps their attitudes and values towards work are different. This may have implications for the prevention of work absence among young construction workers. In addition, irrespective of age, the healthcare system needs to address health and working ability, which are strongly related to sickness absence.
Unanswered questions and future research
It remains to be seen whether similar associations between age, selfreported health problems and sickness absence exist also in, for example, knowledgeintensive sedentary occupations. The order of the causalitythat is, that age and selfreported health problems determine sickness absencemust also be confirmed in prospective studies. Further research is needed to find out the medical, psychosocial and behavioural determinants of sickness absence in the young.
Main messages

Higher total counts of absence days were found among subjects reporting certain health problems and weakened working ability, regardless of age, sex and occupational grade.

One third of the subjects reported named health problems, but their share of the total number of days on sick leave was over 60.

When selfreported health problems, gender and occupational grade were accounted for, age was not associated with the total number of absence days, and older workers were less likely to stay out of work than younger employees.

A zeroinflated negative binomial regression model provided a reasonably acceptable fit to sickness absence data characterised by skewness, overdispersion and heavy clumping at zero value.
Policy implications

It is possible to identify individuals at a high risk of sickness absence with a simple health questionnaire among employees predominantly engaged in physical work.

Irrespective of age, the healthcare system needs to pay more attention to the health problems and working ability experienced by employees, as these are strongly related to sickness absence.

Psychosocial and behavioural differences between younger and older workers should be taken into account in the prevention of work absence among the young.
Statistical appendix
We provide here a detailed technical description of the zeroinflated negative binomial (ZINB) model (see also15 16). The outcome or response variable is denoted by Y number of days on sick leave during the oneyear observation period, and it can obtain nonnegative integer values. The ZINB model is a mixture of (a) the zeroinflation (ZI) part, and (b) the negative binomial (NB) part.
The zeroinflation part
We postulate that the study population is latently divided into two subsets:
A subjects with a very high propensity to have zero days on sick leave
B subjects with substantial probability of at least one absence day.
Let p_{B} be the probability that an individual is susceptiblethat is, he/she belongs to subset B, and p_{A} 1 p_{B} is the probability of being immunethat is, the subject belongs to subset A. It is assumed that these probabilities depend on the individual values of the model terms X_{1}, X_{2}, , X_{m} that are appropriately constructed from the relevant explanatory variables or covariates, according to the common logistic regression model:
logit(p_{A}/p_{B}) a_{0} a_{1}X_{1} a_{m}X_{m},
in which log stands for the natural logarithm function. Each coefficient a_{j} (j1, , m) is interpreted as the change of the logodds of the subject belonging to subset A rather than to subset B corresponding to a unit change in the value of covariate term X_{j} when all the other covariates are kept unchanged. Thus, OR_{j}exp(a_{j}), the antilog of a_{j}, is the odds ratio describing the effect of a unit change in X_{j} on the chances of being immune rather than susceptible, adjusted for the other covariates.
Equally, this logistic model can be specified in terms of contrasting the odds for B versus A, in which case the regression coefficients will only have their signs changed, and the odds ratios will be inverted. In fact, when presenting our results (tables 5 and 6), we chose to display the relative odds in this way to describe the covariate effects on the probability of being susceptible rather than immune.
If the subject belongs to the immune subset A, the distribution of the response is assumed to be degenerate such that the probability of zero days on sick leave is 1.
The negative binomial part
When a subject belongs to the susceptible subset B, the response variable Y may get either a zero or any positive integer value. Let q_{y} be the conditional probability of being exactly y days on sick leave, given membership in this subpopulation. This probability is assumed to come from the negative binomial (NB) distribution, obeying the following formula for any y0, 1, 2,
q_{y} () ^{y} (1 )^{y1/} ( y 1/) (y1) (1/)^{ 1}
in which is the expected value or theoretical mean and >0 is the dispersion parameter of the NB distribution, and (u) refers to the gamma function evaluated at real number value u. Actually after some manipulation this probability can also be expressed in a simplified form as
q_{y} ^{y} exp()/y! R(y, , )
which is a product of the simple and familiar Poisson probability formula and the more complicated function R(y, , ) describing the relative deviation of the NB distribution from the Poisson one at each value of y. The NB variance is (1 ), being obviously greater than which is the Poisson variance.
In the NB part of the ZINB model the mean number of days on sick leave in the susceptible is postulated to depend on the covariates according to a loglinear structure:
log() b_{0} b_{1}X_{1} b_{m}X_{m}.
Here a regression coefficient b_{j} refers to the change in the logarithm of the expected value per unit change in covariate term X_{j} keeping the other covariates constant. Accordingly, MR_{j} exp(b_{j}) is the ratio of mean responsesthat is, the multiplicative effect of a unit change in covariate X_{j} on the expected response among the susceptible and adjusted for the other covariates.
Note that we have the same set of covariate terms X_{1}, X_{2}, , X_{m} to predict both the probability p_{A} of being immune and the mean response among the susceptible. However, it may well be that certain covariates X_{j} have no effect on predicting p_{A}, in which case the parameters a_{j} associated with these covariates in the ZI part are zerovalued, whereas in the NB component some other covariate terms X_{k} may have no effect on the mean response in the susceptible.
ZINB model: mixture of the two parts
Finally, the total or marginal probability Q_{y} for a subject being exactly y days on sick leave during the oneyear period is combined from the above probabilities as follows:
Q_{0} p_{A} p_{B}q_{0} probability of zero days,
Q_{y} p_{B}q_{y} probability of y days for y1, 2,
Hence the marginal probability distribution is a mixture of the degenerate distribution (concentrated at zero) pertinent to the immune subjects and the NB distribution, which is presumed to hold for the susceptible individuals, such that the mixing proportions are p_{A} and p_{B}, respectively. The marginal expected value E(Y) of the response is a weighted average of the conditional means, which simplifies into E(Y) p_{B}. The variance of this mixture distribution is var(Y) p_{B} 1 (p_{A} ) .
This general specification of the ZINB model contains the following special cases: the zeroinflated Poisson (ZIP) model is obtained when the dispersion parameter is put to approach 0. On the other hand, keeping positive but putting p_{A}0, we get the noninflated NB model. When both > 0 and p_{A}0, the model reduces to the simple Poisson model.
The likelihood function is created straightforwardly from the definitions of the probabilities Q_{y} expressed as functions of the 2m2 regression coefficients a_{0}, a_{1}, , a_{m}, b_{0}, b_{1}, , b_{m}, and the dispersion parameter (see Moon and Shin16). Estimation of the parameters and assessment of their precision (by standard errors and confidence intervals) applying the principle of maximum likelihood can be computationally effected in some statistical programmes such as R, Stata, Limdep and SPlus.
Predicting sickness absence by the model
We illustrate how the fitted model can be used for individual predictions on sickness absence days given any covariate profile. From the results of Model 3 reported in table 6 we find the following:
Case 1
Male, whitecollar, age 30 years, no musculoskeletal impairment, sufficient sleep, and selfpredicted working ability rated able. The baseline logodds of 0.54 converts to baseline odds of exp(0.54) 0.58 and estimated probability 0.58/(10.58) 37 of being susceptible to sick leave. Given susceptibility, the conditional baseline mean number of days on sick leave is 5.67. Hence, the marginal expected value for this type of worker is 0.375.672.1 days.
Case 2
Male, bluecollar, age 50 years, musculoskeletal impairment score 7, insufficiency of sleep 2 h/night, predicted working ability not able. The logodds for belonging to subset B is computed as
0.542.020.7270.1320.320.632.94
from which the estimated probability of susceptibility is exp(2.94)/1 exp(2.94)95. The mean number of sickness days for susceptible workers like him is obtained as
exp(1.730.340.1270.132(0.09) 1.13) exp(3.81) 45.2 days
from which the marginal expected value is 0.9545.242.9 days.
Case 3
Female, whitecollar, age 42 years, musculoskeletal impairment score 3, insufficiency of sleep 1 h/night, predicted working ability uncertain. The logodds for belonging to subset B is
0.541.440.3730.1310.320.111.35
from which the probability of susceptibility is estimated as exp(1.35)/1 exp(1.35)79. The mean number of sickness days for susceptible workers like her is obtained as
exp(1.730.260.1130.131(0.09) 0.69) exp(2.35) 10.5 days
from which the marginal expected value is
0.7910.58.3 days.
Adequacy of the ZINB model
In our application, the ZINB model proved to be a more suitable approach to analyse sickness absence data as compared with some popular but simpler models for discrete counts. It was certainly more appropriate than common procedures for continuous outcome variables, like normaltheory linear modelling or nonparametric testing. However, assuming complete immunity is obviously an oversimplification of having very low propensity of being on sick leave. On the other hand, inspection of observed and expected frequencies (table 4) suggested that the probability distribution of the response variable may actually be composed of three components: one with nearly zero mean, another with low mean, and a third with high mean value for the number of days on sick leave. Applications of finite mixture models with a low and a high mean component in analogous contexts have been reported,33 34 but these were based on the simpler Poisson distribution for the separate components. Fitting complicated mixture models would also require tailoring of special computing solution. In our case, it is difficult to say how essential the impact was of the shortcomings in our model specification. One likely consequence is that the reported standard errors and confidence intervals are apparently, to some extent, underestimating the uncertainty associated with the estimation on the interesting quantities. Nevertheless, we believe that allowance of a nearly zero mean and a high mean component in the ZINB model enabled us to capture essential features of the response distribution in order to obtain reasonably realistic estimates of the effects of relevant covariates and adequate predictions on the overall mean levels and variability of the number of days on sick leave.
REFERENCES
Footnotes

Ethics: The Helsinki University Research Ethics Board for the Occupational Health reviewed the study plan and gave their approval in advance. Record number (Dnro): 28/E2/04 (23 April 2004). All subjects received written information regarding the study according to the principles of the Declaration of Helsinki. Only subjects who gave their signed informed consent were included in the study. The consent letters are stored with other study material.

Competing interests: ST and JT are shareholders of and SJ employed by Evalua International. EL, AM, HS and TA have no competing interests to declare.

Funding: Finnish Funding Agency for Technology and Innovation (TEKES); The Finnish National Fund for Research and Development (SITRA); Pfizer Oy. The authors work was independent of the funders.
 Abbreviations:
 AIC
 Akaike information criterion
 ZINB
 zeroinflated negative binomial