Objectives: One of the challenges of conducting meta-analyses on the relationship between workplace mechanical exposures and low back pain is that mechanical exposures are reported in a wide variety of ways. We aimed to develop common metrics to apply in the translation of literature-based workplace mechanical exposures for use in meta-analyses, and to test the metrics’ measurement properties.
Methods: We developed a set of 7-point scales to capture the intensity of important aspects of mechanical exposures that may be related to the development of low back pain in workers. The scales represented three dimensions of mechanical exposures at work: (1) trunk posture, (2) weight lifted or force exerted and (3) spinal loading, and estimated both peak and cumulative loads. Measurement properties of the scales were tested through a survey of experts in biomechanics and ergonomics who were asked to rate literature-based workplace exposure definitions using the scales and provide estimates of their confidence in their ratings.
Results: For each dimension the ratings for peak loads tended to be higher than the cumulative load ratings. The inter-rater reliability for the scales ranged from 0.3 to 0.5; we would need to average the ratings of at least four expert raters to have an acceptable level of reliability (>0.7). Inter-expert reliability was positively related to the experts’ level of confidence in their ratings. In most cases the ranking of intensity ratings from the experts matched the ranking of exposure intensity from the original articles.
Conclusions: This study provides insight into estimating the intensity of literature-based mechanical exposure metrics using a common set of scales which can be applied across epidemiologic studies. These metrics may be useful to quantify the relationship between workplace mechanical exposure and low back pain in a systematic review and meta-analysis.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Physical factors at work and during leisure time have long been associated with low back pain.1 2 Although workplace mechanical exposure is recognised as important, its relative contribution to low back pain burden remains contested3 as back pain has multi-factorial causation.4 Clarifying the mechanical exposure–back pain relationship has been difficult for a number of reasons, including the complexity of the relationship. In a multi-stage process external environment (outside the body) factors, for example, weights lifted, job tasks and description, are transformed into generated forces to the lumbar spine inside the body, with consequent responses of back tissues, and the occurrence of back pain.5 6
Another reason for the lack of clarity on the relative contribution of mechanical exposure is the non-uniformity of methods used to measure it in occupational epidemiological studies. The indicators of exposure range from job title,7 8 through descriptions of work tasks9 and measures of posture and load10–12 to measures of muscle activation.13 14 A review of mechanical exposure methods by Burdorf15 revealed that job title was the most commonly used proxy for mechanical exposure. Although job title is readily available information, it has been shown to be only weakly correlated with actual mechanical exposure and thus can lead to misclassification.16 Systematic observation and direct measures, which are more costly and require greater effort by participants and researchers, had been used in less than 15% of the studies reviewed by Burdorf. The additional cost and effort of using the latter methods is, however, balanced by a demonstrated increase in validity and reliability in measuring mechanical exposure.17–19
One way to better understand a complicated relationship like that of mechanical exposure and low back pain is to undertake a systematic review and meta-analysis. The purpose of a meta-analysis is to combine estimates of effect from a homogeneous group of studies to produce a more precise and robust summary estimate. We have undertaken a consensus process to determine which low back pain outcome definitions can be combined in a meta-analysis.20 Variation in the way exposure is measured, however, still poses a challenge to applying meta-analytic methods in this area of research.
To explore whether similar mechanical exposures could be identified in the back pain literature, we abstracted detailed exposure information from a sample of 48 of the 218 studies identified in our systematic review.21 The 48 studies were those for which the authors provided individual-level data to be used in an individual participant data (IPD) meta-analysis. The most common exposure reported was occupation (n = 26, 54%). Qualitative descriptors, such as “heavy work”, were reported in 16 studies. A 15-point scale measuring perceived exertion (Borg scale) was used in five studies, however each reported different cut-points. Lifting was reported in 21 different ways (eg, heavy, moderate, light; ⩽10 kg, 11–20 kg, 21–50 kg, >50 kg) with very little overlap in cut-points between studies. Lifting was analysed using an ordinal scale about two-thirds of the time and dichotomised about one-third of the time. Bending was the most common trunk posture reported. About two-thirds of the variables did not provide quantitative information on the degree of flexion. Of those reporting the degree of flexion, there was little uniformity of cut-points used in analysis. Gross postures such as sitting, standing and walking were reported in many studies. Of the seven studies reporting sitting, there was overlap in the cut-points used for two studies. There was a similar lack of uniformity in reporting walking, kneeling and standing.
The lack of uniformity in exposure measures and cut-points limits our ability to summarise these data comprehensively. One option to meet this challenge is to translate the diverse mechanical exposure measures into common metrics. Given the wide variety of methods used to capture mechanical exposure, we realised that conversion of the study-specific measures of exposure into a small set of common metrics may not be possible. Thus we undertook a study to develop common metrics for use in the translation of literature-based workplace mechanical exposures for the purpose of meta-analysis, and to test the metrics’ measurement properties. Our objectives were (1) to measure the inter-expert agreement on the translation of mechanical exposures to common metrics and (2) to better understand the type of information (eg, posture and weight) that experts consider when translating these exposures in the face of imperfect and heterogeneous exposure information.
The overall approach taken was to develop a set of scales to capture the important aspects of mechanical exposure that may be related to the development of low back pain in workers. These scales were then tested through a survey of experts in the fields of biomechanics and ergonomics who were asked to rate literature-based workplace exposure definitions using the scales. Ethical approval was granted by the McMaster University Research Ethics Board.
We selected three dimensions of low back mechanical exposure at work: (1) trunk posture, (2) weight lifted or force exerted and (3) spinal loading. In choosing these dimensions we reasoned that weight and posture are commonly measured exposures, especially in more recent studies. Spinal loading is an internal exposure, conceptually related to tissue load and a function of weight lifted and the posture adopted.
For each dimension we considered two load types: cumulative and peak. Cumulative (or average) loads are the result of the magnitude of the loads, the frequency of loads and the duration of those efforts.22 Peak loads capture another feature of the variation in load over time, the largest magnitude. Both “peak” and “cumulative” load measures have been shown to be independently associated with back pain.23
For each scale constructed, the value 1 represented the lowest intensity of exposure and 7 represented the highest intensity of exposure (fig 1). Verbal anchors were provided for the extreme values, for example, 1 = neutral posture and 7 = extreme posture, for peak and average posture. Use of these scales was piloted with a small group of biomechanists. We asked for feedback on clarity and comprehensiveness of the scales to refine them before use in our survey.
Selection and presentation of biomechanical exposures
Recognising the methodological challenge of combining studies that measured exposure differently, we undertook a preliminary non-systematic search as a substudy to determine the range of mechanical exposures described in the literature prior to our systematic review. We identified 55 studies from previous reviews,24–26 through citation searches, and from our personal files. From this pool of studies we wanted to exemplify the range of mechanical exposures found in the low back pain literature; we identified 32 specific exposure definitions from six studies.7–12
The exposure descriptions presented in the survey were taken verbatim from the original articles and will henceforth be referred to as “descriptive text”. We asked a group of experts (see participants below) to assign the 32 literature-based descriptive texts to the level of intensity of peak and cumulative loads for each of the three dimensions (posture, weight, spinal load). Ratings were on the 1–7 scale noted above. The experts were also asked to rate their confidence in their rating for each dimension using a 7-point scale with options: 1 = not at all confident to 7 = very confident. Thus for each descriptive text there were six intensity ratings and three confidence ratings. For each scale an option of “not applicable” was also provided. Figure 2 is an example of survey format for one descriptive text; the entire survey in included in appendix B. Since each article contributed more than one descriptive text (eg, “low” risk and “high” risk groups), they were presented to the raters in random order.
An email explaining the study and providing the survey and instructions was sent to a purposive sample of 16 academic ergonomic experts from Canada, the United States, the Netherlands, and Sweden. The sample of experts was identified by one author (RPW) who chose them based on their extensive knowledge and publication history in the area of mechanical exposures. Nine ergonomists responded to the survey for a response rate of 56%. At least one expert from each country completed the survey; none of the experts were directly involved in the studies included in the survey.
Our general approach to the analysis was to first calculate descriptive statistics. We summarised the mean, standard deviation and range of intensities and confidence ratings for each of the six exposure scales and three confidence ratings. We then did a series of regression analyses treating the 7-point scales as continuous outcomes, to assess the measurement properties of our scales. We undertook a generalisability analysis to examine inter-expert reliability. Mixed model regression analysis was used to better understand the association between posture and weight intensity ratings and spinal load. We used similar methods to examine the level of confidence in the ratings and the effect of confidence on inter-expert reliability. Finally we used descriptive methods to make a preliminary assessment of the validity of the experts’ ratings using our scales.
To examine inter-expert agreement we used generalisability theory,27 28 which allows for estimation of a variety of reliability coefficients simultaneously by partitioning out different sources of error. As a first step, an inter-rater reliability coefficient was estimated for each of the six intensity ratings. A linear regression model was then constructed using the six intensity ratings per descriptive text as repeated measures. The model included random factors for expert (ie, rater) and descriptive text, and fixed factors for exposure dimension (posture, weight, spinal load) and type of load (peak or cumulative). Variance components to calculate reliability coefficients were derived using the VARCOMP procedure in SAS.29 A generalisability coefficient was then constructed to quantify the relative variation among experts. For this analysis, the expert was considered the facet of generalisation, the descriptive text was the facet of differentiation, and dimension and load type were considered fixed facets. Given the limited information in some of the descriptive texts, for example, “warehouse worker”, our a priori hypothesis was that inter-expert reliability would be low (<0.5) and thus the average of a number of expert ratings would be required to provide a reliable rating of the intensity of mechanical exposure for use in a meta-analysis.
Correlation between posture, weight and spinal load
One of our objectives was to better understand the information experts use to rate exposures, because spinal loading is an internal exposure, conceptually related to both the weight lifted and the posture adopted. We divided the descriptive texts into four types: (1) those providing only specific posture information (eg, >90° trunk flexion for 5–10% of working time), (2) those providing only specific weight information (eg, >15 lifts of ⩾25 kg per working day), (3) those providing specific information about posture and weight (eg, 20°–45° trunk flexion with a load weighing 10 lbs) and (4) those providing no specific information on posture or weight (eg, heavy physical work such as carpenters, bricklayers and heavy-industry workers with a minimum of 10 years’ experience). To study the relative contributions of posture and weight information to ratings for spinal load we conducted a mixed model regression analysis in which the spinal load intensity rating was the dependent variable and the weight and posture intensity ratings were the independent variables, along with factors for descriptive text and expert. We report regression coefficients and 95% confidence intervals for the posture and weight ratings. Separate analyses were run for each descriptive text type (and all combined) and for peak and cumulative load types.
To examine the effect of confidence in expert ratings we undertook a mixed model regression analysis with confidence rating as the dependent variable and descriptive text, dimension, and load type (fixed effects) and expert (random effect) as independent variables. We calculated least square mean confidence ratings for each dimension and Bonferroni adjusted p values for pair-wise comparisons.
We also examined the effect of confidence rating on inter-expert reliability. We hypothesised that lower confidence ratings would result in greater random error and lower reliability. The 32 descriptive texts were ranked by average confidence rating (higher values indicating more confidence in ratings). We repeated the inter-expert reliability analysis including only the 10 descriptive texts with the highest confidence rating and again using only the 10 descriptive texts with the lowest ratings.
In the absence of a gold standard for mechanical exposure, we used the ranking of descriptive texts from the original studies as an indicator of the validity of the expert ratings. Each study had between two and four descriptive texts representing the intensity of exposure from lowest to highest. For example, one study compared low back pain in “office workers”i (who were considered to have lower mechanical exposure) to that of “machine operators and longshoremen”ii (who were considered to have higher mechanical exposure). The descriptive texts were randomly ordered in the survey to reduce the likelihood that the experts would directly compare them within the same study when assigning their ratings. If the scales were valid, then one should see a gradient in the assigned intensity ratings that corresponds to the ranking of exposure intensity in the original articles. We present the mean intensity ratings ordered from lowest to highest ranking for each of the original articles.
The mean (SD) peak ratings tended to be higher than the mean cumulative ratings (posture: peak 5.4 (1.3), cumulative 3.3 (1.2); weight: peak 4.6 (1.4), total 3.1 (1.3); spinal load: peak 4.9 (1.3), cumulative 3.3 (1.3)). Some exposures were rated very low and others rated very high, that is, the full range of response options (1–7) was used for cumulative scales (average posture, weight and spinal load), while six out of seven response options (2–7) were used for the peak dimension scales.
The inter-expert reliability coefficient for each of the six intensity scales ranged from 0.30 to 0.52. Figure 3 depicts the generalisability coefficient for one rater and estimated coefficients for the average of 2–10 ratings when ratings on all six intensity scales were considered as repeated measures. Using an average of several raters reduces the component of variance associated with the rater and thus improves the reliability of the average rating. The reliability coefficients for a single rater, the average rating of four raters, and the average rating of six raters were 0.40, 0.73 and 0.80, respectively.
Correlation between posture, weight and spinal load
Results from the regression models predicting peak and cumulative spinal load ratings using posture and weight ratings are presented in table 1. To determine a spinal load rating, we hypothesised that the experts integrated information about both posture and weight. Results are presented for all 32 descriptive texts combined and separately by the type descriptive text: posture only (n = 12), weight only (n = 3), posture and weight (n = 6) and neither posture nor weight (n = 11). Overall for both peak and cumulative loads, posture and weight were independently predictive of spinal load. Although both regression coefficients were statistically significant, the magnitude of the association was higher for weight when predicting peak spinal load (bweight = 0.51 points vs bposture = 0.33 points). When predicting cumulative spinal load the coefficients were nearly identical (bweight = 0.47 points vs bposture = 0.48 points). The relative magnitude of the associations differed somewhat depending on the type of descriptive text. As one would predict, posture ratings were more closely associated with spinal load rating when only posture exposure descriptive text was considered. Similarly, the weight rating was more highly associated with overall spinal load rating when only weight descriptive text was analysed. When the analysis was restricted to the six descriptive texts explicitly reporting posture and weight information, the peak posture rating was more strongly associated with peak spinal load rating. When considering the descriptive text in which neither posture nor weight information was explicit, both posture rating and weight rating were independently associated with spinal load rating.
The average (SD) confidence ratings were between 3.0 and 4.0, representing minimal to moderate confidence (posture: 3.9 (1.4); weight: 3.6 (1.4); spinal load: 3.7 (1.4)). The individual ratings ranged from 1 = not at all confident to 7 = very confident for all dimensions. On average, raters had more confidence in their posture ratings than those for weight with a similar trend for spinal load compared to weight, but the absolute differences in means were small. The average confidence rating for each scale depended on the type of descriptive text presented to the raters. After adjusting for multiple comparisons the effect of scale was different when only posture information was presented, that is the confidence rating for the posture scale (x̅ = 4.0) was greater than for the weight scale (x̅ = 3.2) or the spinal load scale (x̅ = 3.6). For the weight dimension, confidence was greater when posture and weight text (x̅ = 4.2) or neither posture nor weight text (x̅ = 3.9) were given than when posture information alone was given (x̅ = 3.2).
The 10 descriptive texts with the highest confidence rating had an average (SD) rating of 4.2 (1.6) compared to an average of 3.4 (1.5) for the 10 lowest rated descriptive texts. The difference in the means of approximately 0.8 was less than half of the difference between minimal and moderate confidence. The descriptive texts associated with higher confidence tended to be more elaborated and to be at the extremes of flexion or bending, although there were some exceptions: for example, “warehouse worker” was associated with high confidence, while “hospital worker” was associated with a lower level of confidence. The descriptive texts with higher confidence ratings yielded higher inter-rater reliability coefficients. The difference between reliability coefficients for high and low confidence descriptive texts ranged from 0.11 (0.45 vs 0.35) for one rater to 0.05 (0.89 vs. 0.84) for the average of 10 raters.
The average intensity ratings of spinal load within a study are presented in table 2. If the ratings are valid, one would expect a gradation in the magnitude of the ratings from low to high. This is the case for five out of six descriptive texts for both peak and cumulative spinal load. The trends are slightly less clear for the descriptive text in which multiple options for trunk flexion were presented for differing percentages of work time especially when the percentage of work time spent in a posture was ⩽10%.
Our evaluation of the measurement properties of a set of scales for rating workplace mechanical exposure descriptions across published aetiological studies of low back pain found acceptable inter-rater reliability when using the mean of several ratings from multiple experts. Taking the mean of several ratings reduces the variability associated with rater and thus improves the reliability of the rating. Overall, we found that we would need at least four expert raters to increase the reliability to over 0.7, and six raters to increase the reliability to over 0.8. Compared to the wide variation in inter-expert reliability for methods of exposure assessment in general,29 this level of reliability is quite good.
To our knowledge, only one other group has used expert consensus methods to rate occupational physical exposures. D’Souza et al30 developed expert ratings of physical activities for selected job categories using NHANES III data. They asked experts to estimate the proportion of time spent sitting, standing, walking/running, kneeling and working in cramped spaces for 40 job categories. Their maximum inter-expert agreement was for percentage of time sitting (weighted κ = 0.56, intraclass correlation = 0.80). Our reliability values were probably lower because D’Souza et al limited their study to NHANES III job category descriptors whereas we have included a broader range of mechanical exposure definitions. Both studies evaluated a heterogeneous group of jobs, but our study had additional variability due to the different type of information (eg, text descriptor, posture information, weight information) provided to the experts to characterise each job. More general scales, such as ours, would be required for use in a meta-analysis because one would typically need to translate a wide variety of mechanical exposure data.
In our data we found that inter-expert reliability was positively associated with confidence ratings and that confidence was associated with the type of information provided in the text describing the exposure. The descriptive texts associated with higher confidence tended to be more elaborated and to be at the extremes of flexion or bending. This finding is important for the use of common metrics in meta-analysis. Greater levels of measurement error tend to attenuate a relationship between exposure and outcome.31 The level of confidence in the experts’ ratings and type of exposure assessment are potentially important sources of heterogeneity to explore when interpreting the results of a meta-analysis.
We also found that both posture and weight information were independently predictive of spinal load ratings. This reflects the relationship found in engineering mechanics of low back loading being a function of both non-neutral postures and weight lifted (moment of force or torque). Their relative importance depended on the type of scale (peak or cumulative) and the type of specific descriptive text given (posture and not weight, both posture and weight, or neither). Overall, peak weight was more highly associated with peak spinal load, and average posture was more highly associated with cumulative spinal load.
Finally, the lack of a gold standard for occupational mechanical exposures meant that we were limited to comparisons with exposure metrics in the original studies as an indication of validity of our cross-cutting metrics. The results were promising, with similar rankings using the original or translated ratings, except for one study. There was no consistent gradient of exposure rankings for extent of trunk flexion when flexion was experienced for <10% of the work day. Perhaps raters felt that there was little difference in the spinal load across the gradient. Alternatively, our scales may not be sufficiently sensitive to distinguish between these more finely described exposure intensities. Further qualitative study of the rating process might improve our understanding of the reasons behind this.
The strengths of this study include the careful selection of descriptive texts on exposure to be included and their randomised placement in the survey. In choosing the descriptive texts for the survey, we attempted to present a broad range of exposure descriptors that are representative of the breadth of the work mechanical exposure literature. We were also able to recruit a multi-national group of experts to participate.
Our study also has some limitations. By translating all types of mechanical exposure into a relatively simple set of common metrics we lose the depth of information provided by studies using more complex methods of measurement like muscle activation. Furthermore, our participation rate was only 56% which may limit the generalisability of our findings to other expert groups. Having a smaller group of expert respondents also decreases the precision of our reliability estimates. On the other hand, achieving acceptable reliability with a smaller group improves feasibility for use in meta-analyses.
While there has been an ongoing call for the use of technical measures of mechanical exposures,32 33 the preponderance of even recent exposure measures reported21 are self-report or simple observations. Unfortunately these studies also use widely different measures and metrics. Until more consistent mechanical exposures measures are used in the study of low back pain, a method such as developed and reported here will need to be undertaken to create common mechanical exposure metrics for comparisons across studies. The results are promising, but further study is needed before these methods can be used generally by other groups. Further reliability studies are needed to determine factors that influence disagreement in classification, such as experience and training. Additional work is being done to examine the measurement properties of our scales when they are used by student raters who undergo a training session and a calibration exercise using the experts’ ratings before they rate mechanical exposures. This work will help us to better understand the robustness of our method to the use of less experienced raters.
This study provides insight into estimating the intensity of literature-based mechanical exposure metrics using a common set of scales which can be applied across epidemiological studies.
These metrics may be useful to quantify the relationship between workplace mechanical exposure and low back pain in a systematic review and meta-analysis.
Systematic reviews are increasingly being used to inform public policy. Our approach can be used to combine the evidence from several studies examining the relationship between workplace mechanical exposures and musculoskeletal outcomes to inform policy and potential interventions.
The authors would like to thank the members of the Meta-Analysis of Pain in the Lower Back and Work Exposures (MAPLE) Collaborative Group for their continued support of our project and the expert raters (Alex Burdorf, W Monroe Keyserling, Robert Norman, Patrick Neumann, Jim Potvin, David Rempel, Allard van der Beek, Judy Village and Richard Wells) for their invaluable contributions.
Alphabetical list of members of the Meta-Analysis of Pain in the Lower Back and Work Exposures (MAPLE) Collaborative Group
Yannis Alamanos, Elsa Bach, Stephen Bao, Margareta Barnekow-Bergkvist, Stan Bigos, Claire Bombardier, Paulien Bongers, Robert S Bridger, Alex Burdorf, Catherine Burke, Kim Burton, George Byrns, Fatih Cetisli, Chee Heng Leng, David Coggon, Patrick Dempsey, Inga-Lill Engkvist, Michael Feuerstein, John W Frank, Lytt Gardner, David Goldsheyder, Henrik Gonge, Lise Goulet, Mark Groom, Mats Hagberg, Heleen Hamberg, Jan Hartvigsen, Gudrun Hedberg, Antoine Helewa, Rudi Hiebert, Eva Horneij, Grant D Huang, Eva Jansson, Lone Donbæk Jensen, Wilfried Karmaus, Thomas J Keefe, Michael S Kerr, Nuray Kirdi, Nilufer Cetisli Korkmaz, Hiroshi Koyama, Niklas Krause, Thomas Laeubli, Douglas Landsittel, Ute Latza, Torsten Lauritzen, Annette Leclerc, Ling Lei, Paul Leigh, Leah Li, Youxin Liang, Gary Macfarlane, Akio Maeda, Jacques Malchaire, Isabella Märchy, Dominique Masset, Hisao Matsui, Irina Maul, John McBeth, Helena Miranda, Bente E Moen, Tone Morken, Nancy Nelson, Patrick Neumann, Robert W K Norman, Tetsuya Otani, Anna Ozguler, Karen Pachis, Keith Palmer, Jeffrey Peterson, Chris Power, Krishna Rampal, Glenn Reeder, Hilkka Riihimaki, Trond Riise, Michel Rossignol, Andreas Seidler, Julia Smedley, Hugh Smythe, Lorann Stallones, Larry W Stitt, George Stranjalis, Shosuke Suzuki, Tara Symonds, Michel Tousignant, Kiki Tsamandouraki, Eira Viikari-Juntura, Shira Schecter Weiner, Gustav Wickstrom, Christina Wiktorin, Huiyun Xiang, Lei Yang, Vera Yip, Evert Zinzen, Craig Zwerling.
Funding: This study was funded by the Canadian Institutes for Health Research (FRN-67042 and ICH-63069, part of the Interdisciplinary Capacity Enhancement (ICE) Teams Grant Program) with support from the Centre of Research Expertise for the Prevention of Musculoskeletal Disorders (CRE-MSD).
Competing interests: None declared.
iOffice workers were non-executive office workers. About 40% of them performed clerical work with routine office tasks; others were professionals.
iiMachine operators comprised earthmover operators and longshoremen specialised in motorised stevedoring. Machine operators are exposed to low-frequency whole-body vibration and to static load due to prolonged sitting in a constrained posture and the handling of steering apparatus. Their work occasionally includes materials handling and maintenance of machines. The machine used most commonly by longshoremen is a forklift truck. Earthmover operators use heavier machinery, such as excavators, bulldozers, wheel-loaders, etc, in preparing the ground for buildings and in road construction.