Article Text
Abstract
Experts disagree about the optimal classification of upper limb disorders (ULDs). To explore whether differences in associations with occupational risk factors offer a basis for choosing between case definitions in aetiological research and surveillance, we analysed previously published research. Eligible reports (those with estimates of relative risk (RR) for >1 case definition relative to identical exposures were identified from systematic reviews of ULD and occupation and by hand-searching five peer-review journals published between January 1990 and June 2010. We abstracted details by anatomical site of the case and exposure definitions employed and paired estimates of RR, for alternative case definitions with identical occupational exposures. Pairs of case definitions were typically nested, a stricter definition being a subset of a simpler version. Differences in RR between paired definitions were expressed as the ratio of RRs, using that for the simpler definition as the denominator. We found 21 reports, yielding 320 pairs of RRs (82, 75 and 163 respectively at the shoulder, elbow, and distal arm). Ratios of RRs were frequently ≤1 (46%), the median ratio overall and by anatomical site being close to unity. In only 2% of comparisons did ratios reach ≥4. We conclude that complex ULD case definitions (e.g. involving physical signs, more specific symptom patterns, and investigations) yield similar associations with occupational risk factors to those using simpler definitions. Thus, in population-based aetiological research and surveillance, simple case definitions should normally suffice. Data on risk factors can justifiably be pooled in meta-analyses, despite differences in case definition.
- Musculoskeletal
- occupational health practice
- epidemiology
- back disorders
- epidemiology
- risk assessment
- mortality studies
- longitudinal studies
Statistics from Altmetric.com
- Musculoskeletal
- occupational health practice
- epidemiology
- back disorders
- epidemiology
- risk assessment
- mortality studies
- longitudinal studies
What this paper adds
There is widespread disagreement between experts on the optimal classification of upper limb disorders and the best case definitions.
Complex case definitions yield similar associations with occupational risk factors to simpler definitions.
This provides encouragement to adopt simpler case definitions in population-based aetiological research and surveillance, and to pool data in meta-analyses despite differences in case definition.
Introduction
Musculoskeletal disorders of the upper limb (ULDs) are important causes of morbidity and sickness absence, resulting, for example, in an estimated loss of 3.75 million working days per year in the UK in 2008/9.1 However, their optimal classification remains controversial.2–5 Difficulty arises because of the multiplicity of disorders, disease labels and diagnostic criteria adopted by therapists and researchers, ambiguity in the coverage and boundaries of case definitions, and the lack of a suitable reference standard.6
This want of an agreed diagnostic classification has hindered the pooling and interpretation of research data, and also attempts at standardised occupational surveillance. Additionally, it limits the scope for compensation.
Editorials and commentaries have highlighted the impasse and called on experts to agree consensual case definitions,7 8 and several classificatory schemes have been proposed based upon this approach.9–15 However, in a systematic review by Van Eerd et al, no two of 27 schemes identified were identical,4 underscoring the scale of continuing disagreement.
There has been only limited debate about the rationale for preferring one scheme to another. However, a ‘diagnosis’ can be thought of as a means to an end, possible goals being to identify modifiable risk factors as an aid to prevention, and to improve the clinical management of patients. According to this utilitarian logic, good case definitions will be those which ‘add value’ by distinguishing categories of disorder that differ importantly in their associations with potential causes and/or in their prognosis and responses to different treatments. It will be worth distinguishing subgroups (splitting rather than lumping) only where such differences are demonstrable and materially influence follow-on actions.16
Optimal case definitions for prevention may not necessarily be identical to those for clinical care. To assess the relative merits of competing case definitions for ULDs in aetiological research, prevention and surveillance, we compared their utility by analysing results from previously published research.
Methods
Data sources
We sought peer-reviewed reports in which more than one case definition had been analysed against occupational risk factors that were defined identically.
Several sources were screened to find papers with these characteristics:
When searching the Medline and Embase electronic bibliographic databases we combined terms for ULD with those for relevant occupational physical risk factors (eg, repeated movements of the wrists/fingers, repeated bending and straightening of the elbow, repeated movements of the shoulder, work with the hands above shoulder height, lifting, use of a computer keyboard or mouse, vibration, cumulative trauma), and retrieved all systematic reviews and meta-analyses published since 1990 that met these search criteria (21 in all, listed in the online supplement). All primary research reports cited in these reviews were itemised, and cross-tabulated to eliminate duplicates.
Also, in case this strategy lacked sensitivity to detect reports containing more than one ULD case definition, we supplemented our electronic search with hand searches of five relevant and accessible occupational journals (Occupational and Environmental Medicine, Scandinavian Journal of Work, Environment and Health, American Journal of Industrial Medicine, International Archives of Occupational and Environmental Health and Occupational Medicine) published between January 1990 and June 2010. This search provided three potentially relevant additional primary reports for consideration.
We then excluded papers which were: (1) not in English; (2) published before 1980; (3) did not allow upper limb problems to be analysed separately from neck problems; (4) did not distinguish upper limb problems by anatomical site (eg, elbow as opposed to arm); (5) did not include at least two explicit case definitions; or (6) did not quantify risk estimates for one or more exposures in common against these alternative case definitions.
Figure 1 in the online supplement details the number of reports screened, retrieved, excluded and analysed in PRISMA17 format. Altogether, 162 reports were screened for eligibility (86 of these in full). Finally, 21 reports (from 16 studies) were included in data synthesis. The main reasons for exclusion were irrelevant hits (n=76), failure to analyse by anatomical site (n=29), absence of two case definitions (n=15) and insufficient quantification of risks (n=11).
From each eligible report we abstracted the main study characteristics (authors, date, setting, study design, study populations and numbers) and, separately for each anatomical site (shoulder, elbow, forearm, wrist/hand), the occupational exposure definitions and case definitions employed. We also abstracted or derived paired estimates of RR (sometimes expressed as an OR or prevalence ratio) with 95% CI, for alternative case definitions with the same occupational exposures. Data abstraction was piloted and then undertaken by two of us (ECH and CL) and checked in full by KTP.
Various categories of comparison were identifiable, most of which had a natural hierarchy. Distinguishable were definitions based on: (1) symptoms versus symptoms plus signs (eg, elbow pain vs elbow pain with tenderness on palpation); (2) symptoms defined generally versus symptoms defined specifically (eg, tingling/numbness in the hand vs nocturnal tingling/numbness in the median nerve distribution); and (3) symptoms/signs versus symptoms/signs plus a positive investigation (eg, Tinel's sign positive vs this and delayed median nerve conduction). Additionally, (4) one report allowed a comparison with no implicit hierarchy (one examination finding vs another).
Differences in association between pairs of case definitions were expressed as a ratio of RRs, based, for the hierarchical groupings, on the stricter definition (C2) as numerator divided by the simpler one (C1) as denominator, and for the sole non-hierarchical pairing, on an arbitrary choice of denominator.
As hierarchical comparisons tended to be nested (ie, subjects fulfilling the stricter case definition, C2, were subsets of those fulfilling the simpler definition, C1, rather than statistically independent samples) or the degree of overlap was uncertain from published reports, SEs and CIs for their ratios could not be derived by standard statistical methods. Instead, the ratios were categorised into bands by magnitude and summarised by their median and IQR values. To gauge crudely the potential contribution of chance to any differences in RR, a comparison was also made between the lower 95% confidence limit of that for the stricter definition and the central estimate of that for the simpler definition. Separate summary statistics were compiled by anatomical site and for each category of comparison.
A limitation of this method is that the reference groups for the RRs under comparison were not always identical. For example, one RR might be for C1 versus ‘not C1’ and the other for C2 versus ‘not C2’, rather than for C2 versus ‘not C1’. Each report was assessed for this potential bias and the size of bias (% by which the ratio changed when analysis was restricted to ‘not C1’ as compared with ‘not C2’) was estimated for all reports where it was known or calculable from the data supplied. The distribution (median and IQR) of estimated biases was then summarised.
Results
The 16 studies (21 reports)10 18–37 yielded by our search provided a total of 320 paired comparisons of RR. The main characteristics of the reviewed studies are listed in table 1. Reports came mostly from northern Europe and the workplace setting; 13 of the 16 studies were cross-sectional in design. Sample sizes ranged from 96 to 6943, and in six of the studies (11 of the 21 reports), they exceeded 1000.
Tables 2–4 provide, separately for the shoulder, elbow and distal upper limb, details of the paired case definitions employed in reports, the occupational exposures analysed, the number of comparisons of RRs provided by each report, and the frequency distribution of the ratios of RRs.
Taking, as an example, the findings at the shoulder by Brandt,22 a comparison was drawn between a simple symptom definition (at least moderate shoulder pain over the past 7 days) and a stricter definition involving these symptoms together with substantial tenderness on palpation in various pre-specified anatomical locations. RRs for each case definition were obtainable for use of a computer mouse and use of a computer keyboard—two pairs of comparisons, the ratio of RRs (stricter vs simpler definition) being in each case ≤1; in neither did the lower 95% confidence limit for the stricter definition exceed the central estimate of the simpler definition.
On the same basis, 82 paired comparisons were found at the shoulder, 75 at the elbow and 163 in the distal arm, with some reports contributing in excess of 40 comparisons. However, ratios of RRs were seldom as high as 4 (2% of all comparisons) and often ≤1 (46%), the median ratio overall and by anatomical site being close to unity (table 5). In only 5% of comparisons were RRs sufficiently divergent for the lower 95% confidence limit of the stricter case definition to exceed the central estimate of that for the simpler one. Different patterns of hierarchical comparison (whether involving a contrast of symptoms, or symptoms with symptoms and signs, or additional investigation) yielded similar estimates of the ratio, as did comparisons at each anatomical site, although ratios above 2 were somewhat more common at the shoulder.
In a sensitivity analysis, we excluded data from the influential NUDATA and Finland Health 2000 studies,20–23 36 the linked reports of which contributed all but 116 of the 320 comparisons. Among the remainder, ratios of ≥2 and ≥4 were relatively less frequent, but the median and IQR values were scarcely different.
Finally, online supplementary table 6 presents estimates of potential bias arising from the use of different reference groups in comparisons of RRs. In nine studies (11 reports), it was either absent or calculable from the published data, in three it was present but not calculable, and in five studies (seven reports) the potential for bias was neither clear nor calculable. Eventually, estimates of potential bias were made for 45 comparisons. Effects were small, the median bias being 0% (IQR 0% to 1.7%, range −4.9% to 14.6%), and in 87% of comparisons it was <5%.
Discussion
This analysis indicates that hierarchical case definitions of increasing sophistication, involving confirmatory physical signs, more specific symptom patterns or additional investigations, yield remarkably similar associations with putative occupational risk factors to those obtained using simpler case definitions.
Our analysis has certain limitations. In particular, the search strategy is unlikely to have discovered every occupational report over the 2 decades of inquiry that involved multiple case definitions. Specifying sensitive electronic search terms for this topic is challenging. To make the task manageable, we focused on systematic reviews of occupational risk factors at each anatomical site as our main source of references to primary research reports, but to improve the detection rate we also hand-searched five leading peer-reviewed occupational journals. In the event, hand-searching identified only three potentially eligible papers missed by the database screen, lending some validity to our search. More importantly, papers were selected blind to information on the ratios of interest, and we have no reason to suppose that the reports retrieved were unrepresentative of the universe of relevant studies in the peer-reviewed literature.
A second area of difficulty lies in evaluating the role of chance in the ratio measures obtained. Scope to estimate CIs for the ratios was limited by the nested nature of many observations and the indeterminate extent of overlap in others. However, the infrequency of ratios of ≥2 across all 320 comparisons, and of lower confidence limits of RRs for stricter definitions exceeding the central estimates for simpler definitions, argue against important differences being overlooked by chance. Additionally, findings were insensitive to the exclusion of two large influential studies that contributed more than half the observations, suggesting that this possible non-independence of data points from the same study was not a cause of material bias. A further sensitivity analysis suggested that the potential for bias arising from differences in the reference group for paired risk estimates was inconsequential.
A lack of detectable difference between case definitions (ratio close to unity) could arise if the occupational exposures studied were not risk factors for symptoms or disorders at the sites in question, or were defined with substantial measurement error, such as to appear so. However, many of the exposures evaluated were well established or plausible risk factors, and RRs exceeded 1.0 for 81% of the 640 estimates of RR identified during the review. True differences might also be masked by confounding. However, estimates were adjusted in the same statistical models and any superiority in favour of the stricter definition could only be masked by systematic negative confounding relative to the simpler definition. Differences in approach to exposure assessment between studies are unlikely to have biased findings, given the focus on ratios of RR-derived from within-study comparisons in which exposures were defined identically.
Alternatively, and more plausibly, little ‘added value’ is created by case definitions of increasing sophistication relative to simple ones in population-level studies, and if so a good case exists for simplification. In planning field studies of ULDs and occupational risk factors, or employers' surveys of hazard and risk, resource expended on additional physical examinations or investigations will generally yield marginal or no benefits, whereas costs will be predictably higher, with the added problems of inconvenience and non-compliance. Similarly, in health surveillance, a complex case definition will be more difficult to implement uniformly across settings and over time, with no clear offsetting benefits.
Our findings do not identify a preferred set of case definitions for ULDs in the context of prevention and surveillance at the population level. Rather, they suggest that variations make little difference. Thus, the preferred starting point for aetiological investigation will normally be a broad case definition that is simpler to apply and may produce more cases and therefore greater statistical power. This does not preclude additional exploratory analyses on subsets of cases defined according to more stringent diagnostic criteria if researchers wish to seek evidence of differential associations with risk factors. It may also be more efficient to use stricter criteria when cases have already been defined in this way (eg, if cases of carpal tunnel syndrome are readily identifiable from the records of a neurophysiology department, all of whom meet certain criteria following nerve conduction testing). In most situations, however, simple case definitions will suffice. Similarly, in surveillance, simple choices which are easier to implement will usually be preferable.
Finally, in appraising the research literature, our findings imply that heterogeneity of case mix and variations in approach to case classification have less impact than might be supposed. This gives a justification in systematic reviews and meta-analyses for pooling data on associations with risk factors at a given anatomical site between studies, even though such studies may have differences of case definition.
More generally, the utilitarian framework offers an empirical foundation for moving towards a simpler, more rational basis on which to classify ULDs for preventive purposes.
Acknowledgments
A full technical report, of which this abridged summary forms a part, was kindly commented on by the following experts: Dwayne Van Eerd, Dorcas Beaton, Sigurd Mikkelsen, Johan Hviid Andersen, Alexis Descatha, Eira Viikari-Juntura, Alex Burdorf, Jolanda Luime, Mats Hagberg, David Rempel and Barbara Silverstein. Sue Curtis helped prepare this manuscript; Clive Osmond, Hazel Inskip and Georgia Ntani offered comments on its statistical aspects.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Download Supplementary Data (PDF) - Manuscript file of format pdf
Footnotes
Funding This study was supported by a grant from the UK Health and Safety Executive with the aim of improving consensus over case definitions for upper limb disorders (grant number OH1939).
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The data on which this report is based are all in the public domain-sources are identified in reference lists in the main text and in the online supplement.