Article Text


Occupational Asthma guidelines: a systematic quality appraisal using the AGREE II instrument
  1. Theodore Lytras1,
  2. Stefanos Bonovas2,3,
  3. Christos Chronis4,
  4. Athanasios K Konstantinidis4,
  5. Frixos Kopsachilis1,
  6. Dimitrios P Papamichail5,
  7. George Dounias1
  1. 1Department of Occupational and Industrial Hygiene, National School of Public Health, Athens, Greece
  2. 2Department of Epidemiology, ‘Mario Negri’ Institute for Pharmacological Research—IRCCS, Milan, Italy
  3. 3Department of Pharmacology, School of Medicine, University of Athens, Athens, Greece
  4. 4Department of Pulmonary Medicine, University General Hospital of Ioannina, Ioannina, Greece
  5. 5Department of Child Health, National School of Public Health, Athens, Greece
  1. Correspondence to Dr Theodore Lytras, National School of Public Health, Athens 11521, Greece; thlytras{at}


The quality of guidelines is often modest and highly variable.

We searched the Medline database for occupational asthma (OA) guidelines meeting our inclusion criteria and undertook a systematic appraisal of them. Six appraisers independently evaluated these guidelines using the AGREE II (Appraisal of Guidelines, Research and Evaluation II) instrument. Standardised scores for each domain and for overall quality were calculated, as well as intraclass correlation coefficients to assess agreement among appraisers.

Seven relevant guidelines were identified. Three were based on a systematic review of the evidence. Most guidelines scored high on the domains ‘Scope and purpose’ and ‘Clarity and presentation’, but scores on the other domains were variable. The lowest scores were for ‘Applicability’, suggesting that guideline developers did not pay sufficient attention to practical problems affecting the implementation of their recommendations. We also observed a trend toward improved scores in guidelines published after 2000. Inter-rater agreement was good for most domains, and particularly for ‘Rigour of development’. This domain was most strongly correlated with the overall assessment scores, together with ‘Scope and purpose’ and ‘Editorial independence’.

The quality of OA guidelines is variable, both within and across guidelines. There is significant room for improvement, and greater efforts to produce high-quality guidelines are warranted, in order to assist clinical decision-making.

Statistics from


Asthma is a chronic inflammatory disease of the airways, affecting both children and adults, and is marked by repeated attacks of dyspnoea and wheezing.1 It is estimated that about 300 million people worldwide have asthma, and the burden is increasing as communities become more urbanised.2 Asthma causes significant socioeconomic costs, both direct due to increased usage of healthcare services, and indirect due to increased absenteeism from work or school.3

Occupational asthma (OA) is defined as asthma ‘due to causes and conditions attributable to a particular occupational environment and not to stimuli encountered outside the workplace’.4 It accounts for a significant percentage of the total incidence of asthma, ranging between 10% and 25%; this is equivalent to the onset of 250–300 new cases per million population per year.5 In industrialised countries OA is the most commonly reported occupational lung disease,6 is a cause of significant morbidity and results in considerable cost to society and the individual.7 The prognosis of OA is often poor and its socioeconomic consequences severe, with only one-third of workers achieving symptomatic recovery8 and one-third becoming unemployed after diagnosis.9 In addition, OA is often diagnosed late,10 and there are also important special problems with OA prevention, management and surveillance, involving both the patient and the workplace.

As a result, it is imperative for decisions about prevention, diagnosis and management of OA to be based on a solid scientific background. Guidelines are a valuable tool to help clinicians translate best research evidence into best practice,11 being increasingly used to improve quality of care and patient outcomes.12 However, their quality is often modest, with significant heterogeneity in their objectives, methods and applicability.13 Thus, there is a clear need for common, accepted criteria for evaluating guidelines, in order to improve their quality and better inform their users.

To address this need, the AGREE Collaboration (Appraisal of Guidelines, Research and Evaluation) was established as an international group of guideline researchers. The product of this collaboration was the original AGREE instrument,14 which has been further refined and is now in its second version (AGREE II).15AGREE II is a generic tool, which can be applied to the appraisal of guidelines for any disease, targeting any step in the healthcare continuum.16 It consists of 23 items (see online supplementary table 1) grouped into six domains (Scope and purpose, Stakeholder involvement, Rigour of development, Clarity and presentation, Applicability, and Editorial independence), plus two overall assessment items (overall quality of the guideline, and whether the guideline would be recommended for use, recommended with modifications or not recommended for use). Each item, except the last one, is scored on a seven-point Likert scale. For each guideline appraisal, at least two and preferably four raters are recommended.16 The raters’ scores for each domain are summed and expressed as a percentage of the maximum possible score. Both the original AGREE instrument and AGREE II have been validated and extensively applied in numerous guideline appraisals for many diseases, and are considered reliable and useful tools.17 ,18

A critical appraisal of asthma guidelines using AGREE II has been recently published,19 but to our knowledge there has not been a similar appraisal of guidelines for OA. Therefore our primary objective was to review published OA guidelines and assess their quality in detail using AGREE II. This was done as part of an initiative by the National School of Public Health to formulate local guidelines and promote awareness of OA among Greek physicians; there is evidence that OA and other occupational diseases are severely under-reported in Greece,20 ,21 which may mask both underdiagnosis and poor management. Such concerns may also exist in other countries, and given the importance of OA as a public health problem, we felt that an appraisal of OA guidelines would be of wider interest, for both practising physicians and policymakers. A secondary objective of our study was to add to the existing data on the reliability and validity of AGREE II, particularly for occupational health guideline appraisal.22 ,23


A literature search was undertaken to identify published guidelines focusing on OA. We searched Medline for papers published until March 2013, using the following query: ‘Asthma’[MeSH] AND (‘Occupational Exposure ’[MAJR] OR ‘Occupational Health’[MeSH Terms] OR ‘Asthma, Occupational’[Mesh] OR ‘Occupational’[Title/Abstract]) AND (‘Guideline’[Publication Type] OR ‘Practice Guideline’[Publication Type] OR (‘Review’[Publication Type] AND (‘guideline’[Title/Abstract] OR ‘guidelines’[Title/Abstract] OR ‘recommendation’(Title/Abstract] OR ‘recommendations’[Title/Abstract] OR ‘consensus’[Title/Abstract]))). In order to distinguish between guideline and non-guideline documents, we used the definition by Field and Lohr: ‘Clinical practice guidelines are systematically developed statements to assist practitioner and patient decisions about appropriate healthcare for specific clinical circumstances’.24 We did not discriminate between guidelines and consensus statements or position papers, and we did not judge papers for inclusion by their use of evidence-based methods; as long as the main aim of a paper was to make recommendations developed in a systematic manner, we considered it to be a guideline.

We excluded papers in languages other than English, and general asthma guidelines or guidelines not focusing (entirely or in a substantial part) on aspects of diagnosis and/or management of OA. Publication date was not a criterion for selection, but we did exclude older guidelines if an updated version had been published. In addition, we performed a supplementary search of the National Guideline Clearinghouse ( and of the reference lists of all relevant articles, in order to identify additional pertinent guideline citations. For every guideline ultimately included in the appraisal, we thoroughly searched for any accompanying technical and supporting documents in order to better inform our assessments.

Each guideline was independently assessed by six appraisers (TL, SB, CC, AKK, FK, DPP) using AGREE II. The team drew on people with different backgrounds, and included two occupational physicians (TL, FK), two respiratory physicians (CC, AKK) and two public health epidemiologists (SB, DPP). The appraisers did not have previous experience of using the AGREE II instrument. All appraisers first read the AGREE II manual and watched the online overview tutorial; one (TL) also undertook the online AGREE II practice exercise. All were free to discuss the appraisal process or the guidelines’ content, but were instructed not to reveal their scores to each other.

For every guideline we calculated the standardised domain and overall quality scores. We hypothesised that guideline quality would improve over time,25 and to test this hypothesis we compared domain and overall quality scores of guidelines published before and after 2000, using the non-parametric two-sample Wilcoxon rank-sum (Mann–Whitney) test.

To examine the performance of AGREE II in our use case, we performed several analyses. As a measure of inter-rater reliability, we calculated single-rater two-way intraclass correlation coefficients (ICCs)26 for each domain across all guidelines. We classified the degree of agreement using the scale proposed by Altman: agreement for ICC<0.20, poor; 0.21–0.40, fair; 0.41–0.60, moderate; 0.61–0.80, good; 0.81–1.00, very good.27 To measure the internal consistency of each domain, we calculated the Cronbach α coefficient using the mean item scores per domain. An α value >0.80 was considered to indicate a good degree of consistency. In addition, as an indication of criterion validity we calculated Kendall's τ B rank correlation coefficients between the appraisers’ scores for each domain and for the overall guideline quality, in order to identify intercorrelations.

This study was performed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement.28 The PRISMA statement offers guidance to ensure a clear presentation of what was planned, done and found in a systematic review, and facilitates clear reporting of all key information. The software used for the analysis was R V.2.15.2,29 using packages ‘irr’ and ‘psy’.


Our Medline search identified 61 citations. The selection process is depicted as a flow chart in figure 1. One article30 had been co-published in four journals, and the extra citations were eliminated, along with 11 citations in languages other than English. The full text of the remaining 47 articles was retrieved and reviewed by one of the authors (TL). We excluded one case report31 and two articles referring to deprecated versions of asthma guidelines32 ,33—that is, guidelines for which updated versions had been published. We also excluded 16 articles not directly relevant to OA and 21 non-guideline review articles. We ultimately selected seven publications that were deemed to fit the definition of a guideline and were specifically aimed at various aspects of OA (table 1). Our supplementary search did not identify any additional OA guidelines.

Table 1

Characteristics of identified guidelines

Figure 1

Article selection process flow chart. OA, occupational asthma.

All the selected guidelines had been published by professional societies: the American College of Chest Physicians (ACCP),34 the British Occupational Health Research Foundation (BOHRF),35 the Canadian Thoracic Society (CTS),36 the European Academy of Allergology and Clinical Immunology (EAACI),30 ,37 the European Respiratory Society (ERS)38 and the Société de Pneumologie de Langue Française (SPLF).39

Some guidelines were broad in scope, and other more narrow. Of the two guidelines from EAACI, one focused on diagnostic criteria for OA,37 and the other on the use of peak expiratory flow (PEF) monitoring in the investigation of OA.30 The SPLF guideline focused on OA in its last section; this part was evaluated, along with the general part of the guideline. The remaining four guidelines34–36 ,38 were broader, and covered aspects of prevention, diagnosis, prognosis and management of OA, workplace surveillance and control of exposures.

The basic characteristics of the appraised guidelines are listed in table 1. Only three guidelines (ACCP, BOHRF, ERS) were based on systematic reviews of the literature, with defined criteria; these were published separately in all cases. Only four guidelines (BOHRF, CTS, ERS, SPLF) used a system to grade recommendations and their supporting evidence; however, there was considerable heterogeneity in the systems used.

The domain scores for each guideline are shown in table 2. Domains 1 (Scope and purpose) and 4 (Clarity and presentation) had the highest mean scores of 74 and 77, respectively, and domain 5 (Applicability) had the lowest mean score of 35. Domain 6 (Editorial independence) had the distinction of both the lowest individual score of 4 and highest score of 96. Three guidelines (ACCP, BOHRF, ERS) scored higher on all domains and on the overall assessment, and these were the ones recommended by all raters. ERS was the guideline that scored highest on domain 1 (Scope and purpose). On domain 2 (Stakeholder involvement) BOHRF was the highest-scoring guideline; notably BOHRF and SPLF were the only guidelines that involved patient representatives, although the scope of this involvement was not specified. On domain 3 (Rigour of development) the two EAACI guidelines had the lowest score as expected, since these were not based on a systematic review of the literature. BOHRF had the highest score, and was the only guideline with a prespecified revision date. In addition, BOHRF was based on a thorough evidence review and used a formal system to grade quality of evidence and strength of recommendations. All guidelines scored high on domain 4 (Clarity and presentation), but particularly ACCP, BOHRF and ERS, all of which scored over 90%. ERS was the highest-scoring guideline on domain 5 (Applicability), although the score was low (53%); notably, ERS was the only guideline that provided a pocket version, which was freely available online under a Creative Commons Attribution-NonCommercial license ( Finally, on domain 6 (Editorial independence) three guidelines scored very low (CTS, EAACI PEF and SPLF), as they did not report a funding source, nor any competing interests of their developers (table 1); one guideline (EAACI) did not report competing interests but reported a funding source, and scored somewhat higher on domain 6.

Table 2

Domain scores and overall assessment of occupational asthma (OA) guidelines using the AGREE II instrument

Guidelines published after the year 2000 had consistently higher domain and overall scores than guidelines published before 2000 (see online supplementary table 2); the differences did not reach statistical significance, though this might be expected given the small number of guidelines. The year 2000 was chosen post hoc, without looking at the AGREE II scores, to roughly divide the included guidelines into two equal parts.

To measure inter-rater reliability and internal consistency, we calculated ICCs and Cronbach‘s α coefficients for each domain. These are shown in table 3. Internal consistency was good (α coefficient >0.80) for all domains except domain 2 (Stakeholder involvement). Inter-rater agreement was good or very good for domains 2, 3 (Rigour of development), 6 (Editorial independence) and for the overall assessment; agreement was fair for domains 1 (Scope and purpose) and 4 (Clarity and presentation) and poor for domain 5 (Applicability).

Table 3

Inter-rater reliability and internal consistency for each domain of the AGREE II instrument

Table 4 shows the correlation matrix for the AGREE II domain scores and the overall assessment score. Correlations between domain scores and the overall assessment score were all highly significant (p<0.001), and were higher for domains 1 (Scope and purpose), 3 (Rigour of development) and 6 (Editorial independence).

Table 4

Correlation matrix between domain scores and overall quality score, using Kendall's τ B rank correlation coefficient


The quality of the seven appraised OA guidelines was highly variable, both between each other and across domains of the same guideline. The high scores seen for the domains of ‘Scope and purpose’ and ‘Clarity and presentation’ suggest that these aspects of a guideline are more highly valued by guideline developers, or easier to achieve.

Most scores on the ‘Stakeholder involvement’ domain were modest; this indicates room for improvement, especially as regards the participation of patients and of all relevant professional groups. OA requires a multidisciplinary approach, involving occupational physicians, respiratory physicians and allergologists, among others. In addition, given the socioeconomic dimensions of OA,40 guideline developers should actively seek and take into account the preferences of the working and patient population.

Guidelines did not consistently and thoroughly report their funding sources or any competing interests of their developers. ACCP excelled in this regard, obtaining the highest score on the domain ‘Editorial independence’. This is important, as guideline developers often have competing interests,41 which are often under-reported,42 and may influence recommendations.43 Notably, in our appraisal the domain scores for ‘Editorial independence’ were highly correlated to the overall quality scores (table 4), indicating that transparency is perceived as a key element of a high-quality guideline.

The ‘Applicability’ domain had the lowest mean score in our appraisal and, in particular, item 21 (‘The guideline presents monitoring and/or auditing criteria’) had an even lower mean score than the other three items in the domain. This suggests that guideline developers do not pay sufficient attention to factors affecting the practical implementation of their recommendations. Similar findings have been also noted in guideline appraisals from other disease areas.19 ,44 ,45 Applicability is particularly pertinent for OA, the diagnosis and management of which may be affected by many external factors, such as the availability of specialists and specialist tests (eg, specific inhalation challenge testing), workplace organisation and medicolegal concerns. OA guidelines should therefore do significantly more to deal with these concerns.

On the ‘Rigour of development’ domain, the three guidelines with the highest scores (ACCP, BOHRF, ERS) had all been based on separate systematic reviews of the literature, and except for one (ACCP) had used a system for grading the quality of evidence and the strength of recommendations. Only one guideline (BOHRF) had a specified revision date, and most guidelines reported little or no information about external review and its scope; there is thus room for improvement. Scores for ‘Rigour of development’ were highly correlated to the overall quality scores (table 4), highlighting its importance in the assessment of a guideline. In addition we observed the highest inter-rater reliability for this domain (ICC=0.89, table 3), with a very good agreement between raters. This indicates that raters, and probably guideline users as well, are more familiar with judging the methodological qualities rather than other aspects of a guideline.

High inter-rater reliability was also seen for ‘Editorial independence’—suggesting a low degree of ambiguity for this domain—and for the overall quality scores. The poorest agreement among raters was for ‘Applicability’ (ICC=0.17), underlining the lack of specific relevant information in the appraised guidelines, which made the scoring of this domain complicated and challenging. All AGREE II domains had good internal consistency (Cronbach's α >0.80) except ‘Stakeholder involvement’. This indicates a low correlation between the three items in this domain (see online supplementary table 1), which is not unexpected; for example, inclusion of all relevant professional group in the guideline development panel does not necessarily imply the participation of patient representatives. Indeed, only two guidelines (BOHRF and SPLF) included patient representatives, and in an unspecified capacity; in contrast, most guidelines were fairly interdisciplinary.

In this appraisal, we found that OA guidelines published after the year 2000 tended to have higher-quality scores than those published before 2000 (see online supplementary table 2). This finding is in line with previously published results,25 demonstrating a modest improvement in the quality of clinical practice guidelines over time.

Interestingly, although the guidelines scored very differently on the various domain scores, we only noted subtle differences in the recommendations put forward. For example BOHRF and ERS emphasise the role of surveillance for secondary prevention of OA, while ACCP and CTS less so. Also, ACCP and CTS explicitly recommend complete avoidance of exposure in cases of sensitiser-induced OA, BOHRF and SPLF do so in less adamant terms, while ERS specifically mentions reduction (as opposed to avoidance) of exposure as a second-line management option. For the most part, however, all guidelines agreed on their main recommendations: for example, on the need to confirm a diagnosis of OA by objective means; on the usefulness of serial PEF measurements (performed at least four times a day) for the evaluation of suspected cases of OA; on the importance of early diagnosis of OA; and on the need to avoid exposure to the offending agent in order to have the best possible outcome.

It is also notable that most recommendations across the guidelines were based on the same lower-quality evidence—namely, case–control studies, small cohorts and non-analytical, descriptive studies. In addition, most studies on OA were highly heterogeneous in the populations examined, interventions compared and outcomes reported, and were subject to various forms of bias.46 As a result, in those guidelines that employed a grading system (table 1), most recommendations obtained fairly low-quality scores.

Because guidelines were mostly based on the same evidence and reached similar conclusions, this, in our opinion, highlights the potential for increased collaboration among guideline developers. In particular, the literature search, evidence collection and appraisal, could be coordinated and shared between different guideline development groups.47 This would prevent significant duplication of effort and save a great amount of money and time, as this is probably the most resource-demanding step of guideline development. Such a collaboration would also free guideline panels to focus on other important aspects, such as adapting recommendations to the local context, improving their applicability and exploring ways to enhance guideline uptake.

A strength of our study is the good number of appraisers (six), which enhances the reliability of our results. In addition, the appraisal team provided a good balance by including people from different backgrounds, which improves the external validity of the results.

On the other hand, a potential limitation of our study is the exclusion of guidelines not in English. The main limitation though, stems from the use of AGREE II itself. AGREE II does not differentiate between the relative importance of the domains, and provides no specific advice on how to make the overall assessment and decide whether or not the guideline should be recommended. It is clear that the domains are not equally important; for example, involving all stakeholders is no substitute for performing a systematic review of the evidence, and clarity of reporting and presentation does not guarantee optimal recommendations. Therefore, we suggest that before undertaking a guideline appraisal with the AGREE II, the domains should be appropriately weighted according to the purpose of the appraisal, in order to derive an overall score and decide which guideline it is best to recommend.

In conclusion, the quality of OA guidelines in our appraisal was variable, although the newer guidelines appear to be better overall. There is significant room for improvement, especially in applicability and stakeholder involvement; although particularly important for OA, these two domains had relatively low scores in most guidelines. The AGREE II instrument performed well in our use case, as manifested by the good inter-rater reliability and internal consistency for most domains. Guideline developers should strive for optimal quality, as this can lead to better outcomes, and systematic appraisals can help to achieve this result.


View Abstract
  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Contributors TL and GD designed the study. TL, SB, CC, AKK, FK and DPP acquired the data. TL analysed, interpreted the data and drafted the manuscript. All authors critically revised the manuscript for important intellectual content and approved the final version to be published. TL is the guarantor of the study.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.