Criteria | Categories | Proposed operationalisation |
Amount of evidence |
| Thresholds may be defined based on sample size, power or false-discovery rate considerations. As a simple rule, we suggest that category A requires a sample size of over 1000 (total number in cases and controls assuming 1:1 ratio) evaluated in the least common genetic group of interest; B corresponds to a sample size of 100–1000 evaluated in this group, and C corresponds to a sample size of <100 evaluated in this group. |
Replication |
| Between-study inconsistency entails statistical considerations (eg, defined by metrics such as I2, where values of 50% and above are considered large, and values of 25%–50% are considered moderate inconsistency) and also epidemiological considerations for the similarity/standardisation or at least harmonisation of phenotyping, genotyping and analytical models across studies. |
Protection from bias |
| A prerequisite for A is that the bias due to phenotype measurement, genotype measurement, confounding (population stratification) and selective reporting (for meta-analyses) can be appraised as not being high plus there is no other demonstrable bias in any other aspect of the design, analysis or accumulation of the evidence that could invalidate the presence of the proposed association. In category B, although no strong biases are visible, there is no such assurance that major sources of bias have been minimised or accounted for because information is missing on how phenotyping, genotyping and confounding have been handled. Given that occult bias can never be ruled out completely, note that even in category A, we use the qualifier ‘probably’. |
Adapted from Ioannidis et al. Int J Epidemiol 2008;37:120–32 (See supplementary file for references).