๐Ÿฉบ Get feature updates & medical research across 34 specialties

All Posts

Evidence Quality Assessment: GRADE, AMSTAR2, RoB2 Explained

Dr. Harry PowerFebruary 8, 202610 min read
GRADEAMSTAR2evidence qualitycritical appraisalRoB2

Why Evidence Quality Assessment Matters

Not all medical evidence is created equal. A well-designed randomized controlled trial with rigorous methodology provides stronger evidence than a retrospective case series. But determining the quality of a study requires systematic assessment โ€” and different study types require different tools.

Evidence quality assessment frameworks provide structured, reproducible methods for evaluating the reliability of medical research. They are essential for evidence-based practice, guideline development, and clinical decision-making.

The GRADE Framework

GRADE (Grading of Recommendations, Assessment, Development and Evaluations) is the most widely used framework for rating the quality of evidence and strength of recommendations in healthcare.

How it works: GRADE rates evidence quality as High, Moderate, Low, or Very Low. It starts with the study design โ€” RCTs begin as high quality, observational studies as low โ€” and then adjusts up or down based on five factors:

1. Risk of bias: Methodological limitations in the studies 2. Inconsistency: Unexplained variation in results across studies 3. Indirectness: Evidence that doesn't directly address the clinical question 4. Imprecision: Wide confidence intervals or small sample sizes 5. Publication bias: Selective reporting of studies

When to use GRADE: GRADE is used for rating the overall body of evidence for a specific clinical question, particularly when developing clinical guidelines.

AMSTAR2 for Systematic Reviews

AMSTAR2 (A MeaSurement Tool to Assess systematic Reviews) is designed specifically for assessing the methodological quality of systematic reviews and meta-analyses.

How it works: AMSTAR2 evaluates 16 domains including: - Was the research question defined using PICO? - Was the search strategy comprehensive? - Was study selection performed in duplicate? - Was the risk of bias assessment appropriate? - Were statistical methods appropriate for meta-analysis?

AMSTAR2 classifies systematic review quality as High, Moderate, Low, or Critically Low based on the presence of critical and non-critical weaknesses.

When to use AMSTAR2: Apply AMSTAR2 when evaluating the quality of any systematic review or meta-analysis before relying on its conclusions for clinical decisions.

Cochrane Risk of Bias 2 (RoB2) for RCTs

RoB2 is the Cochrane Collaboration's tool for assessing risk of bias in randomized controlled trials. It replaced the original RoB tool in 2019 with a more structured approach.

How it works: RoB2 assesses five domains: 1. Randomization process: Was allocation sequence random and concealed? 2. Deviations from intended interventions: Were participants and personnel blinded? 3. Missing outcome data: Was follow-up complete? 4. Measurement of the outcome: Was outcome assessment blinded and appropriate? 5. Selection of the reported result: Were all planned outcomes reported?

Each domain is judged as Low risk, Some concerns, or High risk. The overall judgment reflects the worst domain rating.

When to use RoB2: Apply RoB2 when critically appraising any individual randomized controlled trial.

QUADAS2 for Diagnostic Accuracy Studies

QUADAS2 (Quality Assessment of Diagnostic Accuracy Studies) is the standard tool for evaluating diagnostic test accuracy studies.

How it works: QUADAS2 assesses four domains: 1. Patient selection: Was the sample representative? Was selection consecutive? 2. Index test: Was the test performed and interpreted without knowledge of the reference standard? 3. Reference standard: Was the reference standard appropriate and interpreted independently? 4. Flow and timing: Was the interval between tests appropriate? Did all patients receive both tests?

Each domain is assessed for both risk of bias and concerns regarding applicability.

When to use QUADAS2: Apply when evaluating studies that report the sensitivity, specificity, or other diagnostic accuracy metrics of a clinical test.

Newcastle-Ottawa Scale for Observational Studies

The Newcastle-Ottawa Scale (NOS) assesses the quality of non-randomized studies โ€” cohort studies and case-control studies.

How it works: NOS uses a star system across three domains:

1. Selection (max 4 stars): Representativeness of the exposed cohort, selection of the non-exposed cohort, ascertainment of exposure, demonstration that the outcome was not present at study start. 2. Comparability (max 2 stars): Comparability of cohorts based on design or analysis, controlling for confounders. 3. Outcome (max 3 stars): Assessment of outcome, follow-up length, adequacy of follow-up.

Studies scoring 7-9 stars are generally considered high quality, 4-6 moderate, and 0-3 low.

When to use NOS: Apply when evaluating cohort or case-control studies, which make up a substantial proportion of the clinical literature.

Matching Framework to Study Type

Choosing the right assessment framework is essential. Here is a practical guide:

| Study Type | Framework |
|---|---|
| Randomized Controlled Trial | Cochrane RoB2 |
| Systematic Review / Meta-Analysis | AMSTAR2 |
| Diagnostic Accuracy Study | QUADAS2 |
| Cohort or Case-Control Study | Newcastle-Ottawa Scale |
| Clinical Guideline | AGREE II |
| Case Report | CARE Checklist |

AI-powered tools like AttendMe.ai can apply these frameworks automatically, identifying the study type and generating the appropriate quality assessment โ€” giving you a structured starting point for critical appraisal.

Dr. Harry Power

Founder & CEO, AttendMe.ai

Last reviewed: February 8, 2026

Try AttendMe.ai Free

AI-powered clinical decision support. No credit card required.