Methodologic quality and risk-of-bias in systematic reviews of healthcare interventions: a review of methods

| OBJECTIVE: To compare the characteristics of systematic reviews of healthcare interventions that assessed or did not methodologic quality/risk-of-bias of included studies. Additionally, to analyze methodologic features of those assessing the methodologic quality/risk-of-bias. METHODS: PubMed database was searched. From 25,571 systematic reviews retrieved, a random sample of 1,025 was screened. Frequencies were used to describe outcomes. Unadjusted and adjusted logistic regressions were performed to test the associations with methodologic quality/risk-of-bias results assessment. In a second analysis, systematic reviews that assessed methodologic quality/risk of bias were dichotomized according to the design of included studies (randomized clinical trials-only versus non-randomized studies of intervention or a combination of both). RESULTS: 303 systematic reviews were fully analyzed. Methodologic quality/risk of bias was assessed by 278 (92%). Methodologic quality/ risk-of-bias assessment was associated with a higher number of databases searched (>4, P= 0.008), the presence of meta-analysis (P= 0.005), and the design of included studies (randomized clinical trials-only, P= 0.042). The chance of using a suitable tool and a tool designed for risk-of-bias assessment rather than methodologic quality was higher for randomized clinical trials-only systematic reviews (P< 0.05). The most used tool was Cochrane’s RoB Tool without a clear studies’ overall risk classification system. CONCLUSION: methodologic quality/ risk-of-bias assessment was associated with included studies’ design (randomized clinical trials-only), a meta-analysis of data, and the number of databases searched (>4). The most used tool was Cochrane’s RoB Tool, with no clearly defined rating system. Methodologic quality/ risk-of-bias assessment methods description, results, and impacts on meta-analysis, the certainty of evidence, and systematic reviews’ results are still to be consistently addressed.


Introduction
In the 1990s, David Sakett popularized the concept of Evidence-Based Medicine 1 (EBM, nowadays expanded to Evidence-Based Practice). During the search for the best evidence to support health professionals in their clinical decision-making process and handling the vast amount of information available, Systematic Reviews (SR) arouse as the soundest source of information.
Over the years, the scientific method for planning, conducting, and reporting SR has improved considerably. Largely due to the introduction of reporting guidelines, such as the PRISMA Statement (Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2 ), checklists for critical appraisal, as the AMSTAR 2 (A Measurement Tool to Assess Systematic Reviews 3 ), and systems for the analysis of the certainty/strength of the evidence, as the GRADE approach (Grading of Recommendations, Assessment, Development and Evaluations 4 ).
All of them highlight essential items that should be addressed in SR and are unanimous in pointing out the importance of analyzing the risk-of-bias of included studies to ensure consistent results. Therefore, the assessment of bias in included studies and a careful analysis of its impacts on SR results are central to drawing conclusions in the most impartial and objective manner 5 and should be carefully addressed by SR authors.
Poor design and/or execution of original studies of healthcare interventions often result in poor internal validity. Thus, assessing the internal validity of studies included in an SR should highlight the risk of under or overestimation of true intervention effect. 6 Features of studies' design associated with effect overestimation include inadequate random sequence generation and concealment, unexplained or unexplored expressive loss to follow-up, and unblinded outcome assessment. 7,8 All of these must be properly explored during the critical appraisal of included studies.
Both the first version of the PRISMA Statement 2 and the revised version 8 emphasize the importance of risk-ofbias within and across studies assessment in SR. The new version, however, suggests some analytic strategies to examine the influence of included studies' risk of bias on SR results: (i) restricting the primary analysis to studies at low risk of bias (sensitivity analysis); (ii) stratifying studies according to risk-of-bias through subgroup analysis or meta-regression; or (iii) adjusting the result from each study to remove the bias. 8 A methodologic study evaluated the adherence of SR to the PRISMA checklist and, besides most performed some risk-of-bias assessment within studies (72%), poor compliance to risk-of-bias across studies assessment (38%), and presentation of results of risk-of-bias across studies (30%) was observed. 9 Another study evaluated 1,114 SR of oral health interventions, of which 61.4% assessed the risk of bias, more consistently in Cochrane reviews than in non-Cochrane reviews (100% versus 56.3% P < 0.001) and those published after the PRISMA Statement release. 10 Cochrane's Handbook defines bias "as a systematic error, or deviation from the truth, in results" of included studies. The risk-of-bias analysis comprises the use of tools to evaluate the internal validity of included studies. On the other hand, methodologic quality is assessed by scales, not rarely mixing different concepts (risk-of-bias, imprecision, relevance, ethics, etc.), making their final scores hard to interpret. 6 Eleven years after the first version of the PRISMA Statement 2 release and twelve years after the GRADE Approach 4 introduction, and since both consider mandatory the methodologic quality and/or risk-of-bias (MQ/ RoB) assessment and analysis in SR of healthcare interventions, uncertainty remains about the adherence of SR authors to this requirement. Therefore, this study aimed to compare the characteristics of systematic reviews of healthcare interventions assessing or not MQ/RoB. Additionally, to analyze methodologic features of SR that assessed the MQ/RoB of included, including: • used tools and their suitability to the design of included studies; • MQ/RoB assessment methodology adopted; • presentation and interpretation of MQ/RoB assessment.

Methods
The search strategy (Box 1) was developed for PubMed database with the aid of an experienced librarian. The search was performed in September 1 srt 2020.

Box 1. Search strategy used in PubMed database
The search returned 25,571 records. A random sample of 1,025 (4%) was retrieved for analysis. A random number generator (https://www.openepi.com/Random/Random.htm) was used to randomly define the 1,025 records to retrieve. The research team selected the randomly defined references by ticking the boxes on the PubMed results list and exported them to a reference manager (EndNote X9, version 3.3 -Clarivate, Thomson Reuters Corporation, Fairfax, VA, USA).
The inclusion process was done in a two-step approach. In the first step, five reviewers read titles and abstracts of retrieved studies, applying specific eligibility criteria. In the second step, four reviewers read the studies' full text, applying additional exclusion criteria, for final inclusion. At both steps, individual information was cross-checked by a second reviewer. Conflicts were solved by a third experienced reviewer.
SR of healthcare interventions with or without meta-analysis was included. Exclusion criteria at step 1 comprised comments, opinions, letters, protocols, narrative reviews, scoping reviews, brief/rapid SR, overviews of reviews (umbrella reviews), network meta-analysis, association (including association and risk), proportion (including prevalence and incidence), descriptive (including phenomenon, case reports, case series, and surveys), prognostic, diagnostic, qualitative and mixed-methods, methodological, cost-effectiveness, in vitro and in vivo (in animal models), and non-healthcare SR (classified according to Munn et al. 11 ). Cochrane Collaboration SRs were also excluded since they were considered benchmarks.
In step 2, articles were excluded whether not fulfilling the key characteristics of SR according to Cochrane's Handbook 12 and Krnic Martinic et al. 13 (Box 2).
SR not assessing MQ/RoB were compared to those which, by any means (qualitative/narrative or quantitative through validated tools), assessed the MQ/RoB of included studies. Data collected comprised author; country; health area; journal impact factor in 2020; funding; adherence to PRISMA Statement; public a priori protocol registration; the number of databases searched; search in grey literature; Methodologic Quality and/or Risk-of-bias (MQ/RoB) tool used and suitability to included studies' design; the number of reviewers involved in MQ/RoB assessment; strategies for conflicts solving; the moment of MQ/RoB assessment -before or after studies' inclusion; use of cut-off point or other systems to studies' quality/bias classification; and interpretation/use of risk-of-bias results (in discussion, meta-analysis, and/or GRADE approach). The same process described for the inclusion process was used for data extraction, and the same five reviewers worked on it.
Frequencies were used to describe characteristics of included SR. Unadjusted and adjusted logistic regressions were performed to test the associations with MQ/RoB assessment. The dependent variable was MQ/RoB assessment dichotomized as yes (zero) and no (1). In a second analysis, SRs that assessed MQ/RoB were dichotomized according to the design of included studies -randomized clinical trials (RCT) only as yes (zero) and non-randomized studies of intervention (NRSI) or a combination of both (RCT+NRSI) as no (1). All variables presenting a p-value <0.20 in the unadjusted model entered the final model. Odds ratios (OR) and 95% confidence intervals (CI) were calculated at a 0.05 level of significance. The Statistical Package for Social Sciences program (SPSS for Windows, version 21.0, SPSS Inc. Chicago, IL, USA) was used for analyses.

Results
Of 1,025 randomly selected studies, 386 (38%) were initially classified as SR of healthcare interventions at step Descriptive characteristics and statistic comparison between SR of intervention with or without MQ/RoB are available at Table 1. Table 1. Characteristics and statistic comparison of the included systematic reviews according to MQ/RoB assessment (n= 303) (to be continued)  The main area of interest was medicine (n= 239, 79%), and the country with the highest number of publications was China (n= 71, 23%). Most SR was published in journals with available Impact Factor (n= 233, 77%) and most authors received no funding for developing the SR or provided no information on funding (n= 185, 61%).
PRISMA Statement adherence was reported by 232 SR (77%), and more than half (n= 184, 61%) did not mention the availability of a published a priori review protocol. Of 119 SR that mentioned a protocol, 97% registered it at PROSPERO (International Prospective Register of Systematic Reviews, https://www.crd.york.ac.uk/prospero/). The mean number of databases searched was 4.1 (range 2 to 12) and most included SR did not search grey literature (n= 184, 61%). About half of SR (n= 160, 53%) included only randomized clinical trials (RCT). Most SR performed a meta-analysis (n= 221, 77%), and did not use the GRADE approach (or other analysis of evidence certainty) (n= 243, 80%).
A complete list of included SRs with descriptive characteristics can be found in the S2 Table.  The adjusted logistic regression showed that the chance of not performing the MQ/RoB assessment was higher for SR searching 4 or fewer databases (OR 6.046, CI 1.598-22.879, P= 0.008). Also higher for SR without meta-analysis (OR 5.300, CI 1.672-16.804, P =0.005). And lower for SR including RCT only (OR 0.356, CI 0.131-0.964, P= 0.042).
From 278 SR assessing the MQ/RoB of included studies, 152 (55%) included only RCT, 86 (31%) both RCT and NRSI, and 40 (14%) only NRSI. Most used MQ/RoB tool in SR including only RCT was Cochrane's RoB Tool (2011) 15 (n= 101, 66%). SR including only NRSI used many MQ/RoB different tools, yet the most used was The Newcastle-Ottawa Scale 16 (NOS) (n= 10, 25%). Most SRs including both RCT and NRSI applied different tools for RCT and NRSI, and the most used combination of tools was the Cochrane's RoB Tool (2011) 15 for RCT and the Newcastle-Ottawa 16 (NOS) for NRSI (n= 13, 15%). Most used tools and combinations for MQ/RoB assessment according to included studies' designs can be found in Figure 2 (RCT-only), Figure 3 (NRSI-only), and Figure 4 (RCT+NRSI), and a complete list in S4 Table. Main findings of comparison between SR of intervention that assessed MQ/RoB including RCT-only or RCT + NRSI/NRSI-only are available in Table 2. The MQ/RoB assessment was performed by two independent reviewers in 152 (55%) SR, while 108 (39%) SR did not inform the number of assessors. Strategy to conflict solving was not informed by 163 (59%) SR.
MQ/RoB assessment was more often performed after the final inclusion of studies in SR (n= 239, 86%). The use of clear systems to classify the overall MQ/RoB of included studies or outcomes was described by only 79 (28%) SR.
Meta-analysis was carried out in 214 (77%) SR, of which only 53 (25%) evaluated the impact of MQ/ RoB of included studies on meta-analysis results, mainly through sensitivity analysis (n= 36). GRADE approach or other methods for evaluation of certainty/strength/level of the body of evidence was used by only 59 (21%) SR. Most SRs using the GRADE approach (n= 46) described the impact of RoB on the certainty of evidence (n= 32, 70%).
Most SR presented results from MQ/RoB assessment in text or figures (n= 252, 91%). Seven SRs presented results as supplementary files, and 19 (7%) SRs did not present MQ/RoB results at all. Less than half of SR discussed the results of the MQ/RoB assessment (n= 138, 47%).
A list of qualitative characteristics of SR that assessed MQ/RoB can be found in the S3 Table. Adjusted logistic regression of SR assessing the MQ/RoB of included studies indicated that the chance of choosing an unsuitable MQ/RoB tool was lower for RCT-only SR (OR 0.009, CI 0.001-0.077, P= 0.000); while the chance of using a tool designed for RoB assessment was higher (OR 29.26, CI 95% 9.15 -93.56, P= 0.000).

Discussion
This review aimed to evaluate the MQ/RoB assessment in a sample of SR of healthcare interventions. SR assessing MQ/RoB were compared to those not assessing it. MQ/RoB assessment was associated with more than four databases searched, design of included studies (RCT-only), and meta-analysis performed. The decision of performing or not a meta-analysis goes through the MQ/RoB assessment, once a metaanalysis based on low quality/high risk-of-bias studies may impair the SR findings and conclusion 6 , thus, the finding of SR not assessing MQ/RoB being also prone to lacking meta-analysis is not surprising.
Increasing the number of searched databases, as well as searching the grey literature, (the so-called comprehensive search) is a manner of reducing the publication bias and meeting the criterion of finding "all references" answering a certain focused question. 17 This finding, together with the association to RCT design and meta-analysis performing, may suggest that SR assessing MQ/RoB achieved higher methodologic standards.
Considering only the SR that assessed MQ/RoB of included studies, it was noticed that most performed a meta-analysis (77%), a number slightly above the 68% reported by Pussegoda et al. 9 This could be explained by the focus on SR of interventions and by the inclusion of the term "meta-analysis" in the search strategy.
About 80% of included SR declared adherence to PRISMA Statement 2 or other SR reporting checklists. Besides no strict analysis of presumed adherence was performed in this study, one criterion of the PRISMA checklist is the mention of a public a priori protocol. Yet less than half of included SR mentioned it. A study analyzing SR published between 1990 and 2014 identified that only 6% fulfilled this criterion. 10 Considering the time elapsed since the PRISMA Statement introduction 2 , SR authors were expected to give more importance to a protocol registration, especially considering the recent evidence that the existence of a registered protocol improves the quality of the SR reporting. 18 Regarding the tools for MQ/RoB assessment, the most used was the Cochrane's RoB Tool 15 (n= 156, 56%). Considering the SR including only RCT, 70% used the Cochrane's RoB Tool 15 , besides the version 2.018 has been available since 2016 (revised version 20 in 2019). Since the included studies were published between 2019/and 2020, it was expected that RoB Tool 2.0 would be more popular. RoB Tool 2.019, differently from its first version, included a classification system of overall RoB at the outcome level. 21 The lack of a system for RoB Tool 15 resulted in many studies classifying the overall risk of bias with unclear rating criteria (38%).
For NRSI, the most used tool was the NOS (Newcastle-Ottawa Scale 16 ), yet it did not reach one-third of NRSIonly SR (n= 35/126, 28%). It is possible to notice that there is some uncertainty among authors on the most suitable tool for MQ/RoB assessment for NRSI. The chance of an RCT-only SR choosing an MQ/RoB tool suitable for this study design was much higher than RCT plus NRSI/NRSI-only SR (108 times higher, P= 0.000). This confirms the perception registered in Cochrane's Handbook that "assessing the risk-ofbias in an NRSI has long been a challenge and has not always been performed or performed well". 22 It is true that NRSI comprises a gamma of different study designs. However, if the aim of the SR is to assess the effectiveness of a certain intervention, ROBINS-I (Risk Of Bias In Non-randomized Studiesof Interventions 23 ) tool should be used for all included NRSI, regardless of the study design. 24 Even though the NOS 16 tool is still accepted by the Cochrane Collaboration 25 , considering that a risk-of-bias tool is preferable, the ROBINS-I tool should be chosen. 24 Some tools suit more than one study design (Coleman's Methodology Score 26 and Downs & Black Scale 27 , e.g.), however, they are MQ checklists, rather than RoB tools. The same is true for appraisal systems with tools for each study design, such as the JBI Critical Appraisal Tools (available at https:// jbi.global/critical-appraisal-tools). The results of this review showed that SR including NRIS-only or both RCT+NRSI were prone to choose methodologic quality appraisal tools regardless of the study's design (48%), while RCT-only SR used mainly RoB tools (70%) (29 times higher chance, P= 0.000).
Ideally, the process for reaching RoB judgments should be clearly described in text 6 , mentioning the number of reviewers involved in MQ/RoB, if they worked independently or not and strategies for conflict solving. 8 The Cochrane Collaboration, as well as the Joanna Briggs Institute, recommend that MQ/ RoB assessments are performed independently by at least two reviewers, to reduce errors in assessments and the influence of individual preconceptions. And disagreement is solved through discussion till consensus, the participation of a third reviewer, or both. 6,28 However, many included SR simply did not mention how many reviewers were involved in the activity (39%), or how the conflicts were solved (59%).
Most SR assessed MQ/ROB of included studies after the studies' inclusion process (86%), and only about 2% of SR described the assessment of MQ/RoB as a tool for a final decision on studies' inclusion. Assessing the MQ/RoB before studies' inclusion assures that only moderate to good quality (or low risk-of-bias) studies are included, and besides this practice is recommended by some SR guidelines (as the JBI Manual for Evidence Synthesis 29 , the Cochrane Collaboration recommends that all studies answering a certain focused question should be included. Including all eligible studies may result in precise but biased results, due to study characteristics. Instead, including only the studies at low risk of bias may produce an unbiased imprecise result. Therefore, strategies to cope with bias should be defined a priori and SR authors should judiciously interpret and discuss how studies' biases affected the SR results. 6 Surprisingly, a few SR (9%), besides performing MQ/ RoB assessment, simply did not report the results in text. Others provided a supplementary file with MQ/RoB results; however, free access was not often assured. Ideally, MQ/RoB results should be clearly exposed, preferably by tables or graphs, alongside the text, allowing readers full comprehension and appraisal of the MQ/RoB assessment process. 6 MQ/RoB assessment results should be used for metaanalysis interpretation and certainty of evidence analysis and be discussed in the text. 8 SR authors should not perform analyses and interpretations disregarding the results of the MQ/RoB assessment. 6 Among those SRs that meta-analyzed the data, only 25% evaluated the impact of MQ/RoB of included studies on the results, mainly through sensitivity analysis. Despite sensitivity analysis being considered the main procedure to assess the bias impact in SR results, alternative ways to address it include metaregression, subgroup analysis, and careful discussion of findings in the light of MQ/RoB results. 6 Nevertheless, less than half of the authors discussed the impact of MQ/RoB on the SR results.
All strategies to cope with bias issues, however, share the risk of being ignored in SR's conclusions, so the MQ/RoB assessment should be incorporated into some certainty of evidence measurement, as the GRADE approach e.g. 6 , and SR authors should make judgments not only for the risk-ofbias within studies but across studies as well. 2,6 Notwithstanding, twelve years after the GRADE approach introduction 4 , only 17% of the included SR used it to rate the certainty of the evidence.

Conclusion
Most SR of healthcare interventions assessed MQ/ RoB of included studies. MQ/RoB assessment was associated with included studies' design (RCTonly), a meta-analysis of data, and the number of databases searched (>4). The most used tool was Cochrane's RoB Tool, with no clearly defined rating system. SR including only RCT used more suitable tools and more RoB than MQ tools when compared to SR including other studies designs. MQ/RoB assessment methods description, results, and impacts on meta-analysis, the certainty of the evidence, and SR results are still to be consistently addressed.