You are on page 1of 4

journal club

Journal club 3: systematic reviews

Jennifer Reids series aims to help you access the speech and language therapy literature, assess its credibility and decide how to act on your findings. Each instalment takes the mystery out of critically appraising a different type of journal article. Here, she looks at systematic reviews.


here has been an explosion of literature relevant to speech and language therapy over the course of my working life. An article which reviews the current state of play in a relevant area may appeal to timepressed clinicians. Can we expect a review to be more comprehensive than an article on a single piece of research? Should it avoid the need to comb the literature for articles on original research? And may we assume that the scope of the review will be better than we could do ourselves, the reviewers being more knowledgeable than us? Well, the answer is probably both yes and no. Reviews may indeed provide a ready-made synthesis of the available research but they too are open to that enemy of science, bias. This fact, along with the huge expansion, particularly in the medical literature, has led to the development of a new type of review method. The sorts of reviews many of us grew up with, like those presented in textbooks, have been reclassified variously as overviews, narrative reviews, or simply non-systematic reviews. The systematic review has become one of the core tools of evidence-based practice. If you can get your head round its principles and methods, you will find you are much better equipped to deal with the current literature. The landscape of a systematic review may feel

very foreign to the uninitiated my advice is to persevere, because well-conducted systematic reviews on areas of current concern for speech and language therapy services are invaluable. In Fife, we have found appraising systematic reviews in our journal clubs really helpful (even if they do nip your head to begin with!)

my advice is to persevere because well-conducted systematic reviews on areas of current concern for speech and language therapy services are invaluable
The critical appraisal tool for speech and language therapists presented here has been developed primarily from CASP (PHRU, 2006). It provides a structured framework for reading and appraising reports which summarise the

results of primary research studies. It can be used for systematic reviews with or without meta-analysis (when the reviewers attempt to combine the numerical results from various studies). These methods are at the top level of the evidence hierarchy, so authors often use the actual terms, systematic review or metaanalysis, in the title of the article. The results of a systematic review rely not only on the quantity and quality of the primary studies included but also on how well the review and synthesis was conducted. However, a well-conducted systematic review should provide more definitive evidence than any other type of study, even if the results relate only to a circumscribed area. The tool can also be used for other types of review but, for a non-systematic or narrative review, you may wish to use selectively this and the Expert Opinion tool presented in the first article in this series (Reid, 2010). As with other critical appraisal tools, the main themes to be addressed revolve around the study results, their validity (how true they are) and to what extent, if any, they might apply to the appraisers own context. As previously, magazine subscribers may download a formatted version of the appraisal tool at www. to use on their own or with colleagues in a journal club.



journal club
Question 1: What question was being asked, and is this an important clinical question? or exclusion of particular designs. As you read more systematic reviews, you will begin to get a better feel for this. The prestigious Cochrane Collaboration (, along with other authorities on evidence-based medicine, will try to convince you that a respectable review of an intervention should include only randomised controlled trials (RCTs). This may well be an attainable goal for medical treatments. However, in many areas of speech and language therapy practice, the only available evidence comes from small scale, exploratory studies. Moreover, the UK Medical Research Councils Framework for Development and Evaluation of RCTs for Complex Interventions to Improve Health (2000) advocates the use of small-scale and exploratory designs in the early phases of development of evidence-informed interventions. Those engaged in systematic reviews may need to take into account the level of maturity of the field of research before deciding where to draw the line. Inclusion criteria set too high up the evidence hierarchy increase the danger of arriving at the nil result of, for example, a Cochrane review of treatment for acquired dysarthria (Sellars et al., 2005) it found no studies met its inclusion criteria. This result may be a trigger for future research in this area, but it is distinctly unhelpful for clinicians looking for clues to potentially promising treatments, and by default promotes the expert opinion route with all its potential biases. Question 3: Did the reviewers try to identify all relevant studies? Like other aspects of evidence-based practice, appraisal points are scored by playing the game by the rules: systematic reviews should provide an exhaustive summary of the literature relevant to the question in hand, so reviewers are expected to have tried to identify all sources of evidence including those that are in the grey literature (such as dissertations, unpublished studies or articles in obscure publications.) For appraising exhaustiveness, the relevant questions to ask are: Did they follow up reference lists? Did they make personal contact with experts? Did they search for unpublished studies? Did they search for non-English-language studies? If they did, they will mention it, because they know this earns them credit towards publication in respected, peer-reviewed journals! If they failed to do so, is there a danger their review has been seriously compromised? Hmm, I leave you to form your own opinion Question 4: Did the reviewers assess the quality of the included studies?

Try formulating the reviewers stated aims into a research question if they have not done so explicitly in the article. Is the question clearly focused in terms of the PICO framework, which we discussed in the first article of the series (Reid, 2010): Population studied Intervention given (if it is an intervention study) Control / comparison (if applicable) Outcomes considered? Is this question important for your clinical practice? If the reviewers question does not quite fit the bill, what question(s) do you wish they had asked instead? Question 2: Did the review include the right type of study?

The article should present clear inclusion and/or exclusion criteria, so home into this section of the article to consider whether the included studies address the reviews question. Sometimes the primary studies have been designed to answer a different question, so it is important to check that a studys inclusion in the review is justified. Do the included studies have an appropriate study design? In my experience, those who are new to critical appraisal or research design may be inclined to feel that the views of the reviewers are more valid than their own. Deal with any feelings of inadequacy by reading carefully the reviewers rationale for inclusion

Which bibliographic databases were used? If you are not yet familiar with the nomenclature, consider whether more than one database was searched. Beware reviews that use only one database source the field of speech and language therapy is so cross-disciplinary that it is impossible to predict which journals contain potentially useful articles. For example, my default setting for rapid literature searching is to search simultaneously MEDLINE, PsychINFO and possibly ERIC ( if the question involves school-aged children. This produces some duplicates but also many unique references from only one database.

The main consideration is whether a clear, pre-determined strategy was used to decide which studies were included. Look for a set of defined categories that together form the definition of quality the reviewers have adopted, plus a scoring system there may be a table of the included studies showing the points awarded against each quality criterion. These sorts of tables often interfere with the readability of an article but you should try not to skip them. They really are crucial to understanding the results of the review and you may find that one or more primary studies are worth following up. It is also important that more than one assessor has been involved in rating and scoring the studies. This provides evidence that the quality system is objective and reliable enough to support the credibility of the results.



Question 5: How are the results presented and what is the main result?

Figure 1 A reminder about Confidence Intervals Confidence intervals allow you to estimate the strength of the evidence and whether it is definitive (in other words, you dont need further studies to check the result). A single study gives you only one example of the difference between two measures, two groups etc. If you repeated the same study several times, you would not get exactly the same result each time. You cant know what the real difference is, especially from one study. Calculating a 90 per cent confidence interval around your result allows you to say that there is a 90 per cent chance that the true result lies within this range. If an author is interpreting the confidence interval appropriately, you should see comments about both the extent to which their results support their original hypothesis as well as whether any further studies need to be done. Confidence intervals which straddle zero suggest that there may be no real difference or that your study used too few participants for you to detect the effect definitively.

The two components of this question are stated in this order for a reason: how results are expressed can have an important influence on what you perceive as the main result. You need to consider: whether the reviewers interpretation of numbers was sensible how the results are expressed (for example, odds ratio; means and confidence intervals (figure 1)) and how large and how meaningful this size of result is. Some systematic reviews provide an assessment of quality followed by a verbal synthesis in the form of one or more conclusions, with an indication of the strength of the current evidence for each. In terms of the main result, it can be instructive trying to sum up the bottom-line result of the review in one sentence it does help when trying to communicate the gist of your appraisal to others. And your clinical bottom-line will certainly be needed if your appraisal is to be combined with the appraisal of other evidence in order to produce a clinical guideline or a best practice standard, whether for your local context or for a wider audience. Question 6: Samantha If the results of thePaula studies have been combined, was it reasonable to do so?

Some systematic reviews go beyond qualitative synthesis and present a meta-analysis of the quantitative data from included studies. One

crucial concept here is the notion of effect size. (Dont panic! Im going to talk about numbers now but stay with me) Calculating an effect size is a method for quantifying the effectiveness of an intervention, allowing you to compare or combine the results of different studies. Numerical calculations are used to produce a number (a statistic!) so you can then compare like-with-like across different studies. You can think of it as similar to converting raw scores to standard scores in formal assessments it allows you to compare a clients performance in different areas of functioning, for example receptive vocabulary vs. comprehension of sentences. The value of a Cohens d or other statistic tells you about how big a change has been found in the outcome measure for the intervention. Whether changes can be attributed wholly to the intervention is a moot point, but in general the bigger the average change the effect size the more likely we are to believe it was caused by the intervention. A weak effect does not equate to no effect, but it may not show up conclusively in some study designs. To detect weak effects, you usually need lots of study participants. This may be where a meta-analysis comes into its own, as combining the data from lots of smaller studies of a relatively weak effect can provide much more definitive evidence that the intervention really does have an effect. You will come across different methods for analysing effect sizes. Percentage of nonoverlapping data (PND) provides a means of translating the results of individual studies into a common currency so you can evaluate them side-by-side. It can be applied to research designs that are lower down the evidence hierarchy, such as single-subject designs (also known as n=1 studies). In figure 2, can you work out which intervention had the stronger effect? For both studies, there is an area of overlap (see arrows) where the relatively high pre-intervention scores of some participants are the same as those of the people with the lowest post-intervention scores. However, this area of overlap is much smaller for intervention B, which translates into a stronger effect size and, numerically, to a larger percentage of non-overlapping data. For study B, we can be more confident that the changes in participants performance was indeed treatment effects and/or that the design

ensured that other influences on performance were controlled. Calculation of percentage of non-overlapping data may be used to combine the results of small-scale studies for a systematic review with meta-analysis. Meta-analysis of more robust studies, such as RCTs, is more likely to be reported using a forest plot (figure 3) or blobbogram (see the logo of the Cochrane Collaboration). These provide a visual display of the effect sizes associated with the included studies, the confidence intervals of their results, a summary effect size and confidence interval. The convention is for them to include an identifier for each study on the left (in order of year of publication), and some scary statistics on the right though if you can deal with them, you will find they answer the question about how results are expressed. Weighting is about how much each study contributed to the overall summary measure the bigger the blob, the more influential the study. One of our adult acquired journal clubs appraised a review of treatments for dysphagia in neurological disorders (Ashford et al., 2009). We found the heavy weighting of a couple of large Logemann studies a concern, because the participants in the Logemann studies were skewed towards people with Parkinsons Disease and dementia, with very small numbers of people with stroke very different, we thought, from the Fife caseload profile. Question 7: Can the results be applied to the local population?

Clinical recommendations may be offered by reviewers, but without an accumulation of robust, scientific evidence, these are often



Figure 2 Effect size Intervention A prepostIntervention B prepostBlack et al., 1999 Connor, 2002 Drake et al., 2005 Scores in the overlapping area could either be from pre- or from post-testing Elder et al., 2005 Foukes, 2009 Figure 3 Forest plot example (fictional)

Key Measure of effect, and therefore weighting, in metaanalysis Confidence interval Line of no effect Summary measure of effect - lateral spread shows confidence interval

fairly circumspect. You need to address the usual considerations about potential differences between the population covered by the review and your own, and whether your local setting is different from that of the review to the extent that its results cannot reasonably be applied. You also need to consider whether the intervention is practical and acceptable to clients in your own setting. Question 8: Were all important outcomes considered?


Consider whether any reported benefit outweighs any risk and/or additional cost. If this information is not reported, can it be filled in from elsewhere? And finally... A good example of a clinically helpful review, in my opinion, was one we reviewed last year in an adult learning disability journal club. The study (van Oorsouw et al., 2009) posed a question the group felt was extremely important for them, regarding which aspects of staff training are related to improvements in staff behaviour. The authors included single-subject and small sample studies and found 55 studies that met their criteria, which provided relevant data from over 500 participants. Meta-analysis (using percentage non-overlapping data) was applied to the data from all the participants. The results suggested that a combination of in-service (using multiple techniques) with coaching-on-the-job (featuring verbal feedback) is the most powerful format. Even though these results did not really add to what the group already believed, it is important for us to have evidence to support what we are currently doing as well as information to help us break new ground. The journal club session helped the staff feel more confident in their practice and gave them ammunition for resisting pressure to undertake staff training that was unlikely to be effective. The study results were also of great interest to paediatric and adult acquired staff. These days pretty much every speech and language therapist has to do staff training, whether this is with health, education or social care staff, so this review also spoke to their concerns about

how to design and deliver training effectively. Of course, what we really need to know is how to bring about long-term, sustained change in staff behaviour, but unfortunately this study SLTP did not speak to that question. Jennifer Reid is a consultant speech and language therapist with NHS Fife, email


You should try to think whether the reviewers have considered the outcomes of the review from all angles, that is from the point of view of clients, families and carers, and the wider community, as well as speech and language therapists and other professionals, service managers and policy makers. Question 9: Should policy or practice change as a result of the evidence contained in this review?

Ashford, J., McCabe, D.M.A., Wheeler-Hegland, K., Frymark, T., Mullen, R., Musson, N., Schooling, T. & Smith Hammond, C. (2009) Evidence-based systematic review: Oropharyngeal dysphagia behavioral treatments. Part III - Impact of dysphagia treatments on populations with neurological disorders, Journal of Rehabilitation Research & Development 46(2), pp.195-204. Medical Research Council (2000) A Framework for development and evaluation of RCTs for Complex Interventions to Improve Health. Available at http:// htm?d=MRC003372 (Accessed 18 February 2011.) Public Health Research Unit (2006) Critical Appraisal Skills Programme. Available at: Pages/PHD/CASP.htm (Accessed: 18 February 2011.) Reid, J. (2010) Journal Club: expert opinion, Speech & Language Therapy in Practice Autumn, pp.17-21. Sellars, C., Hughes, T. & Langthorne, P. (2005) Speech and language therapy for dysarthria due to nonprogressive brain damage, Cochrane Database of Systematic Reviews Issue 3. Art. No: CD002088. DOI: 10.1002/14651858.CD002088.pub2. van Oorsouw, W.M.W.J., Embregts, P.J.C.M., Bosman, A.M.T. & Jahoda, A. (2009) Training staff serving clients with intellectual disabilities: a meta-analysis of aspects determining effectiveness, Research in Developmental Disabilities 30(3), pp. 503-511.

Critical appraisal for speech and language therapists (CASLT) Download the systematic review framework document from Use it yourself or with colleagues in a journal club, and let us know how you get on (email