journal club (5


Journal club 5: observational studies
Jennifer Reid’s series aims to help you access the speech and language therapy literature, assess its credibility and decide how to act on your findings. Each instalment takes the mystery out of critically appraising a different type of journal article. Here, she looks at observational studies.



esearch is fundamentally a quest for explanations. Explanations go beyond simple description in order to provide an account of causal relationships, for example, between events, human beliefs, behaviour, experiences and ill-health. It is really important to keep the notion of causality in mind as we try to get our heads round the observational designs used in health-related research. Causal reasoning, in a nutshell, requires us to: • know what conditions preceded the phenomenon of interest, • assess which of these antecedent conditions are candidates as causal agents, and then • organise this knowledge into a plausible, causal chain of events. There are a number of observational research designs, and not all provide robust evidence of causality. It is not enough to demonstrate that there is an association between two factors. If you find that children living in high flats have poorer health, this is not good evidence that living in a flat causes ill-health. The two factors may be related, with economic or social circumstances perhaps much better candidates for an underlying cause.

NHS Fife’s Dunfermline cluster Journal Club. Author Jen Reid has her back to the camera.

Experimental or observational?

When an article talks about an intervention, how do I know if this is an experimental study or an observational one? Group intervention studies, especially those using randomisation to groups, are considered superior to observational designs for answering causal questions about healthcare interventions, since there is better control of the effects of any unforeseen (confounding) factors. However, such an experimental design is not always practical or possible, especially where the participants’ common characteristic cannot be manipulated (such as having a genetic condition) or would not be ethical because of likely negative effects (for example, not talking to your baby). Observational studies examine aspects of people’s past and/or present life in order to identify relevant information through observation rather than experimental man-

ipulation. They do not offer a particular intervention and measure directly its effect. However, information may be collected about intervention(s) the participants have received. The goal of observational studies is usually to identify factors which may be causally related, and thus may be incorporated into interventions which then produce better outcomes in the future. Sometimes people who have received a particular intervention are followed up and their outcomes compared to the outcomes of others who did not receive the intervention. An observational study appraisal tool is likely to be the one to choose for such a study. The exception is if the intervention was offered as a core part of the research design and participants were allocated or selected to receive one intervention or another according to some preset criteria.

Observational designs

There are a number of observational designs and some jargon to deal with. 1. Cohort study A cohort study follows over time of a group of people who have something in common. (The term cohort was also used to refer to a group of Roman soldiers, which could be a

useful aide-mémoire.) The group may have a common characteristic, such as where the participants were born, how they were educated or an aspect of their health or wellbeing. Alternatively they may have all been exposed to a risk or challenging circumstance of some kind, or have received a particular health intervention. The comparison group for cohort studies may be the general population from which the cohort is drawn, or another cohort of persons thought to be similar except for the common characteristic under investigation. Alternatively, subgroups within the cohort may be compared with each other. This is commonly the case in birth cohort studies, where all children born in particular years in one geographical area are studied. Results are analysed to detect a cohort effect. This means finding out whether membership of the cohort, and therefore having the common characteristic, appears to make a difference to the outcome. In research designed to investigate risks of adverse health outcomes, a cohort is identified before the appearance of the condition(s) under investigation. For example, Conti-Ramsden & Botting (2007), in a study of emotional health in adolescents, describe their young people with specific language impairment as, “originally recruited at 7 years of age as part of a wider study … The original



journal club (5)
Figure 1 Criteria for Causation, from Bradford Hill (1965) Temporal relationship: Cause always precedes the outcome. Strength: The stronger the association, the more likely it is that the relationship is causal. (NB Look at the significance of those correlation coefficients!) Consistency: The association is consistent across different studies. Dose-response relationship: An increasing amount of the proposed cause increases the outcome’s severity or risk of occurrence. Sense: The causal explanation is theoretically plausible and compatible with current knowledge. Alternate explanations: Other plausible explanations have been ruled out. Experiment: The outcome can be influenced by an appropriate intervention. Specificity: A single putative cause produces a specific effect. This one is probably less important for our purposes, especially given the multi-factorial nature of most of the behaviours speech and language therapists are dealing with.

Critical appraisal for speech and language therapists (CASLT) Download the observational study and survey questionnaire frameworks from www.speechmag. com/Members/CASLT for your own use or with colleagues in a journal club.
Spearman’s rho only indicates the presence or absence and direction of any association between the variables being measured. Hill’s (1965) criteria for causation (figure 1) were originally designed for epidemiological studies but have been widely quoted and so may pop up in authors’ discussion of the results of their observational studies. Here is an appropriately cautious conclusion about causality from the Conti-Ramsden & Botting (2007) study: “Our data show a clear increased risk for this population as they near adulthood compared to peers, even when concurrent language and cognition are accounted for. This finding replicates other studies that have shown raised prevalence of psychiatric difficulties in those with communication impairments … or increased language impairment in children referred psychiatrically … However, the association has often been assumed to be causal in that either long-term language impairment may lead to (or exacerbate) wider difficulties or psychiatric impairment may constrain communication skill. Nonetheless, it needs to be noted that the majority of adolescents with SLI in our study did not appear to suffer from emotional problems” (p. 522).

cohort of 242 children represented a random 50% sample of all children attending year 2 (age 7) in language units across England.” The emotional mental health of this cohort is the health outcome of interest. It is compared with a matched group of young people without a history of specific language impairment to explore whether there may be a greater risk of negative mental health outcomes for young people who have the condition. Cohort designs are particularly useful for studying developmental changes across the lifespan, such as to identify the influence of early circumstances, or the negative longterm effects of a condition, on life outcomes. An example might be language and literacy outcomes for children born very prematurely. However, they are expensive to do: outcomes may take a long time to occur so you need to follow up the same group over a long period of time, it is hard to prevent loss of participants (attrition) which is bad for the integrity of your results and, unless your cohort is very large indeed, it may be impossible to pick up enough people with a rare outcome to gain evidence for prognosis. 2. Case-controlled study In a case-controlled (or case-control) study, on the other hand, people who have the outcome of interest (cases) are identified and matched with people who do not (controls). For example, a case-controlled design for ContiRamsden & Botting’s outcome of interest – emotional mental health in adolescence – might be to recruit participants with poor emotional mental health. They would then investigate their current language skills and / or their developmental language history in comparison with a matched group with good emotional mental health. Case-controlled studies may be the only practical design for researching rare conditions or outcomes, but on their own they provide much weaker evidence of a causal relationship because there is much more risk of systematic bias affecting the results. You need to ensure every participant is allocated correctly as a case or not, as any misallocation can profoundly influence the results. The measures used to

determine who is a ‘case’ therefore need to be pretty bullet-proof. This can be particularly tricky with complex human behaviour such as communication or emotional mental health. At a recent conference I attended, speech and language therapists debated whether they would identify the same children as languagedelayed as the team studying a large Australian preschool birth cohort. The study was using a cut-off of 1.25 standard deviations below the mean for their age on language testing. We concluded its ‘cases’ might include quite a few of our ‘non-cases’. 3. Cross-sectional survey The third main observational design is the cross-sectional survey. A representative sample of the population of interest (clients, practitioners, relatives) is interviewed, examined or otherwise studied to gain information on a question, such as, “How many children entering primary school have poor vocabulary?” or, “What influences speech and language therapy intervention for adults with autism and learning disability?“ or, “What do care staff in residential homes know about aphasia?” The data for crosssectional studies are collected at a single point in time. However, the study may include retrospective information. An example would be, in a survey of knowledge and skills for making information accessible for people with learning disability, asking support staff whether they had ever received any formal training on making information accessible. Surveys can be relatively cheap and easy to do, but there are even more potential challenges to the integrity of the data, so it is not an appropriate design for answering causal questions. I have prepared a separate framework tool for surveys that use questionnaires, which is available at www. For observational study results, there are ways to evaluate how robust the evidence is for inferring causality. Remember, just because you have established an association between two factors, this does not allow you to assume a causal relationship. In terms of the numbers, a correlation coefficient such as Pearson’s r or


The reporting of observational studies in peerreviewed journals has been influenced by the STROBE statement (von Elm et al., 2008). Like the Bradford Hill criteria (figure 1), this was originally devised to improve reporting of epidemiological research but it has been extended to other areas. Although designed for authors, it may provide some guidance for readers too. Observational studies are perhaps less common in speech and language therapy literature, so I have found it helpful to have a tool that encapsulates all the main observational designs rather than trying to match separate tools to cohort, case-controlled and cross-sectional studies. I developed the following appraisal framework for speech and language therapists from relevant CASP tools (PHRU, 2006) and the STROBE statement (combined checklist).



Question 1: Did the study address a clearly focused issue? In general, you are looking for selection bias which might compromise the extent to which the findings can be generalised (external validity). Were participants representative of a clearly defined and clinically relevant population? Appraise the eligibility (inclusion and exclusion) criteria, the sources and methods of selection of participants and how cohorts were followed up. The selection method should be systematic – explicit, reliable and replicable - especially for a casecontrolled study, where it is crucial that there is no misallocation of cases. Scrutinise also the way that controls have been selected. Are they matched, populationbased or randomly selected, and is the rationale justified? If controls were matched, were the matching criteria appropriate? Authors should also provide information to allow you to assess whether those who were invited to participate but declined could be different in any important way from the study participants. Potential controls are perhaps more likely to decline or ignore invitations to participate, so this may be even more important for this group. How many participants were there, was there a rationale for this and were the numbers sufficient to support generalisation of the findings? If the study is asking, “How many people have…” you need to think whether the sampling is of newly identified (incidence) or of cases across the whole population (prevalence) (see figure 2), and which would provide a more appropriate answer to the research question.
Figure 2 Incidence or Prevalence? iNCidence is the number of New Cases in a given time span Prevalence is the Proportion of cases in the Population

Which population was studied? Which risk factors or outcomes were investigated? Did the study try to detect a beneficial or harmful effect? Is the underlying issue one of causation? Try formulating the reviewers’ stated aims into a research question if they have not done so explicitly in the article. Is this question important for your clinical practice? Question 2: Was the choice of design appropriate? Is an observational design an appropriate way of answering the research question under the circumstances? Remember that a group intervention study is a more powerful design for demonstrating causality. Try to work out whether this is a cohort, case-controlled or cross-sectional study. For a cohort study, the participants should have been recruited before the outcome of interest has occurred, and the cohort should have something in common (though this can be a very general characteristic, such as being born in Scotland in 2005, or a more specific one like being a sibling of a child with autism spectrum disorder). For a case-controlled study, ‘cases’ and ‘controls’ are identified at the outset of the study, criteria for ‘caseness’ are crucial for quality control, and the outcome of interest should be rare or harmful. Crosssectional surveys are probably the easiest to spot, since we meet them regularly in everyday life. For surveys, sampling methods which ensure that the participants represent adequately the population of interest are very important for quality control. Question 3: Were participants recruited in an acceptable way?

objective, and do they measure what they are supposed to measure? Externally validated measures, like formal tests, need less supporting information than measures developed for the purposes of the study. Some outcomes may take a long time to occur, so was the timeframe of the study long enough to assess this accurately for all participants? Moreover, the participants who are lost to follow-up may have different outcomes from those who were available, so attrition rates need to be given and their potential impact discussed. A flowchart of recruitment and follow-up schedule, indicating attrition numbers, can be really helpful for long-term studies. As in intervention trials, the study method should minimise the possibility of performance bias by employing similar measurement methods for both cases and controls, and by blinding those undertaking the assessments to participants’ status wherever feasible. Question 5: Has there been adequate attention to confounding?

Question 4: Were phenomena measured enough to minimise bias?


Confounding is the influence of unforeseen factors. Check which factors have been considered and list any you think might be important that the authors seem to have overlooked. How, if at all, have the researchers taken account of the confounding factors in the design and/or analysis? Have a look in the data analysis section for evidence they have used statistical techniques such as modelling, regression or sensitivity analysis to make adjustments for confounding factors. Here is some relevant wording from the Conti-Ramsden & Botting (2007) article:“… all the analyses above comparing those with SLI and those with NLD [no language disorder] remained unchanged after controlling for gender…” Question 6: What are the results of this study? As for intervention studies, it helps to try to sum up the bottom-line result of the study in one sentence – this also helps to

You should be given enough information to assess how well all the phenomena involved have been assessed or otherwise measured, both factors (cohort characteristics or caseness criteria) and outcomes. Are definitions clear enough? Were measurements subjective or



ensure you’ve got it straight enough in your own head to be able to communicate the gist of your appraisal to others. Consider whether the analysis appears appropriate to the design. Bear in mind that an observational study can only demonstrate associations: the presence or absence of an association, its strength (weak/strong correlation) and direction (positive = one goes up, so does the other; negative = one goes up, the other goes down). How are the results expressed? Begin by having a look at their descriptive statistics such as the numbers or proportion of participants with a given outcome, or tables of average differences. Inferential statistics draw conclusions rather than simply describing. These may include analysis of tables of correlations or of measures of difference (between groups, as in a randomised controlled trial), such as analysis of variance (ANOVA), modelling, regression or sensitivity analysis. It might help to think of the study ‘variance’ as all the measured differences amongst the participants. You then use your statistical analysis to try to make sense of this variance. The more of the variance accounted for in the end, the better the evidence is that the study has captured the strongest - and therefore potentially causal - factors. You have already had a look for evidence that confounding factors were considered. Are the numerical results adjusted for confounding? Take into account the list you made of overlooked confounding factors and consider if confounding could still explain an important part of the results? How did they evaluate the effect of individuals refusing to participate and did they adjust the overall results accordingly? Did this make much difference? For the phenomena of interest, how large and how meaningful is this size of result? (Continue your practice in NOT glossing over the sections with the p-values and confidence intervals!) And, of course, do the results answer the study’s questions? Question 7: Are the findings plausible? To what extent can we generalise these findings? As in other appraisal frameworks you need to examine the detail of the study to determine whether there are important differences between the context of the research study and your own context. Do you bias, confounding, or even chance (especially if the numbers of participants was small). For studies drawing causal conclusions, run them through the causality criteria in figure 1 (p.19). Question 8: Do the results of this study fit with other available evidence? think that the study participants and setting may be very different from your own caseload? Can you quantify the potential local benefits and / or harms? Question 10: Should policy or practice change as a result of the evidence contained in this study?

Consider evidence from other studies, of all types, for consistency of findings. A wellconducted systematic review would be particularly helpful. You may need to conduct (or even commission) a literature review and appraise the quantity and quality of the available evidence before you are able to assess this. Bear in mind costs and benefits; the issue in question will have to be particularly important (specific, relevant, timely, and with resource implications) for your service before you decide to invest resources in a more comprehensive review of the evidence base in this area. Question 9: Can the results be applied to the local population?

As with the other appraisal tools, we should evaluate the study’s contribution to the evidence base for local service provision. Does this study have implications for my practice, for that of my colleagues or more widely? Is there a further question to be asked, or more evidence needed, before I can answer this question? Always bear in mind that an individual observational study rarely provides sufficiently robust evidence to recommend changes to clinical practice or decisionmaking. However, for certain questions, observational studies provide the only evidence we can access. Recommendations from observational studies are always stronger when supported by other evidence. In a local journal club report on the ContiRamsden & Botting study (2007), the group concluded that the study added weight to a growing body of evidence showing a raised risk of negative mental health in young people with specific language impairment. However, since the majority of the group with specific language impairment did not have negative mental health symptoms, there is no simple, causal relationship between specific language impairment and mental health. There must be other factors at play. So, speech and language therapists need to be alert to the raised risk in young people with specific language impairment of anxiety and depression, either of which would have an impact on our clinical decision-making or on SLTP the success of our interventions. Jennifer Reid is a consultant speech and language therapist with NHS Fife, email Cartoons are by Fran,

Do you believe the findings? As with any research, a big effect is hard to ignore, but this is only true if the study design and methods were of a high enough standard, so any flaws you have identified need to be borne in mind. The sorts of things to consider are whether the results could have been unduly influenced by

References Bradford Hill, A. (1965) ‘The Environment and Disease: Association or Causation?’, Proceedings of the Royal Society of Medicine 58, pp.295-300. Conti-Ramsden, G. & Botting, N. (2008) ‘Emotional health in adolescents with and without a history of specific language impairment (SLI)’, Journal of Child Psychology and Psychiatry 49(5), pp.516–525. Public Health Research Unit (2006) Critical Appraisal Skills Programme. Available at: http://www. (Accessed 29 July 2011). von Elm, E., Altman, D.G., Egger, M., Pocock, S.J., Gøtzsche, P.C., Vandenbroucke, J.P. (2008) ‘The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies’, Journal of Clinical Epidemiology 61(4), pp.344-349.


Sign up to vote on this title
UsefulNot useful