January 03, 2010

Do younger pupils really ‘mimic the habits of obese children in older classes’? Answer - probably not!

A tweat posted by @GuardianEdu yesterday (2 January 2010) attracted my interest:

‘Younger pupils mimic habits of obese children in older classes http://bit.ly/7KUEez’

The link takes you to the full news article published today in The Observer (3 January 2010) that is based on research conducted by a team from Ontario, Canada. Here’s some of the key extracts from the Observer article:

Children at schools where older students are obese or otherwise overweight are significantly more likely to suffer weight problems themselves, researchers report. For each one per cent increase in the prevalence of obese students aged 16 to 18 years, the odds of a student at 14 to 16 years old attending that school also being overweight increased significantly.

"It was the one risk factor that held true across every school we looked at," said Dr Scott Leatherdale, the chair of research at Cancer Care Ontario and lead investigator with the School Health Action, Planning and Evaluation System. "Schools that had a large number of obese younger students were disproportionately likely also to have a high percentage of overweight older students. The association was completely consistent."

"It could be that younger students look up to older students, and so emulate their sedentary behaviour and bad eating habits and do not judge the older children's body shape," he said. "Or it could be that the school doesn't encourage enough physical activity among its students, and the older students' weight issues are an indication of that.”’

The research from which these findings are taken is due to be published in the Journal of Youth Adolesence. Fortunately, the journal article itself is available to download as an ‘online early’ article at:

Leatherdale, S. T. and Papadakis, S. ‘A multi-level examination of the association between older social models in the school environment and overweight and obesity among younger students’, Journal of Youth Adolesence, DOI 10.1007/s10964-009-9491-z

In this instance, the news article does seem to broadly reflect the findings as reported in the journal article. As the authors conclude in the article:

‘[T]he junior students in our sample were more likely to be overweight or obsese if they attended a school with a high prevalence of senior students who are obese. … For each 1% increase in the prevalence of obese senior students at a school, the odds of a junior student at that school being obese increased (OR 1.19, 95% CI 1.16-1.24, p < .001). … This finding is consistent with previous empirical research which suggests that characteristics of the school a student attends can have important impact on their weight status.’

Having decided that the prevalence of overweight and obese senior students represents a risk factor in terms of junior pupils being overweight or obese, the authors conclude by arguing for the need to use indicators like this to develop more targeted school-based intervention programmes.

The statistical analysis upon which this article is based need not overly concern us here. For those interested, the authors used multi-level binary logistic regression to consider what factors (pupil-level and school-level) were associated with the chances of a pupil being overweight and also obese. The analysis itself, involving a sample of 12,049 grades 9-12 pupils from 76 Ontario secondary schools, seems to be technically correct. However, and as is often the case, it is the interpretation of the results of this analysis that are problematic. In this case, you don’t need to be a statistician to be able to identify some of the problems relating to the interpretation of the results from this study. Here’s some of the key ones:

First, the authors place a lot of emphasis on the importance of the school environment in providing an environment that influences levels of obesity in pupils. And yet, when reading the “fine print” of the study, the authors report that schools could only account for 1.8% of the variation in levels of obesity between pupils. In other words, school-level factors themselves would seem to have only a marginal influence on whether children are obese or not (with between 98.2% of the variation between children being associated with non-school based factors). The only reason why this finding is statistically significant is because of the huge sample size (n=12,049) where any difference, however minor, is likely to be statistically significant. This is therefore a good example of where findings may be statistically significant but not practically significant.

Second, the authors claim that with every one percentage point increase in the proportion of senior students who are obese, this increases the odds of junior pupils being obese by a factor of 1.19 (i.e. increasing their chances by 19%). This seems to be quite a notable relationship. However, quoting odds ratios like this can be misleading, especially when you’re dealing with only a small absolute number of children. In this case, the proportions of pupils in each school who were found to be obese were small, with only 6.5% of children categorised as obese on average within each school. In this instance, what therefore seems to be large changes in the odds of children becoming obese will only reflect very small changes in the actual numbers of pupils (in this case, literally, a handful of pupils).

Third, and even putting the above points to one side, there is a more fundamental problem relating to inferring a causal relationship from a correlation. While it may be true that the proportions of junior pupils in a school who are obese is correlated with the proportions of senior pupils who are obese, we cannot assume that there is any direct (or, indeed, indirect) causal relationship between the two at all. Unfortunately, this does not stop the authors from hypothesising wildly about how junior pupils may be emulating their senior counterparts. This is just irresponsible and misleading. Moreover, and in this case, it is what also generated the headline in The Observer news article – a headline with absolutely no evidence to substantiate it at all.

Fourth, there is actually a much simpler – and far more plausible – explanation for this correlation that the authors remarkably do not seem to recognize. It is not surprising that the proportions of junior and senior pupils tend to be similar within individual schools given that they tend to come from the same neighbourhoods or catchment areas. In this sense, any between-school variations found are likely to be due to variations in the socio-demographic nature of the catchment areas of the schools rather than anything that the schools themselves are doing.

Fifth, and finally, the authors do not make any distinction between different types of school-level factors; in particularly, those that are related to the school itself (the school ethos, the curriculum taught etc.) and those that are actually related to the wider neighbourhood in which a school is located and thus are  not influenced by the school at all (i.e. socio-demographic  characteristics – including levels of obesity – that may vary from school to school but that are not caused by the schools as such but simply reflect their different catchment areas).

This is an important distinction to make. In this case, the absence of such a distinction has lead the authors to making misleading claims regarding the existing and direct influence of the school environment on levels of obesity. This is not to say that schools cannot play an important role in helping to promote healthier lifestyles and reduce levels of childhood obesity but this is a different issue. The authors are focusing here on attempting to identify the existing influences that schools have.

So, here is another example of where the basic point that should have been drilled into any undergraduate student taking a research methods course – that correlation does not necessarily imply causality – seems to have been lost. In this present case, we cannot simply blame the sensationalist and misleading reporting of a journalist. As we saw, the news article broadly reflected the findings from the journal article itself. It may be, therefore, that part of the problem here is that as the research is based on a rather advanced and complex form of statistical modeling then people are going to be less likely to feel qualified in assessing the validity of the claims being made and thus more likely just to take the key findings ‘on trust’. However, and as demonstrated above, this is a dangerous approach to take and can easily lead to important policy decisions being made on the basis of false evidence.

Ultimately, this is why the peer-review process is so important in, hopefully, helping to identify and remove some of the more blatant cases of poor quality research. With this in mind, the key question we are left with is how did this article, with the claims it is making, get through the peer-review process of the Journal of Youth Adolescence seemingly unimpeded? Did none of the reviewers (or the editors) recognize and/or raise any of the concerns highlighted above?

December 16, 2009

Further analysis of GCSE results in England by DCSF. Good or bad news? Depends who’s reporting. ... And the news might not be news at all.

The DCSF (Department for Children, Schools and Families) in England released a new report yesterday providing more detailed analyses of GCSE attainment in England by pupil characteristics (for the report see:  http://www.dcsf.gov.uk/rsgateway/DB/SFR/s000900/index.shtml). I haven’t had time to read it yet but three points emerge from news coverage of this report that are already worth commenting on.

The first is political spin. The figures seem to be good or bad depending on who you listen to. For the government, the figures are encouraging. A tweat from the @DCSF yesterday declared: “New GCSE figures: attainment gap between those on free school meals and their peers has narrowed”.

However, positive stories are never as newsworthy as negative ones and so here’s how @bbceducation reported the findings in their own tweat that came out within a few hours of the one from the DCSF: “Poor white teenage boys in England have slipped further behind other youngsters in their GCSE results

So, here’s the first lesson – never trust any press release or news story to give you the full picture! You will always need to look at the evidence for yourself. However, two further lessons emerge when you do begin looking at the evidence. Infact we need look no further than the statistics reported in the press releases/news items accompanying these tweats to begin to find these lessons.

Both tweats imply notable changes in performance and yet the actual changes reported (and remember the statistics in the press releases/news items are the ones that have been cherry-picked to back up their respective positions) are marginal. Take the finding on free school meals (FSM) for example. On the DCSF’s website, they report that the proportion of those pupils eligible for FSM gaining the expected level (five good passes at GCSE) rose by 3.4 percentage points over the last year. As they go onto claim, this is: “a faster improvement than the 3.1 percentage point rise for non-FSM pupils”.

BBC Education did little better in relation to their news story – this time choosing to emphasise the negative results. The gap between poor white boys (those in eligible for FSM) and other white boys (those not eligible for FSM) widened from 29.8 percentage points to 31.6 percentage points. A whopping 1.8 percentage point increase!

So here are the two further lessons from this cursory review of tweats and press releases/news stories. The first is the need for all those involved to be much clearer in their headline reporting of the actual size of any effects found. Most people won’t even go beyond the headlines and will thus simply be left with the impression that either things are getting better (the DCSF line) or worse (the BBC line) for young people from disadvantaged backgrounds. And yet in both cases, the change is marginal. Unfortunately, marginal changes are no good to politicians or the media.

The second lesson is the need to step back and look at trends over time. We can expect minor fluctuations in statistics year-by-year, simply due to random variation in the make-up of any particular cohort of school pupils. Without further information we have no way of knowing whether these (very minor) changes reported do actually represent an underlying trend or are simply random fluctuations.

So, the next thing I need to do – and what I’d advise everyone else to do as well – is to read the full report for myself; only then can we develop a more balanced view of what is going on and determine whether some or all of these findings are actually indicative of real trends at all or may just reflect random fluctuation.

December 06, 2009

Sneak preview of the contents page of the next issue of Effective Education (Vol. 1, No. 2, 2009)

The next issue of Effective Education (Volume 1, Issue 2) is now in press and due for publication before the end of this month (December 2009). For more details see: http://www.informaworld.com/effectiveeducation

EFFECTIVE EDUCATION

Issue 2, Volume 1, 2009

Unobserved but not unimportant: The effects of unmeasured variables on causal attributions, Robert Coe (University of Durham, UK)

The Effectiveness of the Success for All Reading Programme on Primary EAL pupils in Hong Kong, Alan Cheung (John Hopkins University, USA)

How First Year Students perceive the Fit between Secondary and University Education: the Effect of  Teaching Approaches, M. Torenbeek, E.P.W.A. Jansen & W.H.A. Hofman (University of Groningen, The Netherlands)

The ‘Re-imagining’ of Evidence under New Labour: policy and practice in education in uncertain times, Robert Hulme (University of Chester, UK) and Moira Hulme (University of Glasgow, UK)

Quantitative measures of respect and social inclusion in children: Overview and recommendations, Colin G. Tredoux (University of Cape Town, South Africa), Noraini M. Noor (International Islamic University of Malaysia, Malaysia) and Lisa de Paulo (University of Cape Town, South Africa)

Research shows that using blogs, texting and social networking sites improves children’s literacy skills - or does it?

A news item appeared on the BBC website this week with the headline: ‘Children who use technology are “better writers”‘ (see:  http://news.bbc.co.uk/1/hi/technology/8392653.stm). The claim is based on a survey of 3,001 children aged 9-16 commissioned by the National Literacy Trust that explored their use of new communication technologies such as: blogs, texting and social network sites. From the news item it appears that the main finding is that: ‘of the children who neither blogged nor used social network sites, 47% rated their writing as "good" or "very good", while 61% of the bloggers and 56% of the social networkers said the same.’

In response to these findings, Jonathan Douglas, Director of the National Literacy Trust, was quoted as saying: ‘Our research suggests a strong correlation between kids using technology and wider patterns of reading and writing. […] Engagement with online technology drives their enthusiasm for writing short stories, letters, song lyrics or diaries.’ Moreover, and in response to the claim that the use of blogging, texting and social networking sites damages literacy, he went onto state that: ‘Our research results are conclusive - the more forms of communications children use the stronger their core literary skills.’

Now, I’ve not had time to read the full report as yet but there are at least two problems with the conclusions being drawn above:

1. From the news item, it appears that the survey did not actually measure children’s literacy skills. Rather it focused simply on their own self-perceptions of how good their writing is. At the very best, therefore, all that can be claimed here is that the more that children use such communication technologies, the more that they are likely to have positive self-perceptions of their literacy skills.

2. While there may well be a ‘strong correlation’ between these two things, it is impossible without further evidence to make any claims about what may be causing what. It is certainly premature for the Director of the National Literacy Trust to conclude that they have ‘conclusive results’ showing that it is children’s use of such technologies that increases their core literacy skills. While this may be the case, there is also an equally plausible explanation: that children with greater literacy skills (or, in their case, a greater perception of their literacy skills) are more likely to then use literacy-based technologies such as blogging, texting and social network sites more. Moreover, there may not be any direct relationship between the two at all. It may be, for example, that there is some other factor - for example a child’s socio-economic background - that has an influence on both literacy skills and use of technology. Thus, the more affluent a child’s background, the more likely they are to have higher literacy skills and also to have greater access to, and thus make greater use of, such technologies. It is not inconceivable, therefore, that literacy skills and technology use are completely unrelated.

This second point – that correlation does not equal causality – is a fundamental one that students should have learnt from any basic research methodology course. The fact that a well-respected organization such as the National Literacy Trust can be confusing the two in their own research findings is a poor reflection on the state of educational research in the UK. Moreover, the fact that the BBC’s own ‘technology reporter’ can simply report such claims uncritically, as in this news item, does nothing to help improve the situation.

October 31, 2009

Governments, evidence and politics – some reflections on the UK Govt’s sacking of its chief scientific drugs adviser

There are plenty of examples from education of the UK government introducing major policy initiatives without any evidence to suggest whether they are going to work or not. Indeed, there are also examples where policies have been introduced despite overwhelming evidence to the contrary. However, the decision of the government to sack its chief scientific drugs adviser – Professor David Nutt – is perhaps the most stark recent example of the precarious place of evidence-based policy.

Professor Nutt’s position became untenable after he accused government ministers of "devaluing and distorting scientific evidence" regarding the misuse of illicit drugs after the government decided to reclassify cannabis from a Class C to a Class B drug against the advice of its Advisory Council on the Misuse of Drugs. Reacting to his sacking, and quoted in the Guardian (see: http://bit.ly/3CCi2W), Professor Nutt explained that the Prime Minister had ‘made up his mind’ to reclassify cannabis despite evidence to the contrary: "Gordon Brown comes into office and, soon after that, he starts saying absurd things like cannabis is lethal... it has to be a class B drug. He has made his mind up."We went back, we looked at the evidence, we said, 'No, no, there is no extra evidence of harm, it's still a class C drug.' He said, 'Tough, it's going to be class B'. [...]  He is the first Prime Minister, this is the first government, that has ever in the history of the Misuse of Drugs Act gone against the advice of its scientific panel.""And then it did it again with ecstasy and I have to say it's not about [me] overstepping the line, it's about the government overstepping the line. They are making scientific decisions before they've even consulted with their experts.”

There are two points worth drawing out from this example. The first, clearly, is the worrying trend of governments (in plural, let’s not just blame this on the present administration) to play fast and free with evidence. The fact that the government can show such disregard for the available evidence, and for its own scientific advisers, is deeply worrying. Let’s be clear, governments have always used evidence selectively; happy to quote it and take the moral high ground when it fits in with its latest policy initiative and yet equally happy to blatantly ignore it when it doesn’t suit. Witness, for example, the government’s reaction just last week to the publication of the findings of the Cambridge Review of Primary Education (see: http://bit.ly/J0xN).

However, and here’s the second point, it is important that in our concern  with this latest sacking, we don’t find ourselves occupying the equally untenable position whereby government policy is simply based on evidence with political influences excised completely. After all, policy-making is a complex and inherently political process where value judgements need to be made. The use of evidence is only one component of this process.Take, for example, the issue of boys’ underachievement in school. Let’s assume we have strong and rigorous evidence that a particular classroom-based approach can significantly increase boys’ educational attainment scores. While we may have the evidence that this approach is effective (in terms of increasing boys’ attainment), there remain important and legitimate political considerations to address before we simply press ahead and roll out the approach across all schools.

For example, what is the effect of this approach on girls and their educational attainment? It could be that the approach is based upon ‘masculinising’ the curriculum and classroom to make education more appealing to boys. However, this may then alienate girls and thus adversely effects their attainment. Moreover, and in this case, we also need to ask what types of masculinity are being promoted for boys to engage with and aspire to? While such forms of masculinity may be proven to increase boys’ educational attainment, they may have adverse consequences for other aspects of their lives, including their socio-emotional development.

The point is that while governments need to make best use of evidence, they also need to act politically (and actually can’t avoid doing so). However, even political decisions need to be based on evidence rather than mere unsubstantiated belief. In the example above, we would want some evidence of how the particular approach to raising boys’ attainment was actually impacting upon girls. Similarly, we’d also want to ascertain whether there is evidence to support our concerns that the dominant forms of masculinity being promoted were adversely impacting on other aspects of the boys’ development.

Of course good evaluative designs of educational interventions not only focus on the intended effects of the initiative in question but also their potentially unintended (and possibly adverse) effects. In this sense, the design of an evaluation and the outcomes to be measured need to be informed by theoretical and political considerations. It is in this sense that while political decisions need to be informed by evidence, the creation of evidence also needs to be informed by political decisions. Politics and evidence are inherently related.

September 26, 2009

Walden University’s College of Education produces teachers who are more effective in improving pupils’ reading fluency. Really?

A glossy advertisement on the back of the latest issue of Educational Researcher (the official journal of the American Educational Research Association, AERA, no less) grabbed my attention. Apparently, and as the headline exclaims: “New study shows that students of Walden teachers make greater gains in reading fluency.”

The claim is based upon research commissioned by Walden University’s Richard W. Riley College of Education and Leadership that compared the effectiveness of teachers who had graduated with their master's degree compared to that of teachers who had graduated with master's degrees elsewhere. As the glossy advert went onto explain:

“In a unique collaboration with Tacoma Public Schools in Tacoma, Washington, researchers compared the reading fluency of students taught by Walden Master’s-educated teachers with students taught by non-Walden Master’s-educated teachers. The study revealed that students of teachers who graduated from Walden’s Elementary Reading and Literacy programme had gains in reading fluency that were on average 4.8 words per minutes, or 14%, greater than students of non-Walden Master’s-educated teachers.”

This is a huge claim. It is not surprising that Walden's College of Education chose to buy a glossy advert on the back of the prestigious AERA magazine to publicise it. What College wouldn’t want to let the world know that their masters degree is proven to be more effective than others? Students will clearly want to graduate from Walden given that a Walden degree is evidence that you are a more effective teacher. The advert encourages readers to visit their website at http://www.WaldenU.edu/tacoma for more information on the research. Fortunately, the full report of the research is also available to download from the website and can also be downloaded directly from here: http://bit.ly/2IZDLm 

So, are the claims in the advertisement true? Well, the research that lies behind these findings is based on a relatively small sample (the main element of which compares the reading scores of children taught by just 35 graduates from Walden with those taught by 35 graduates of other programmes). However, the findings are statistically significant so we can be sufficiently confident that the differences between the two groups are unlikely to have occurred by chance. Moreover, the researchers use appropriate statistical techniques – hierarchical linear modelling – for analysing the data they have (nearly 4,000 pupils clustered in 70 classes).

Interestingly, the researchers are a little more cautious in their own interpretation of the findings. As they explain in the executive summary: “Limitations on the research design do not allow for a claim of causation between the completion of the Walden degree and teaching effectiveness. However, [the findings] ... provide suggestive evidence that the program may indeed improve the effectiveness of elementary literacy instruction” (p. 3).Of course everything rests on these ‘limitations’ that, not surprisingly, fail to get a mention in the glossy advert and that do not seem to be considered by the researchers to be that serious to stop them claiming that they have “suggestive evidence” that the Walden programme “is making teachers more effective at reading and language arts instruction” (p. 21). Well, here’s the main limitations, taken directly from the research report (pp. 22-23):
 
  1. “While we were able to use matching to control for differences in teacher experience between the Walden and the control group samples, we did not have information on teachers’ credentials, prior education (i.e., bachelor’s degree institution and major field of study), or professional development/training experiences. It is plausible that any differences in student reading gains are not due to Walden’s M.S. in Education program, but due to systematic differences in these other factors between Walden teachers and the comparison group teachers.”
  2. The inference from the estimated effect is the difference in earning a Walden M.S. in Education degree with a specialization in Elementary Reading and Literacy relative to earning any other type of master’s degree (as represented in the control group). It is plausible that teachers who seek out specialized degrees in elementary literacy instruction are more likely to be successful at reading instruction than those who seek out degrees in other areas. In fact, they may pursue the degree because they have higher self-efficacy as it relates to literacy instruction. Consequently, the estimated effect of the Walden program may stem from this self-selection and the unobserved differences in reading instruction effectiveness between those who sought out the ERL program and those who did not.
  3. The samples were too small to control for “school effects” (i.e., the effects on student achievement that are common to all students within a given school). Therefore, it is possible that the difference in performance between Walden teachers and non-Walden teachers is due to the programs and policies used in the schools where they teach rather than to their own classroom instruction.
  4. "While we were able to control for some student demographic characteristics, there were a number of unobserved factors that might also explain these differences, for example students’ socioeconomic status or home circumstances."
In relation to three of the four limitations (1, 3 and 4), these are significant but are to be expected from such a research design where it is simply not possible for students to be  randomly assigned to the main and control groups. As the researchers quite rightly point out, the positive gains found among the pupils taught by Walden graduates could be due to a range of unidentified systematic differences between these graduates and their comparators. This is why the researchers quite rightly state that it is not possible to make “a claim of causation between the completion of the Walden degree and teaching effectiveness.” It is also why they also present their research as “suggestive evidence”.

All of the above is quite reasonable and to be expected with a pragmatic evaluation of this type. However, it is the second limitation that is much more problematic and represents a fundamental flaw in the research design. Interestingly, it is hidden away in the body of the report and not mentioned at all either in the Executive Summary or the main Conclusions. Not surprisingly, it doesn’t feature at all in the glossy advertisement.And yet, this second limitation completely undermines the validity of the claims being made. In essence, we’re not comparing “like with like” at all. Rather, we’re comparing students that have taken a master’s degree with a specialisation in elementary reading and literacy with students who have simply taken generic master’s degrees. There is thus no way of knowing whether the additional gains made in reading fluency among the pupils taught by the Walden graduates (which are actually fairly small by the way and not consistent across year groups)  were due to the effectiveness of the Walden programme itself (i.e. compared to other specialist elementary reading and literacy master’s programmes) or the fact that it is due simply to the students having had more specialist training in elementary reading and literacy.

This is a crucial point. Remember that the headline in the glossy advert claimed that: “New study shows that students of Walden teachers make greater gains in reading fluency.” This is clearly misleading as it encourages the reader to believe that there is evidence that the Walden programme is more effective than other comparable specialist programmes. As it is, the study provides no evidence at all that Walden teachers are any more effective in producing gains in reading fluency than teachers with equivalent specialist qualifications from any other College.

     

September 19, 2009

Why does the UK Government, with £6m at its disposal, also find it so difficult to do a simple evaluation?

This week, the Home Office published the findings of the first phase of its £6 million evaluation of Blueprint, a multi-component school-based drug education programme targeted at secondary school children in Years 7 and 8. The reports are available at: http://bit.ly/22SLI

With such resources at its disposal one would expect a rigorous evaluation with some clear evidence of whether the programme is effective or not (initially in relation to children’s levels of drug awareness and, in the longer-term, their attitudes and behaviour). After all, undertaking an evaluation isn’t rocket science. You invite a number of schools to take part, you randomly split them into two groups – one that will deliver the programme and one that will act as a control/comparison group – and then you just collect some data from all the children before the programme starts and then again at the end. If the children in the programme schools have shown progress (in terms of awareness, attitudes and/or behaviour) above and beyond those in the control group then you have strong evidence that the programme has been effective.

Unfortunately, the research team responsible for the evaluation of the Blueprint programme failed to follow even this simple design. They were advised to use 50 schools in order to generate sufficient data to detect any effects that might be associated with the programme. However, they felt that the use of such a sample size was “a very large step for an improvement in the limited UK evidence based” (p. 32) and thus, presumably, a step too far. This is just nonsense. Only this summer we (the Centre for Effective Education) published the results of a randomised controlled trial of a pupil mentoring scheme involving 50 schools and over 800 children (the full report is available from our website at: http://www.qub.ac.uk/cee). Moreover, we’re just writing up another trial involving 80 preschool settings and 1,500 3-4 year old children and their parents.

Instead, the research team referred to guidance from the Medical Research Council that, in the evaluation of complex interventions, a “cumulative approach” is required “to understanding how outcomes are achieved, moving from theory, to modelling, to an exploratory trial to a definitive trial” (p. 32). This is indeed an eminently sensible and pragmatic approach to take and one we have also adopted as well. Most recently we have just completed an “efficacy test” of an early childhood programme in 10 preschool playgroups (5 delivering the pilot programme and 5 acting as a control group).

However, and curiously, the “exploratory trial” the research team chose to conduct for the Blueprint programme involved 30 schools. Clearly too large for a proper exploratory trial and insufficient for a full-blown study. Unfortunately, the problems don’t just stop here. Inexplicably, the research team decided to only select six of the 30 schools to act as a comparison (control) group and then decided not to randomly select them but to hand-pick them. As it turned out, the characteristics of these six comparison schools proved to be significantly different to the remaining 23 schools (one dropped out) delivering the programme and so they cannot now be used for any meaningful comparisons at all.The catalogue of errors involved in this trial are well outlined by Ben Goldacre in the latest entry in his commendable “Bad Science” column in The Guardian, see: http://bit.ly/ECcq5.  It is just astounding that the Home Office could have ended up with such a half-baked evaluation, especially given the amount of funding they set aside for this and the clear advice they were given as well as the expertise at their disposal (see Goldacre’s column for more details).

I have previously asked the question “why some educational researchers find it so difficult to do a simple evaluation” (see: http://bit.ly/6tfJ). Then, I used an example of a small evaluation conducted by a couple of educational researchers that was reported at the BERA Conference. That was bad enough; reflecting, as I argued, a more general lack of competence among sections of the British educational research community in conducting simple evaluations of the effectiveness of educational programmes and interventions. However this present example is simply in a different league. What hope can we have for the future when even the New Labour government – the self-styled proponents of evidence-based policy – can’t even undertake a simple evaluation for themselves?

 

September 07, 2009

Why do some educational researchers find it so difficult to do a simple evaluation?

Here's an example of an evaluation of an educational programme taken from a paper presented last week at the British Educational Research Association Annual Conference at a session I attended. To maintain anonymity I will keep the description of the study fairly vague. The point is not to be critical of the specific authors of the paper, for they are far from the only ones to adopt this type of approach, but to raise a more general point about the nature of educational research.

The paper described what was actually a very interesting educational initiative that attempted to motivate children through the use of a particular strategy. The presenters clearly knew their subject area and provided a convincing case theoretically for why the use of that strategy may help to motivate children. They also described a pilot scheme where this approach was trialled for a short period of time. However, the evaluation that was undertaken of the effectiveness of the strategy, and that the presenters then went onto report, was unfortnately probably one of the worst examples of an evaluation I have seen.

Part of the evaluation involved the teachers rating the children's levels of motivation into four categories (‘very motivated’, ‘engaged’, ‘somewhat engaged’ and ‘negative’) for the 63 who participated in the pilot scheme. The results were presented in a table, reproduced below exactly as it appeared in the paper, with the children being broken down by their entering grade (year group):


Entering     |      Very       |      Engaged   |    Somewhat   |   Negative
Grade        | Motivated    |                      |    Engaged     |
------------------------------------------------------------------------------------------------------
2               |        5         |          7         |           3         |          3
3 or 4        |        8         |         13         |           4         |          1
5 or 6        |        2         |          6          |           4         |          1
7+             |                  |          1          |           1         |          4
------------------------------------------------------------------------------------------------------
Total          |      15        |          7          |           12        |          9
 

The presenters interpreted these data as follows: “The teachers’ descriptions indicated that 15 of them were very motivated by [using the strategy], 27 were somewhat motivated, 12 were not very engaged, and 9 found it to be a negative experience. In general, in this population, students aged 8-11 years-old [i.e. those in entry grades 3 or 4] were more likely to be motivated by [the strategy] than younger or older students.”

Now, there are three main problems with this interpretation of the data that should be apparent to anyone who has done even an elementary course in educational research methods:

  1. There’s no pre-test scores. How, therefore, can we tell whether the children’s levels of motivation have actually changed at all during the course of the pilot scheme?
     
  2. There’s no comparison or control group. Even if we had pre-test scores and we could see that the children’s motivations had increased over the course of the pilot scheme, how do we know that this improvement was down to them participating in the pilot scheme and not due to something else?
     
  3. As regards the claim that the use of the strategy was more effective for the middle band of children (i.e. those with entering grades 3-4), how do we know that the differences between the differing bands of children were due to the programme rather than just down to random variation?

As it happens, the query raised in the last point can be answered very quickly with the use of a simple statistical test (a Fisher’s exact test in this instance). In this case, and by conflating the oldest two bands so that we are comparing the ‘3-4’ group with their younger counterparts (‘2’) and older counterparts (‘5+’), such a test gives us a significance level of p=0.275. What this tells us, in essence, is that there’s a fair chance (a 27.5% chance to be precise) that there are actually no underlying age differences and that the differences in this present sample are simply due to random variation. With odds like this, how can we have any confidence in these claims?

The presenters attempted to justify their approach by arguing that it is difficult to isolate the effects of the strategy used and that it was not possible to organize and conduct a randomized controlled trial. However, such arguments are difficult to defend. Infact the present pilot scheme, that ran for just a few weeks, was ideally placed to have been evaluated using a small, pragmatic trial. For example, the children taking part could have been randomly organized into two groups, with one group participating in the scheme initially and the other group acting as a control but possibly getting to participate in the scheme at a later stage (i.e. being a ‘delayed control group’). This way, nobody loses out in the long run. Then, with the children organized into two groups, they just needed to have their motivations tested at the beginning of the pilot scheme and then again at the end. Et voila: a pragmatic randomized trial that would provide strong evidence of whether this pilot scheme was being effective in increasing the motivation of the children taking part.

So if randomized trials are so simple to organize and run then why do researchers still opt, with depressing frequency, for flawed evaluative designs like this? I have offered some possible answers to this question in my editorial for the first issue of the new journal Effective Education which can be accessed free online at: http://www.informaworld.com/effectiveeducation Whatever the reason, it is surely a telling indictment that studies like the one described here are still being produced when so much commitment has been expressed, and efforts made, to building research capacity in education. Teaching the basics of evaluative research designs should be a core element of all undergraduate and postgraduate research training. After all, doesn’t the question of whether an educational programme is effective or not represent one of the basic and fundamental questions that educational research should be seeking to answer? The fact that educational researchers are routinely failing to receive basic training in simple evaluative techniques is therefore indefensible.

CEE team win national prize for poster presentation

A research team from the Centre for Effective Education has won the prize for ‘best poster’ at the British Educational Research Association Annual Conference. The prize, sponsored by the CfBT Education Trust was awarded to Professor Paul Connolly, Dr Emma Larkin and Dr Susan Kehoe for their poster reporting the findings of the evaluation they have recently completed of the effects of the children’s television series, Sesame Tree, on young children’s attitudes and awareness in Northern Ireland.

The BERA Conference is the largest annual gathering of educational researchers within the UK and this year attracted over 800 delegates at its meeting at the University of Manchester between 2-5 September. The prize was awarded during a packed plenary session and the poster was particularly commended for “excelling at communicating the findings of a complex research study in a clear and highly accessible way for policy makers and practitioners.”

Speaking of the prize, Professor Connolly said: “we were delighted to have received this prestigious award. Much of the credit for the poster is due to Emma and Susan who spent a lot of time planning very carefully how to present the findings.”

He went onto add: “This prize means a lot to us at the Centre for Effective Education where we pride ourselves on undertaking strong and scientifically-robust research but where we are also committed to ensuring that the findings are reported in an accessible and relevant way so that they contribute to policy and practice.”

The poster reported on two, linked, studies that were conducted during 2008 into the effects of Sesame Tree – the Northern Ireland version of the popular US-based Sesame Street – on the attitudes and awareness of 5-6 year olds. The first studied comprised a cluster randomized controlled trial involving 20 primary schools and 440 children whereas the second study comprised a naturalistic longitudinal survey of a separate sample of 697 children from 37 primary schools selected randomly from across Northern Ireland.

The prize-winning poster will be on display shortly in the reception area of the School of Education (69-71 University Street). To download a copy of the handout associated with the poster please follow this link: http://www.paulconnolly.net/publications/pdf_files/SesameBeraPoster.pdf

September 01, 2009

CEE researchers to present five papers at BERA

Researchers from the Centre for Effective Education are due to present five papers at the British Educational Research Association Annual Conference to be held on 2-5 September at the University of Manchester. BERA is the largest gathering of educational researchers within the UK, attracting up to 1,000 delegates. The papers to be presented report the findings of four different studies that the Centre has been running over the last year:

  • “The effects of the children’s television series Sesame Tree on young children’s social attitudes and cultural awareness” Paul Connolly, Emma Larkin and Susan Kehoe (12.30-2.30pm Thursday and 2.00-3.00pm Friday 4 September, Poster Presentation, University Place Theatre Foyer – Level 1)
  • “A qualitative evaluation of a mentoring reading programme for 9-10 year olds in Northern Ireland” Oscar Odena, Sarah Miller and Susan Kehoe (4.30-6.00pm, Thursday 3 September, Session 4.17, room: University Place 3.205)
  • “Educational attainment, well being and economic disadvantage: a survey of primary school pupils in Northern Ireland” Sarah Miller, Laura Lundy and Lisa Maguire (9.00-10.30am, Friday 4 September, Session 5.19, Room: University Place 3.212)
  • “A need to belong: an epidemiological study of the experiences and needs of minority ethnic children in Northern Ireland” Liam O’hare, Andy Biggart and Paul Connolly (3.00-4.30pm, Friday 4 September, Session 6.19, Room: Roscoe 3.4)
  • “The place of randomised controlled trials in educational research: a case study” Paul Connolly and Sarah Miller (9.00-10.30am, Saturday 5 September, Session 8.05, Room: Roscoe 3.2)

For more information on the BERA Conference, and to view the full programme, please visit: http://www.beraconference.co.uk/scipro.html For more information on any of the papers listed above, please contact the lead author. Their contact details can be found on the Centre website at: http://www.qub.ac.uk/cee