« CEE team win national prize for poster presentation | Main | Why does the UK Government, with £6m at its disposal, also find it so difficult to do a simple evaluation? »

Why do some educational researchers find it so difficult to do a simple evaluation?

Here's an example of an evaluation of an educational programme taken from a paper presented last week at the British Educational Research Association Annual Conference at a session I attended. To maintain anonymity I will keep the description of the study fairly vague. The point is not to be critical of the specific authors of the paper, for they are far from the only ones to adopt this type of approach, but to raise a more general point about the nature of educational research.

The paper described what was actually a very interesting educational initiative that attempted to motivate children through the use of a particular strategy. The presenters clearly knew their subject area and provided a convincing case theoretically for why the use of that strategy may help to motivate children. They also described a pilot scheme where this approach was trialled for a short period of time. However, the evaluation that was undertaken of the effectiveness of the strategy, and that the presenters then went onto report, was unfortnately probably one of the worst examples of an evaluation I have seen.

Part of the evaluation involved the teachers rating the children's levels of motivation into four categories (‘very motivated’, ‘engaged’, ‘somewhat engaged’ and ‘negative’) for the 63 who participated in the pilot scheme. The results were presented in a table, reproduced below exactly as it appeared in the paper, with the children being broken down by their entering grade (year group):


Entering     |      Very       |      Engaged   |    Somewhat   |   Negative
Grade        | Motivated    |                      |    Engaged     |
------------------------------------------------------------------------------------------------------
2               |        5         |          7         |           3         |          3
3 or 4        |        8         |         13         |           4         |          1
5 or 6        |        2         |          6          |           4         |          1
7+             |                  |          1          |           1         |          4
------------------------------------------------------------------------------------------------------
Total          |      15        |          7          |           12        |          9
 

The presenters interpreted these data as follows: “The teachers’ descriptions indicated that 15 of them were very motivated by [using the strategy], 27 were somewhat motivated, 12 were not very engaged, and 9 found it to be a negative experience. In general, in this population, students aged 8-11 years-old [i.e. those in entry grades 3 or 4] were more likely to be motivated by [the strategy] than younger or older students.”

Now, there are three main problems with this interpretation of the data that should be apparent to anyone who has done even an elementary course in educational research methods:

  1. There’s no pre-test scores. How, therefore, can we tell whether the children’s levels of motivation have actually changed at all during the course of the pilot scheme?
     
  2. There’s no comparison or control group. Even if we had pre-test scores and we could see that the children’s motivations had increased over the course of the pilot scheme, how do we know that this improvement was down to them participating in the pilot scheme and not due to something else?
     
  3. As regards the claim that the use of the strategy was more effective for the middle band of children (i.e. those with entering grades 3-4), how do we know that the differences between the differing bands of children were due to the programme rather than just down to random variation?

As it happens, the query raised in the last point can be answered very quickly with the use of a simple statistical test (a Fisher’s exact test in this instance). In this case, and by conflating the oldest two bands so that we are comparing the ‘3-4’ group with their younger counterparts (‘2’) and older counterparts (‘5+’), such a test gives us a significance level of p=0.275. What this tells us, in essence, is that there’s a fair chance (a 27.5% chance to be precise) that there are actually no underlying age differences and that the differences in this present sample are simply due to random variation. With odds like this, how can we have any confidence in these claims?

The presenters attempted to justify their approach by arguing that it is difficult to isolate the effects of the strategy used and that it was not possible to organize and conduct a randomized controlled trial. However, such arguments are difficult to defend. Infact the present pilot scheme, that ran for just a few weeks, was ideally placed to have been evaluated using a small, pragmatic trial. For example, the children taking part could have been randomly organized into two groups, with one group participating in the scheme initially and the other group acting as a control but possibly getting to participate in the scheme at a later stage (i.e. being a ‘delayed control group’). This way, nobody loses out in the long run. Then, with the children organized into two groups, they just needed to have their motivations tested at the beginning of the pilot scheme and then again at the end. Et voila: a pragmatic randomized trial that would provide strong evidence of whether this pilot scheme was being effective in increasing the motivation of the children taking part.

So if randomized trials are so simple to organize and run then why do researchers still opt, with depressing frequency, for flawed evaluative designs like this? I have offered some possible answers to this question in my editorial for the first issue of the new journal Effective Education which can be accessed free online at: http://www.informaworld.com/effectiveeducation Whatever the reason, it is surely a telling indictment that studies like the one described here are still being produced when so much commitment has been expressed, and efforts made, to building research capacity in education. Teaching the basics of evaluative research designs should be a core element of all undergraduate and postgraduate research training. After all, doesn’t the question of whether an educational programme is effective or not represent one of the basic and fundamental questions that educational research should be seeking to answer? The fact that educational researchers are routinely failing to receive basic training in simple evaluative techniques is therefore indefensible.

TrackBack

TrackBack URL for this entry:
http://www.paulconnolly.net/blog-mt/mt-tb.fcgi/5


Hosting by Yahoo!

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)