Where they are feasible, randomised trials are generally the most reliable tool we have for finding out which of two interventions works best. We simply take a group of children, or schools (or patients, or people); we split them into two groups at random; we give one intervention to one group, and the other intervention to the other group; then we measure how each group is doing, to see if one intervention achieved its supposed outcome any better.
This is how medicines are tested, and in most circumstances it would be regarded as dangerous for anyone to use a treatment today, without ensuring that it had been shown to work well in a randomised trial. Trials are not only used in medicine, however, and it is common to find them being used in fields as diverse as web design, retail, government, and development work around the world.
For example, there was a longstanding debate about which of two competing models of “microfinance” schemes was best at getting people out of poverty in India, whilst ensuring that the money was paid back, so it could be re-used in other villages: a randomised trial compared the two models, and established which was best.
At the top of the page at Wikipedia, when they are having a funding drive, you can see the smiling face of Jimmy Wales, the founder, on a fundraising advert. He’s a fairly shy person, and didn’t want his face to be on these banners. But Wikipedia ran a randomised trial, assigning visitors to different adverts: some saw an advert with a child from the developing world (“she could have access to all of human knowledge if you donate…”); some saw an attractive young intern; some saw Jimmy Wales. The adverts with Wales got more clicks and more donations than the rest, so they were used universally.
It’s easy to imagine that there are ways around the inconvenience of randomly assigning people, or schools, to one intervention or another: surely, you might think, we could just look at the people who are already getting one intervention, or another, and simply monitor their outcomes to find out which is the best. But this approach suffers from a serious problem. If you don’t randomise, and just observe what’s happening in classrooms already, then the people getting different interventions might be very different from each other, in ways that are hard to measure.
For example, when you look across the country, children who are taught to read in one particularly strict and specific way at school may perform better on a reading test at age 7, but that doesn’t necessarily mean that the strict, specific reading method was responsible for their better performance. It may just be that schools with more affluent children, or fewer social problems, are more able to get away with using this (imaginary) strict reading method, and their pupils were always going to perform better on reading tests at age 7.
This is also a problem when you are rolling out a new policy, and hoping to find out whether it works better than what’s already in place. It is tempting to look at results before and after a new intervention is rolled out, but this can be very misleading, as other factors may have changed at the same time. For example, if you have a “back to work” scheme that is supposed to get people on benefits back into employment, it might get implemented across the country at a time when the economy is picking up anyway, so more people will be finding jobs, and you might be misled into believing that it was your “back to work” scheme that did the job (at best, you’ll be tangled up in some very complex and arbitrary mathematical modelling, trying to discount for the effects of the economy picking up).
Sometimes people hope that running a pilot is a way around this, but this is also a mistake. Pilots are very informative about the practicalities of whether your new intervention can be implemented, but they can be very misleading on the benefits or harms, because the centres that participate in pilots are often different to the centres that don’t. For example, job centres participating in a “back to work” pilot might be less busy, or have more highly motivated staff: their clients were always going to do better, so a pilot in those centres will make the new jobs scheme look better than it really is. Similarly, running a pilot of a fashionable new educational intervention in schools that are already performing well might make the new idea look fantastic, when in reality, the good results have nothing to do with the new intervention.
This is why randomised trials are the best way to find out how well a new intervention works: they ensure that the pupils or schools getting a new intervention are the same as the pupils and schools still getting the old one, because they are all randomly selected from the same pool.
This excerpt from Test Learn Adapt