« The Soccer Theory of Globalization | Main | Africans already got the idea: “Africa does not need strong men, it needs strong institutions” »

Development Experiments: Ethical? Feasible? Useful?

A new kind of development research in recent years involves experiments: there is a “treatment group” that gets an aid intervention (such as a de-worming drug for school children), and a “control group” that does not. People are assigned randomly to the two groups, so there is no systematic difference between the two groups except the treatment. The difference in outcomes (such as school attendance by those who get deworming vs. those who do not) is a rigorous estimate of the effect of treatment. These Randomized Controlled Trials (RCTs) have been advocated by leading development economists like Esther Duflo and Abhijit Banerjee at MIT and Michael Kremer at Harvard.

Others have criticized RCTs. The most prominent critic is the widely respected dean of development research and current President of the American Economics Association, Angus Deaton of Princeton, who released his Keynes lecture on this topic earlier this year, “Instruments of development: Randomization in the tropics, and the search for the elusive keys to economic development.” Dani Rodrik and Lant Pritchett have also criticized RCTs.

To drastically oversimplify and boil down the debate, here are some criticisms and responses:

1. What are you doing experimenting on humans?

Denying something beneficial to some people for research purposes seems wildly unethical at first. RCT defenders point out that there is never enough money to treat everyone and drawing lots is a widely recognized method for fairly allocating scarce resources. And when you find something works, it will then be adopted and be beneficial to everyone who participated in the experiment. The same issues arise in medical drug testing, where RCTs are mainly accepted. Still, RCTs can cause hard feelings between treatment and control groups within a community or across communities. Given the concerns of this blog with the human dignity of the poor, the researchers should be at least be careful to communicate to the individuals involved what they are up to and always get their permission.

2. Can you really generalize from one small experiment to conclude that something “works”?

This is the single biggest concern about what RCTs teach us. If you find that using flipcharts in classrooms raises test scores in one experiment, does that mean that aid agencies should buy flipcharts for every school in the world? Context matters – the effect of flipcharts depends on the existing educational level of students and teachers, availability of other educational methods, and about a thousand other things. Plus, implementing something in an experimental setting is a lot easier than having it implemented well on a large scale. Defenders of RCTs say you can run many experiments in many different settings to validate that something “works.” Critics worry about the feeble incentives for academics to do replications, and say we have little idea how many or what kind of replications would be sufficient to establish that something “works.”

3. Can you find out “what works” without a theory to guide you?

Critics say this is the real problem with issue #2. The dream of getting pure evidence without theory is usually unattainable. For example, you need a theory to guide you as to what determines the effect of flipcharts to have any hope of narrowing down the testing and replications to something manageable. The most useful RCT results are those that confirm or reject a theory of human behavior. For example, a general finding across many RCTs in Africa is that demand for free life-saving products collapses once you charge a price for them (even a low subsidized price). This refutes the theory that fully informed people are rationally purchasing low cost medical inputs to improve their health and working capacity. This would usefully lead to further testing of whether the problem is lack of information or the assumption of perfect rationality (the latter is increasingly questioned for rich as well as poor people).

4. Can RCTs be manipulated to get the “right” results?

Yes. One could search among many outcome variables and many slices of the sample for results. One could investigate in advance which settings were more likely to give good results. Of course, scientific ethics prohibit these practices, but they are difficult to enforce. These problems become more severe when the implementing agency has a stake in the outcome of the evaluation, as could happen with an agency whose program will receive more funding when the results are positive.

5. Are RCTs limited to small questions?

Yes. Even if problems 1 through 4 are resolved, RCTs are infeasible for many of the big questions in development, like the economy-wide effects of good institutions or good macroeconomic policies. Some RCT proponents have (rather naively) claimed RCTs could revolutionize social policy, making it dramatically more effective – this claim itself can ironically not be tested with RCTs. Otherwise, embracing RCTs has led development researchers to lower their ambitions. This is probably a GOOD thing in foreign aid, where outsiders cannot hope to induce social transformation anyway and just finding some things that work for poor people is a reasonable outcome. But RCTs are usually less relevant for understanding overall economic development.

Overall, RCTs have contributed positive things to development research, but RCT proponents seem to oversell them and seem to be overly dogmatic about this kind of evidence being superior to all other kinds.

TrackBack

TrackBack URL for this entry:
https://blogs.nyu.edu/movabletype/mt-tb.cgi/2068

Comments (17)

The one thing that makes me nervous about RCTs in development - and this is a quibble - is that in a normal project your goal would be to have the largest impact possible. Even if you are working in just one place, you try to have some impact everywhere. In a trial, though, you need to specifically avoid "leakage" of your impact. That strikes me as a distorting factor in intervention design.

Omair:

Great post - it was nice to see a discussion on absence of theory in RCTs without emotional Deatonesque lamentations over the decline of price theory.

With regard to your point number 1 though, don't some RCTs guarantee all groups treatment, but delay it for the control groups long enough that they can study the effects? This seems like a much more equitable way of implementing an RCT, if affordable. I'd imagine local leaders and private individuals would also be more receptive to this.

Matt:

Great post - another thing which is related to what Alanna said:

After controlling for the fact that they are an extremely targeted intervention, RCTs will usually be carried out as carefully and efficiently as possible - in order to maximize the possible impact.

In ordinary life, when these interventions are actually implemented as policy, they can't possible remain as efficient and targeted as an RCT - would you expect, for example, Ugandan health authorities to be as effective at de-worming as the JPAL?

Mike:

A very nice summary of the debate. It seems to me, though, that with this post Prof. Easterly withdraws some of his (what seemed to be) enthusiastic support of RCTs as an invaluable aid tool from 'The White Man's Burden.'

Having arrived at this blog post via twitter retweets, I was expecting this piece to be much more one-sided against the use of RCTs in development, but I think all the abovepoints are valid concerns.

At the same time using RCTs or at least non-randomized control groups does have avery practical role for aid agencies implementing new programmes.

If we believe a new intervention may have a positive effect (say because it worked in another country) but we are not fully sure it will work or how to fine tune it to the current situation we might well want to do a pilot project before advocating to go countrywide. But how will we know if the positive effect we see from the pilot is actually due to our intervention and not due to some other change in the country? Here's where control groups come handy to help disentangle these issues. (however this clerly doesn't mean you can then generalize about this globally!).

Your issue of dignitiy and ethics is an important one - and often raised against RCTs. Just to point out that inflicting an untested intervention on an entire population without trying to collect informationon counterfactuals isn't any more ethical than doing trials where some don't "benenfit" form the new intervention.

April:

Great post. The way I see it, RCTs are fabulous for the confidence you can have in their results. However, from looking at the RCTs that have been done on health and (much less so) education interventions in developing countries I have this observation: many are less useful than they might be because they are often designed and the findings interpreted without the benefit of deep knowledge of the theory and empirical evidence about health and education systems in developing countries. Many of the RCT studies appear to be done by people or teams of people with more expertise in methodology than health or education systems. I think there would be much more benefit from RCTs if the teams who conduct the studies were consciously formed to bring in the (often) missing expertise. Paul Gertler's studies in health and education are exceptions - and they illustrate the value of bringing the two knowledge bases together. They very often try, and succeed, in answering policy-relevant questions.

Not to pick on Cohen and DasGupta - but their net price paper http://www.brookings.edu/papers/2007/12_malaria_cohen.aspx
only looked at what happens to demand at different prices. There is much evidence to indicate that pricing also effects supply response (and over time, the density of private supply networks via crowding out). And to know the overall effect of a price change, immediately and over time, you'd need to look at how these effects come together
See Mead Over's blog entry at CGD Global Health blog on the topic:
http://blogs.cgdev.org/globalhealth/2009/06/zero-prices-are-special-for-providers-also-%E2%80%93-but-not-in-a-good-way.php

I think if they had known more about these factors ahead of time, they might have been able to design their trial to take them into account. At a minimum, they would not have misunderstood, and somewhat misrepresented, the implications for policy of their results.
Take-away message: for those of us that can influence study designs and the make up of research teams, let's do what we can to bring together the people with expertise in RCTs and the people with health and education (and other) systems expertise. That way we will get the most value possible from these fine methodological tools.

Live:

Good post, really.

A RCT is useful not to see if something works. As Nagel pointed out the defining of nature of science is to generate propositions that can be disproved. Without a theory that can generate a disprovable hypothesis, it's merely gathering data in the hope that a pattern emerges.

The more fruitful approach in development is the proof of concept project with continuous interventions and improvements.

That's the model we've seen with most successful start-ups. As they scale the improvements continue, the effects are analyzed and the knowledge gained from ever more effective interventions leads to actionable intelligence to get from here to there.

Nancy:

The phase-in approach for the treatment is one way to ensure that everyone eventually gets the treatment. Refer to the India balsakhi RCT for a clever phase-in approach (balsakhis = teaching assistants/tutors), where all schools received the treatment (balsakhis) but in such a way that there were still control groups.

At the J-PAL training, they recommend doing RCT for programs that are NOT gold-plated (i.e., that can be replicated)and that have been running for a while (i.e., not in pilot stage), especially in cases where there are possible plans to scale up the program further. You are always going to run into scaling problems with programs, regardless of whether you use RCT.

In terms of manipulating results, you can do that in any type of analysis, so this concern does not apply exclusively or specifically to RCTs.

Without even intentionally manipulating results, non-experimental methods can yield more misleading results than RCT. If you run a simple multiple regression with 40 dependent variables, chances are that 2 of those variables will be statistically significant just by luck of the draw (5% significance level), even if, in reality, they don't have a statistically significant effect on the independent variable if you were to draw another sample.

I think the post (and several of the comments) misattributes some of the enthusiasm and claims for RCTs from the researchers who do them to the non-researchers who use them to advocate for certain solutions.

I've never read an RCT paper that didn't: 1) exhaustively discuss how the experiment came to an equitable and ethical method for randomization (and of course most academic research requires this to pass the human subjects approval panels), 2) exhaustively discuss the limited application of the results and the need for replication in more contexts, and 3) have a thorough grounding in theory.

In fact, if you have a policy agenda, talking to Duflo, Kremer, Karlan et. al. can be maddening since it's virtually impossible to get them to commit to a policy recommendation beyond the local context of an experiment they've run.

To paraphrase Don Marquis, RCTs, nor many of the leading researchers, are not responsible for those who claim to believe in them.

In response to Matt specifically -- JPAL, IPA nor any of the other randomistas actually run any interventions. The interventions are all run by NGOs as part of their programs. Sure one NGO may be more effective than another running a particular intervention but that applies to scaling anything and so doesn't have any bearing on RCTs or their usefulness or validity.

Nancy:

Correction -- sorry my brain is not functioning today -- I switched independent and dependent!

Without even intentionally manipulating results, non-experimental methods can yield more misleading results than RCT. If you run a simple multiple regression with 40 independent variables, chances are that 2 of those variables will be statistically significant just by luck of the draw (5% significance level), even if, in reality, they don't have a statistically significant effect on the dependent variable if you were to draw another sample.

Fer:

Ey, we almost all know RCTs are not a panacea, simply "hard evidence", which has been one of Easterly´s core ideas, right? Forget panaceas!

But most comments to this post seem to me a bit unbalanced. Manipulation? well, something possible in RCTs, but the risk is higher in non-RCTs, don´t you think? This blog is also about Collier´s data mining, or Rodrik´s industrial policy views. So publication bias is maybe lying elsewhere RCTs...

And finally, lets take some view of macroeconomics: corruption is bad for development. Nice! always? which corruption? where? how to attack it and when? We can´t translate anything of long-run growth determinants into short-run policy action with any cross country study. So Macroeconomics (and its weak or false or misguided foundations currently debated in the crisis) is not that useful too. Where is the conplementarity between the two branches? we won´t never know maybe! Maybe Easterly should have warned the reader (just a bit!) with some of his own words: don´t rely too much on economists (especially outsiders and planners), we don´t know nothing apart for just few things like macro management...

Bill Savedoff:

Why are we debating whether RCTs are a good idea or not outside of a specific question or context? It is like having a debate about Pilot Programs. Are pilot programs a good idea? Oh no (insert sarcasm here). Aren't pilot programs a bad idea? They are unethical (some people don't get them), irrelevant (who knows if they could ever be scaled up), and they so often lack any theoretical model to justify why they might work.

I'm astonished that Bill Easterly keeps ragging on RCTs this way when he did such a good job of pointing out the need to get away from the 'planning' mode and into the 'seeking' mode of development. RCTs are one (and ONLY one) tool for making the seeking process more systematic. What benefit do we get by denigrating a useful method?

Babur:

Points 2, 3, and 4 seem rather silly to me as effective criticisms of this kind of study. RCTs, presumably, are an idea drawn from the "hard" sciences and are a kind of minimum standard in medical sciences for gathering empirical evidence on the efficacy of an intervention. If you leveled these criticisms at, say, clinical trials in medicine, you'd be laughed at.

#2: Sure, there is difficulty in generalizing RCT results. There is difficulty in generalizing the result of any experiment. So? That just means that the bar for "convincing" evidence has to be set a little higher, with several RCTs showing similar results, and with RCTs performed in different environments with different modifying variables. This is the way it is done in the sciences - one experiment is not convincing, but may be promising, and after several experiments people begin to be convinced. As for "feeble incentives" for academics to do replications, well that is the fault of the academics. If they take something on face value with only one experiment, there are no incentives. If no one believes them until it has been proven with large data sets and meta-analysis, then there are incentives.

#3: This is just absurd. The whole point of doing empirical research is to try to gather data to prove or undermine theoretical models. If the experiments provides evidence for some extant theoretical model, then it makes that model stronger. If the results defy existing theory, well that means that the theory is inadequate, and that the academic types can use the data to try to develop new models. Once again, this is done in the sciences regularly. You start out designing experiments based on some theory you subscribe to, or in order to fill the gap in some existing theory, and the results of the experiment guide the development of the theory, which in turn guides further experimentation, etc. etc.

#4: Sure, that is true. But these are just poor RCTs. If people are accepting the results of experiments performed by people with clear conflicts of interest, that is a problem. But the problem is then with the established standards for accepting evidence, not with the use of RCTs. If RCTs are going to be used, one has to apply appropriate rigor in determining their validity and accepting the results as valid. If this is not happening, then that is a huge institutional flaw in the way development and aid is done, which needs to be corrected ASAP.

Min:

"Denying something beneficial to some people for research purposes seems wildly unethical at first."

Sampling techniques such as "play the winner" can minimize ethical concerns. As evidence that something is beneficial accumulates, subjects are more likely to receive it. The statistics is not as easy as a simple treatment vs. control group model, but the statistics can handle it. :)

"The most useful RCT results are those that confirm or reject a theory of human behavior. For example, a general finding across many RCTs in Africa is that demand for free life-saving products collapses once you charge a price for them (even a low subsidized price). This refutes the theory that fully informed people are rationally purchasing low cost medical inputs to improve their health and working capacity. This would usefully lead to further testing of whether the problem is lack of information or the assumption of perfect rationality (the latter is increasingly questioned for rich as well as poor people)."

Well, my first thought, as a layman, is that culture probably has something to do with that. Sharing is a much greater value in many African cultures than in ours. Charging money to save someone's life may be a rather strange concept in such cultures. And the "assumption of perfect rationality" is a misuse of the term, not to mention paternalistic in this context.

TGGP:

Given the concerns of this blog with the human dignity of the poor, the researchers should
Why should the researchers be motivated by the concerns of this blog?

Personally, I think we need a hell of a lot more experimentation. Pull out all stops and experiment away. There will be some short-term downsides, but advances in knowledge outweigh them.

Post a comment

About

This page contains a single entry from the blog posted on July 16, 2009 12:00 AM.

The previous post in this blog was The Soccer Theory of Globalization.

The next post in this blog is Africans already got the idea: “Africa does not need strong men, it needs strong institutions”.

Many more can be found on the main index page or by looking through the archives.