This post has been updated to include more information about the evaluation work done by GAVI, the Vaccine Alliance.
It seems like a no-brainer. Before you spend big bucks on a massive effort to improve life for the world's poorest — say, distributing millions of free bed nets against malarial mosquitoes, or offering thousands of women microloans as small as $200 to start small businesses — you should run a smaller scale test to make sure the idea actually works. After all, just because a project sounds good in theory doesn't mean it's going to pan out in practice.
For instance, what if giving out the bed nets for free makes people less likely to value them? Maybe you should charge a fee on the theory that while less people would get the nets, those who do will be the ones who see a need for them and will therefore take the trouble to actually use them.
And what if some totally different method wouldn't achieve better results for less money? For instance, maybe the key to lifting women's incomes isn't helping them start a small business but helping them land a salaried job?
Yet for decades, questions like this have been left unanswered. Instead health and development aid for the world's poorest has largely been designed based on what seems reasonable, rather than what can be proved with hard evidence.
A New Movement
Since the early 2000s, however, a growing movement of social science researchers have been pushing policy-makers to do "impact evaluations" of their programs. That's a phrase used in the world of aid that means checking whether your program is achieving its ultimate objective — say raising incomes or reducing disease.
In particular, these scientists have been arguing for the use of what they call the gold-standard of proof: the "randomized controlled trial." In an RCT you randomly divide the people you're studying into at least two groups. One gets the intervention you want to test. The second, an otherwise identical "control group" of subjects, doesn't get the intervention. Then you compare the results for each group to see what difference, if any, the intervention made.
Over the last decade there's been an explosion in the number of RCTs being done to measure health and anti-poverty efforts, and they've helped settle some major debates about what works and what doesn't. (As it turns out, offering bed nets for free as opposed to at a price, appears to be extremely effective. On the other hand, while microloans may have all sorts of uses, the evidence suggests that lifting people's incomes over the long term is not one of them.)
The Worries Of 'Randomistas'
Despite these successes, the researchers who advocate this approach — they're sometimes called "randomistas" — also worry that RCT's are still not being deployed frequently enough, and that even when they are done, policy makers often fail to apply the lessons.
This sense of mixed progress was evident at a recent conference organized by the Washington, D.C.-think tank Center for Global Development, where some of the most prominent randomistas gathered to take stock.
Just ten years ago one of the most active centers of RCT work was running about 70 impact evaluations worldwide. Today the number it's completed or currently has underway tops 800. That's according to Abhijit Banarjee a professor of economics at MIT who helped found the center — the Abdul Latif Jameel Poverty Action Lab, or J-PAL , a network of affiliated researchers at nearly 50 universities who set up RCTs in the fields of global health and poverty that was started in 2003.
And when you include the work of groups beyond J-PAL, the number of impact evaluations of global health and poverty programs that are completed and published each year has risen steadily in the last decade from about 50 per year to 500 per year, said Emmanuel Jimenez. He's director of the International Impact Evaluation Initiative, or 3ie, an NGO that maintains a searchable database of findings in addition to providing $83 million to fund studies since 2008.
Rachel Glennerster, Executive Director of J-PAL, credits the rise of RCT's not just to funding organizations like 3ie but other research nonprofits that conduct them. Today, she said, major anti-poverty players ranging from the World Bank and USAID — the main U.S. government agency responsible for development programs — all have departments that use impact evaluations in one way or another.
"What encourages me is that we've built a whole kind of ecosystem of groups who are trying to move this forward," said Glennerster.
But like other randomistas, she also worried that that the number of RCTs is still paltry compared to the number of development programs that governments, international organizations and NGOs are carrying out.
Even at the World Bank and USAID, only a small portion of projects are subject to impact evaluations, agreed Amanda Glassman, chief operating officer and senior fellow at the Center for Global Development. Every year, her group does an exhaustive review to identify large-scale health programs that made a big impact. Of about 250 that they looked through this past year, "only 50 used rigorous methods to establish the attributable impact. And none of the very largest programs in global health had done any impact evaluation" of the type she argues are needed — including two major international nonprofit organizations: Global Fund to Fight AIDS, Tuberculosis as well as GAVI, the Vaccine Alliance.
This doesn't mean the health products that these health programs use — medicines or vaccines, for instance — haven't been proven effective through, say, medical trials or studies of what happens to the incidence of disease when you vaccinate a certain population, explained Glassman.
Officials at GAVI note that the organization also tracks the increase in vaccination rates and decline of diseases in areas where it works, using a number of official data sources. Measuring impact "is a major part of how the organization operates," says Hope Johnson, director of Monitoring and Evaluation for GAVI.
Glassman says that's not enough when "the challenge isn't just the biological effect of a pill or vaccine but how to get those pills or vaccines to those who need them." One question, for instance: Is it more effective to do an intensive one-week campaign in which health workers armed with vaccines fan out across a community than to provide routine vaccinations at health clinics.
Is Attention Being Paid?
Then there's the question of how much attention policy-makers are paying to the results of the RCTs that are being done. Banarjee noted that RCTs have at least already "fundamentally changed our understanding" of some key issues in aid – the limits of microloans as a tool for ending poverty, the advisability of offering not just bed nets but all sorts of other preventive health products like de worming pills and chlorine treatments for water for free or heavily subsidized prices.
But in many cases, the information generated by RCTs isn't used to improve aid. Jimenez, of 3ie, described an internal review done by the World Bank — where he used to work — which found that only about half of impact evaluations done on Bank projects were even cited in the final reports on those projects.
So why do some RCT's make an impact while others vanish without a trace? One important lesson: collaboration with local governments is critical. Researchers need to work more directly with the policy-makers who implement aid programs, said Jimenez.
Several speakers at the conference described successful experiences doing this: A team from J-PAL has worked with Indonesia's government to test and then roll out measures to curb corruption in a rice distribution program that serves 66 million people. And researchers from the non-profit institute RTI have been helping the government of Kenya design new teaching techniques to improve reading and math skills in elementary schools.
To make these partnerships with policymakers work , said Jimenez, researchers might sometimes need to put their personal career interests on the back-burner. For instance, researchers often prefer not to publicize their results until they're ready for publication in a prominent journal. But that can take months. Instead said Jimenez, researchers need to be "getting results out when the decision-makers need it."
J-PAL's Banarjee said that figuring out how to collaborate with governments is such a priority that J-PAL recently launched a whole branch dedicated to doing just that — it's called the Government Partnerships Initiative. Otherwise, he said, "a lot of good ideas don't get implemented. And I think that's really a tragedy."
SCOTT SIMON, HOST:
Here's an idea that sounds pretty obvious. Before you spend lots of money on an aid program, wouldn't you want to run a test to make certain the program actually works? If you want to help farmers, should you buy them seeds or just give them cash? If you're trying to protect people from malaria, should you hand out millions of bed nets? For years, the answers to those questions were usually taken on faith. NPR's Nurith Aizenman reports on a growing effort to change that.
NURITH AIZENMAN, BYLINE: The world has spent billions to help the planet's poorest people. But in a lot of situations, it wasn't clear how much of a difference that was making. Amanda Glassman is chief operating officer of the Center for Global Development, a Washington, D.C., think tank.
AMANDA GLASSMAN: In the best-case scenario, people were measuring what the situation was on the ground before a program happened and after a program happened. But the problem was you really didn't know whether - when things changed - it was because of the program or because of something else.
AIZENMAN: Like good weather or overall improvements to the economy. And Glassman says even those types of studies were rare.
GLASSMAN: In many cases, they weren't collecting any data at all on what the impact of the program was.
AIZENMAN: So program designers largely based their decisions on what seemed logical. Take distributing bed nets in malaria-prone areas. It became a major priority because lab studies showed the nets really do cut down on mosquitoes. The trouble?
GLASSMAN: What really matters is not just whether the technology - in this case, the bed net - works but whether people actually use it.
AIZENMAN: In fact, on this very point, there was a big debate. See, even though bed nets weren't that expensive, people weren't buying them. Some aid experts argued this meant you should hand out the nets for free. Others said maybe the reason people don't buy them is because they don't think they're that helpful. Abhijit Banerjee is an economist at MIT.
ABHIJIT BANERJEE: You could argue that, look, if they don't believe in them, and you give it to them, they won't use it.
AIZENMAN: To settle a question like this, says Banerjee, you need a kind of experiment that's the gold standard in science, a randomized, controlled trial. Essentially, you randomly assign the people you want to study to two groups. One gets access to the program or, quote, "treatment" you want to test. The other group does not.
BANERJEE: If I now observe a difference between them, then it's much more likely that that's because of the treatment.
AIZENMAN: In 2003 Banerjee co-founded a network of researchers at nearly 50 universities dedicated to doing work like this. And by the late 2000s, some groundbreaking results were coming in. On the bed nets, they found that families that had gotten nets for free were just as likely to use them. It turns out that extremely poor people are just really sensitive to price, even for lifesaving products. For aid groups, this experiment was a game changer.
BANERJEE: Many organizations were very, very reluctant to go to zero price. And I think the weight of this evidence has moved a lot of them.
AIZENMAN: A lot of programs now offer bed nets for free or highly subsidized prices. Findings like this have fueled a surge in randomized, controlled trials. Banerjee's network alone has done more than 800 of them. And everyone from governments to the world bank does these studies now. But the Center for Global Development's Amanda Glassman says it's still a drop in the bucket.
GLASSMAN: We and some colleagues here have been looking at what share of the total aid portfolio is subject to these more rigorous methods. And it's such a small share. It's less than 5 percent still.
AIZENMAN: This includes major areas of U.S. spending.
GLASSMAN: For example, we have a huge HIV/AIDS program. It really matters whether the pills that we distribute are taken in the correct way.
AIZENMAN: Like, is it worth having a health worker actually watch patients take their pills, or is just giving people instructions enough?
GLASSMAN: There's just very few experiments that look at - what does that the best?
AIZENMAN: So, Glassman says, until these experiments take on a larger role in aid decisions, when it comes to helping the poor, mostly, we're still flying blind. Nurith Aizenman, NPR News. Transcript provided by NPR, Copyright NPR.