Evaluating program components without randomization: Your data is richer than you think

Because poverty rarely stems from a single constraint, development programs often combine multiple interventions into a single package. For instance, PROGRESA in Mexico integrated health services, nutritional supplements, and economic support to address multiple barriers to progress. Similarly, the UK Girls’ Education Challenge IGATE-T program in Zimbabwe (for which Limestone Analytics led the evaluation) bundled teacher professional development, youth leadership clubs, curriculum changes, learning materials, community engagement, and bicycles to holistically improve academic outcomes.

When such a program works, implementers and funders are often left uncertain about which of the myriad components drive the impact and which add little value. This can result in bloated programs — a “kitchen sink” approach to program design, where components are included simply because they might matter.

How can one isolate the value added by a single program component, such as the provision of bicycles as part of an education program?

The Evaluation Challenge

Ideally, this question would be built into the evaluation design from the start. Through a randomized trial, for example, or a rule-based distribution that allows comparison across eligibility thresholds. But development programs are rarely implemented under such clean conditions.

As is often the case with development programs, these questions arise after implementation — after bicycles have already been distributed and program rollout has followed general guidelines rather than a tightly designed evaluation plan.

In such cases, can we still credibly learn about the added value of bicycles? Our recent paper in the Journal of Development Effectiveness shows that the answer to this question is often “yes.”

We show how the uneven, often ad hoc, implementation of program components within and across locations can be used to identify the impact of individual components. This article explains how we used this approach to assess the value added of bicycles as part of the IGATE-T project.

Measuring the added value of bicycles

The Bicycles for Education and Empowerment Program (BEEP) distributed over 7,400 bicycles to students as part of the much larger multi-component IGATE-T initiative. Bicycles may reduce commute time, improving attendance, safety, and time spent studying. Not everyone received a bicycle, but allocation was not random. Rather, implementation was ad hoc, as program managers tried to distribute the bicycles based on need while facing the realities of imperfect logistics and fieldwork.

In Zimbabwe, the implementers targeted schools with longer and less safe commutes. Schools then typically provided bicycles to students based on the distance to school. This non-random allocation meant that bicycle recipients were inherently different from non-recipients, making it impossible to estimate the impact of bicycles by comparing recipients and non-recipients.

The analysis used optimal full matching (OFM) to construct a comparison group of students who were similar to bicycle recipients in observable characteristics but did not receive bicycles. Unlike traditional matching approaches that discard many observations, OFM assigns weights across all individuals in the dataset to improve balance between groups.

The constructed comparison group effectively had similar observable characteristics, from similar communities, with similar exposure to non-bicycle components of the program as the bicycle-receiving treatment group. Differences in outcomes between these matched groups can therefore be interpreted as plausibly reflecting the added impact of bicycles

Below, we discuss the results of the bicycle analysis.

Key Findings: What Do Bicycles Actually Do?

By using OFM to isolate the bicycle component from the rest of the IGATE-T program, we found distinct impacts on attendance and empowerment, but not on test scores.

Bicycles Boost Attendance — The results are consistent with the theory of change: reducing the cost of commuting increases school participation. Girls who received a bicycle missed 1.04 fewer school days per month than their matched peers. This effect was most pronounced for girls living very far (1+ hours) or moderately far (16–30 minutes) from school.
And Empowerment — Beyond education, mobility matters for agency. We found significantly higher (0.2 SD) Youth Leadership Index empowerment scores for bicycle recipients. This suggests that independent transportation gives girls a greater sense of control and voice.
No Automatic Leap in Learning — Unlike previous studies of standalone bicycle programs in India and Zambia, which found improvements in test scores, we found no evidence that bicycles improved literacy or numeracy over the one-year period.
Why the difference? In our context, those who did not receive bicycles were already benefiting from the larger IGATE-T education program (teacher training, learning resources). This suggests that while bicycles increase school attendance, the marginal gains in learning may be harder to achieve when baseline educational quality is already improving through other interventions. Contextual differences between Zimbabwe and the India and Zambia settings may also help explain the difference.

Lessons for Policy and Practice

The most important contribution of this analysis is not about bicycles themselves. Rather, it demonstrates how organizations can learn about the marginal value of individual program components, even when programs were not originally designed for rigorous evaluation.

Learning from existing program data — Development programs are often implemented unevenly or inconsistently across locations. Instead of viewing these implementation differences as a limitation, evaluators can potentially leverage them to learn which components generate measurable benefits. Approaches such as OFM enable the extraction of credible insights from administrative data collected during program delivery.

Disentangling bundled interventions — Many development programs combine multiple components, such as training, resources, community engagement, infrastructure, and more. When programs succeed, it can be difficult to determine which elements are essential and which add cost without improving outcomes. Methods like the one used here can help estimate the marginal impact of individual components within these complex program bundles.

Generating evidence when randomization is infeasible — Randomized evaluations are not always possible. Political constraints, ethical concerns, or operational realities often prevent strict experimental designs. Careful observational approaches can provide a practical alternative for generating evidence and guiding future program design.

When applied to the analysis of bicycles in a large education program, the method shows that bicycles appear to improve attendance and empowerment but not short-term learning outcomes. More broadly, the analysis illustrates that even when programs were not designed as randomized experiments, existing project data can still contain useful variation for evaluating the impact of individual program components. For organizations implementing complex interventions, this means that project data may be far richer than it initially appears.

Christopher Cotton is the Jarislowsky Deutsch Chair in Economic and Financial Policy at Queen’s University and a senior research and policy advisor at Limestone. Ardyn Nordstrom is an Assistant Professor in the School of Public Policy and Administration at Carleton University and a research associate at Limestone. Zachary Robb is a PhD candidate at Queen’s University and a research associate at Limestone.

Impact Blogs: Evaluating program components without randomization: Your data is richer than you think

Posted On

Author

Categories

More Blog Posts

Related Projects

Related Publications

Other Blogs Title

About

Services

News

Intranet

Contact

Get Social

Tweets