Insights
14 Mar 2024 · 7 min read
What Baseline, Midline, and Endline Studies Actually Measure — and Why the Sequence Matters
A baseline is not just a starting snapshot. Understanding the full measurement sequence — and what each wave is designed to answer — is the difference between an evaluation that informs decisions and one that creates noise.
Every donor-funded programme eventually faces the same question from its funder: did it work? The answer depends entirely on whether the right data were collected, at the right moments, using a design that makes comparison valid. Baseline, midline, and endline studies are the machinery that makes that answer possible — yet the terms are used loosely, and the purpose of each wave is frequently misunderstood.
What is a baseline study, and what is it actually measuring?
A baseline study establishes the state of your target population before a programme begins. It is not simply a count of beneficiaries or a description of service coverage — a strong baseline captures the distribution of the outcomes you intend to change. If your programme aims to improve children's learning outcomes, your baseline must measure literacy and numeracy in your treatment group (and, for counterfactual purposes, a comparable comparison group) before any intervention has occurred.
The baseline serves two purposes. First, it gives you a starting value against which to measure change. Second — and this is where baselines earn their cost — it allows you to detect selection effects: systematic differences between groups that, left unaddressed, would make your endline comparison misleading. A baseline that simply counts programme participants without a comparison frame is not a baseline; it is a registration exercise.
In Devtplan's work on the RTI International Learning through Play programme (P3), the baseline needed to capture pre-programme cognitive and social-emotional outcomes across selected districts. This required sampling design precise enough to support sub-district inference and a measurement protocol that could be replicated under midline and endline conditions — standardised instruments, trained enumerators, controlled field procedures. The point is that the baseline locks in the measurement conditions; midline and endline repeat those conditions, not invent new ones.
What does a midline study add that a baseline does not?
A midline sits at the programme's halfway point — typically one to two years after baseline — and answers a different question from the endline. Where an endline asks "did the programme achieve its outcomes?", a midline asks "is the programme on track, and are there early signals that should prompt a course correction?"
Midlines are most valuable when programmes are adaptive — when findings can actually be used to change implementation before the final evaluation window closes. They are least valuable when programme staff receive the findings too late to act on them, or when the study design does not allow disaggregation fine enough to identify which components are or are not working.
A common midline failure is treating it as a mini-endline: reaching for impact conclusions before the programme has had sufficient exposure time. Midlines measure process and intermediate outcomes — shifts in knowledge, attitude, and short-cycle behaviour changes — not final impact. Framing midline findings correctly in reporting is a discipline that requires both statistical rigour and honest communication with the commissioning partner.
What makes an endline evaluation different from a final report?
An endline evaluation is the full measurement of outcomes against the baseline, using the same instruments, the same comparison frame, and the same analytical approach. The endline is where attribution claims are tested. "The programme improved learning outcomes by X" is only a valid statement if the endline design supports that claim — which means the same sampling frame, the same respondent eligibility criteria, and a comparison or control group that remained valid through programme implementation.
Endlines fail when the comparison group has been contaminated (e.g. control communities received a similar programme from a different actor), when attrition from the original sample is high enough to introduce bias, or when the analytical approach used at baseline was not documented precisely enough to replicate. These are not abstract methodological concerns — they are the reasons evaluation findings are sometimes disputed by funders.
Devtplan's endline evaluation of the BMZ PASEWAY programme (2022) — which targeted pathways for sustainable employment for women and youth — required reconstructing a comparison across multiple regions and verifying that the control group had not received comparable employment-support interventions from other actors during the programme period. That verification step is a standard part of a credible endline and is often underbudgeted by implementing partners.
What sampling approach should a baseline use?
The sampling approach for a baseline is determined by the level of inference required at endline. If your programme targets households in three regions and you want to make region-level comparisons at endline, your baseline sample must be powered for region-level inference from the outset. Retroactively attempting to increase the sample or change the sampling frame between baseline and endline invalidates the comparison.
For most Ghana-based programmes, a stratified multi-stage cluster sample is the standard approach: administrative regions or districts as primary strata, communities or enumeration areas as primary sampling units, and households within communities as final units. The cluster effect (design effect, or deff) must be accounted for in the power calculation, or the study will be systematically underpowered. A design effect of 1.5 is a reasonable starting assumption for household surveys in Ghanaian communities; the actual deff should be estimated from the baseline data and used to recalculate midline and endline sample sizes.
How do you ensure the same respondents are tracked across waves?
Panel designs (tracking the same individuals from baseline to endline) provide the strongest evidence of individual-level change but carry the highest attrition risk. Cohort designs (sampling from the same target population at each wave, without individual tracking) are more practical in high-mobility contexts and are the norm for household-level programme evaluation in Ghana. The choice should be made at baseline and documented, because it affects both the analytical approach and the statistical power required at each wave.
Attrition at midline and endline should be reported fully — not minimised. Differential attrition (where treatment and comparison groups drop out of the sample at different rates) is a threat to the validity of any impact estimate. Funders and implementing partners sometimes push evaluators to minimise discussion of attrition in reports; resisting that pressure is part of an evaluator's professional obligation.
What are the most common baseline mistakes that undermine the endline?
The most costly baseline mistakes are design decisions that cannot be corrected later. These include: insufficient statistical power (too small a sample to detect a realistic effect size); an inadequate comparison frame (no control or comparison group, or one that cannot be maintained through the programme); poorly documented instruments that cannot be replicated; and field protocols that introduce systematic interviewer or respondent bias. Each of these mistakes is invisible at baseline — they only become apparent when the endline analysis cannot answer the questions the funder is asking.
A secondary category of mistakes is administrative: failing to archive baseline data in an accessible, documented format; losing enumerator training materials needed to standardise midline and endline field procedures; and transferring institutional knowledge so poorly that the midline team does not know what the baseline team actually measured. Data management standards from the baseline must be maintained across the entire evaluation sequence.
Frequently asked questions
- What is the difference between a baseline study and a needs assessment?
- A needs assessment identifies gaps or priorities before programme design. A baseline study measures the status of outcomes you intend to change, providing the starting value for your evaluation. A baseline requires a comparison frame and a sampling design that supports endline inference; a needs assessment does not.
- How long after a programme starts should the baseline be conducted?
- Ideally before any programme activities reach beneficiaries. In practice, late funding decisions sometimes mean the baseline must be conducted within the first few weeks of implementation. The key requirement is that the target population has not yet received the core programme intervention; a baseline conducted after significant exposure is compromised.
- Can a midline study replace a baseline if the programme started without one?
- No. A midline without a baseline can describe current status but cannot attribute change to the programme. If no baseline was conducted, the evaluation's scope must be honestly revised to remove impact claims, and any retrospective baseline approach (e.g. recall data) must be treated with significant caution.
- How much does a baseline/endline study typically cost for a Ghana-based programme?
- Costs vary significantly by sample size, number of regions, instrument complexity, and whether an independent evaluator or the implementing partner is conducting the study. A nationally representative household survey baseline covering three or more regions with a sample of 1,500–3,000 households typically requires a dedicated budget in the range of USD 40,000–120,000. Devtplan can scope costs to your specific programme design.
Related reading
-
22 Jan 2024
Resettlement Done Right: Why Community Engagement Is Not Optional in Infrastructure Projects
Resettlement without adequate community engagement is the single most common source of infrastructure project delays, cost overruns, and reputational damage for development financiers. Here is what "done right" looks like in practice.
-
08 Oct 2023
Choosing a Sampling Approach for a National Household Survey in Ghana
Ghana's administrative geography, population distribution, and field logistics make sampling for national household surveys a practical problem as much as a statistical one. This is what researchers and programme evaluators need to know before designing the frame.