Attrition isn't random: what panel surveys quietly lose

Panel surveys are valuable because they let you observe change at the household level — how a livelihood shock propagates, how a programme effect develops, how vulnerability evolves over a season or a decade. They are also biased in ways that cross-sectional designs are not, because attrition between rounds is rarely random.

The mechanism is straightforward to describe and uncomfortable to address. Households drop out of panel surveys for reasons that are correlated with the outcomes the survey is trying to measure. They migrate in response to drought, flood, or wage collapse. They dissolve under household conflict that is itself a downstream effect of the stress the panel is trying to capture. They become uncontactable when illness or death disrupts the residence pattern. They withdraw when programme contact diminishes and the survey loses its salience. The very households whose trajectories the panel was designed to capture are systematically over-represented among the missing.

The standard corrections handle the easy part of this problem. Inverse probability weighting on baseline observables — education of household head, asset index, district fixed effects, baseline value of the outcome — corrects for the variation in attrition that is associated with the variables you observed at baseline. This is useful and worth doing. It does not address the hard part, which is that the unobserved shock is what drove both attrition and the outcome of interest. The bias that survives weighting is the bias that matters most.

An illustrative pattern. In a livelihoods panel in a flood-affected eastern district, attrition between baseline and the second follow-up reached 22% — not unusually high for the context. The lost households had baseline characteristics that already predicted higher vulnerability: lower asset index, more migrant labour, fewer adult earners. IPW on those baseline characteristics narrowed the gap between the panel-balanced sample and the original baseline distribution. It did not narrow it to zero. The remaining gap, on diagnostic checks comparing balanced-panel respondents to administrative records of NREGA participation in the same villages, suggested the corrected sample still under-represented the most-affected households. The published estimate of programme effect was, almost certainly, an upper bound. The report did not say so in those words.

Heckman selection correction can in principle handle non-random attrition under a stronger assumption: that there is at least one variable that affects the probability of being observed but not the outcome. The practical search for such an exclusion restriction is usually unsuccessful in development contexts. Anything that predicts whether you can find a household again — mobile phone access, residential stability, neighbour cooperation — tends also to predict the welfare and livelihood outcomes the panel measures. Heckman corrections are sometimes applied with implausible exclusion restrictions on the grounds that something is better than nothing. Often it is not.

Tracing protocols are where the leverage is. Mobile-first tracing — baseline collection of multiple phone numbers, monthly check-in SMS, follow-up calls before each round — reduces silent attrition meaningfully. Neighbour informants matter, especially for migration tracking; well-designed protocols collect contact information for two non-household relatives at baseline. Cross-referencing against administrative records (NREGA muster rolls, PDS dealer lists, school enrolment) can identify households that have moved within the district but lost their original locator. None of this is glamorous. All of it shifts the attrition curve in ways that no amount of statistical correction can match.

There is a political-economy dimension to attrition reporting that is worth naming. Panels with low attrition are easier to publish, win contract extensions, and produce headline numbers. Panels with high attrition are uncomfortable to discuss in funder meetings. The incentive is therefore to report the panel-balanced subsample as the main analytic sample — sometimes without making the selection step transparent. The headline number is then conditional on remaining observable, which is exactly the conditioning that the original research question was supposed to be unconditional on.

A more honest reporting standard would treat attrition above a threshold — we use 15% as a working rule, others use higher — not as a problem to be corrected statistically but as a problem that changes how the analysis is framed. Above the threshold, the report should describe what is known about the attrited households (administrative cross-references, neighbour reports, partial information from incomplete follow-ups), report the analysis on the full baseline sample with attrited households assigned a range of plausible outcomes, and frame the headline estimate as conditional on observability. This is harder than it sounds; it is also the difference between a panel that produces credible inference and a panel that produces a clean number.

One related caution. Panels that report low attrition often achieve it by selecting on stability at baseline — sampling only households with at least one earning member resident in the village for five years, for instance. The low attrition is then the consequence of having pre-selected the stable subset of the population, which is itself the selection problem moved earlier in the pipeline. This is rarely flagged in the published methodology. Comparing the eligible-sampling-frame composition to the achieved baseline sample is the diagnostic; if the report does not present this comparison, the reader has no way to detect the substitution.

Useful references: Wooldridge's Econometric Analysis of Cross-Section and Panel Data for the formal treatment of attrition; the IPA Goldilocks tracking working paper for practical tracing protocols; and the India Human Development Survey documentation, which is unusually transparent about attrition and tracing across rounds.