District Prioritisation Data Practice 2024

What administrative data cannot tell you about district vulnerability — and what you have to go find out

Every composite vulnerability index is built on data that systematically undercounts the districts it most needs to measure accurately. The question is not whether to use administrative data, but how to be honest about what it misses and how to compensate.

The India Health and Climate Resilience Fund set out to prioritise districts based on evidence, not on political geography or programme convenience. That is the right instinct. But when you sit down to build a composite indicator framework from publicly available administrative data, you encounter a structural problem: the districts with the highest vulnerability are often the ones with the worst data quality.

This is not random noise. It is a pattern. HMIS completeness — the share of health facilities actually reporting into the national system — falls precisely in the districts where facility density is lowest, where staff vacancies are highest, and where state-level administrative oversight is thinnest. The same districts that score poorly on health system capacity are, by the logic of the system, also the ones least likely to have accurate data on that capacity. The composite score you produce will, in a quiet and consistent way, understate the vulnerability of the places you most need to reach.

A district that cannot count what is happening in its health facilities is not the same as a district where nothing much is happening. The absence of data is itself a finding — and often a more troubling one than any number you could put in its place.

For the IHCRF framework, we found that 22 of the 116 districts in our initial sample had HMIS completeness below 60%. Seven of those 22 were in the top quartile of our climate exposure index. In other words, some of the most climate-exposed districts were also the ones we had the least reliable information about. The composite score for those districts was technically producible, but it was built on shaky foundations.

The specific failure modes

Three data gaps caused the most analytical trouble.

First, NFHS-5 point estimates for small districts. The National Family Health Survey is the closest thing India has to a nationally standardised household survey at district level, but for districts with populations below 300,000, the confidence intervals on key indicators — child stunting, institutional delivery, OOP expenditure — are wide enough to make comparisons unreliable. We identified 17 such districts in our analysis. A district might appear to have a stunting rate of 42%, but the survey-weighted 95% confidence interval runs from 33% to 51%. That is not a point estimate; it is a range so wide it covers most of the cross-district distribution.

Second, the 2011 Census as a population denominator. All facility density indicators — beds per 10,000 population, facilities per 1,000 population — depend on a population figure. The 2021 Census has not been conducted. For districts with high population growth since 2011, we are systematically understating density, which means understating capacity. For tribal districts in Jharkhand with significant in-migration from forest displacement, the error can be substantial.

Third, and most consequential for a climate-health framework, administrative data does not capture what happens to health access during a climate event. HMIS tells you how many patients visited a facility in a given month. It does not tell you how many people wanted to seek care and could not reach the facility because the road was flooded, or because the breadwinner had migrated, or because the ASHA herself was managing a flooded home. The demand that never materialised is invisible in facility-based records.

Data design implication

This is the primary justification for Module C of the household survey instrument: questions on climate event experience and its effects on health-seeking are not supplementary context. They are the measurement strategy for a phenomenon that cannot be measured from the supply side. The administrative data tells you what the system delivered. The household survey tells you what the system failed to deliver, and why.

What field instrument design can do

The response to these gaps is not to abandon administrative data — it is irreplaceable for facility-level indicators — but to design the primary data collection to specifically address what administrative sources cannot reach. This requires being precise about which gaps matter most for the investment decision, rather than trying to fill all of them.

For the IHCRF framework, three field-data compensations were built into the household survey design. The first is the health access under climate stress module, which asks directly whether climate events have affected the household's ability to reach a facility, and how. This captures the demand-suppression effect that HMIS data misses entirely.

The second is a structured enumerator observation protocol — the final question in Module E — that records the assessor's direct observation of household socioeconomic condition at the time of interview. Standardised enumerator observation is not a substitute for consumption data, but it provides a quality check on self-reported income brackets and a mechanism for flagging outliers for supervisory review.

The third is the skip pattern architecture itself. A questionnaire that fails to reach its later modules — because respondents drop off, or because enumerators rush — will systematically miss the most vulnerable households, who are also the most time-constrained and least accustomed to formal interviews. Skip patterns that efficiently route low-burden households through the instrument while maintaining depth for the most relevant subgroups are not a design nicety; they are a data quality intervention.

Transparency as method

The final consideration is epistemological. A vulnerability index that is presented without an honest account of its data limitations is not a neutral technical product. It is an argument dressed as a score. When a district ranking is used to allocate scarce resources — as the IHCRF composite score was — the people using it need to know where the numbers are firm and where they are not.

The sensitivity analysis we ran was not primarily about statistical robustness. It was about communication: showing funders and programme staff that the top-decile districts appear across most reasonable alternative specifications, and that the three districts that drop in and out depending on weight assumptions deserve specific scrutiny, not automated exclusion.

Good evaluation practice, in the end, is not about finding the method that produces the most impressive looking output. It is about being clear about what you know, what you are inferring, and what remains genuinely uncertain — and building that clarity into every step of the instrument and the analysis.

Pinpoint Ventures Advisory LLP provides MEL design, composite indicator frameworks, and field survey instruments for social sector programmes in India and South Asia. This thinking note is an illustrative sample of Pinpoint's analytical practice. Contact: pinpointventures.in

___

HMIS completeness analysis based on NHM dashboard data (FY2022–23). Districts classified as below 60% completeness where more than 40% of registered facilities show zero reporting months in a 12-month window.

NFHS-5 district estimates and confidence intervals sourced from IIPS district factsheets (2019–21). Confidence intervals computed by IIPS using Taylor series linearization.

The argument on demand suppression draws on evidence from the literature on conflict-affected health systems, where similar supply-side measurement gaps have been documented — see Kruk et al. (2010), "What is a resilient health system?" The Lancet.

ILLUSTRATIVE SAMPLE — Pinpoint Ventures Advisory LLP — pinpointventures.in