What translation does to a survey instrument

Forward-back translation is treated as the standard procedure for adapting survey instruments across languages. The English instrument is translated into the target language by one bilingual translator; an independent translator translates it back into English; the two English versions are compared, discrepancies are reconciled, and the target-language instrument is considered validated. This procedure is required by most journal reviewers and by most institutional ethics committees. It catches a small fraction of the failures that actually matter.

Three failure modes survive forward-back translation almost untouched.

The first is register. A survey item translated into formal Hindi or formal Tamil or formal Bangla will read as alien to most respondents in informal rural contexts, even when it is technically accurate. Standardised written Indian languages developed largely through Sanskritised or Persianised registers, neither of which is the register most respondents speak in. Enumerators read the items aloud in this formal register, respondents puzzle through them, and the resulting variance is mostly noise about register comprehension rather than signal about the underlying construct. Forward-back translation does not detect register mismatch because both translators are working in the same formal register.

The second is examples that don't carry. The hypothetical scenarios that survey items embed — "if you needed to borrow Rs. 1000 quickly, who would you turn to?" — are not neutral. The amount Rs. 1000 means different things in urban Bengaluru and rural Khunti. The phrase "borrow quickly" presumes an emergency-credit relationship that is differently configured across regions, communities, and household structures. The set of possible answers is shaped by what the respondent can imagine doing, which is shaped by the example. Translating the words preserves the example. Adapting the instrument to the field context requires changing the example, which most translation protocols do not authorise.

The third is the policy idiom. Concepts like "empowerment," "wellbeing," "social capital," "agency," and "resilience" exist in development English with technical histories of usage and reasonably stable referents. They have no clean equivalents in most Indian languages, because the policy infrastructure that produced the English vocabulary did not produce a parallel vocabulary in Tamil or Odia or Assamese. Translators usually pick the closest available term, which carries different connotations from the original. The same issue runs in reverse for India-specific policy idioms — "BPL," "Aadhaar-seeded," "convergence," "saturation" — which exist in English-language policy text but mean little to respondents asked about them in their own language. The forward-back procedure routinely produces translations that read fine in English on the round-trip but produce systematically different response distributions in the field.

The corrective for all three failure modes is the cognitive interview. The protocol is straightforward: pause the respondent every few items and ask, in their own language, what they understood the question to be asking. The disagreements between the question the instrument intended and the question the respondent answered are usually visible in the first ten interviews. They are almost never visible in the back-translation document.

Bilingual interviewers are simultaneously a resource and a problem. Their fluency lets them detect the failures we have just described. It also lets them code-switch in the field in ways that smooth the failures over. An enumerator who notices that a respondent is confused by a Hindi item may, helpfully, paraphrase it into the local register or insert a clarifying example. The respondent's answer becomes more coherent. The instrument's standardisation degrades, often invisibly. Audio-recorded enumerator-respondent exchanges, sampled and reviewed, are the only reliable check on this drift. Most fieldwork budgets do not include them.

One pattern we have seen often enough to mention: items on financial autonomy translated into Odia produce systematically different response distributions than the same items in Hindi, even when the respondent populations are otherwise comparable. Some of the difference is real cultural variation in financial decision-making norms. Some of it, on cognitive-interview review, is the translation of the autonomy concept itself, which lands differently in the two languages because the underlying lexicon for individual versus household decision-making is differently structured. We have not found a way to disentangle these contributions cleanly. We have found that pretending the difference is purely cultural variation, which is the standard interpretation, is wrong.

The practical implication is uncomfortable. Instrument adaptation across languages is research, not translation. It requires cognitive interviews in each language, examples adapted to local context, and pilot data analysed for register effects. It cannot be done in two weeks by two translators. The budget implications are real. The methodological implications, if you skip them, are also real, but they hide as ordinary measurement error in the analysis — which means they are usually invisible to the funder reading the report.

If your survey runs in three languages and the analysis pools across them without item-level invariance testing, your headline numbers are likely to be artefacts of the language distribution as much as anything else.

Useful references: the WHO's guidelines on translation and adaptation of instruments; the ITC guidelines on test translation; and Willis's primer on cognitive interviewing for the corrective procedure.