Abstract
Subjective well-being is central to economic, medical, and policy decision-making. We evaluate whether large language models (LLMs) can provide valid predictions of well-being across global populations. Using natural-language profiles from 64,000 individuals in 64 countries, we benchmark four leading LLMs against self-reports and statistical models. Unlike regressions, which estimate relationships from survey data, LLMs draw only on individual characteristics (e.g., sociodemographic, attitudinal, and psychological factors) together with associations encoded during pretraining, rather than from the survey’s subjective well-being responses. They produced plausible patterns consistent with known correlates such as income and health, but systematically underperformed relative to regressions and showed the largest errors in underrepresented countries, reflecting biases rooted in global digital and economic inequality. A preregistered experiment revealed that LLMs rely on surface-level linguistic associations rather than conceptual understanding, leading to predictable distortions in unfamiliar contexts. Injecting contextual information partly reduced—but did not remove—these biases. These findings demonstrate that while LLMs can simulate broad correlates of life satisfaction, they fail to capture its experiential and cultural depth. Accordingly, they should not be used as substitutes for human self-reports of well-being; doing so would risk reinforcing inequality and undermining human agency.