Thesis

Forecasting Mental Distress using Healthcare Claims Data

Taylor, Sara. Forecasting Mental Distress using Healthcare Claims Data. 2020. MIT, PhD dissertation.

Abstract

Recently, depression rates have reached record levels in the US: 7.1% of adults in the US had at least one major depressive episode in 2015 and an estimated 7 million American adults aged 65 and older experience depression. Anxiety disorders are also on the rise, with a recent review estimating a prevalence of up to 25% for the general population.

This dissertation focuses on estimating and forecasting mental distress using data from elec- tronic health records and insurance claims to try to answer a fundamental question: Can we predict who will need mental health help before they need it? If these individuals can be identified, we can develop ways to quickly mobilize resources to respond to any increase in symptoms and develop the methods to mitigate the effects of mental distress through ongoing baseline treatments.

Following a brief high-level review of the US healthcare system and its data sources, we use various standardized survey scores stored in Electronic Health Records (EHRs) to define how mental distress is categorized in the more ubiquitous claims data. We achieve a Matthew’s correlation coefficient of 0.29 and an accuracy of 75% on a hold-out test set. These definitions are then used throughout the rest of the dissertation as the label of interest. We also describe a state-space based generalized linear model that can be used to estimate the rate of health care events. We found that only a 16-day history was needed for the state-space models compared to an 85-day history in a static model to achieve similar accuracies.

Finally, we forecast distress using demographic information and healthcare event rate features. We report Matthew’s correlation coefficients, accuracy, and other metrics for predicting 1, 3, 6, and 12 months in the future. On a hold-out test set, we achieved accuracies of 89%, 74%, 59%, and 47% for forecasting the presence of a distress event 1, 3, 6, and 12 months into the future, respectively (compared to a baseline static model with accuracies of 78%, 63%, 49%, and 34%). We found that including the current distress label significantly improved the forecast results of the next period. 

Related Content