Dissertation Title: Forecasting Mental Distress using Healthcare Claims Data
Participation link (password: 068596)
Recently, depression rates have reached their highest recorded levels in the US: 7.1% of adults in the US had at least one major depressive episode in 2015, 14% of women had postpartum depression 4-6 weeks after giving birth, and an estimated 7 million American adults aged 65 and older experience depression. Anxiety disorders are also on the rise, with a recent review estimating a prevalence of 3.8%-25% for the general population. Unfortunately, there is not a solid explanation for why some people have these illnesses, and others never suffer from any.
This dissertation focuses on estimating and forecasting mental distress using data from electronic health records and medical and pharmacy claims to try to answer a fundamental question: Can we predict who will need mental health help before they need it? If these individuals can be identified, then we can develop ways to quickly mobilize resources to respond quickly to any increase in symptoms and methods to mitigate the short and long term effects of mental distress through ongoing baseline treatments.
Following a brief high-level review of the US healthcare system and the benefits and limitations of various data sources, we use various standardized survey scores stored in Electronic Health Records (EHRs) to define how mental distress is categorized in the more ubiquitous medical and pharmacy claims data. We achieve a Matthew’s correlation coefficient of 0.29 and an accuracy of 75% on a hold-out test set. These definitions are then used throughout the rest of the dissertation as the label of interest. We also describe a state-space based generalized linear model that can be used to estimate the rate of health care events and how the model can be used to make inferences and how well it performs on previously unseen individuals. We found that only a 16-day history was needed for the state-space models compared to an 85-day history in a static model to achieve similar accuracies.
Finally, we forecast the distress labels using demographic information and health care event rate features. We report Matthew’s correlation coefficients, accuracy, and other metrics for predicting 1, 3, 6, and 12 months in the future.
Rosalind W. Picard, Sc.D., Professor of Media Arts and Sciences, MIT Media Lab
Dennis A. Ausiello, M.D., Director, Center for Assessment Technology and Continuous Health (CATCH), Massachusetts General Hospital
Isaac Samuel Kohane, M.D., Ph.D., Marion V. Nelson Professor of Biomedical Informatics, Harvard Medical School
Deneen Vojta, M.D., Executive Vice President, Research & Development, UnitedHealth Group