Asma Ghandeharioun Dissertation Defense

April 23, 2021
10:00am — 12:00pm ET

Dissertation Title: Towards Human-Centered Optimality Criteria


Despite transformational success of machine learning across various applications, examples of deployed models failing to recognize and support human-centered (HC) criteria are abundant. In this thesis, I conceptualize the space of human-machine collaboration with respect to two components: interpretation of peopleby machines and interpretation of machines by people. I develop several tools that make improvements along these axes. First, I develop a pipeline that predicts depressive symptoms as rated by clinicians from real-world longitudinal data outperforming several baselines. Second, I introduce a novel, model-agnostic, and dataset-agnostic method to approximate interactive human evaluation in open-domain dialog through self-play that is more strongly correlated with human evaluations than other automated metrics commonly used today. While dialog quality evaluation metrics predominantly use word-level overlap or distance metrics based on embedding resemblance to each turn of the conversation, I show the significance of taking into account the trajectory of the conversation and using proxies such as sentiment, semantics, and user engagement that are psychologically motivated. Third, I demonstrate an uncertainty measurement technique that helps disambiguate annotators and data bias and inter-rater disagreement as an antidote to relying on the majority vote in multi-annotator settings. I show this characterization also provides a proxy for model calibration, the degree the model’s predicted confidence matches the true likelihood of correctness. Finally, I present a novel method that allows humans to investigate the decision-making process of a predictor to gain better insight into how it works. The method jointly trains a generator, a discriminator, and a concept disentangler, allowing the human to ask "what-if" questions. I evaluate it on several challenging synthetic and realistic datasets where previous methods fall short of satisfying desirable criteria for interpretability, and show that our method performs consistently well across all. I discuss its applications for detecting potential biases of a classifier, investigating its alignment with expert domain knowledge, and identifying spurious artifacts that impact predictions using simulated experiments. Together, these novel techniques and insights provide a more comprehensive interpretation of peopleby machines and more flexible tools for interpretation of machines by peoplethat can move us closer to HC optimality.

Committee members:

Rosalind W. Picard, Sc.D.
Professor of Media Arts and Sciences
Massachusetts Institute of Technology 

David Sontag, Ph.D.
Associate Professor of Electrical Engineering and Computer Science
Massachusetts Institute of Technology 

Zachary Chase Lipton, Ph.D.
Assistant Professor of Operations Research and Machine Learning
Carnegie Mellon University

More Events