Transfer Learning with Real-World Nonverbal Vocalizations from Minimally Speaking Individuals

Narain, J., Johnson, K., Quatieri, T., Picard, R., and Maes, P. “Transfer Learning with Real-World Vocalizations from Minimally Speaking Individuals”. Workshop in Interpretable ML in Healthcare at International Conference on Machine Learning. July 2021.


We trained and evaluated several types of transfer learning to classify affect and communication intent of nonverbal vocalizations from eight minimally speaking individuals (mv*). Datasets were recorded in the real-world with in-the-moment labels from a close family member. We trained deep neural nets (DNNs) on six audio datasets (including our dataset of nonverbal vocalizations) and then fine-tuned the models to classify affect and intent for each individual. We also evaluated a zero-shot approach for arousal and valence regression using an acted dataset of nonverbal vocalizations that occur amidst typical speech. For two of the eight mv* communicators, fine-tuning improved model performance compared to fully personalized DNNs and there were weak groupings in arousal values inferred using zero-shot learning. The limited success of the evaluated transfer learning approaches highlights the need for specialized datasets with mv* individuals.

Related Content