Augmenting Natural Communication in Nonverbal Individuals with Autism

Johnson, K.T.* & Narain, J.*, Maes, P., Picard, R.W., "Augmenting Natural Communication in Nonverbal Individuals with Autism," International Society for Autism Research (INSAR), Seattle, Washington, May 2020. (*Co-first authors/equal contribution)



Despite technological and usability advances, some individuals with minimally verbal autism (mvASD) still struggle to convey affect and intent using current augmentative communication systems. In contrast, their non-speech vocalizations are often affect and context rich and accessible in almost any environment. Our system uses primary caregivers' unique knowledge of an individual's vocal sounds to label and train machine learning models in order to build holistic communication technology (see Figure 1).


This work involves the development of 4 major research outputs:

  1. Scalable data collection methods to capture naturalistic nonverbal communication, including intuitive in-the-moment labeling by primary caregivers
  2. Signal processing techniques to accurately characterize the real-world audio, including signal-label alignment and spectral classification
  3. Personalized machine learning methods to elucidate an individual’s unique utterances
  4. Interactive augmentative communication interfaces to increase agency and improve understanding


We first identified the needs of the community through interviews (n=5) and surveys (n=18) with mvASD individuals and their families. We then conducted an eight-month case study with an elementary-aged nonverbal child and his family. Through a highly participatory design process, we refined the data collection process, created an inexpensive wearable audio recording system, and developed and deployed an open-source Android app for primary caregivers to label communicative and affective exchanges in real-time with minimal burden or interference. We collected over 13 hours of unprompted vocalizations from the child in his everyday environments with more than 300 labeled instances. The labeled signals were then used to classify states of affect, interaction, or communicative intent using multiple machine learning methods.


Initial machine learning results using a multi-class support vector machine (SVM) with 6 researcher-labeled states produced a weighted f1-score of 0.67, suggesting these states could be differentiated with audio only. Visual inspection of the audio waveforms highlighted the diversity and distinct characteristics of the individual’s vocalizations, which varied in tone, pitch, and duration depending on the individual’s emotional or physical state and intended communication. (Canonical examples from this individual are shown in Figure 2.) We then trained a deep learning model using a long short-term memory (LSTM) recurrent neural network (RNN) and a zero-shot transfer learning approach. This method employed a generic audio database to classify three categories of caregiver-labeled vocalizations: laughter, negative affect, and self-soothing sounds. We identified laughter and negative affect with 70% and 69% accuracy, respectively, but classification of the self-soothing sounds produced accuracies around chance.


These results highlight both the need for specialized, naturalistic databases and novel computational methods and their potential to enhance communicative and affective exchanges between mvASD individuals and the broader community. Future work includes 1) the development of a database of naturalistic mvASD vocalizations to engage the machine learning community, 2) data processing pipelines for signal accuracy and alignment, and 3) personalized, semi-supervised machine learning methods that leverage the sparsely labeled structure of the real-world data. Each of these steps is built on an iterative co-designing process with autistic stakeholders with the goal of increasing agency and enhancing dialogue between individuals with mvASD and the world. 

Related Content