Cognitive Machines Group

Journal, Conference, and Workshop Papers

Soroush Vosoughi, Helen Zhou, and Deb Roy. (2015). Digital Stylometry: Linking Profiles Across Social Networks. Proceedings of the 7th International Conference on Social Informatics (SocInfo 2015). Beijing, China. pdf (344KB)

Soroush Vosoughi and Deb Roy. (2015). A Human-Machine Collaborative System for Identifying Rumors on Twitter. In proceedings of the 2015 IEEE International Conference on Data Mining (ICDM) Workshop on Event Analytics using Social Media Data (EASM). Atlantic City, New Jersey. pdf (476KB)

Soroush Vosoughi, Helen Zhou, and Deb Roy. (2015). Enhanced Twitter Sentiment Classification Using Contextual Information. In proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP) Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA). Lisboa, Portugal. pdf (1.5MB)

Brandon C. Roy, Soroush Vosoughi, and Deb Roy. (2014). Grounding language models in spatiotemporal context. Proceedings of Interspeech 2014. Singapore. pdf (500KB)

Soroush Vosoughi. (2014). Improving automatic speech recognition through head pose driven visual grounding. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’14). ACM, New York, NY, USA, 3235-3238. pdf (816KB)

Iris Chin , Soroush Vosoughi, Mathew Goodwin, Deb Roy and Letitia Naigles. (2014). How the Speechome Recorder can change our understanding of developmental trajectories. Poster presented at the 13th International Congress for the Study of Child Language. Amsterdam, The Netherlands. pdf (68KB)

Iris Chin, Soroush Vosoughi, Mathew Goodwin, Deb Roy and Letitia Naigles. (2013). Dense Data Collection Through the Speechome Recorder Better Reveals Developmental Trajectories. In the Extended Abstract of the International Meeting for Autism Research (IMFAR) 2013. San Sebastián, Spain. pdf (60KB)

Iris Chin, Soroush Vosoughi, Emily Potrzeba, Mathew Goodwin, Deb Roy and Letitia Naigles. (2013). Verb use in a child previously diagnosed with ASD: Dense recordings reveal typical and atypical development. In the Extended Abstract of the Biennial Meeting of the Society for Research in Child Development (SRCD) 2013. Seattle, Washington. pdf (116KB)

Soroush Vosoughi, Matthew S. Goodwin, Bill Washabaugh, and Deb Roy. (2012). A portable audio/video recorder for longitudinal study of child development. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI '12). ACM, New York, NY, USA, 193-200. pdf (1.5MB)

Jeff Orkin and Deb Roy. (2012). Understanding Speech in Interactive Narratives with Crowdsourced Data. Proceedings of the Artificial Intelligence and Interactive Digital Entertainment Conference (AIIDE). pdf (6885KB)

Soroush Vosoughi and Deb Roy. (2012). An Automatic Child-Directed Speech Detector for the Study of Child Language Development. Proceedings of Interspeech 2012. Portland, Oregon. pdf (1.5MB)

Brandon C. Roy, Michael C. Frank, and Deb Roy. (2012). Relating Activity Contexts to Early Word Learning in Dense Longitudinal Data. Proceedings of the 34th Annual Meeting of the Cognitive Science Society. Sapporo, Japan. (Correction to Figure 4, 5/18/13) pdf (912KB)

Soroush Vosoughi and Deb Roy. (2012). A longitudinal study of prosodic exaggeration in child-directed speech. Proceedings of the 6th International Conference on Speech Prosody. Shanghai, China. pdf (200KB)

Iris Chin, Devin Rubin, Andrea Tovar, Soroush Vosoughi, Michelle Cheng, Emily Potrzeba, Mathew Goodwin, Deb Roy, Letitia Naigles. (2012). Dense Recordings of Naturalistic Interactions Reveal both Typical and Atypical Speech in One Child with ASD. Proceedings of the International Meeting for Autism Research (IMFAR). Toronto, Canada. pdf (12KB)

Hilke Reckman, Jeff Orkin and Deb Roy. (2011). Extracting aspects of determiner meaning from dialogue in a virtual world environment. Proceedings of the International Conference on Computational Semantics (IWCS). Oxford, England. pdf (308KB)

Stefanie Tellex, Thomas Kollar, George Shaw, Nicholas Roy and Deb Roy. (2010). Grounding Spatial Language for Video Search. Proceedings of the Twelfth International Conference on Multimodal Interfaces (ICMI). Beijing, China. pdf (636KB) Best Student Paper Award

Philip DeCamp, George Shaw, Rony Kubat and Deb Roy. (2010). An Immersive System for Browsing and Visualizing Surveillance Video. Proceedings of ACM Multimedia 2010. Florence, Italy. pdf (3.5MB)

Albert Huang, Stefanie Tellex, Abraham Bachrach, Thomas Kollar, Deb Roy, and Nick Roy. (2010). Natural Language Command of an Autonomous Micro-Air Vehicle. Proceedings of the International Conference on Intelligent Robots and Systems (IROS). Taipei, Taiwan. pdf (1.4MB)

Meredith Meyer, Philip DeCamp, Bridgette Hard, Dare Baldwin and Deb Roy. (2010). Assessing Behavioral and Computational Approaches to Naturalistic Action Segmentation. Proceedings of the 32nd Annual Cognitive Science Conference. Portland, Oregon. pdf (388KB)

Brandon C. Roy*, Soroush Vosoughi*, and Deb Roy. (2010). Automatic Estimation of Transcription Accuracy and Difficulty. Proceedings of Interspeech 2010. Makuhari, Japan. pdf (1.7MB) *Equal Contribution

Hilke Reckman, Jeff Orkin and Deb Roy. (2010). Learning meanings of words and constructions, grounded in a virtual game. Proceedings of the 10th Conference on Natural Language Processing (KONVENS). Saarbrücken, Germany. pdf (276 KB)

Soroush Vosoughi, Brandon C. Roy, Michael C. Frank, and Deb Roy. (2010). Contributions of Prosodic and Distributional Features of Caregivers' Speech in Early Word Learning. Proceedings of the 32nd Annual Cognitive Science Conference. Portland, Oregon. pdf (348KB)

Jeff Orkin, Tynan Smith, Hilke Reckman, and Deb Roy. (2010). Semi-Automatic Task Recognition for Interactive Narratives with EAT & RUN. Proceedings of the 3rd Intelligent Narrative Technologies Workshop at the 5th International Conference on Foundations of Digital Games (FDG), Monterey, CA. pdf (236KB)

Jeff Orkin and Deb Roy. (2010). Semi-Automated Dialogue Act Classification for Situated Social Agents in Games. Proceedings of the Agents for Games & Simulations Workshop at the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada. pdf (242KB)

Soroush Vosoughi, Brandon C. Roy, Michael C. Frank, and Deb Roy. (2010). Effects of Caregiver Prosody on Child Language Acquisition. Proceedings of the 5th International Conference on Speech Prosody. Chicago, IL. pdf (344KB)

Thomas Kollar, Stefanie Tellex, Deb Roy, and Nick Roy. (2010). Toward Understanding Natural Language Directions. Proceedings of Human Robot Interaction Conference 2010 (HRI-2010). Osaka, Japan. pdf (1.2MB)

Stefanie Tellex and Deb Roy. (2009). Grounding Spatial Prepositions for Video Search. Proceedings of the Eleventh International Conference on Multimodal Interfaces (ICMI-2009). Cambridge, MA. pdf (2.6MB)

Deb Roy. (2009). New Horizons in the Study of Child Language Acquisition. Proceedings of Interspeech 2009. Brighton, England. pdf (1.4MB)

Brandon C. Roy and Deb Roy. (2009). Fast transcription of unstructured audio recordings. Proceedings of Interspeech 2009. Brighton, England. pdf (276K)

Rony Kubat, Daniel Mirman and Deb Roy. (2009). Semantic context effects on color categorization. Proceedings of the 31st Annual Cognitive Science Society Meeting. pdf (392K)

Brandon C. Roy, Michael C. Frank and Deb Roy. (2009). Exploring word learning in a high-density longitudinal corpus. Proceedings of the 31st Annual Meeting of the Cognitive Science Society. pdf (820K)

Philip DeCamp and Deb Roy. (2009). A Human-Machine Collaborative Approach to Tracking Human Movement in Multi-Camera Video. Proceedings of the 2009 International Conference on Content-based Image and Video Retrieval (CIVR). pdf (1.0MB)

Stefanie Tellex and Deb Roy. (2009). Towards Surveillance Video Search by Natural Language Query. Proceedings of the ACM International Conference on Image and Video Retrieval. pdf (642K)

Jeff Orkin and Deb Roy. (2009). Automatic Learning and Generation of Social Behavior from Collective Human Gameplay. Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). pdf (377K)

Deb Roy. (2008). A Mechanistic Model of Three Facets of Meaning. Symbols, Embodiment, and Meaning, de Vega, Glenberg, and Graesser, eds. pdf (1.9MB)

Michael Fleischman and Deb Roy. (2008). Grounded Language Modeling for Automatic Speech Recognition of Sports Video. HLT/NAACL. Columbus, OH. pdf (212K)

Kai-yuh Hsiao, Stefanie Tellex, Soroush Vosoughi, Rony Kubat, and Deb Roy. (2008). Object Schemas for Grounding Language in a Responsive Robot. Connection Science 20, 4 (Dec.2008), 253-276. pdf (1.6MB)

Kai-yuh Hsiao, Soroush Vosoughi, Stefanie Tellex, Rony Kubat, and Deb Roy. (2008). Object Schemas for Responsive Robotic Language Use. Proceedings of the 3rd ACM/IEEE International Conference on Human-Robot Interaction. pdf (269K)

Jeff Orkin and Deb Roy. (2007). The Restaurant Game: Learning Social Behavior and Language from Thousands of Players Online. Journal of Game Development, 3(1), 39-60. pdf (3.9MB)

Rony Kubat, Philip DeCamp, Brandon Roy, and Deb Roy. (2007). TotalRecall: Visualization and Semi-Automatic Annotation of Very Large Audio-Visual Corpora. Ninth International Conference on Multimodal Interfaces (ICMI 2007). pdf (491K)

Michael Fleischman and Deb Roy. (2007). Unsupervised Content-Based Indexing of Sports Video Retrieval. 9th ACM Workshop on Multimedia Information Retrieval (MIR). Augsburg, Germany. pdf (264K)

Michael Fleischman, Brandon Roy, and Deb Roy. (2007). Temporal Feature Induction for Baseball Highlight Classification. ACM Multimedia Conference. Augsburg, Germany. pdf (317K)

Peter Gorniak and Deb Roy. (2007). Situated Language Understanding as Filtering Perceived Affordances. Cognitive Science, 31(2), 197-231. pdf (1.7MB)

Michael Fleischman and Deb Roy. (2007). Situated Models of Meaning for Sports Video Retrieval. HLT/ACL 2007, Rochester, NY. pdf (293K)

Stefanie Tellex and Deb Roy. (2007) Grounding Language in Spatial Routines. AAAI 2007 Spring Symposia on Control Mechanisms for Spatial Knowledge Processing in Cognitive / Intelligent Systems, Stanford University, Palo Alto CA. pdf (116K)

Michael Levit and Deb Roy. (2007). Interpretation of Spatial Language in a Map Navigation Task. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 37(3), 667-679. pdf (386K)

Michael Fleischman, Philip DeCamp, and Deb Roy. (2006). Mining Temporal Patterns of Movement for Video Content Classification. Proceedings of the 8th ACM SIGMM International Workshop on Multimedia Information Retrieval. pdf (323K)

Deb Roy, Rupal Patel, Philip DeCamp, Rony Kubat, Michael Fleischman, Brandon Roy, Nikolaos Mavridis, Stefanie Tellex, Alexia Salata, Jethran Guinness, Michael Levit, Peter Gorniak. (2006). The Human Speechome Project. Proceedings of the 28th Annual Cognitive Science Conference. pdf (756K)

Peter Gorniak and Deb Roy. (2006). Perceived Affordances as a Substrate for Linguistic Concepts. Twenty-eighth Annual Meeting of the Cognitive Science Society, 6 pages. pdf (3,318K)

Peter Gorniak, Jeff Orkin, and Deb Roy. (2006). Speech, Space and Purpose: Situated Language Understanding in Computer Games. Twenty-eighth Annual Meeting of the Cognitive Science Society Workshop on Computer Games. pdf (313K)

Nikolaos Mavridis and Deb Roy. (2006). Grounded Situation Models for Robots: Where Words and Percepts Meet. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pdf (598K)

Dong Zhang, Daniel Gatica-Perez, Deb Roy, Samy Bengio. (2006). Modeling Interactions from Email Communication. IEEE International Conference on Multimedia & Expo (ICME). pdf (208K)

Stefanie Tellex and Deb Roy. (2006). Spatial Routines for a Simulated Speech-Controlled Vehicle. Proceedings of Human Robot Interaction Conference 2006 (HRI-2006). pdf (200K)

Philip DeCamp, Amber Frid-Jimenez, Jethran Guiness and Deb Roy. (2005). Gist Icons: Seeing Meaning in Large Bodies of Literature. IEEE Info Visualization 2005 Conference. pdf (1.5MB)

Kai-yuh Hsiao and Deb Roy. (2005). A Habit System for an Interactive Robot. AAAI Fall Symposium 2005: From Reactive to Anticipatory Cognitive Embodied Systems. pdf (981K)

Peter Gorniak and Roy (2005). Probabilistic Grounding of Situated Speech using Plan Recognition and Reference Resolution. Seventh International Conference on Multimodal Interfaces (ICMI 2005). Best Paper Award. pdf (312K)

Michael Fleischman and Deb Roy. (2005). Intentional Context in Situated Language Learning. Ninth Conference on Computational Natural Language Learning. pdf (224K)

Nick Mavridis and Deb Roy. (2005). Grounded Situation Models for Robots: Bridging language, Perception, and Action. AAAI-05 Workshop on Modular Construction of Human-Like Intelligence. pdf (544K)

Deb Roy. (2005). Semiotic Schemas: A Framework for Grounding Language in Action and Perception. Artificial Intelligence, 167(1-2):170-205. pdf (1 MB)

Michael Fleischman and Deb Roy. (2005). Why are verbs harder to learner than nouns? Initial insights from a computational model of situated word learning. 27th Annual Meeting of the Cognitive Science Society. pdf (584K)

Deb Roy. (2005). Grounding words in perception and action: computational insights. Trends in Cognitive Science, 9(8), 389-396. pdf (272K)

Kai-yuh Hsiao, Peter Gorniak, and Deb Roy. (2005). NetP: A Network API for Building Heterogeneous Modular Intelligent Systems. Proceedings of AAAI 2005 Workshop in Modular Construction of Human-Like Intelligence, pdf (667K)

Peter Gorniak and Deb Roy. (2005). Speaking with your Sidekick: Understanding Situated Speech in Computer Role Playing Games. Proceedings of Artificial Intelligence and Interactive Digital Entertainment, 2005. pdf (624K)

Deb Roy and Niloy Mukherjee. (2005). Towards Situated Speech Understanding: Visual Context Priming of Language Models. Computer Speech and Language, 19(2), pages 227-248. pdf (567K)

Joshua Juster and Deb Roy. (2004). Elvis: Situated Speech and Gesture Understanding for a Robotic Chandelier. Proc. Int. Conf. Multimodal Interfaces. pdf (372K)

Deb Roy, Yair Ghitza, Jeff Bartelma, and Charlie Kehoe. (2004). Visual Memory Augmentation: Using Eye Gaze as an Attention Filter. Proceedings of the IEEE International Symposium on Wearable Computers. pdf (8 MB)

Deb Roy, Kai-Yuh Hsiao, and Nikolaos Mavridis. (2004). Mental
Imagery for a Conversational Robot. IEEE Transactions on Systems, Man,
and Cybernetics, Part B, 34(3), 1374-1383. pdf (488K)

Peter Gorniak and Deb Roy. (2004). Grounded Semantic Composition for Visual Scenes, Journal of Artificial Intelligence Research, Volume 21, pages 429-470. pdf (1.2MB)

Peter Gorniak and Deb Roy. (2003). Augmenting User Interfaces with Adaptive Speech Commands. In Proceedings of the International Conference for Multimodal Interfaces. pdf (355K)

Peter Gorniak and Deb Roy. (2003). A Visually Grounded Natural Language Interface for Reference to Spatial Scenes. In Proceedings of the International Conference for Multimodal Interfaces. pdf (562K)

Brian Whitman, Deb Roy, Barry Vercoe. (2003). Learning Word Meanings and Descriptive Parameter Spaces from Music. In Proceedings of the HLT-NAACL03 workshop on Learning Word Meaning from Non-Linguistic Data. pdf (570K)

Kai-yuh Hsiao, Nikolaos Mavridis, Deb Roy. Coupling Perception and Simulation: Steps Towards Conversational Robotics. (2003). The Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. pdf (206K)

Deb Roy, Kai-yuh Hsiao, Nikolaos Mavridis. (2003). Conversational Robots: Building Blocks for Grounding Word Meanings. In Proceedings of the HLT-NAACL03 Workshop on Learning Word Meaning from Non-Linguistic Data. pdf (364K)

Deb Roy. (2003). Grounded Spoken Language Acquisition: Experiments in Word Learning. IEEE Transactions on Multimedia, 5(2): 197-209. pdf (1.1MB)

Deb Roy, Peter Gorniak, Niloy Mukherjee, and Josh Juster. (2002). A Trainable Spoken Language Understanding System for Visual Object Selection. In Proceedings of the International Conference of Spoken Language Processing. pdf (86K)

Deb Roy. (2002). A Trainable Visually-Grounded Spoken Language Generation System. In Proceedings of the International Conference of Spoken Language Processing. pdf (177K)

Deb Roy. (2002). Learning Words and Syntax for a Visual Description Task. Computer Speech and Language. pdf (513K)

Deb Roy. (2001/2002). Learning Visually Grounded Words and Syntax of Natural Spoken Language. Evolution of Communication. 4(1). pdf (829K)

Deb Roy and Alex Pentland. (2002). Learning Words from Sights and Sounds: A Computational Model. Cognitive Science, 26(1), 113-146. pdf (689K)

Ewa Dominowska, Deb Roy and Rupal Patel. (2002). An Adaptive Context-Sensitive Communication Aid. Proceedings for the 17th Annual International Conference "Technology and Persons with Disabilities".

Deb Roy. (2000). Integration of Speech and Vision using Mutual Information. Int. Conf. Acoustics, Speech and Signal Processing. pdf (626K)

Ph.D. Theses

Soroush Vosoughi. (2015). Automatic Detection and Verification of Rumors on Twitter. Ph.D. Thesis, Massachusetts Institute of Technology. pdf (4.2MB)

Brandon C. Roy. (2013). The Birth of a Word. Ph.D. Thesis, Massachusetts Institute of Technology. pdf (3.8MB)

Philip DeCamp. (2012). Data Visualization in the First Person. Ph.D. Thesis, Massachusetts Institute of Technology. pdf with videos (129MB),pdf (23.1MB)

Rony Kubat. (2012). Will They Buy? Ph.D. Thesis, Massachusetts Institute of Technology. pdf (3.3MB)

Stefanie Tellex. (2010). Natural Language and Spatial Reasoning. Ph.D. Thesis, Massachusetts Institute of Technology. pdf (8.2MB)

Michael Ben Fleischman. (2008). Grounding Language in Events. Ph.D. Thesis, Massachusetts Institute of Technology. pdf (1.8M)

Kai-yuh Hsiao. (2007). Embodied Object Schemas for Grounding Language Use. Ph.D. Thesis, Massachusetts Institute of Technology. pdf (7.6M)

Peter Gorniak. (2005). The Affordance-Based Concept. Ph.D. Thesis, Massachusetts Institute of Technology. pdf (5.8M)

Masters Theses

Matthew Miller. (2011). Semantic Spaces: Behavior, Language and Word Learning in the Human Speechome Corpus. M.Sc. in Media Arts and Sciences Thesis. pdf (81.9MB)

George Macaulay Shaw. (2011). A Taxonomy of Situated Language in Natural Contexts. M.Sc. in Media Arts and Sciences Thesis. pdf (54.9MB)

Sheng-Ying Pao. (2010). Connected Strangers: Manipulating Social Perceptions to Study Trust. M.Sc. in Media Arts and Sciences Thesis. pdf (2.8MB)

Kleovoulos Tsourides. (2010). Visually Grounded Virtual Accelerometers: A Longitudinal Video Investigation of Dyadic Bodily Dynamics around the time of Word Acquisition. M.Sc. in Media Arts and Sciences Thesis. pdf (10MB)

Soroush Vosoughi. (2010). Interactions of caregiver speech and early word learning in the Speechome Corpus: Computational Explorations. M.Sc. in Media Arts and Sciences Thesis. pdf (1.8MB)

Sophia Yuditskaya. (2010). Automatic Vocal Recognition of a Child's Perceived Emotional State within the Speechome Corpus. M.Sc. in Media Arts and Sciences Thesis. pdf (12.6MB)

Philip DeCamp. (2007). HeadLock: Wide-Range Head Pose Estimation for Low Resolution Video. M.Sc. in Media Arts and Sciences Thesis. pdf (24.4M)

Jeff Orkin. (2007). Learning Plan Networks in Conversational Video Games. M.Sc. in Media Arts and Sciences Thesis. pdf (6.9M)

Philipp Robbel. (2007). Exploiting Object Dynamics for Recognition and Control. M.Sc. in Media Arts and Sciences Thesis. pdf (6.1M)

Brandon Roy. (2007). Human-Machine Collaboration for Rapid Speech Transcription. M.Sc. in Media Arts and Sciences Thesis. pdf (13.1M)

Stefanie Tellex. (2006). Grounding Language in Spatial Routines. M.Sc. in Media Arts and Sciences Thesis. pdf (1.3M)

Andre Ribeiro. (2005). Graph Dynamics: Learning and Representation. M.Sc. in Media Arts and Sciences Thesis. pdf (1.1M)

Niloy Mukherjee. (2003). Spontenous Speech Recognition Using Visual Context-Aware Language Models. M.Sc. in Media Arts and Sciences Thesis. pdf (1360K)

Sheel Sanjay Dhande. (2003). A Computational Model to Connect Gestalt Perception and Natural Language. M.Sc. in Media Arts and Sciences Thesis. pdf (736K)

M.Eng. Theses

Charles Kehoe. (2005). Indexical Grounding for a Mobile Robot. M.Eng. EECS Thesis. pdf (255K)

Jeffrey Bartelma. (2004). Flycatcher: Fusion of Gaze with Hierarchical Image Segmentation for Robust Object Detection. M.Eng. EECS Thesis. pdf (949K)

Joshua Juster. (2004). Speech and Gesture Understanding in a Homeostatic Control Framework for a Robotic Chandelier. M.Eng. EECS Thesis. pdf (539K)

Christopher Lucas. (2004). Patent Semantics: Analysis, Search, and Visualization of Large Text Corpora. M.Eng. EECS Thesis. pdf (918K)

Ewa Dominowska. (2002). A Communication Aid with Context-Aware Vocabulary Prediction. M.Eng. EECS Thesis. pdf (1.5M)

Norimasa Yoshida. (2002). Automatic Utterance Segmentation in Spontaneous Speech. M.Eng. EECS Thesis. pdf (591K)

Ben Yoder. (2001). Spontaneous Speech Recognition Using Hidden Markov Models. M.Eng. EECS Thesis. pdf (591K)

Miscellaneous

Naigles, L. R., Chin, I., Vosoughi, S., Goodwin, M., & Roy, D. (2011). Final Report for 3R01DC007428 - 04S1 Collaborative research supplement and infrastructure and research equipment for advancement of science. pdf (1.3M)

Kai-yuh Hsiao, and Soroush Vosoughi. (2009).The Object Schema Model and Situational Context. pdf (155K)

Peter Gorniak. (2003). Meaning "I". General Exams Paper. pdf (2.8M)