TrackMarks: Semi-Automatic Video Annotation

This project attempts to address the practical problems involved with extracting behavioral information from large, multi-camera video corpora. Ultra-dense video recordings offer new possibilities for in-depth, quantitative analysis of human behavior, with applications ranging from child development research to determining how people are affected by different retail environments. Despite the growing sophistication of computer vision systems being developed for person tracking, gesture recognition, and object identification, these technologies remain error prone. Accurate video annotation still requires substantial human input. In order to analyze the hundreds of thousands of hours of video collected for the Human Speechome Project, we have developed a new software system for semi-automatically annotating longitudinal, multi-track video data. This system combines computer vision algorithms with a novel interface design to enable human annotators to generate and edit video annotations with speed and accuracy.