Yadid Ayzenberg Thesis Defense: Tributary, Interactive Distributed Analytics of Large Scale Sensor Time Series Data

Play Video
Yadid Ayzenberg

State of the art technology has made it possible to monitor various physiological signals for prolonged periods. Using wearable sensors, individuals can be monitored; sensor data can be collected and stored in digital format, transmitted to remote locations, and analyzed at later times. This technology may open the door to a multitude of exciting and innovative applications.

We could learn the effects of the environment and of our day-to-day choices on our physiology. Does the number of hours we sleep affect our mood during the following day? Is our performance impacted by the times we schedule our recreational activities? Does physical activity affect our quality of sleep? Do these choices have an impact on chronic conditions?

The proliferation of smart phones and wearable sensors is creating very large data sets that may contain useful information. Gartner claims that the Internet of Things install base will grow to 26 billion units by 2020. However, the magnitude of generated data creates new challenges as well. Processing and analyzing these large data sets in an efficient manner requires advanced computational tools. The challenge is that as more data are collected, it becomes more computationally expensive to process requiring novel algorithmic techniques and parallel architectures. Traditional analysis techniques do not scale adequately, and in many cases researchers are required to create customized environments.

This thesis explores and extends the affordances of warehouse scale computing for interactivity and pliability of large scale time series data sets. In the first part of the thesis, Ayzenberg describes a theoretical framework for distributed processing of time-series data that is implementation invariant and may be implemented on an existing distributed computation infrastructure. Next, Ayzenberg presents a detailed architecture and implementation of the theoretical framework which was deployed on several clusters, as well as in-depth analysis of the user-interface design considerations and the user experience design process.

In the second part of the thesis, Ayzenberg presents a system evaluation that consists of two parts. The first part is a quantitative characterization of the system performance in a variety of scenarios that included different dataset and cluster sizes. The second part contains the results of a user study: researchers were asked to use the system to analyze data that they had collected in their own studies and to participate in an ethnographic study on their experience.

This study reveals that distributed computing holds great potential for accelerating scientific research utilizing large scale sensor data sets, providing new ways to see patterns in large sets of data, and much speedier analyses.

Host/Chair: Rosalind W. Picard


Andrew B. Lippman, John Roese

More in the video series Thesis Defenses
Related Content