• Login
  • Register

Work for a Member company and need a Member Portal account? Register here with your company email address.

Publication

Generating synthetic mobility data for a realistic population with RNNs to improve utility and privacy

Alex Berke

Alex Berke, Ronan Doorley, Kent Larson, Esteban Moro. 2022. Generating synthetic mobility data for a realistic population with RNNs to improve utility and privacy. In Proceedings of The 37th ACM/SIGAPP Symposium on Applied Computing (SAC ’22). 955-959. https://doi.org/10.1145/3477314.3507230

Abstract

Location data collected from mobile devices represent mobility behaviors at individual and societal levels. These data have important applications ranging from transportation planning to epidemic modeling. However, issues must be overcome to best serve these use cases: The data often represent a limited sample of the population and use of the data jeopardizes privacy.

To address these issues, we present and evaluate a system for generating synthetic mobility data using a deep recurrent neural network (RNN) which is trained on real location data. The system takes a population distribution as input and generates mobility traces for a corresponding synthetic population.

Related generative approaches have not solved the challenges of capturing both the patterns and variability in individuals' mobility behaviors over longer time periods, while also balancing the generation of realistic data with privacy. Our system leverages RNNs' ability to generate complex and novel sequences while retaining patterns from training data.
Also, the model introduces
randomness used to calibrate the variation between the synthetic and real data at the individual level. This is to both capture variability in human mobility, and protect user privacy.

Location based services (LBS) data from more than 22,700 mobile devices were used in an experimental evaluation across utility and privacy metrics. We show the generated mobility data retain the characteristics of the real data,
while
varying from the real data at the individual level,
and where this amount of variation matches the variation within the real data.

Related Content