Project

Dynamic Traffic Prediction in Andorra: a Bayesian Network Approach

Ronan Doorley

Groups

Data Fusion for Dynamic Traffic Prediction

Traffic congestion has huge negative impacts on the productivity, health and personal lives of city dwellers. To manage this problem effectively, transportation engineers need to predict traffic congestion throughout the road network at all hours of the day. Prediction of traffic typically involves travel surveys that are expensive, time consuming and do not capture temporal variation in travel demand. However, anonymised location data from mobile phones present an alternative source of data which is passively collected, widely available and naturally captures temporal trends. On the other hand, these data contain other biases and so if we use these data for transportation models, we must take care to correct for these biases using more reliable data. As part of the City Science collaboration with Andorra, we used a Bayesian network to build a calibrated transportation model for the country based on geolocated telecoms data and validated using a small sample of traffic counts.

Data Fusion for Dynamic Traffic Prediction

Traffic congestion has huge negative impacts on the productivity, health and personal lives of city dwellers. To manage this problem effectively, transportation engineers need to predict traffic congestion throughout the road network at all hours of the day. Prediction of traffic typically involves travel surveys that are expensive, time consuming and do not capture temporal variation in travel demand. However, anonymised location data from mobile phones present an alternative source of data which is passively collected, widely available and naturally captures temporal trends. On the other hand, these data contain other biases and so if we use these data for transportation models, we must take care to correct for these biases using more reliable data. As part of the City Science collaboration with Andorra, we used a Bayesian network to build a calibrated transportation model for the country based on geolocated telecoms data and validated using a small sample of traffic counts.

Research Topics
#data #networks #engineering

Methodology

Combined Travel Demand and Route Choice Models

Prediction of traffic in a road network requires the solution of two closely related problems: the matrix estimation (ME) estimation problem estimates the demand for travel and the Trip Assignment (TA) problem predicts how those trips are routed through the network. The TA problem takes the travel demand (O-D matrix) as input and estimates the traffic volumes whereas the ME problem estimates the O-D matrix, often using traffic volumes as inputs.

Trip demands are sometimes available directly from households surveys but in many cases an up-to-date survey is not available.  However, if we have some initial “guess" for the O-D matrix, we can often get a good estimation of the actual trip matrix by using the traffic volume information to augment our initial guess. These initial guesses are usually based on an outdated or subjectively guessed O-D matrix. The available methods for solving the ME problem include least-squares methods, entropy-based methods and statistical-based methods. The approach used in the current study is inspired by Castillo’s Bayesian Network approach (Castillo et al., 2008) which is later described in more detail.

If the TA and ME problems are treated sequentially, there can be inconsistencies between estimates of the O-D flows and the link flows. Therefore, these problems are often combined into a single problem and solved using bilevel or decomposition approaches. By iterating between solving these two problems (using any available method for each problem), we eventually converge on a solution which satisfies both problems.

In the current study, no survey data were available to generate the initial O-D matrix. Furthermore, Andorra has a population of 80,000 but attracts 8 millions tourists per year. The highly irregular and seasonal travel patterns in Andorra can therefore not be captured using static methods such as surveys. However, several months of high resolution geolocated telecoms data were available and these could be used to generate initial ‘guesses’ for the O-D matrix of any time period and division of zones.

RNC Telecoms Data 

Through a partnership with Andorra Telecom,  three months of Radio Network Controller (RNC) data were available. The RNC keeps track of the locations of devices as they move around the coverage area. Each time a connected device interacts with the network (call, text or cellular data), moves from one cell to another or goes unobserved for 90 minutes,  a record is made of the subscriber ID, the timestamp, the coordinates of the device and the home network of the subscriber. 

The initial O-D matrices could be extracted from the RNC data by first identifying stay-points (Li at al., 2008), mapping the stay-points to Traffic Analysis Zones (TAZ) and scanning series of TAZs of each device, searching for transitions from one location to another. For each transition found, a trip was added to the O-D matrix for the appropriate time period.

Bayesian Networks

A Bayesian Network is a directed acyclic graph where each of the nodes (X) represent a random variable (in this case O-D flows and link traffic flows) and each edge represents the probabilistic relationship between two nodes. If we have knowledge of the probabilistic relationships between the variables in X, and we have some initial probability distribution for each variable, then as soon as some evidence becomes available about the value of any variables, it is simple to propagate this evidence to the rest of the variables and update their probability distributions (Castillo et al., 2008). 

Applying this framework to our current problem, we can consider each element of the O-D matrix and each road’s traffic flow as random variables. Also, for any given O-D matrix, we can solve the TA problem to give us the probabilities of each trip being allocated to each road. This information can be represented by a (Gaussian) Bayesian Network, as shown below. Then, each time we observe the traffic volume on any particular road, we can propagate this evidence throughout the model, updating our belief about all the flow variables.

Solution and Evaluation

To find the solution to the combined problem, we iterate between predicting traffic based on the current estimate of the O-D matrix, and updating the O-D matrix based on the evidence of the observed traffic flows. The former task is carried out using the Method of Successive Averages (Sheffi, 1985) and the latter task uses the evidence propagation described in the previous section.

In each time period, 1/4 of the observed link flows are randomly selected to be held back for testing. These observations are not used in the calibration and are instead used to test the predictive accuracy of the calibrated model. Overall, across all periods, the model achieves good predictive accuracy on the test data with a Mean Average Percentage Error of 24%. After the model is calibrated, we can visualise the demand between each zone in Andorra and the traffic on every road in each time period.

References

Castillo, E., Menéndez, J.M. and Sánchez-Cambronero, S., 2008. Predicting traffic flow using Bayesian networks. Transportation Research Part B: Methodological42(5), pp.482-509

Li, Q., Zheng, Y., Xie, X., Chen, Y., Liu, W. and Ma, W.Y., 2008. Mining user similarity based on location history. In Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems(p. 34). ACM.

Sheffi, Y., 1985. Urban Transportation Networks(Vol. 6). Prentice-Hall, Englewood Cliffs, NJ.