Project: Road Traffic Analysis using MIDAS data

Reference: SRT 7/2/2

Last update: 03/09/2009 14:54:37


The objectives of the project are:

To design and implement a data storage and retrieval system for road traffic count data capable of supporting statistical analysis of very large datasets;

To implement, study and extend journey time prediction methodologies to improve journey planning and encourage efficient uses of infrastructure;

To study a suite of statistical models including speed-flow relationships, patterns of vehicle usage and trends in road usage across multiple detector sites.


This proposal addresses the access and handling of very large transport related datasets together with application of such data repositories to the statistical analysis and presentation of quantitative information. It specifically looks at the MIDAS data archive collected by the Highways Agency of road traffic data sensed by loop detectors installed beneath the carriageway across parts of the UK strategic road network. Phase one of the project is to design and implement a data storage and retrieval system for the system that will support computationally intensive statistical analysis. Phase two considers the specific and importance question of journey travel time prediction which if made available at real time offers the potential of large benefits to travellers as well as influencing routing patterns in ways that reduce overall congestion. Phase three of the proposal considers a further suite of statistical models that have become feasible with efficient access to the MIDAS archive. These investigations will have impact on studies of marginal social cost pricing (congestion pricing) of road usage as well as notions of effective capacity and bottleneck identification.


University of Cambridge, Computer Laboratory
William Gates Building, 15 JJ Thomson Avenue, Cambridge, CB3 0FD

Contract details

Cost to the Department: £66,460.00

Actual start date: 01 October 2005

Actual completion date: 02 October 2006


Road traffic analysis using MIDAS data: jouney time prediction
Author: R J Gibbens and Y Saatchi
Publication date: 01/12/2006
Source: University of Cambridge Computer Laboratory Technical Report TR-676
More information:

Summary of results

  1. Phase one of the project involved a short study of the data formats used by MIDAS to record traffic count data and described a revised data format including explicit indexing. This revised format, based on the familiar ZIP file archiving tool allowed efficient random access to the data necessary for high throughput applications.

    The project looked at the variability of journey times across days in three day categories: Mondays, midweek days and Fridays. Two estimators using real-time data were considered: a simple-to-implement regression-based method and a more computationally demanding k-nearest neighbour method. Our example scenario of UK data was taken from the M25 London orbital motorway during 2003 and the results compared in terms of the root-mean-square prediction error. It was found that where the variability was greatest (typically during the rush hours periods or periods of flow breakdowns) the regression and nearest neighbour estimators reduced the prediction error substantially compared with a naive estimator constructed from the historical mean journey time. Only as the lag between the decision time and the journey start time increased to beyond around 2 hours did the potential to improve upon the historical mean estimator diminish. Thus, there is considerable scope for prediction methods combined with access to real-time data to improve the accuracy in journey time estimates. In so doing, they reduce the uncertainty in estimating the generalized cost of travel. The regression-based prediction estimator has a particularly low computational overhead, in contrast to the nearest neighbour estimator, which makes it entirely suitable for an online implementation.

    Finally, the project demonstrates both the value of preserving historical archives of transport related datasets as well as provision of access to real-time measurements.