Earlier this year we hosted a student, Christopher Lunny, as part of the Data Lab MSc programme. The goal of Christopher's MSc project with us was to test out USMART's real-time data performance using Transport for London Countdown API. We also wanted to answer this question: How well do the buses adhere to their timetables?
The Transport for London Countdown API provides a stream of live bus arrival-time predictions in London with over 19,000 bus stops, over 700 bus routes and over 8,000 buses; there are approximately 130,000 bus arrival-time predictions at any given time and over 12 Million updates to the model per day.
As part of the USMART platform, Christopher used an Apache Cassandra database for data storage which allowed for very fast writes of data over a scale-out architecture of commodity hardware with linear scalability. Apache Spark provided him with the engine for distributed data processing with modules for machine learning, graph processing and real-time processing of streaming data.
Christopher went on to win Data Lab's Best Student award based on the work he did while he was at UrbanTide and we are very proud to have provided the space and support for him to work on such an achievement.
Bus Gap Analysis - then three come at once
Using the Transport for London data, Christopher wanted to answer this question: how well do the buses adhere to their timetables? Many of the buses are not timetabled to be at specific stops at specific times, especially during peak times. Instead, the timetables indicate expected time gaps between one bus and the next to arrive at a stop, and each stop can have a different waiting time even for the same bus route.
We analysed the performance of one of the bus routes during the peak time to see how long the gaps were between each bus arrival at a particular stop. The findings indicate that in some cases longer than timetabled wait times occurred as did platooning; however the more widespread visualisation of the routes, using big data analytics, shows that bus stop predictions are generally accurate on the vast majority of routes.
As well as Christopher picking up the best project award, we are delighted that Transport for London is interested in this data analysis and visualisation project. Watch this space for updates.