← Prev: Tensorflow on Ubuntu 18.04.. Next: Using Julia to transpose genom.. →

Exploring Sydney Toll Road Data

Exploring Sydney Toll Road Data

I was asked to give a talk by Emily Moylan about data visualisation for a transport informatics unit so I thought I would use a dataset that was relevent to them. As it turns out all the toll data in Sydney is availble for free download! How nice of them*. You can get it yourself here: https://nswtollroaddata.com/

Firstly, let’s jump on the some numbers.

There were 2,120,731,776 toll gantry beeps between 2009 to 2019. These toll roads are the ED, CCT, M7, M2, LCT, M4. I missed the M5 because that data comes in a slightly different format - I will leave it as an exercise for the reader. The data is basically a time series in 15 minute increments for each gantry point with the number of cars/trucks passing through that gantry in the time interval. That means we have 36,447,919 time series points for all roads for all time.

Don’t know what a gantry is? In this context, it is the thing that goes beep and makes you question why you live in Sydney again.

The Cross City Tunnel - 1 month

Looking at all that data in one go might be tricky. Let’s just start with the Cross City Tunnel for the month of November only back in 2017. Here is what the 1,257,705 cars and trucks moving in the CCT looks like over that month look like:

Volume of vehicles in the Cross City Tunnel for November 2017.

Straight away you can see lots of non-suprising patterns emerge.

The ups and downs are literally night and day. Plus you can see each peak has a mini up-down which is the morning and evening rush hour. Finally, there is an undulating wave passing over the month, that 7-day period is known as a week! Pretty interesting, but as I said, not suprising. This is pretty typical of all the data.

The Cross City Tunnel - 2009-2019

Now let’s look at all 135,230,287 trips on the CCT over our 10 year timespan of data (FYI, that is 26 vehicles per minute, and at around $5 a pop is worth about $676 Million dollars in this time frame, just from the CCT).

All tolls collected in the Cross City Tunnel toll road, coloured by gantry and vehicle type.

On this scale we lose the day-to-day patterns and even the weekly patterns, but we can see there are dips in the volume of vehicles around Chistmas time every year. Best time of year to drive around Sydney toll roads! The different colors representing the different entry/exit points on the CCT and the distinction between cars and trucks doesn’t tell us much, except that there are a lot more cars than trucks. Other than that, they all follow the same pattern - people and goods are all moving in different directions but they are moving at similar times.

Anyway, I plotted this as a 3D stacked bar plot (which took about 1 hour and 40GB RAM just to make the figure - I had to run it on out Artemis HPC to stop my computer from crashing) to make visualising the different gantry entry/exit points and car/truck use a bit clearer:

All tolls collected in the Cross City Tunnel toll road, broken down by gantry and vehicle type, in 3D.

To look at this a different way, and probably more informative but less pretty, you can just plot a histogram of the total number of cars/trucks:

All tolls collected in the Cross City Tunnel toll road, broken down by gantry (N/E/W) and vehicle type (Car/Truck), with total numbers for the 2009-2019 collection time.

All the roads 2009-2019

There are lots of different ways to explore this data set and many more insights to learn from it. Rather than doing the same thing for each of the Toll Roads I wanted to try and capture some kind of informative metric across Sydney, so I thought I would plot them all as an animated map and see which tolls are used the most:

Most of Sydney toll gantry points coloured according to total volume of use for each year. Sad times when the M4 turns back on in 2018 after not being a toll road for a long time. Note the scale is the total volume of vehicles for that year coloured between 1,000,000 and 10,000,000 vehicles.

*From the toll data website this dataset was released according “with Transurban’s obligations under an Undertaking accepted by the ACCC on 29 August 2018 under section 87B of the Competition and Consumer Act 2010” to basically make it so future toll roads can be competitive when bidding for tenure. And it gives a fun dataset to explore and interogate to learn more about people in the city and data in general. You can access the Python Notebooks I used for exploration from here.

If you have done some cool data analysis or have done something with the M5 data let us know.