Junctional Traffic – Characterising and Rating Traffic Junctions
‘I love junctions.’ This insider joke of last week’s hackathon definetely has some truth in it. At the very well organised Smart Benutten Urban traffic Data Hack we looked at traffic junctions from a new and exciting perspective. I believe our approach has promise for further development. Therefor, I’ve chosen to document it more properly and open myself up for collaboration, see near the bottom of this blog for details.
I will first introduce our apporach of redefining the fundamental levels of a traffic junction. Next I will further explain in more detail each of the fundamental levels and how we used data to describe them. Finally, the power of combining data from the fundamental levels is showcased.
1 Redefining junctions
Let’s set some context. First of all, in this document we consider a junction to be a place where roads meet. Second, we primarily used data from cars. Even though the following approach can be easily extended to include other forms of traffic we will only consider the traffic that is included by the data sets we used (primarily cars).
We defined the three fundamentals of a junction as:
- the task
- the performance
- the components
To understand the task and the performance I want to take you through a mental exercise: Imagine your favourite junction. Where is it? How many roads meet? Is it busy? Now I want you to strip all the physical aspects that build that junctin away. All that remains is roads leading up to an empty field. Now you are left with a certain amount of people are entering this empty space from each road and they all want go in any of the available directions.
What we now define as the task of this junction is its job to get a number of people from each road to all the other roads over a certain time period. You can choose to describe a task in detail or simplified. We decided to describe the task of a standard crossing of two roads as the number of people going from any of the four direction to any of the other three direction per hour of the day. To illustrate with the below image, cars enter the question mark from A, B, C and D and want to continue their way in any of the directions. The number of cars can vary, for example by time of day or location. This allows you to cluster junctions in categories of similar task.
Figure 1: Junction task
To value the performance of a junction we thought safety and speed should be considered. We only look at speed for this example. With speed we mean time required to pass the junction.
The components are the set of physical tools that we use to complete the task. Different tools can complete the same task, but they have a different performance. Examples:
- traffic lights versus roundabouts,
- combined lanes for directions or separate lanes,
- pedestrian/cycle bridge versus inclusion in round about.
The three fundamental levels of a traffic junction allow us to compare (by performance) similar (by task) actual traffic junctions (set of components):
By simple comparison we could:
- identify low performance junctions,
- advise higher performing junction that perform the same task.
When we ‘map’ many junction into this space we can generate a data set where we can use machine learning to learn a relation between performance and the complex mix of components. This could then generate the optimal traffic junction given a task.
1.1 Smart Benutten Hackathon focus
During the hackathon we focused on data from the junctions in following map.
Figure 2: Junctions Zwolle
The data we used for the analysis are:
- Traffic light data for Thursday 19 May 2016 for junctions labelled B, E and K.
- Travel time data (by Bluetooth) for Thursday 12 May 2016 for all labelled junctions.
- Distances between junctions as calculated by us.
We unfortunately had to combine data of two datasets that was one week apart. The data for the same day does exist and should be used for a more accurate comparison, it just wasn’t readily available for us. It doesn’t take away that we can succesfully show the method.
2 Task
The task can be expressed in multiple ways. This is depended on your preference and the available data you can work with.
It is important to understand that there is a possible difference between the current task executed by the junction and the desired task you wish it would execute. This difference would be due to the limits of task that the junction can execute (I believe better known as its capacity) or the desire to rearrange the overall traffic flows in areas of the city. With the data we examine the current task of a junction.
2.1 Smart Benutten hackathon approach
We defined the (current) task as the number of cars crossing a junction in any of the available directions (ie. the twelve directions in the junction task figure) per hour of the day. We chose this because it is both satisfactory to work with and available.
The data came from the traffic light data set. Even though the data set is rich and clear we were required to work with a very small subset of it for two reasons.
- We specifically wanted data for junctions that we could also generate performance data for, which in our case was limited to junction surrounded by other junctions that are part of the travel time (Bluetooth) data set. More about that in the performance section.
- The raw data comes in an ‘unfriendly’ format that we could not easily access and work with at the hackathon.
The junctions that we analysed in Zwolle are the cross roads of:
## Source: local data frame [3 x 2]
##
## straat1 straat2
## (fctr) (fctr)
## 1 Van Wevelinkhovenstraat Bisschop Willebrandlaan
## 2 Bisschop Willebrandlaan Thomas à Kempisstraat
## 3 Wethouder Alferinkweg Luttenbergstraat
They are labelled B, E and K in figure 2. For these junctions we had hourly figures for the number of cars traveling in each direction for the span of one day. This hourly data sufficiently describes the current task of a junction.
The data was pre-processed, we assume that the data is complete and correct. I assume that the raw data is more detailed and I believe that for a proper analysis access to this raw data is required, because it is unknown what assumptions and processes have been performed on the raw data.
Traffic light data seems to be abundant, I believe it can be used to generate a current task profile for many junctions in Zwolle and the Netherlands.
2.2 Next steps
The task of a junction is dependent on the system as a whole. One would probably want to take a holistic approach and map several kinds of traffic (motorised, cycling and pedestrian). You’d also want to view areas as a whole. For example the areas between motorways and city centres. In these areas the current task of a junction will be a mixture of its own components, its neighbouring junctions, its position compared to where people want to be and go.
Further we want to explore
- the extend of the traffic light data set,
- similar data availability for junctions without traffic lights,
- other data sets that can confirm the validity of the data.
As much as identifying the current task performed by all junctions, you’d also want to map desired tasks. for this we’d need a data set that shows start and end locations of people’s journeys.
3 Performance
As mentioned before we focused only on the speed (average) through a junction. The performance indicator is harder to define than the task indicator due to the interconnectedness of junctions.
3.1 Smart Benutten hackathon approach
A rich data set is available from Bluetooth sensors placed at various junctions in Zwolle. They connect with cars and the processed data shows the average time to get from one junction to another junction. These averages are reported for every minute of the day.
We are not sure what percentage of cars leaves a data trace. Often at night there is no data available, but since there are no real traffic problems to solve that does not matter so much.
When working with the data we had several assumptions:
- The average speed from A to C via B is the average speed from A to B plus the average speed from B to C. We are not sure how the data that we were given is pre-processed and therefore we don’t know if our assumptions hold. However, I’m confident that given the raw data we could manage the pre-processing such that our aims can be met.
- The average speed from A to C via B is a performance indicator of junction B. This assumption requires more in-depth analysis of the situation and the raw data measurements. Given the raw data we can again modify our assumptions to fit the actual data more accurately.
- The pre-processed data is correct and so are its assumptions. One of them likely is that the average speed calculated from cars with Bluetooth is representative of the average speed of all cars.
We averaged the hourly times of Thursday 12 May across junctions B, E and K. To explain by the example of junction B. We selected the 8 different averages for section A to B, B to A, C to B, B to C, E to B, B to E, L to B and B to L. Using these numbers we calculated for twelve directions going across junction B: A to C, A to E, A to L, and similarly from C, E and L to each of the other directions. Then we divided the distance of each section by its average time to get an average speed.
3.2 The next steps
The travel time data is rich and I believe has run for several years. However, it seems limited in comparison to the traffic light data. Simply because currently the Bluetooth trackers are only installed on a limited number of junctions surrounding Zwolle’s centre.
- Expand the Bluetooth data gathering.
- Collaborate in the analysis with other cities that have Bluetooth data.
- Explore other data sets that we can use to express the performance of a junction.
3.2.1 Expand the Bluetooth data gathering
As the data could prove very useful it is worth exploring the cost and benefits of extending this data set. There are two main ways to extend the data.
- Increase the number of junctions with Bluetooth tracking
- Increase the number of cars with Bluetooth connection
To increase the number of junctions would require analysis of locations to expand to. Also it could be beneficial to look at the raw data and carefully analyse what possible information can be taken from it. I would also suggest to consider the difference in placing the Bluetooth devices at junctions, or in between junctions.
A cheap way to increase the amount of data may be to give Bluetooth devices drivers for free. Data security might be maintained by not registering the devices to people and make sure not to measure close to final destinations (ie. do not measure at places where people can park their car), such that a person’s identity can’ no’t be retrieved from analysing where they park.
3.2.2 Collaborate with other cities
Since the traffic flows within our small nation are pretty similar (I assume) it is worth exploring if other cities have the same or similar data available.
When junctions are compared by task there is no downside to using data from other cities, and very much to gain. I would even go as far to say that there is more to learn because of a likely higher variation in decision.
3.2.3 Other data sets to express junction performance
There are two main reasons why I think it is worth exploring other data sets. One, the time travel (Bluetooth) data set is relatively limited and, two, a great deal of traffic data is collected that I’m not aware about.
I would suggest that the available data are explored for their potential to generate junction performance metrics.
4 Components
The building blocks of actual junctions: lanes, traffic lights, roundabouts, adaptive traffic control systems (Nl: zoals regelscenarios), cycle bridges, priority implementations, distance to next junction, speed limits, and much more.
A list of components should be identified and a database of all components of interest should be created for all junctions that a task and performance metric is generated for.
When you plot the performance of junctions with similar task you can then start identifying the components of high performance junctions. you can simply copy and paste the components layout of a high performing junction to junctions that have lower performance on the same task.
To take it another step. When the three task, performance and components data set have grown to substantial amount you can use machine learning techniques to learn about the dependency of performance on the components and then generate optimised junctions out of scratch.
5 Task performance plots (junction signatures)
You can plot the task versus performance for a junction. This means you need to gather and combine the number of cars (task) and average speed (performance) over determined time intervals.
5.1 Smart Benutten hackathon approach
Please note that the data available for performance and task were not the same day (12 versus 19 May) and it was only data for one day (very little). The data is likely not representative of the real/average situation. The following plots are only to illustrate the method.
A detailed plot of task (number of cars per hour) versus performance (average speed across junction) for junctions B, E and K is given in figure 3. The columns in this plot split the data into the three junctions. The colours represent directions, using the legend and the Zwolle junctions figure 2 you can determine which direction. The points represent the average performance and task for a specific hour of the day. The fitted lines show the performance versus task trend for a direction. The colour coding is arranged such that red, blue, green and grey show roads entering the junction from approximate North, East, South and West respectively, and also such that light, medium and dark shades show left, straight and right turning traffic.
Figure 3: Task (number of cars per hour) versus performance (average speed across junction) for junctions B, E and K.
An interactive visualisation is available on shiny. It shows the ‘signature’ of a single direction in three junctions appearing over the course of a day. It is essentially one direction of each 3 junctions in figure 3 on a single plot.
6 Next (for real)
There is a long way to go from this concept to the real implementation. The system (traffic junctions) we try to dissect is large, complicated and interconnected. However, the approach to assign a junction a task and performance based on data seems promising to me, and a valid reason to pursue this exploration.
I personally am in no position to continue this work on my own, but I am interested to continue in collaboration and take an active position in that collaboration. Please feel free to get in touch: email (san [at] nesware [dot] net), twitter, or linkedIn
For any interested parties, I am currently full time available as a freelancer to work on this project.
7 Thank you
The Zwolle hackathon was a great event, and I want to thank all people involved in organisation and also the participants. In particular I want to thank Jos (Data Science Amsterdam) for the hosting this wonderful event, Gijs (@gijsvanderkolk) for going the extra mile getting the data required to showcase this concept, Patrick from Hanz(@bijhanz) for hosting our wellbeing, and my team mates (Sarah, Marno, Rob, and Pablo) for being awesome.