In a the previous post we extracted Next Bus predictions from the Wmata API.
If that’s too much or uninteresting to you, start here. Our extracted data looks something like this:
Goal: determine how accurate Next Bus predictions are
doing some data magic in R first will improve performance of the d3 visualizations. It also allows for much more rapid prototyping, exploration and analysis.
Step 1: Create a timeseries
Wmata conveniently provides with us a TripID for each unique bus trip. The only problem is they’re not actually unique.
The simplest fix I found was the create a unique identifier for bus trips by concatenating TripID and VehicleID.
There are certainly more complex ways that use a time dimension to determine unique trips, but this worked for me.
Step 2: Flag arrivals and departures
Next Bus doesn’t actually tell us when a bus arrives. We need to determine this from the time series we collect.
This is one of the reasons I collect predictions from the API every 10 seconds.
There is probably a more computationally efficient method for doing this using vectorized functions, but this works fine.
This gives us something like this:
Assumption 1: When Next Bus says a bus is arriving (Minutes==0) multiple times, I take the latest prediction as the actual arrival time.
In other words, I take the last prediction where Minutes==0 before the bus disappears off your Next Bus app as the arrival time.
Step 2: Filter
Assumption 2: Remove bus trips that never arrive.
There are some ghost buses out there. It’s impossible to determine the error in a prediction if you don’t know the outcome (true arrival time), so these trips are removed.
Step 3: Calculate error in Next Bus predictions
Now that we know the arrival times for each bus trip (df$arrival), the prediction in minutes until arrival (df$Minutes)
and the time the prediction was made (df$time), we can figure calculate the prediction error (df$err) and actual time until arrival (df$est).
Step 4: Write out data to csv
Easy enough. This is the data that will feed the d3 visualizations built in the next post.