How accurate is Next Bus I: extracting data from API
Motivation: If you’re a bus-rider and public transportation enthusiast like myself, you probably use Next Bus – well I do anyway.
Next Bus is great (usually). It tells me exactly how many minutes I have until the bus rolls up to my bus stop… when it’s right.
Having missed my fair share of early buses and waiting for what seems like ages when Next Bus continuously predicts just a few more minutes,
I thought it would be an interesting and worthwhile problem to investigate… and also a good excuse to experiment working with some APIs
and cool intereactive visualizations using JavaScript and D3. So I sought out to determine how accurate Next Bus predictions really are.
Disclaimer: I use R and Python on a regular basis for scripting and data analysis. I’ve worked with APIs before, but am by no means an expert.
This is my first real venture into JavaScript and D3, so this is much more of an experiment than an expert-guide. Anyhow, this is what I did.
I live in DC, so I tapped the WMATA (Washington Metropolitan Area Transit Authority) API for my data.
I live in a house, which is near a bus stop, so I pulled predictions for my bus stop (selfish, I know) every 10 seconds for about a week.
First: Get API key: Pretty straightforward. You can register in a couple clicks here.
Second: Access data from API: I found a great library that made this pretty simple: python-wmata from bycoffe on Github.
With this Wmata class, here’s the fastest way to access the API:
which returns something like this when you print buspred:
Pretty fast! Here’s the first function I wrote to actually process the pulled data from the API. It’s pretty simple – it parses the JSON object
pulled from the API into an array with the useful bits of information we want to save.
Here’s what it does. It actually transforms the JSON into an array of arrays. Each array is a prediction for a different bus.
This happens for the same reason that when you check Next Bus on your phone, you see predictions for the next 2 or 3 buses en route towards your stop.
In this example one bus is 19 minutes away while another is 41 minutes away.
I probably could have skipped this step and transformed the the JSON directly to a flat file (or database), but this was made the most sense to me at the time
and it works.
write2text is the function that I initialize with the information I want to extract from the API and let rip for an hour, day, week, or however long you want to save predictions for.
It’s basically a big wrapper around extractPred. It writes the predictions to a .txt file every 10 seconds, or however often you specify with the freq argument.
filename is the name of the output .csv datafile where results will be written
freq is the frequency in seconds that the function will make a call to the API
mins is the number of minutes the function will run for
stopid is the ID for the bus-stop you want to pull data for
Now we can start harvesting data. I used an old laptop and let it rip for a week by running the code below by setting min=60*24*7.
Here’s what the a snippet of the collected data (bus64_outputData.txt) looks like:
So now we’ve collected a lot of data from Next Bus. I got ~190,000 rows for one bus stop for just one week. So what do we do with it all?
Checkout the next post.