Motivation
Recently I spent a considerable amount of time writing the postman_problems python library implementing solvers for the Chinese and Rural Postman Problems (CPP and RPP respectively). I wrote about my initial motivation for the project: finding the optimal route through a trail system in a state park here. Although I’ve still yet to run the 34 mile optimal trail route, I am pleased with the optimization procedure. However, I couldn’t help but feel that all those nights and weekends hobbying on this thing deserved a more satisfying visual than my static SVGs and hacky GIF. So to spice it up, I decided to solve the RPP on a graph derived from geodata and visualize on an interactive Leaflet map.
The Problem
In short, ride all 50 state named avenues in DC endtoend following the shortest route possible.
There happens to be an annual 50 states ride sponsored by our regional bike association, WABA, that takes riders to each of the 50^{†} state named avenues in DC. Each state’s avenue is touched, but not covered in full. This problem takes it a step further by instituting this requirement. Thus, it boils to the RPP where the required edges are state avenues (endtoend) and the optional edges are every other road within DC city limits.
For those unfamiliar with DC street naming convention, that can (and should) be remedied with a read through the history behind the street system here. Seriously, it’s an interesting read. Basically there are 50 state named avenues in DC ranging from 0.3 miles (Indiana Avenue) to 10 miles (Massachusetts Avenue) comprising 115 miles in total.
The Solution
The data is grabbed from Open Street Maps (OSM). Most of the post is spent wrangling the OSM geodata into shape for the RPP algorithm using NetworkX graphs. The final route (and intermediate steps) are visualized using Leaflet maps through mplleaflet which enable interactivity using tiles from Mapbox and CartoDB among others.
Note to readers: the rendering of these maps can work the browser pretty hard; allow a couple extra seconds for loading.
The Approach
Most of the heavy lifting leverages functions from the graph.py module in the postman_problems_examples repo. The majority of preRPP processing employs heuristics that simplify the computation such that this code can run in a reasonable amount of time. The parameters employed here, which I believe get pretty darn close to the optimal solution, run in about 50 minutes. By tweaking a couple parameters, accuracy can be sacrificed for time to get run time down to ~5 minutes on a standard 4 core laptop.
Verbose technical details about the guts of each step are omitted from this post for readability. However the interested reader can find these in the docstrings in graph.py.
The table of contents below provides the best highlevel summary of the approach. All code needed to reproduce this analysis is in the postman_problems_examples repo, including the jupyter notebook used to author this blog post and a conda environment.
^{†} While there are 50 roadways, there are technically only 48 state named avenues: Ohio Drive and California Street are the stubborn exceptions.
Table of Contents
 0: Get the data
 1: Load OSM to NetworkX
 2: Make Graph w State Avenues only
 3. Remove Redundant State Avenues
 4. Create Single Connected Component
 5. Solve CPP
 6. Solve RPP
0: Get the data
There are many ways to grab Open Street Map (OSM) data, since it’s, well, open. I grabbed the DC map from GeoFabrik here.
1: Load OSM to NetworkX
While some libraries like OSMnx provide an elegant interface to downloading, transforming and manipulating OSM data in NetworkX, I decided to start with the raw
data itself. I adopted an OSMtonx parser from a hodge podge of Gists (here and there) to read_osm
.
read_osm
creates a directed graph. However, for this analysis, we’ll use undirected graphs with the assumption that all roads are bidirectional on a bike one way or another.
<class 'networkx.classes.digraph.DiGraph'>
CPU times: user 46.6 s, sys: 2.1 s, total: 48.7 s
Wall time: 50.2 s
This is a pretty big graph, about 275k edges. It takes about a minute to load on my machine (Macbook w 4 cores)
275429
2: Make Graph w State Avenues only
Generate state avenue names
Most state avenues are written in the long form (ex. Connecticut Avenue Northwest). However, some, such as Florida Ave NW, are written in the short form. To be safe, we grab any permutation OSM could throw at us.
['Alabama Ave Southeast',
'Alabama Ave Southwest',
'Alabama Ave Northeast',
'Alabama Ave Northwest',
'Alabama Ave SE',
'Alabama Ave SW',
'Alabama Ave NE',
'Alabama Ave NW',
'Alabama Avenue Southeast',
'Alabama Avenue Southwest',
'Alabama Avenue Northeast',
'Alabama Avenue Northwest',
'Alabama Avenue SE',
'Alabama Avenue SW',
'Alabama Avenue NE',
'Alabama Avenue NW',
'Alaska Ave Southeast',
'Alaska Ave Southwest',
'Alaska Ave Northeast',
'Alaska Ave Northwest']
Create graph w state avenues only
This is a much smaller graph:
7148
But every state is represented:
50
Here they are by edge count:
Counter({'Alabama': 348,
'Alaska': 36,
'Arizona': 89,
'Arkansas': 64,
'California': 24,
'Colorado': 59,
'Connecticut': 283,
'Delaware': 66,
'Florida': 253,
'Georgia': 185,
'Hawaii': 84,
'Idaho': 50,
'Illinois': 37,
'Indiana': 29,
'Iowa': 30,
'Kansas': 74,
'Kentucky': 47,
'Louisiana': 37,
'Maine': 199,
'Maryland': 233,
'Massachusetts': 581,
'Michigan': 225,
'Minnesota': 214,
'Mississippi': 102,
'Missouri': 66,
'Montana': 74,
'Nebraska': 183,
'Nevada': 122,
'New Hampshire': 259,
'New Jersey': 170,
'New Mexico': 66,
'New York': 333,
'North Carolina': 87,
'North Dakota': 16,
'Ohio': 391,
'Oklahoma': 48,
'Oregon': 170,
'Pennsylvania': 433,
'Rhode Island': 282,
'South Carolina': 60,
'South Dakota': 124,
'Tennessee': 40,
'Texas': 123,
'Utah': 66,
'Vermont': 154,
'Virginia': 134,
'Washington': 58,
'West Virginia': 56,
'Wisconsin': 257,
'Wyoming': 27})
2.1 Viz state avenues
As long as your NetworkX graph has lat
and lon
node attributes, mplleaflet can be used to pretty effortlessly plot your NetworkX graph on an interactive map.
Here’s the map with all the state avenues…
You can even customize with your favorite tiles. For example:
mplleaflet.display(fig=ax.figure, tiles='stamen_wc')
…But there’s a wrinkle. Zoom in on bigger avenues, like New York or Rhode Island, and you’ll notice that there are two parallel edges representing each direction as a separate oneway road. This usually occurs when there are several lanes of traffic in each direction, or physical dividers between directions. Example below:
This is great for OSM and point A to B routing problems, but for the Rural Postman problem it imposes the requirement that each main avenue be cycled twice. We’re not into that.
Example: Rhode Island Ave (parallel edges) vs Florida Ave (single edge)
3. Remove Redundant State Avenues
As it turns out, removing these parallel (redundant) edges is a nontrivial problem to solve. My approach is the following:
 Build graph with oneway state avenue edges only.
 For each state avenue, create list of connected components that represent sequences of OSM ways in the same direction (broken up by intersections and turns).
 Compute distance between each node in a component to every other node in the other candidate components.
 Identify redundant components as those with the majority of their nodes below some threshold distance away from another component.
 Build graph without redundant edges.
3.1 Create state avenue graph with oneway edges only
The oneway avenues are plotted in red below. A brief look indicates that 8090% of the oneway avenues are parallel (redundant). A few, like Idaho Avenue NW and Ohio Drive SW, are single oneway roads with no accompanying parallel edge for us to remove.
NOTE: you’ll need to zoom in 34 levels to see the parallel edges.
Create connected components with oneway state avenues
There are 163 distinct components in the graph above.
163
3.2 Split connected components
Remove kinked nodes
However, we need to break some of these components up into smaller ones. Many components, like the one below, have bends or a connected cycle that contain both the parallel edges, where we only want one. My approach is to identify the nodes with sharp angles and remove them. I don’t know what the proper name for these is (you can read about angular resolution), but we’ll call them “kinked nodes.”
This will split the connected component below into two, allowing us to determine that one of them is redundant.
I borrow this code from jeromer
to calculate the compass bearing (0 to 360) of each edge. Wherever the the bearing difference between two adjacent edges is greater than
bearing_thresh
, we call the node shared by both edges a “kinked node.” A relative low bearing_thresh
of 60 appeared to work best after some experimentation.
After removing these “kinked nodes,” our list of components grows from 163 to 246:
246
Viz components without kinked nodes
Example: Here’s the Massachusetts Ave example from above after we remove kinked nodes:
Full map: Zoom in on the map below and you’ll see that we split up most of the obvious components that should be. There are a few corner cases that we miss, but I’d estimate we programmatically split about 95% of the components correctly.
3.3 & 3.4 Match connected components
Now that we’ve crafted the right components, we calculate how close (parallel) each component is to one another.
This is a relatively coarse approach, but performs surprisingly well:
1. Find closest nodes from candidate components to each node in each component (pseudo code below):
For each node N in component C:
For each C_cand in components with same street avenue as C:
Calculate closest node in C_cand to N.
2. Calculate overlap between components. Using the distances calculated in 1., we say that a node from component C
is matched to a component C_cand
if the distance is less than
thresh_distance
specified in calculate_component_overlap
. 75 meters seemed to work pretty well. Essentially we’re saying these nodes are close enough to be considered interchangeable.
3. Use the nodewise matching calculated in 2. to calculate which components are redundant. If thresh_pct
of nodes in component C
are close enough (within thresh_distance
) to nodes in
component C_cand
, we call C
redundant and discard it.
Viz redundant component solution
The map below visualizes the solution to the redundant parallel edges problem. There are some misses, but overall this simple approach works surprisingly well:
 red: redundant oneway edges to remove
 black: oneway edges to keep
 blue: all state avenues
3.5 Build graph without redundant edges
This is the essentially the graph with just black and blue edges from the map above.
After deduping the redundant edges, our connected component count drops from 246 to 96.
96
4. Create Single Connected Component
The strategy I employ for solving the Rural Postman Problem (RPP) in postman_problems is simple in that it reuses the machinery from the Chinese Postman Problem (CPP) solver here. However, it makes the strong assumption that the graph’s required edges form a single connected component. This is obviously not true for our state avenue graph asis, but it’s not too off. Although there are 96 components, there are only a couple more than a few hundred meters to the next closest component.
So we hack it a bit by adding required edges to the graph to make it a single connected component. The tricky part is choosing the edges that add as little distance as possible. This was the first computationally intensive step that required some clever tricks and approximations to ensure execution in a reasonable amount of time.
My approach:

Build graph with contracted edges only.

Calculate haversine distance between each possible pair of components.

Find minimum distance connectors: iterate through the data structure created in 2. to calculate shortest paths for top candidates based on haversine distance and add shortest connectors to graph. More details below.

Build single component graph.
4.1 Contract edges
Nodes with degree 2 are collapsed into an edge stretching from a deadend node (degree 1) or intersection (degree >= 3) to another. This achieves two things:
 Limits the number of distance calculations.
 Ensures that components are connected at logical points (dead ends and intersections) rather than arbitrary parts of a roadway. This will make for a more continuous route.
This significantly reduces the nodes needed for distances computations by a factor of > 15.
Number of nodes in contracted graph: 347
Number of nodes in original graph: 5962
4.2 Calculate haversine distance between components
The 345 nodes from the contracted edge graph translate to >100,000 possible node pairings. That means >100,000 distance calculations. While applying a shortest path algorithm over the graph would certainly be more exact, it is painfully slow compared to simple haversine distance. This is mainly due to the high number of nodes and edges in the DC OSM map (over 250k edges).
On my laptop I averaged about 4 shortest path calculations per second. Not too bad for a handful, but 115k would take about 7 hours. Haversine distance, by comparison, churns through 115k in a couple seconds.
116526
4.3 Find minimum distance connectors
This gets a bit tricky. Basically we iterate through the top (closest) candidate pairs of components and connect them iterationbyiteration with the shortest path edge. We use precalculated haversine distance to get in the right ballpark, then refine with true shortest path for the closest 20 candidates. This helps us avoid the scenario where we naively connect two nodes that are geographically close as the crow flies (haversine), but far away via available roads. Two nodes separated by highways or train tracks, for example.
We had 96 components to connect, so it makes sense that we have 95 connectors.
95
4.4 Build single component graph
We add about 12 miles with the 95 additional required edges. That’s not too bad: an average distance of 0.13 miles per each edge added.
12.341087779659958
So that leaves us with a single component of 125 miles of required edges to optimize a route through. That means the distance of deduped state avenues alone, without connectors (~112 miles) is just a couple miles away from what Wikipedia reports (115 miles).
125.06645388481837
Make graph with granular edges (filling in those that were contracted) connecting components:
4.5 Viz single connected component
Black edges represent the deduped state avenues. Red edges represent the 12 miles of connectors that create the single connected component.
5. Solve CPP
I don’t expect the Chinese Postman solution to be optimal since it only utilizes the required edges. However, I do expect it to execute quickly and serve as a benchmark for the Rural Postman solution. In the age of “deep learning,” I agree with Smerity, baselines need more love.
5.1 Create CPP edgelist
The cpp solver I wrote operates off an edgelist (text file). This feels a bit clunky here, but it works.
The first two columns are interpeted as the from and to nodes; everything else as edge attributes.
start_node  end_node  comp  name  connector  path  distance  required  

0  3788079770  649840206  0.0  Texas Avenue Southeast  NaN  [649840206, 649840209, 649840220, 30100500, 64...  1244.893849  1 
1  3788079770  49744479  NaN  NaN  True  [3788079770, 49751669, 4630443674, 49751671, 4...  674.458290  1 
2  49765126  49765287  1.0  Georgia Avenue Northwest  NaN  [49765126, 49765129, 49765130, 49765131, 49765...  559.251509  1 
5.2 CPP solver
Starting point
I fix the starting node for the solution to OSM node 49765113
which corresponds to (38.917002, 77.0364987): the intersection of New Hampshire Avenue NW, 16th St NW and U St NW… and also
the close to my house:
Solve
5.3: CPP results
The CPP solution covers roughly 390,000 meters, about 242 miles. The optimal CPP route doubles the required distance, doublebacking every edge on average… definitely not ideal.
OrderedDict([('distance_walked', 391523.723281576),
('distance_doublebacked', 190249.27638658223),
('distance_walked_once', 201274.44689499377),
('distance_walked_optional', 0),
('distance_walked_required', 391523.723281576),
('edges_walked', 688),
('edges_doublebacked', 337),
('edges_walked_once', 351),
('edges_walked_optional', 0),
('edges_walked_required', 688)])
6. Solve RPP
The RPP should improve the CPP solution as it considers optional edges that can drastically limit the amount of doublebacking.
We could add every possible edge that connects the required nodes, but it turns out that computation blows up quickly, and I’m not that patient. The get_shortest_paths_distances
is the bottleneck
applying dijkstra path length on all possible combinations. There are ~14k pairs to calculate shortest path for (4 per second) which would take almost one hour.
However, we can use some heuristics to speed this up dramatically without sacrificing too much.
6.1 Create RPP edgelist
Ideally optional edges will be relatively short, since they are, well, optional. It is unlikely that the RPP algorithm will find that leveraging an optional edge that stretches from one corner of the
graph to another will be efficient. Thus we constrain the set of optional edges presented to the RPP solver to include only those less than max_distance
.
I experimented with several thresholds. 3200 meters certainly took longer (~40 minutes), but yielded the best route results. I tried 4000m which ran for about 4 hours and returned a route with the same distance (160 miles) as the 3200m threshold.
CPU times: user 34min 38s, sys: 20.5 s, total: 34min 59s
Wall time: 36min 3s
Check how many optional edges are considered (0=optional, 1=required):
Counter({0: 16021, 1: 351})
6.2 RPP solver
Apply the RPP solver to the processed dataset.
CPU times: user 7min 25s, sys: 1.68 s, total: 7min 27s
Wall time: 7min 31s
6.3 RPP results
As expected, the RPP route is considerably shorter than the CPP solution. The ~242 mile CPP route is cut significantly to ~160 with the RPP approach.
~26,000m (~161 miles) in total with ~59,000m (37 miles) of doublebacking. Not bad… but probably a 2day ride.
161.58546891004886
OrderedDict([('distance_walked', 260045.958535698),
('distance_doublebacked', 58771.511640704295),
('distance_walked_once', 201274.4468949937),
('distance_walked_optional', 51891.485571184385),
('distance_walked_required', 208154.47296451364),
('edges_walked', 447),
('edges_doublebacked', 96),
('edges_walked_once', 351),
('edges_walked_optional', 57),
('edges_walked_required', 390)])
As seen below, filling the contracted edges back in with the granular nodes adds considerably to the edge count.
Number of edges in RPP circuit (with contracted edges): 447
Number of edges in RPP circuit (with granular edges): 9081
6.4 Viz RPP graph
Create RPP granular graph
Add the granular edges (that we contracted for computation) back to the graph.
Visualize RPP solution by edge type
 black: required state avenue edges
 red: required nonstate avenue edges added to form single component
 blue: optional nonstate avenue roads
Visualize RPP solution by edge walk count
Edge walks per color:
black: 1
magenta: 2
orange: 3
Edges walked more than once are also widened.
This solution feels pretty reasonable with surprisingly little doublebacking. After staring at this for several minutes, I could think of roads I’d prefer not to cycle on, but no obvious shorter paths.
6.5 Serialize RPP solution
CSV
Remember we contracted the edges in 4.1 for more efficient computation. However, when we visualize the solution, the more granular edges within the larger contracted ones are filled back in, so we can see the exact route to ride with all the bends and squiggles.
Geojson
Similarly, we create a geojson object of the RPP solution using the time
attribute to keep track of the route order. This data structure can be used for fancy js/d3 visualizations. Coming soon,
hopefully.