Technical Information

Implementation

En Route is written in Common Lisp. Common Lisp is a performant, very high-level language, and has proven itself to be effective in implementing pathfinding solutions in reasonable (programmer, CPU) time and building an application out of it. I chose Common Lisp for its pervasive expressiveness and flexibility—it's fun to work with!

Hunchentoot is the web server serving dynamically-generated content. Apache 2.2 sits in front of my Lisp process to handle everything else, and reverse-proxies dynamic requests to Lisp.

Street and bus GIS data (but not bus schedules) are stored in a PostGIS database. Postmodern is used to load that data at import-time, for fast RAM access.

CL-Graph and CL-Containers are used for representing temporary data during the import process. A set of simple home-grown graph data structures is used for the final street/bus network, rather than CL-Graph, (perhaps prematurely) in the name of performance.

Bordeaux-Threads is used for cross-platform threading/locking support. The En Route core should operate on a non-threaded Lisp.

CXML is used for dynamic XHTML document generation with the W3C DOM.

CL-PPCRE is used by the user-interface code for parsing.

I wrote (but haven't yet released) new libraries that implement:

It should hardly bear mention that I use SLIME for interactive, incremental Common Lisp development. I use SLIME from and edit with XEmacs.

I'm currently compiling and running En Route with SBCL for the web site. I occasionally do development in Clozure ML (OpenMCL), for its stricter slot-value type checking and to loosely keep an eye on En Route's Common Lisp conformance.

The Lisp image, which includes the code, street GIS data, bus GIS data and schedules, and various data indices, contains about 19 million live objects, consuming 600 to 700MB total.

En Route Spokane is hosted on x86_64 Linux, under a Xen domain (virtual machine) with 1328 MB RAM. (It shares this Xen domain with my workstation.) The hardware is an AMD Athlon64 3000+ (2GHz).

Data Modelling and Pathfinding

En Route uses E.W. Dijkstra's famous algorithm for finding the shortest path in a graph. Street crossings, bus stops, and points along a bus route are modelled as graph vertices. Street segments (from one crossing to another) are modelled as edges with a weight (or distance) value of 1½ times the amount of time they would take to walk down. Bus rides from points on a bus route are modelled as edges with a weight value of the time it takes to ride the bus from the first point to the second. The act of boarding a bus is modelled as an edge from the stop to the bus route vertex, with a weight value of the time it takes to wait for the next bus on that route.

An interesting challenge is that the time it takes to wait for the next bus at a stop varies—it depends on what time of day it is when you arrive at the stop, and it depends on the day of the week (because there's different schedules for weekdays, Saturdays, and Sundays/holidays). It turns out to be pretty simple to take time-of-day into account in Dijkstra's algorithm, and it usually gives you correct results.

Dijkstra's algorithm was designed for static graphs, and using it on a dynamic network in this way does lead to some errors. It actually solves the minimum-time variant of finding the shortest path on a network with time-dependent edge weights. There are some corner cases, though, where the answer it gives does indeed get you there the quickest, but elects that you walk instead of take a bus on some leg, even when you could have caught a bus and arrived at your destination at the same time.

Diagram of a time-dependent network

Consider the network above. Circle nodes (S, T) represent normal street vertices. Square nodes (1, 2, 3, 4) represent bus stop vertices. Triangle nodes are bus route vertices. Bidirectional, thick-headed arrows between two street or stop vertices are ordinary walking edges; the number is the walking time in minutes. One-way thick-headed arrows between two route vertices are bus-riding edges; the number again is the time. One-way, blue, thin-headed arrows from stop vertices to route vertices are bus-boarding edges; the time in brackets denotes when the next bus arrives.

We want to get from source S to target T. Watch what happens after we follow Dijkstra's algorithm; the tree below is the resulting shortest-path tree. (Familiarity with Dijkstra's algorithm will be helpful to follow along.)

Diagram of the shortest path tree of the above time-dependent network

Inverted nodes represent when a vertex has been expanded (all of them, in this case). The "d" value next to each vertex is the minimum distance from the source to that vertex, and the "t" value is the time we (conceptually) visited that vertex.

With our shortest-path tree, to get the shortest path, we need to start with the target T and follow in-edges (parent vertices) to find our way back to the source. The path we record as we do this is the shortest path. Here, our shortest path is [S, 4, 8, 9, T].

Do you see the problem? The result is that we are to walk all the way from S to 4 in order to catch the bus that starts at 7, passes through 8, and finishes at 9. The problem is that we could have caught the bus starting at 5 and finishing at 6 and still have time to catch the final bus. The reason that is not the path Dijkstra's algorithm chose is that 2 is cut off from 3: we can get to 3 with a lower cost via [4] than via [1, 5, 6, 2]. But for what we want, that doesn't matter—we still have to wait until 10:20 to catch the bus leaving 7. If we consider the path we want, [S, 1, 5, 6, 2, 3, 7, 8, 9, T], we'll see that the total cost from S to 8 should be 87, not 102. That's because the time at which we would be visiting 3 is 10:15, which lowers the weight of edge (3, 7) from 35 to 5.

The path I described is the minimum-cost path, but Dijkstra's algorithm gives us a minimum-time path. (Ours is also a minimum-time path, for this network.)

En Route has a workaround: after it has found a minimum-time path, it does another search from the last route vertex on the path to the starting point. If the result is better, it replaces everything up to that last route vertex with the result. This yields the minimum-cost path for the example network above and a few other corner cases I've found, but it's still not correct.

The algorithm we really want for En Route is one that will solve the minimum-cost shortest path problem. Research and findings on such algorithms for the way En Route models data (continuous time) is relatively new, and frankly I'm having a hard time with the mathematical notation used in these papers. Implementation of a correct algorithm for this problem is a goal of mine, but it can wait a little bit.

Data sources

The STA has graciously provided bus path and schedule information, and bus stop information. En Route Spokane would have been an infeasible project without this data.

En Route Spokane uses the following GIS datasets published by the City of Spokane:

Information for a great number of area businesses, listed in the Landmarks menus, was obtained from the Experience Spokane web site and the Pacific Northwest Inlander's venue listing.