Making a City Simulator

6 minute read


Things come together pretty seamlessly: I’m pretty into urbanism (with the intent to eventually put together a serious research project), I wanted (and had to) practice my OOP for my introductory comp sci class, and I find some kind of catharsis in seeing lots of little people move around (see 700+ hours on Cities: Skylines).

So I indulged my dorkiness and decided to create a basic simulation of an urban area’s transport network, with a few goals:

  • create a convincing heirarchy of objects and utilize data structures to satisfy my CS prof
  • see how many agents I could have with a stable simulation
  • watch the little guys move around on the screen

Rather quickly, I decided I wanted to model New York, because I was always interested in exploring some of the really cool datasets around that detail the city’s travel habits. The first problem became getting the correct geographical data on-screen. This turned out to be fairly straightforward thanks to node and line data I found online, which was easy to normalize to the coordinate formats used by the graphics “library” provided by my prof.

Next, I had to map each station node to its respective line, and get all the lines properly in order. This part was done manually using line station lists from the MTA’s website. So far, so good.

Now came the simulation part. There are 3 classes of agent representable onscreen: a Node (station), a Train, and a Citizen. Nodes are static, citizens move freely on the map (or on trains) and trains move on preset paths between stations. The train part is easy: each train just has to store an ordered array of stops for it to move between. For convenience, these are stored as Line objects, which are also able to reference the trains moving along them. A point of failure here: the positions of trains onscreen are generated by linearly interpolating between node coordinates, not by following the line geometry visualized on-screen. This is because properly ordering the line geometry (given its unordered dataset) and assigning it to the correct Line objects in the simulation would have sucked, and simply using nearest-node pathfinding to move along the geometry resulted in visual glitches and inaccuracies. So the trains float a little. But otherwise, using a time increment that is updated every iteration of the main loop (and that can be adjusted to change simulation speed), trains either move between stops or wait at them, 24/7–just like the real thing.

The Node objects are more than just static blobs on-screen. Each Node stores its neighbors (generated via Line objects), as well as nearby walking transfers, used in the A* algorithm for citizen pathfinding. Walking transfers are generated efficiently by slicing the coordinate plane into a grid during initialization, and then only checking for neighboring nodes under a threshold distance within each node’s subgrid and its adjacents. This grid is also used for efficient nearest-node detection on the mouse pointer. All of this isn’t quite necessary with only ~500 nodes, but it is still nice to have.

Image of system nodes

After all that, I had to get citizens on-screen. This is where the fun stuff happens. Both Node and Train objects are extensions of CitizenContainer, a self-explanatory class used to simplify the simulation logic and enable visuals (the number of stored citizens alters a Node or Train’s size onscreen. At first, the citizen algorithm randomly picked an initial and destination Node, generated a path using A* (each node having stored its neighbors), and moved along that path until despawning at their destination. To move along paths, citizens either walk to a station, wait at a station, board a waiting train at a station, or remain on a train until exiting at the appropriate stop. Time delays are used for deboarding trains and transferring lines. A system of distance-based weights is used to ensure realistic behavior, with citizens only walking short distances and tending to strongly prefer the train (e.g. for trips like a 10 minute walk in Midtown which could be a 2 minute train ride). Penalties are applied for individual stops (incentivizing express lines) and line transfers, so that citizens will tend to stay on express lines for as long as possible, then move to local lines, and switch only when necessary. The implementation of transfer penalties was one of the most surprisingly easy aspects of the project–it only required a basic wrapper class inserted into the A* algorithm. Overall, manual trip comparisons with Google Maps show pretty reasonable behavior, albeit with another weak point: citizens do not consider train timetables when pathfinding, so a train that is 500 simulation ticks away is as good as one that is about to arrive. In addition, trains are never late, and their speed, headways, and stalled time at stops are not perfectly proportional to real life (mostly for visual effect), which honestly nearly defrauds my ability to call this a “simulation” of New York.

Regardless, I then applied pre-pandemic ridership data to stops to properly weight citizen path selections. Citizens also spawn near stops, not on them, mostly for visual effect. Adding some extra controls to the drawing of agents onscreen, panning and zooming, simulation speed adjustments, line colors, and other basic quality-of-life changes made the simulation kind of fun to just sit back and watch. The last step was adding some element of interactivity, which is enabled by spawning citizens at the mouse (who will then pathfind from that location) as well as ridership statistics of stations nearest to the mouse.

Animated GIF of running simulation

Overall, this project wasn’t crazy. It’s not crazy complicated. It’s not crazy well-optimized (although working on that and seeing how much I can boost performance is certainly a goal of mine). But it’s a pretty nice and Java-y OOP exercise that was a lot of fun to make–and I’m hoping it’s the first in a line of urban simulation models that will start to show some serious promise.

Check out the code here.