Migrating CitySim from Class Hierarchy Sim to ECS Sim

I've been using Daniel Greenheck's CitySim.js as a sandbox for experimenting with simulation architecture. Not to improve or comment on Daniel's design — his goals are unrelated to mine — but because it gives me a fully-functioning city sim with a Three.js presentation layer I can work with, and it's a stepping stone for my other city sim projects.

I've been making modifications to this project on and off for a while now. The motivation for writing about it now is that the transition from Object-Oriented to ECS thinking has been the most interesting (to me) of my experiments in this code base. But I should cover some ground before getting to the actual topic. In brief: I took Daniel's code, spent some effort separating the simulation from the presentation, moved the simulation to a node.js server using the CQRS pattern, then ported that server from Node.js to C# using Wolverine for the CQRS plumbing. This post is about my latest refactoring — migrating from a per-object, class-hierarchy simulation to an Entity Component Systems architecture — and what that actually involved.

Getting to the Starting Line

Daniel's original code runs entirely in the browser. The simulation and the Three.js rendering are interleaved — building objects directly manipulate the scene, the scene reads from building objects. That's perfectly reasonable for a browser demo, but it makes the simulation hard to reason about in isolation and impossible to move off the main thread, let alone off-process.

My first refactor was purely structural: extract a clean boundary between simulation and presentation. The simulation runs on a 1-second tick and emits events. The presentation maintains its own read model and applies those events. The two sides never share object references. Commands (place building, bulldoze) flow from presentation to simulation; events (tile changed, stats changed) flow back. A full world snapshot fires once when a client subscribes, so the presentation can initialize itself without querying the sim world directly. This is the Command Query Responsibility Segregation (CQRS) pattern — nothing exotic, just discipline about which side owns reads and which side owns writes.

Once that boundary existed I moved the simulation to a Node.js server. The presentation lives in the browser, the simulation runs in a Node.js process, and they talk over HTTP (commands) and WebSocket (events). The browser never imports from the sim layer. The events are plain JSON. The presentation doesn't know or care where they come from.

I then ported the simulation from Node.js to C#. The events remained the same, using the same JSON structure, wire format and WebSocket envelope. The browser client needed zero changes. The C# server implements the same tick logic, the same BFS power distribution, the same development state machine. It's just a different language running the same algorithm, publishing the same events.

The Object Simulation

The dotnet_cqrs tagged simulation is object-oriented in a fairly classical sense. Each building type is a class: ResidentialZone, CommercialZone, IndustrialZone, Road, PowerPlant, PowerLine. Each building object owns its own state — a residential zone has a list of Citizen references, a development level, a max-residents count. The world is a 2D array of Tile objects, each tile optionally holding a building reference. Citizens hold direct object references to their residence and workplace. It's a graph of mutually-referencing objects.

Behavior is layered on top through a module composition pattern. BuildingManager holds singleton instances of DevelopmentModule, ResidentsModule, JobsModule, CommerceModule, and RoadAccessModule. We call Simulate on each tile and based on the tile's type call Simulate on the appropriate modules with the tile reference:

public void Simulate(World world, Tile tile)
{
    if (tile.Building is null) return;
    foreach (var m in Modules(tile.Building))
        m.Simulate(world, tile.Building);
    // ... update building status
}

private IEnumerable<SimModule> Modules(Building b) => b.Type switch
{
    BuildingType.Residential => [_roadAccess, _development, _residents],
    BuildingType.Commercial  => [_roadAccess, _development, _jobs, _commerce],
    BuildingType.Industrial  => [_roadAccess, _development, _jobs],
    BuildingType.PowerPlant  => [_roadAccess],
    _                        => [],
};

Modules switches on building type and returns the right combination. Residential gets road access, development, and residents. Commercial adds jobs and commerce. Roads get nothing. Adding a new building type means adding a class and a case in Modules. Adding a new behavior means writing a module and wiring it in.

The tick iterates over the full tile grid, calls Simulate() on each tile's building, then handles resident move-ins and the citizen update pass. Power distribution runs before all of that as a service — a BFS from each power plant through adjacent tiles. Anything that changed has its Updated flag set to true, and at the end of the tick those tiles become TileChanged events.

This is clean code. It models the domain directly — a zone has residents, a citizen has a residence — and it's easy to debug because you can inspect any object and see its full state. I'm not replacing it because it's bad. I'm replacing it because I want to see what the ECS version looks like, and because the ECS version has some properties that become more relevant as the simulation grows.

What ECS Actually Means

Entity Component Systems is one of those patterns that gets described in vague terms ("data-oriented!", "cache-friendly!", "decoupled!") but it usually gets misunderstood as a System that uses Entities with Components. It's certainly how I first thought of it.

The core idea: Entities are just IDs. Components are pure data structs attached to entities. And Systems query for entities that have a particular combination of components and process them in bulk. That last part is the real paradigm shift and easily overlooked. You don't operate on Entities, you operate on queries that match specific Components. This is where data-oriented comes from: You don't care what an entity represents, just what it consists of. If friend shaped, apply friend system.

A residential zone isn't an object of type ResidentialZone. It's an entity that has GridPosition, Development, PowerConsumer, PowerConductor, RoadAccessUser, and Residence components. An industrial zone has the same set minus Residence, plus Employer. A road has GridPosition, Road, and PowerConductor. The type is implied by the combination of components present.

I'm using Arch as the ECS library for this. Arch organizes entities into archetypes — groups that share the same set of component types — and stores each archetype's components in tightly-packed arrays. When a system runs a query, it gets direct references into those arrays without heap allocation per entity or pointer chasing through an object graph.

Components as Value Types

To get the memory layout benefits Arch offers, components should be value types. C# record struct is the natural fit:

public record struct Development(
    int Level = 0,
    int AbandonmentCounter = 0,
    int ConstructionCounter = 0,
    DevelopmentState State = DevelopmentState.Undeveloped,
    int MaxLevel = 3);

public record struct PowerConsumer(int Required = 10, int Supplied = 0);

public record struct RoadAccessUser(bool HasRoadAccess);

Pure data without behavior or inheritance. The query lambdas take ref parameters that point directly into Arch's internal arrays — mutations are in-place.

record struct default values are not zero-initialization defaults

A record struct primary constructor with default parameters looks authoritative:

public record struct Development(
    DevelopmentState State = DevelopmentState.Undeveloped, ...);

But C# always synthesizes a separate parameterless constructor for structs that zero-initializes all fields. When you write new Development() with no arguments, the compiler resolves it to that constructor — not the primary constructor with all its defaults applied. The primary constructor defaults only take effect when you call it explicitly with at least one argument, or with named parameters.

The practical consequence is that the true "default" of any record struct is determined entirely by the zero values of its field types (0 for numerics, false for bool, null for references, and 0 — i.e. the first member — for enums).

This bit me when I first ran the refactored code. My Developments always got Abandoned immediately, rather than being Undeveloped and it took me way too long stepping through the debugger before I understood the root cause.

For this reason I've adopted a pattern of never defining defaults, but providing a Create static method that provides the contract that record struct implies:

public record struct Development(
    int Level,
    int AbandonmentCounter,
    int ConstructionCounter,
    DevelopmentState State,
    int MaxLevel)
{
    public static Development Create(
        int level               = 0,
        int abandonmentCounter  = 0,
        int constructionCounter = 0,
        DevelopmentState state  = DevelopmentState.Undeveloped,
        int maxLevel            = 3)
        => new(level, abandonmentCounter, constructionCounter, state, maxLevel);
}

Now I have one type signature that works with zero or any subset of parameters and delivers expected struct.

Cutting the Object Graph

The most significant structural change is eliminating the object graph. In the object-oriented version, citizen.Residence is a direct reference to a ResidentialZone instance. In ECS, both citizens and zones are entities — they're IDs. While you can certainly create relationship components that store the referenced entity, there is no automatic traversal and, really, this only works for the child of a parent-child relationship. Since you can't query for a Parent component with a certain value, you'd have to iterate over all entities with Parent components and manually filter, which is likely not performant for the needed task.

My solution is external lookup tables:

public readonly Dictionary<Entity, HashSet<Entity>> ResidentsByResidence = new();
public readonly Dictionary<Entity, HashSet<Entity>> EmployeesByEmployer = new();

A residential zone entity maps to the set of citizen entities living in it. An employer entity maps to its workers. All systems receive my World instance as a parameter providing both the Arch World and my lookup tables.

This is less elegant than zone.Residents -- I still have to query the ECS by entity ID manually to follow the relationship after I look it up. I don't love it, but I also haven't found a better answer that stays within the ECS model. Some ECS libraries like flecs provide this facility directly, including optimization for traversal. There is also a Relationships add-on for Arch, but I haven't taken a closer look at it.

What the Systems Look Like

The DevelopmentSystem is a good example of the pattern. It declares a query description — entities that have GridPosition, Development, BuildingState, PowerConsumer, and RoadAccessUser — and the Run() method iterates over everything that matches:

public class DevelopmentSystem : SimSystem
{
    public DevelopmentSystem() : base(new QueryDescription()
        .WithAll<GridPosition, Development, BuildingState, PowerConsumer, RoadAccessUser>())
    { }

    public override void Run(World world)
    {
        world.Ecs.Query(in QueryDescription,
            (Entity entity, ref GridPosition position, ref Development development,
             ref BuildingState buildingState, ref PowerConsumer powerConsumer,
             ref RoadAccessUser roadAccessUser) =>
            {
                CheckAbandonmentCriteria(ref development, ref powerConsumer, ref roadAccessUser);

                switch (development.State)
                {
                    case DevelopmentState.Undeveloped:
                        if (MeetsDevelopmentCriteria(ref powerConsumer, ref roadAccessUser) &&
                            Random.Shared.NextDouble() < SimConfig.Modules.Development.RedevelopChance)
                        {
                            development.State = DevelopmentState.UnderConstruction;
                        }
                        break;
                    // ... UnderConstruction, Developed, Abandoned
                }
            }
        );
    }
}

The system doesn't know about ResidentialZone or CommercialZone. It processes everything with a Development component. If I add a new building type that develops, I attach a Development component and it participates automatically. I don't have to remember what modules I need to include for the new building type.

The Power BFS carries over almost directly. The logic is identical — interleaved BFS from each power plant, sharing load evenly, conductance limited to entities with a PowerConductor component. The difference is that the object-oriented version navigates the tile grid through world.Tiles[x][y], while the ECS version queries for power plants and consumers and then navigates by the GridPosition component:

foreach (var neighbor in Neighbors(world, position))
{
    var tile = world.GetTile(neighbor);
    if (tile is { } occupant && world.Ecs.Has<PowerConductor>(occupant))
        s.Frontier.Enqueue(neighbor);
}

One thing I had to add here is a CommandBuffer. Arch doesn't allow structural mutations during a query iteration — you can't add or remove components while the query is running. Power supply changes aren't structural (they're just field updates on an existing PowerConsumer), but you can batch structural changes using a CommandBuffer and replay them after the query completes.

This nuance of the architecture is what makes ECS much more suitable for multi-threading than many other game loops. Queries change state only on components they have exclusive access to and they do not modify the storage structurally. That means that many system query loops lend themselves to being run in parallel without the need for locks or other data change coordination.

Change Tracking

In the object-oriented version, change tracking is a boolean Updated flag on each building and tile. The ECS version could do this as a component, but adding and removing an Updated marker component to every entity forces archetype re-writes. This could create a lot of churn that offsets the performance advantages gained.

Instead, my World keeps a HashSet<GridPosition> of changed tiles. Systems call world.TileChanged(position) when they detect a meaningful state change:

private readonly HashSet<GridPosition> _changedTiles = [];

public void TileChanged(GridPosition position) => _changedTiles.Add(position);

public IEnumerable<GridPosition> ResetChangedTiles()
{
    var changes = _changedTiles.ToArray();
    _changedTiles.Clear();
    return changes;
}

At the end of the tick, the simulation drains _changedTiles to produce TileChanged events. Systems have to consciously signal that a position changed rather than setting a flag on the object that happens to live there. It's a slightly different mental model, but it is clean.

The Tick

In the object-oriented version, the tick is a function with a number of loops: loop over tiles, loop over citizens, collect events. In the ECS version it's a sequence of system runs:

_powerSystem.Run(_world);
_roadAccessSystem.Run(_world);
_developmentSystem.Run(_world);
_jobsSystem.Run(_world);
_residentsSystem.Run(_world);
_commerceSystem.Run(_world);
_vacancySystem.Run(_world);
_buildingStateSystem.Run(_world);
_homelessnessSystem.Run(_world);
_employmentSystem.Run(_world);

The ordering still matters — power before development, development before residents — but the tick defines that order of operations in one place as a list of system executions, which makes the ordering much more explicit. Each system is self-contained. Adding a new system is adding a line. Changing what a system processes is changing its query description.

The CQRS Boundary Didn't Move

What didn't change at all is the boundary between the simulation and the presentation. The events — WorldSnapshot, TileChanged, StatsChanged — are structurally identical to what the Node.js and the previous C# server produced. The wire format is the same. The JavaScript client connects, receives the same envelope format, and renders accordingly.

This is why I invested in the boundary before porting the simulation. I threw away the entire simulation internals and rebuilt them in a completely different paradigm, and the browser never noticed. I've always loved CQRS, but often hesitate applying it. Too often it feels like Astronaut Architecture. I'd still been on the fence about whether the CQRS discipline was worth the upfront ceremony here. Having now done two full rewrites of the simulation side while the presentation stayed untouched, I feel confident that this wasn't just me over-engineering.

What Got Better

Composition without inheritance. Adding a building type that develops but doesn't have residents is attaching Development and not attaching Residence. We don't have to worry about where this fits in the class hierarchy: Do we need to create some new fields or worse, nullable fields, in a base class or interface. Do we need to deal with multiple interface inheritance. Do we have to look at all our Modules to make sure the new class invokes the proper ones. With Systems the new entity joins the processing automatically.

Explicit data ownership. Every piece of state lives in a named component. There's no question of whether the simulation state is on the building, the tile, or some module's private field. You look at the component list for an entity and you see everything. Coming from object-orientation it feels like an encapsulation violation, but that's just the instance mindset projecting onto entities. Data is encapsulated, stronger even, by belonging to one specific component.

Isolation. Each system declares exactly what it needs. DevelopmentSystem is genuinely unaware of residents or commerce. PowerSystem doesn't know development exists. The separation is structural, not just conventional.

What Got Harder

Cross-entity relationships. Replacing citizen.Residence with ResidentsByResidence[residenceEntity] adds indirection and makes the code less obviously connected to the domain. The object-oriented version reads like the domain model. The ECS version requires you to remember that relationships live in dictionaries on World.

Archetype discoverability. This model does not change that there are certain shapes and those shapes have names and meaning to us. There exists a residential building and a citizen, but what they compose of is not easily discoverable. This is something that continues to bother me. I'm experimenting with some compile time contract concepts to ease that pain, although I can't yet tell if that isn't just object-oriented me clinging to my past. So I won't elaborate on that any further yet.

Ceremony. Defining fifteen record struct component types with careful attention to separation of concerns and implicit requirements of modification order is a lot of boilerplate and mental accounting. The object-oriented version just needed a new class.

Where This Goes

The ECS simulation is running. The client connects, gets a world snapshot, places buildings, watches zones develop. Everything appears to behave the same as the node.js and the previous dotnet versions. The architecture is sound.

This project has always been a simplistic simulation model for illustration purposes and not really a basis for a full game. I will continue to use it as a sandbox for simulation concepts I want to try before I apply it to the larger scale of my city simulation project.

If I go quiet here, check Mastodon — that's where my ECS experiments will likely surface first.