Archetypal ECS Considered Harmful?

The Inherent Mendacity of Benchmarks

If you’ve heard of ECS, you’ve probably heard that its primary benefit is in terms of performance. Maybe you’ve seen benchmarks showing a million entities on screen, processing at a blazing speed.

You should be asking yourself: what conditions make this level of performance possible?

I have long argued that the primary benefit of ECS is in terms of organization. Composition enables you to define sensibly repeatable structures. Once you’ve built up a toolbox of components and systems, you can find yourself building new kind of game objects just by grabbing components off the shelf without writing any new game logic. Even when you have to build new behavior, it’s usually obvious where logic goes. The benefit to your sanity is enormous. If ECS was demonstrably slower than traditional game architecture patterns like the actor model, I would probably still use it only for this reason.

Everyone seems enamored by the idea that ECS will magically make their game more performant, somehow. But performance has conditions and costs. The incredible benchmark results you see are obtained by a pattern called Archetypal ECS, and the benefits of this architecture are far from universal. The needs of most real-world game designs do not conform to these ideal situations that produce ridiculous performance. The developers of these libraries are not making actual games with these tools. In fact, nobody is.

Prioritizing performance metrics above all else has been profoundly detrimental to the adoption of ECS – high-performance designs are often cumbersome ones. What is needed are primarily ergonomic ECS systems which use simple structures that make the programmer’s life easier. And as I am going to demonstrate, the performance tradeoffs in many real-world use cases are actually better with simple designs.

First, some theory…

The simple ECS approach: sparse storage

The sparse storage pattern of ECS is as follows:

  1. Entities are IDs.
  2. Components are stored per-type in contiguous arrays.
  3. The component storage maintains a lookup per component type from entity ID to storage index.

Checking if an entity has a component type is as simple as checking if the component type storage contains the entity ID. Retrieving a component involves looking up the storage index using the entity ID. Adding a component is as simple as adding a component to the end of the array and adding a new entry in the ID-to-index lookup. Removing one is as simple as removing the entry at the storage index.

So what’s the problem? Let’s say you want to iterate over all the entities that have a Position and Velocity component, and add Velocity to Position. When retrieving those two components, you have to arbitrarily access locations in two separate arrays. That’s going to cause cache misses, so this is obviously not optimal data access. Which brings us to…

The theoretical performance benefits of archetypal ECS

Archetypal ECS, also known as the dense storage strategy, operates on, you guessed it, archetypes. An archetype is defined as a grouping of entities that contain the same set of components. For example, every entity that has Position and Velocity components is in an archetype together. Every entity that has Position and Velocity and Acceleration components is in another archetype together. And so on and so forth.

The key optimization of archetypes is that they store information close together in memory. For a set of entities that all have the same component structure, all of their Position structs are in one array together, all of their Velocity structs are in one array together, and so on and so forth.

Consider our above case with Position and Velocity components. This is an ideal situation for cache locality to kick in - you are just adding a value in array 1 to a value in array 0, and the index of both of them increments by 1. The processor pipeline can do perfect predictions, everything is sunshine and roses in hardware land.

My project: Samurai Gunn 2

I am the lead programmer on Samurai Gunn 2, and we are using ECS to build the game. One major justification for switching to ECS was the need to support rollback netcode.

Rollback netcode has a few requirements to work well:

  1. Updates have to be deterministic: the same state and the same inputs should always produce the same next state.
  2. Updates have to be fast.
  3. Taking a state snapshot has to be fast.
  4. Reverting state via a snapshot has to be fast.

ECS is a great architecture for this. We don’t have to worry about individually writing procedures to save the state of gameplay-critical objects - we can just directly copy data around because that’s what constitutes the game state.

We are using a library I built called MoonTools.ECS. It uses sparse storage, and the main entity access pattern is through a concept called Filters. Filters define components that are included or excluded. Any time an entity’s component structure is altered, the relevant Filters are checked, and if the entity satisfies that Filter’s conditions, it is added to the Filter.

When implementing rollback snapshots, the initial idea I hit upon was that entities that were gameplay critical could just have a Rollback component added to them. Then the snapshot would only save the state of those entities.

In practice, since other Filters would have rollback entities interleaved with non-rollback entities, I was having issues where entities would iterate in different orders after a rollback, which would lead to desynchronization. It was also slow - when copying the storages, I had to filter components in their storages one by one.

I started thinking that archetype storage could help me speed this up and ensure correctness - anything with a rollback component would necessarily be in a separate archetype from anything without a rollback component, so their order would be preserved and the copies would be much faster. I had heard that archetypes had other benefits, like good iteration performance. It was a reasonable enough idea. So I started researching how other ECS libraries were built.

A survey of the ECS landscape

There are quite a few ECS libraries out there these days. Let’s examine a few of them.

Flecs

When you ask for ECS library recommendations, you’re pretty much always going to hear about Flecs.

Flecs introduced an extremely powerful innovation to ECS - the concept of an entity relationship. You can relate one entity to another via a relationship containing metadata. For example, a player entity can be related to a character entity via a Controls relationship. One entity could follow another entity via a Follow relationship. This allows you to conveniently express patterns that would not be possible only via components. When I read about this idea I immediately borrowed it for my own ECS library and it was like a missing link that let me get rid of all kinds of awkwardly structured patterns that I had used before.

Beyond that, the main innovation of Flecs is that it’s structured like a database, to allow you to perform complex data queries on the structure of the world. It has all kinds of neat design tricks, like the fact that every component type is actually also an entity itself. This stuff is like catnip for a certain type of programmer. Leibniz himself would weep at its purity.

Flecs deeply integrates relations into its archetype structure, because specific entities are also type IDs. This is where things start to get a little crazy.

I’ll let Mertens speak for himself about this implementation:

A problem our current implementation has is that archetypes aren’t cleaned up when an entity used in a pair is deleted. For example, if we have an archetype with (ChildOf, my_parent), and we delete entity my_parent, the archetype won’t get cleaned up. This is problematic not just because it leaks memory, but also because entity ids are recycled, and my_parent could be reused for an entirely different purpose.

To solve this, we need some way to cleanup archetypes when entities used by archetypes are deleted. This doesn’t necessarily just apply to entities used in pairs, and can also apply to regular component entities.

This is where things get a bit more complicated. To cleanup archetypes, all references to the archetype must also be deleted. This means that it must be deleted from the hashmap that finds archetypes by component id vector, and, if the ECS implements it, all incoming and outgoing edges from the archetype must also be cleaned up. The archetype must also be unregistered from the component index. When a deleted archetype contains an entity that’s used by another set of archetypes, those archetypes also have to be cleaned up.

Additionally, query caches must be notified to delete all instances of the deleted archetype (instances- because wildcard queries can cause an archetype to get inserted multiple times). If up to this point a query cache was a simple vector of archetypes, a new data structure will have to be introduced for more efficient removal. Otherwise you’d get an O(AQN) operation, where A=the number of deleted archetypes, Q=the number of queries, and N=the number of archetypes per query cache.

Something that complicates cleanup is that cleaning up archetypes can create new archetypes. For example: when entityApples is deleted, all entities in archetype Position, (Eats, Apples) need to be moved to archetype Position. It is not guaranteed that this archetype already exists, which means that archetype cleanup can cause archetype creation. Furthermore, this newly created archetype can in theory also contain an id that is about to be deleted. This is one of the bigger tasks, but essential for relationships as it guarantees our storage doesn’t have dangling references, and makes sure our relationship pairs don’t suddenly point to garbage entities.

Estimate: 8 weeks

If you’re like me, you’re probably scratching your head wondering how on earth this is ever supposed to be efficient. To summarize, any time an entity is deleted, you might have to:

  1. Delete any archetypes pertaining to that entity relation
  2. Clean up the archetype graph
  3. Clean up all related archetypes
  4. Clean up all query caches that reference the deleted archetype
  5. Potentially create a new archetype… which might contain data that is also going to be deleted

Any time I see a design like this klaxons start blaring in my head. I value simplicity enormously. It’s not clear to me in the first place that providing such high level query abstractions is beneficial to your ability to implement a game. The maintenance burden of this design is clearly enormous and probably impractical. But at this point I figured that I could still implement archetype storage over components without resorting to implementing entity relations in such an abstract way.

While I’m at it, let’s look at some of the other ECS offerings out there.

Svelto.ECS

Here is an example of a “simple” setup for Svelto.ECS:

public class SimpleContext
{
    //the group where the entity will be built in
    public static ExclusiveGroup group0 = new ExclusiveGroup();

    public SimpleContext()
    {
        var simpleSubmissionEntityViewScheduler = new SimpleEntitiesSubmissionScheduler();
        //Build Svelto Entities and Engines container, called EnginesRoot
        _enginesRoot = new EnginesRoot(simpleSubmissionEntityViewScheduler);

        var entityFactory   = _enginesRoot.GenerateEntityFactory();

        //Add an Engine to the enginesRoot to manage the SimpleEntities
        var behaviourForEntityClassEngine = new BehaviourForEntityClassEngine();
        _enginesRoot.AddEngine(behaviourForEntityClassEngine);

        //build a new Entity with ID 0 in group0
        entityFactory.BuildEntity<SimpleEntityDescriptor>(new EGID(0, ExclusiveGroups.group0));

        //submit the previously built entities to the Svelto database
        simpleSubmissionEntityViewScheduler.SubmitEntities();

        //as Svelto doesn't provide an engine ticking system, it's the user's responsibility to
        //update engines
        behaviourForEntityClassEngine.Update();
    }

    readonly EnginesRoot _enginesRoot;
}

Would you seriously want to make an entire project that looks like this? If you do, I recommend you go to the doctor and get some treatment for your case of Enterprise Software Brain. I actually cannot believe that someone unironically designed this in their free time. This shit makes Bjarne Stroustrup look like Antoine de Saint-Exupéry.

Unity DOTS

Much ado has been made about Unity’s Data Oriented Tech Stack for the past few years. One of the main selling points is their Job System, which allows for multithreaded updates. Let’s take a look at some code.

namespace ExampleCode.IJobEntitys
{
    [WithAll(typeof(Apple))]
    [WithNone(typeof(Banana))]
    [BurstCompile]
    public partial struct MyIJobEntity : IJobEntity
    {
        public EntityCommandBuffer.ParallelWriter Ecb;

        [BurstCompile]
        public void Execute([ChunkIndexInQuery] int chunkIndex, Entity entity, ref Foo foo, in Bar bar)
        {
            if (bar.Value < 0)
            {
                Ecb.RemoveComponent<Bar>(chunkIndex, entity);
            }

            foo = new Foo { };
        }
    }

    public partial struct MySystem : ISystem
    {
        [BurstCompile]
        public void OnUpdate(ref SystemState state)
        {
            var ecbSingleton = SystemAPI.GetSingleton<BeginSimulationEntityCommandBufferSystem.Singleton>();
            var ecb = ecbSingleton.CreateCommandBuffer(state.WorldUnmanaged);

            var job = new MyIJobEntity
            {
                Ecb = ecb.AsParallelWriter()
            };

            state.Dependency = job.Schedule(state.Dependency);
        }
    }
}

My eyes are glazing over already. Maybe just regular old system updates will be better.

namespace ExampleCode.Queries
{
    public partial struct MySystem : ISystem
    {
        [BurstCompile]
        public void OnUpdate(ref SystemState state)
        {
            EntityQuery myQuery = SystemAPI.QueryBuilder().WithAll<Foo, Bar, Apple>().WithNone<Banana>().Build();
            ComponentTypeHandle<Foo> fooHandle = SystemAPI.GetComponentTypeHandle<Foo>();
            ComponentTypeHandle<Bar> barHandle = SystemAPI.GetComponentTypeHandle<Bar>();
            EntityTypeHandle entityHandle = SystemAPI.GetEntityTypeHandle();

            NativeArray<ArchetypeChunk> chunks = myQuery.ToArchetypeChunkArray(Allocator.Temp);

            for (int i = 0, chunkCount = chunks.Length; i < chunkCount; i++)
            {
                ArchetypeChunk chunk = chunks[i];

                NativeArray<Foo> foos = chunk.GetNativeArray(ref fooHandle);
                NativeArray<Bar> bars = chunk.GetNativeArray(ref barHandle);
                NativeArray<Entity> entities = chunk.GetNativeArray(entityHandle);

                for (int j = 0, entityCount = chunk.Count; j < entityCount; j++)
                {
                    Entity entity = entities[j];
                    Foo foo = foos[j];
                    Bar bar = bars[j];

                    foos[j] = new Foo { };
                }
            }
        }
    }
}

You know, maybe that Job structure wasn’t so bad after all. Having to get chunk handles and array handles and write manual loops for them in every single system you ever write is definitely going to get irritating fast. Maybe it won’t feel that bad when compared to Unity randomly corrupting your asset database in the middle of your workday, or the company announcing that they’re going to charge you per-install fees.

DefaultECS

DefaultECS is another C# ECS library. It has some pretty robust features, like an analyzer to provide codegen and usage warnings.

public sealed class VelocitySystem : AEntitySetSystem<float>
{
    public VelocitySystem(World world, IParallelRunner runner)
        : base(world.GetEntities().With<Velocity>().With<Position>().AsSet(), runner)
    {
    }

    protected override void Update(float elapsedTime, in Entity entity)
    {
        ref Velocity velocity = ref entity.Get<Velocity>();
        ref Position position = ref entity.Get<Position>();

        Vector2 offset = velocity.Value * elapsedTime;

        position.Value.X += offset.X;
        position.Value.Y += offset.Y;
    }
}

This really doesn’t look half bad. I could probably use this and feel fine about it. (It is interesting to note that the library is clearly using a sparse storage pattern.)

There are a lot more of these kind of libraries out there but I’m not really interested in enumerating all of them. My point was that it’s pretty obvious which one of these I would care to use on a day-to-day basis. Let’s move on.

Optimization is about identifying bottlenecks

Premature optimization is the root of all evil (or at least most of it) in programming.

-Donald Knuth

This is arguably the most misunderstood quote in the history of computing. Intellectually lazy programmers take this as an invitation to ignore optimization completely. Nobody with any self-respect believes this. Clear optimizations present themselves all the time when you are selecting data structures: if your program needs to check membership in a set frequently, then a hashset is obviously the correct choice over a linked list. This isn’t premature optimization – it’s just optimization. But the sentence immediately preceding this quote is this:

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times.

In other words, Knuth is inviting us to ask - What does our program actually do? What are the real hot paths?

Are games as simple as a million entities with position and velocity? Are games just about adding numbers to each other in tight loops? Are we data scientists now? Are Excel spreadsheets the next hot gaming platform?

The realities of archetypal ECS

When an entity is created, it exists in the “empty” archetype. As components are added or removed, the entity’s component data is copied between archetypes. For example, when you add a Position component to an empty entity, it is moved into the archetype containing only Position components. When you add a Velocity component to that entity, it is moved into the archetype containing only Velocity components, and its Position component is copied into that archetype storage. This network of component types forms the archetype graph.

If you’re clever, you are probably already noticing a potential problem with this structure. What happens when you remove a component from an entity that has, say, 40 components on it? (This is by no means an unreasonable amount for a complex enough game.) That’s right – you have to copy the remaining 39 components into another archetype structure. If you immediately add another component after that? That’s right – you have to copy those 39 components again. If you’re doing this frequently enough, you are churning data between different locations in memory constantly.

When you treat entities very dynamically, you cause an explosion of archetypes. In other words, the more fragmented your entity structures become, the less benefit you gain from the data locality of archetypes.

With this in mind, let’s go over some of the conditions that have to be satisfied for Archetypal ECS to give you those amazing benchmarks:

  1. You have lots of entities with extremely, if not exactly, similar structure.
  2. The structure of most entities changes not at all or only infrequently.
  3. Entities can be updated without need to reference separate data structures.

When you see a benchmark that says an ECS library is able to update 100k entities in 9 milliseconds, this does not imply that it can update 1 entity in 9 microseconds in all cases. This is an optimization that is produced under very specific conditions.

Take the incredibly obvious example of collision detection, something almost every game is going to need. The naive approach is to just compare the position of every object to the position of every other object in the world. Even the fastest data access in the world isn’t going to help you make an n-squared algorithm have acceptable performance.

So you’re going to need a data structure like a spatial hash or an octree or whatever. All the benefits of your perfect data locality are now completely destroyed, because you need to maintain an external data structure to make efficient comparisons.

One of the best features of ECS architecture for the kinds of games I work on is that the structure of an entity is modular - you can modify the behavior of an entity on the fly by simply adding and removing components. An architecture that discourages you from treating entities this way is a huge red flag.

A real-world case study

In Samurai Gunn 2, the primary gameplay entities are characters - there are at most 4 of them active at one time. Their structures vary between characters, because their specialized capabilities are defined as components, so they will almost always be placed into separate archetypes. There are bullets, which at the absolute most there might be 20 of them on screen, in an extremely rare case, but they often have special properties that cause them to behave differently. There are sword slashes, again, a maximum of 4 on screen at once. You might see where I’m going with this.

You might say, well, what if you just design these entities so that they have the same component structure, but the state of the components internally vary? I would say: Why the hell would I want to do that? Why should I settle for reducing the expressiveness of my design? Isn’t the whole point of building games this way to elegantly compose behaviors? Why would I go out of my way to structure my entire architecture for optimal performance in an extreme case that will never be relevant to my game, when I can achieve more than acceptable performance in my actual use cases with a sparse storage pattern?

Bamboo is implemented as 16x16 tiles. Solid objects collide with bamboo, and they can be destroyed by sword slashes or bullets. There might be around a hundred of these on screen in certain levels, so that’s approaching a scenario where cache locality might matter. However, bamboo doesn’t really need to update its state every frame, unless something collides with it, which as I’ve mentioned above, is the result of a check into a collision acceleration structure and cannot be optimized directly in the ECS data structures.

Maybe you’re thinking: OK, archetypal ECS doesn’t do anything for Samurai Gunn, but there might be some game designs that could really benefit from this kind of data layout. Sure, of course. My argument was never that archetypal ECS is universally bad. All software architectures are about tradeoffs, and identifying which things you can trade off in the specific performance characteristics of your game. Cities Skylines 2 uses Unity DOTS and it apparently fixed the CPU bottleneck issues they were having in the first game. (Unfortunately the renderer seems incomplete and is causing serious GPU bottlenecks now.) A giant agent simulation is basically the exact use case that justifies archetype ECS. My argument is that a majority of games will never benefit from this kind of structure, and it can in fact be detrimental.

In case you think my objections to archetypal ECS are totally theoretical, after spending weeks reworking my entire ECS storage structure, snapshots were pretty fast - about 0.5ms, which is great. However, I was getting an absolutely blazing 5 frames per second running Samurai Gunn 2. The previous system hovered between 300-500FPS in debug builds. There were definitely optimizations I still could have done at that point - traversing the archetype graph on queries was slow, and I could have cached things, and so on and so forth - but I didn’t care anymore. The implementation complexities were just not worth it when the old system had great performance with a much simpler design. I’m making dumb twitchy action games, so I’d rather just use a dumb twitchy architecture.

The main insight I was able to get from rebuilding the storage was that separating gameplay-critical state into its own World would avoid all of the inconsistency and speed issues we were having. This is entirely possible to do with my original sparse storage and Filter design. I wish it hadn’t taken 3 weeks of rebuilding storage structures to realize that. My critical mistake was forgetting that “good performance” does not exist in a vacuum.

Don’t take my word for it

After I finally concluded that redesigning my ECS library’s storage to be archetypal was a complete waste of time, I started to wonder what other studios were using in practice and if they arrived at similar conclusions to me. As far as I’m aware, the only major commercial project I can think of in recent years that definitely used ECS was Overwatch. I skimmed the Blizzard GDC talk about ECS one more time and saw this snippet of code in a slide:

void PlayerConnectionSystem::Update(f32 timeStep)
{
    for (ConnectionComponent* c: ComponentItr<ConnectionComponent>(m_admin))
    {
        InputStreamComponent* is = c->Sibling<InputStreamComponent>();
        StatsComponent* stats = c->Sibling<StatsComponent>();

        ...
    }
}

This is obviously a sparse storage pattern – this system is iterating over each ConnectionComponent, and is able to obtain sibling components of specific types from that reference. So if I haven’t managed to convince you that sparse storage is perfectly acceptable for shipping a game, well, they shipped Overwatch using it.

Final thoughts

After rebuilding MoonTools.ECS to use archetypal storage and finding that the performance was totally abysmal, I found myself asking a very important question. Why was I listening to a bunch of people who have never made a game with their own tools? What was I thinking?

We live in an extremely strange time for programming. Libraries and tools proliferate, designed abstractly in vacuums, never having been used to actually make anything. But what insight could you possibly have about designing a tool when you’ve never actually needed it for a definite purpose?