Stop Making Threads

󰃭 2025-06-03

A folk definition of insanity is to do the same thing over and over again and to expect the results to be different. By this definition, we in fact require that programmers of multithreaded systems be insane. Were they sane, they could not understand their programs. -Edward A Lee, “The Problem with Threads”

A common question about SDL3 is: “How do I use multi-threading with the GPU API?”

In some sense, this is a reasonable question. One of the main motivations for the development of Vulkan and other modern graphics systems was that OpenGL and Direct3D11 contexts were not thread-safe at all. And more threads = more efficiency, right? That said, if you are asking such an open-ended question, it means you have a solution in search of a problem. This is a fundamentally wrong question to ask. Threads are not free performance enhancers. They are expensive to start up and tear down, and they enormously increase the complexity of the code base and the amount of mental effort you have to expend to reason about data integrity. Threads need to be used carefully.

Since I’ve grabbed your attention with this provocative headline, I have to admit: I do use threads in my applications. But I have specific and considered reasons for doing so.

Before we get started: this advice is targeted at your average solo developer or small team shipping games with SDL. If you are a professional working on advanced cutting-edge techniques, this advice does not apply to you.

Basic principles of threading

A thread is an independent execution sequence within a single process. Almost all CPUs nowadays have multiple cores, so using threads increases the CPU utilization. Each thread maintains its own stack, but it has a shared heap with all other threads. On the one hand this makes communication between threads easy, because they all access the same heap memory. On the other hand, this introduces a new kind of memory error: a race condition. Threads run in an unpredictable order relative to each other, and since they are accessing shared memory, that memory can be manipulated in unpredictable ways.

To address this, we can introduce locks which prevent multiple threads from accessing memory at the same time. If two threads try to acquire a lock at the same time, one thread must wait for the other to release the lock. This can introduce the problem of deadlocks, which is when two processes are each holding on to a lock that the other needs to proceed. Both threads are stopped and the program locks up.

When you introduce threads into your program, you need to be careful to avoid concurrency issues like race conditions and deadlocks. These issues can be subtle and extremely difficult to reproduce and debug.

The GPU API provides certain specific thread-safety guarantees. The intended threading workflow is to acquire a command buffer, record commands into it, and submit it all on the same thread. As long as you stick to that workflow, you won’t have any inconsistencies using threads.

An obvious performance principle here is that threads only increase performance if they are actually doing something. If a thread is waiting for something to happen on another thread, it is not increasing the CPU utilization of the program. Keep this in mind as we continue.

Game Application Flow

This is the structure of a typical SDL main thread game loop:

Handle SDL events until the event queue is empty.
Run the game’s update logic.
Run the game’s rendering logic.
Some kind of sleep, whether it’s frame pacing logic or waiting for vblank.

This is pretty straightforward. Some have asked: What if I could squeeze out a few extra cycles by having a render loop on a render thread?

An image I wish I could show to everyone who wants to use threads in the renderer

Don’t use a render thread

The truth is that this has always been a bad idea. I’ve lost track of how many reported issues have chalked up to “someone called OpenGL on a thread”. But it’s just as much of a bad idea as it’s always been even if SDL GPU has certain thread safety guarantees.

I went into detail in one of my other posts about how the GPU is on a separate execution timeline from the CPU. In other words, the processing of GPU commands is already asynchronous, which immediately invalidates one of the primary benefits of using a thread.

In SDL GPU and APIs like it, this asynchronicity is very explicit. You insert commands into a command buffer on the CPU, and submit that command buffer to the GPU when you are ready for those commands to begin executing. In the early days of Vulkan, there were worries that command buffer recording would be relatively expensive. These fears have not borne out whatsoever. Command recording is cheap, and you don’t have to worry about the overhead. As scientist Gene Amdahl once helpfully stated:

“The overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used.”

Command recording is cheap, and most of the actual work is asynchronous, so recording commands on a thread does not significantly improve overall performance. This alone should be reason enough for you to not use a render thread. If you still need convincing, here’s a few more…

Presumably, your render process depends on your game logic. If your render loop is on a thread, we already have a problem, because your render thread is just going to be sitting there and doing absolutely nothing until your game logic is done executing. Worst of all, the complexity of your program has now increased, because it is harder to reason about a multi-threaded process than a single-threaded process. Let’s say your renderer needs to upload some data to buffers every frame, and you’d like to do it as early as possible in the frame. This is incredibly easy to express with command buffers - you simply acquire a command buffer, set up the copy pass, and submit the command buffer, and continue on your way. You can do this at any time, even outside of your rendering logic. Since the command buffer execution is asynchronous, you have increased the throughput of your program, and you didn’t even have to use a thread! There is absolutely no reason to do any kind of complex threading logic here.

There also seems to be a notion that recording multiple command buffers in parallel threads speeds up the render process instead of recording commands on one thread. This is absolutely false. There is almost no practical situation where you will benefit from doing this because command recording tends to depend on the overall state of the renderer anyway. General threading overhead and the amount of synchronization required makes this pointless. If you are having issues hitting your framerate target, you should do performance analysis of your frame on both the CPU and GPU side (all GPU hardware vendors provide GPU performance analysis tools) to find out what is taking the time. Odds are that command recording and submission is not the culprit.

The final reason is that window presentation is extremely thread-sensitive. You must acquire the swapchain texture on the same thread that created the window, and you also must create the window on the main thread. There are good reasons for this - SDL needs to handle synchronization between the swapchain and window state during resize and other changes. Obviously this limits threading options pretty significantly. You could do some harebrained scheme like acquiring the swapchain on a different thread from the rest of your rendering, but why bother? It doesn’t actually accomplish anything because you have to synchronize the threads to record and submit the commands in the correct order anyway. All you will have accomplished is making your program significantly more complex for no benefit.

There is one place in command submission where a thread might be useful, and that’s because certain Vulkan implementations allow the driver to block on present calls. In practice I haven’t seen this being a problem. However, this is completely opaque to you anyway as the client using SDL because we don’t expose a presentation command. We may eventually implement something that handles this automatically, and you shouldn’t try to work around it client-side either way.

You might be considering using threads to reduce input latency. We have mechanisms in the GPU API to reduce input latency, like setting SDL_SetGPUAllowedFramesInFlight to 1, and calling SDL_WaitForGPUSwapchain as late as possible before polling events. You should definitely try that instead of resorting to threads.

I want to be clear that I’m not actually saying to never use a thread in your renderer - I just want to stress that if you’re going to use one, you should have an actual specific problem that the thread solves. I’ve already explained why putting your whole render loop on a thread is a bad idea. Let’s talk about some of those potential good use cases for threads.

When To Use Threads

A good rule of thumb for the question “Should I do this on a thread?” is: no. That said, there are certainly some valid use cases for threads.

One excellent use case for threads is continual background processing tasks. In my C# game framework MoonWorks compressed audio and video decoding occurs on threads. These tasks are computationally non-trivial and the results can be buffered ahead of time, so performing them on a thread is ideal. While a streaming audio voice is active, the thread checks how many audio buffers are enqueued on the voice and decodes a new buffer if the amount is below a certain threshold. This ensures that there are no skips in the audio even if a frame takes longer than usual to process, and it frees up the main thread to do other things.

Threads are useful for reading assets from the disk without blocking the main thread. You could stream in an asset on-demand by performing the read and upload to GPU on a thread. You could also do what I do in my game and just load everything on a thread when the game starts up. I have an example of this sort of workflow as a C# example here. With all that said, we seem to be approaching a world where non-blocking I/O APIs exist that don’t need to use threads at all, so once those are widely available it would definitely be superior to avoid using threads here. (SDL actually already has an AsyncIO API that tries to use async APIs but falls back to a thread implementation if unavailable, but the API doesn’t work with the equally useful Storage API just yet.)

Expensive operations that can tolerate latency are another valid use case for threads. Maybe you have to perform some kind of complex mesh update and it’s fine for the results to appear a few frames late. Go ahead and put that operation on a thread.

Another good use case is for asynchronous operations that do not need to communicate back to the main thread. For example, say you wanted to write the contents of a texture to a PNG file. This is one of those cases where SDL GPU having certain thread safety guarantees is quite nice. You must insert a SDL_DownloadFromGPUTexture command, call SDL_SubmitAndAcquireGPUFence, and then wait on the fence to make sure that the data is ready before reading it out of a transfer buffer. Blocking the main thread with the SDL_WaitForGPUFences call is unnecessary. You could instead fire off a Task that waits on the command buffer fence and then reads the buffer and writes to disk. Tasks execute using a thread pool, so this is a very efficient structure. If you’re not using C#, I recommend using some kind of pre-made thread pool structure to manage the overhead of setting up and tearing down threads.

I’d like to mention one other very interesting use of threads in id Tech 7.

Fun fact: Doom Eternal does not have a main or render thread. It’s all jobs with one worker thread per core.

It’s easy to see why this structure would be excellent for a large team working on a large game. Job systems are a great way of expressing data dependencies and this helps manage the complexity of reasoning about threads while obtaining nice performance boosts from higher CPU utilization. That said, if you’re a solo developer I don’t think you should run off and structure your game around a job system, simpler architecture is more than enough for simpler games.

Seriously, do not use a render thread

I’m sure some masochists will see this article and decide to just use a render thread anyway.

That sign can’t stop me because I can’t read!

For the rest of you, I hope I have saved you from creating a completely unnecessary technical mess you’ll have to maintain for years to come.

To reiterate: You should not automatically start using threads in your application. Make sure you’re solving a specific problem and that threads are the right solution. “Threads are cool” is not a good reason to use them.