Musings on cross-platform graphics engine architectures – Part 1

Welcome to my blog series on graphics engine architecture!

The motivation for me writing this originates from a project I’ve been working on in my spare time for the past couple of years. Part of said project consists of a graphics engine I built from the ground up, which is set up to support a modern AAA game rendering pipeline.  During its development I’ve kept a couple of core design goals and pillars in mind which have influenced the architecture of the engine significantly. I’d like to share those with you over the course of this series, together with bits and pieces of insight on how to build an API which stays true to these goals.

Because there’s a lot to talk about here I’d like to break this topic down into a few posts, with each entry building upon the last to create a cohesive idea of what I feel good graphics engine design looks like today. Here’s a breakdown:

  • Part 1 – The core engine design goals and pillars (today’s post)
  • Part 2 – Efficient multi-threaded command recording and submission
  • Part 3 – Efficiently working with D3D12 (or Vulkan) concepts
  • Part 4 – Random thoughts that don’t fit in any of the above (actual title TBD)

Part 1 – Core design goals and pillars

Coming up with design goals for a piece of software is not something you do on the spot. They’re usually a set of ideas which form out of experience working on various codebases, taking inspiration from what works and what doesn’t in any given situation. It’s also an exercise in figuring out what you want to do exactly with your software. There’s always some library which does one particular thing better than you do, so don’t try to be the best in every category from the start! Pick your battles wisely, and make sure you’re confident you can win them. Improving on things is always possible through iteration.

The engine I wanted to build had to be a straightforward, lean, no-nonsense piece of software, which could be maintained by just a handful of people (preferably even just one person!). It had to run on modern desktop platforms and APIs, and should try to make the best use of those platforms and APIs as possible. I know this is a relatively large ask, but I felt like it could be done. In planning out and writing this engine, I’ve come up with the following design goals.

Embrace ‘difficult’ concepts introduced in modern graphics APIs

I’m mostly familiar with the DirectX family of APIs, so I’ll explain what I mean by this in DX terms.

In all of the DX12 codebases I’ve worked in, there seems to be a common struggle surrounding concepts such as root signatures, descriptors, pipeline state objects, barriers, fences, command allocators and efficient use of the command list paradigm (to just name a few!). This very often seems to stem from having an existing, often complex, architecture in place built around DX9 or DX11, which somehow needs to be transitioned over to a DX12 back-end. Carrying this legacy around doesn’t do it any favors! Major architectural restructuring might be needed to get any type of benefit out of your new back-end, and this can be incredibly hard to pull off if you’re working in an established live codebase. It’s not just building a plane while you’re flying it; it’s taking the plane apart mid-air and building a new, different plane out of the parts.

Starting a project without this legacy poses an interesting opportunity to embrace these ‘difficult’ concepts and elevate them to core concepts within your API. As an example this could mean that we might not reason about individual shader programs anymore, but think in terms of pipeline states instead. Maybe a root signature is something you now explicitly declare in your data, and instead of binding to shader registers you reason directly about root parameters (or you just roll with bindless!).

The point is to not just use these concepts, but to fully embrace them and let them work for you. All of them were designed to give you some form of advantage over legacy APIs, so don’t try to fight them if you can avoid it.

We’ll be digging into some of the intricacies of this in Part 3.

Make job-based multi-threaded execution a first class citizen

Having some kind of scalable multi-threading solution is a must these days if you want to get the most out of your CPU. Since we’re starting with a clean slate, why not build our engine entirely around this concept? Luckily enough, modern graphics APIs will allow you to do some form of command list recording over multiple threads (deferred contexts in DX11 technically did as well, but we all know how that ended). Embracing this concept in your architecture from the start can make writing scalable multi-threaded graphics code just as straight-forward as traditional single-threaded code (if you have to explicitly use critical sections in your high level rendering code, you’re doing something wrong!). That’s essentially the holy grail of authoring multithreaded code, and it’s absolutely achievable with some careful thought and planning. I luckily also had some help from some really talented people who did a lot of the heavy lifting in writing a fast and sleek job system for another part of the project, so that definitely helps too.

For me the importance lies in completely free-threaded command recording and submission (we’ll discuss this in Part 2) at the lower level, and having a construct at a higher level in which we can write modular graphics code which can easily be distributed among different threads/jobs, and which can easily reason about dependencies between these jobs. For this I drew a lot of inspiration from Yuriy O’Donnell’s excellent talk on frame graphs from GDC 2017. I’ll carve out a little section in Part 3 as well to discuss how I approached a simple and lean multi-threaded render graph implementation.

Define a clear, small API, but be open towards extension

I very much believe in focusing on a minimal viable API and making sure that that minimal set of functionality is well thought out and well tested. Most of these decisions purely stem from working on a decent variety of codebases all at different scales, and learning from what they did well or where they frustrated me. Just like any other complex software library, graphics engines tend to get complicated pretty quickly. Pursuing an architecture which follows the “closed towards change, open towards extensibility” principle seems like a good step to avoid the urge to over-engineer things.

When talking to coworkers about graphics engine design, I always like to use the video player analogy. A video player doesn’t and shouldn’t care about the actual contents of the video you’re playing. All it should care about is building a series of frames as fast as possible out of the data its being given. In addition to that it might do some filtering, scaling, interpolation, etc. all based off of that basic data you feed into it. The same is very much true for the core aspects of a rendering engine. Sure, sometimes you’ll want to have some functionality in which you read back some data your engine has generated for later use (e.g. updating particles or animations in a compute shader, using the last frame’s HDR buffer for screen space reflections, etc), but the core function is still to render and display a sequence of images based on data you feed into it. At its lowest level, a graphics engine shouldn’t ever know about what you’re doing with it at a higher level, so try to avoid leaking those things into your engine design. To me, this is absolutely paramount to keeping an engine small, maintainable and efficient.

Another aspect of this pillar is to design towards a solution which makes integrating new platforms as minimal of a pain as possible, without creating compromises in the form of heavy handed abstraction layers or over-generalizations. It’s important to accept that future-proofing is almost never a viable option, and that platform A will always be different from platform B. It’s tempting to design one general API which tries to cover functionality for all your target platforms or graphics APIs, but in my experience this tends to lead to an over-complicated system which tries to make decisions about things it has absolutely no business making. This means that I might expose just a small core set of functionality which I know all target platforms share, combined with extensions to that functionality on a per-platform basis. It’s totally fine to throw an #ifdef (PLATFORM) or #ifdef (GRAPHICS_API) in your higher level graphics code, if that means you’ll keep the underlying engine small, simple and performant. A small and simple engine will eventually also be much easier to refactor in the future once your core functionality inevitably needs to change. A big inspiration for this was Mike Acton’s famous “Three Big Lies” blog post. If you haven’t read it yet, I highly suggest you do.

Don’t be afraid to fail, but try to fail quickly if you do

I know. This one’s a cliché, but it’s true! If you put in the effort to keep your code lean, mean and easily testable, you’ll be able to adapt designs quickly. If some system doesn’t work the way you want it to, it’s not a huge challenge to refactor it into something that does. This is just universally true for any type of software.

What success or failure means is obviously up to you. For me, performance and usability come to mind. I often try to get other people to use some new systems I’ve come up with as fast as possible to see whether they’re approachable or not. It’s easy to become blind to the flaws of a system you’ve architected and implemented, so get it into the hands of a critical eye as soon as possible. The same goes for performance (although your critical eye can be your profiling tool of choice in this case).



I’ll leave it here for this entry, as it’s getting quite long already. Feel free to leave a comment, or ping me on twitter (@BelgianRenderer, my DMs are open).

Thanks for reading, and see you in part 2!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.