Blog

# Debugging D3D12 driver crash

Wed
12
Sep 2018

New generation, explcit graphics APIs (Vulkan and DirectX 12) are more efficient, involve less CPU overhead. Part of it is that they don't check most errors. In old APIs (Direct3D 9, OpenGL) every function call was validated internally, returned success of failure code, while driver crash indicated a bug in driver code. New APIs, on the other hand, rely on developer doing the right thing. Of course, some functions still return error code (especially ones that allocate memory or create some resource), but those that record commands into a command list just return void. If you do something illegal, you can expect undefined behavior. You can use Validation Layers / Debug Layer to do some checks, but otherwise everything may work fine on some GPUs, you may get incorrect result, or you may experience driver crash or timeout (called "TDR"). Good thing is that (contrary to old Windows XP), crash inside graphics driver doesn't cause "blue screen of death" or machine restart. System just restarts graphics hardware and driver, while your program receives DXGI_ERROR_DEVICE_REMOVED code from one of functions like IDXGISwapChain::Present. Unfortunately, you then don't know which specific draw call or other command caused the crash.

NVIDIA proposed solution for that: they created NVIDIA Aftermath library. It lets you (among other things) record commands that write custom "marker" data to a buffer that survives driver crash, so you can later read it and see which command was successfully executed last. Unfortunately, this library works only with NVIDIA graphics cards.

Some time ago I showed a portable solution for Vulkan in my post: "Debugging Vulkan driver crash - equivalent of NVIDIA Aftermath". Now I'd like to present a solution for Direct3D 12. It turns out that this API also provides a standardized way to achieve this, in form of a method ID3D12GraphicsCommandList2::WriteBufferImmediate. One caveat: This new version of the interface requires:

User to have at least Windows 10 Fall Creators Update.
Developer to have Windows SDK in version at least for Windows 10 Fall Creators Update.
Developer to use Visual Studio 2017 - required by this version of Windows SDK.

I created a simple library that implements all the required logic under easy interface, which I called D3d12AfterCrash. You can find all the details and instruction for how to use it in file "D3d12AfterCrash.h".

I guess it would be better to allocate the buffer using WinAPI function VirtualAlloc(NULL, bufferSize, MEM_COMMIT, PAGE_READWRITE), then call ID3D12Device3::OpenExistingHeapFromAddress and ID3D12Device::CreatePlacedResource, but my simple way of just doing ID3D12Device::CreateCommittedResource seems to work - buffer survives driver crash and preserves its content. I checked it on AMD as well as NVIDIA card.

Comments | #directx #graphics #libraries #productions Share

# Macro with current function name - __func__ vs __FUNCTION__

Tue
11
Sep 2018

Today, while programming in C++, I wanted to write an assert-like macro that would throw an exception when given condition is not satisfied. I wanted to include as much information as possible in the message string. I know that condition expression, which is argument of my macro, can be turned into a string by using # preprocessor operator.

Next, I searched for a way to also obtain name of current function. At first, I found __func__, as described here (C++11) and here (C99). Unfortunately, following code fails to compile:

#define CHECK(cond) if(!(cond)) { \
    throw std::runtime_error("ERROR: Condition " #cond " in function " __func__);

void ProcessData()
{
    CHECK(itemCount > 0); // Compilation error!
    // (...)
}

This is because this identifier is actually an implicit local variable static const char __func__[] = "...".

Then I recalled that Visual Studio defines __FUNCTION__ macro, as custom Microsoft extension. See documentation here. This one works as I expected - it can be concatenated with other strings, because it's a string literal. Following macro definition fixes the problem:

#define CHECK(cond) if(!(cond)) \
    { throw std::runtime_error("ERROR: Condition " #cond " in function " __FUNCTION__); }

When itemCount is 0, exception is thrown and ex.what() returns following string:

ERROR: Condition itemCount > 0 in function ProcessData

Well... For any experienced C++ developer, it should be no surprise that C++ standard committee comes up with solutions that are far from being useful in practice :)

Comments | #c++ Share

# Operations on power of two numbers

Sun
09
Sep 2018

Numbers that are powers of two (i.e. 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 and so on...) are especially important in programming, due to the way computers work - they operate on binary representation. Sometimes there is a need to ensure that certain number is power of two. For example, it might be important for size and alignment of some memory blocks. This property simplifies operations on such quantities - they can be manipulated using bitwise operations instead of arithmetic ones.

In this post I'd like to present efficient algorithms for 3 common operations on power-of-2 numbers, in C++. I do it just to gather them in one place, because they can be easily found in many other places all around the Internet. These operations can be implemented using other algorithms as well. Most obvious implementation would involve a loop over bits, but that would give O(n) time complexity relative to the number of bits in operand type. Following algorithms use clever bit tricks to be more efficient. They have constant or logarithmic time and they don't use any flow control.

1. Check if a number is a power of two. Examples:

IsPow2(0)   == true (!!)
IsPow2(1)   == true
IsPow2(2)   == true
IsPow2(3)   == false
IsPow2(4)   == true
IsPow2(123) == false
IsPow2(128) == true
IsPow2(129) == false

This one I know off the top of my head. The trick here is based on an observation that a number is power of two when its binary representation has exactly one bit set, e.g. 128 = 0b10000000. If you decrement it, all less significant bits become set: 127 = 0b1111111. Bitwise AND checks if these two numbers have no bits set in common.

template <typename T> bool IsPow2(T x)
{
    return (x & (x-1)) == 0;
}

2. Find smallest power of two greater or equal to given number. Examples:

NextPow2(0)   == 0
NextPow2(1)   == 1
NextPow2(2)   == 2
NextPow2(3)   == 4
NextPow2(4)   == 4
NextPow2(123) == 128
NextPow2(128) == 128
NextPow2(129) == 256

This one I had in my library for a long time.

uint32_t NextPow2(uint32_t v)
{
    v--;
    v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
    v |= v >> 16;
    v++;
    return v;
}
uint64_t NextPow2(uint64_t v)
{
    v--;
    v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
    v |= v >> 16; v |= v >> 32;
    v++;
    return v;
}

3. Find largest power of two less or equal to given number. Examples:

PrevPow2(0) == 0
PrevPow2(1) == 1
PrevPow2(2) == 2
PrevPow2(3) == 2
PrevPow2(4) == 4
PrevPow2(123) == 64
PrevPow2(128) == 128
PrevPow2(129) == 128

I needed this one just recently and it took me a while to find it on Google. Finally, I found it in this post on StackOveflow.

uint32_t PrevPow2(uint32_t v)
{
    v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
    v |= v >> 16;
    v = v ^ (v >> 1);
    return v;
}
uint64_t PrevPow2(uint64_t v)
{
    v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8;
    v |= v >> 16; v |= v >> 32;
    v = v ^ (v >> 1);
    return v;
}

Update 2018-09-10: As I've been notified on Twitter, C++20 is also getting such functions as standard header <bit>.

Comments | #math #c++ #algorithms Share

# Iteration time is everything

Thu
06
Sep 2018

I still remember Demobit 2018 in February in Bratislava, Slovakia. During this demoscene party, one of the talks was given by Matt Swoboda "Smash", author of Notch. Notch is a program that allows to create audio-visual content, like demos or interactive visual shows accompanying concerts, in a visual way - by connecting blocks, somewhat like blueprints in Unreal Engine. (The name not to be confused with nickname of the author of Minecraft.) See also Number one / Another one by CNDC/Fairlight - latest demo made in it.

During his talk, Smash referred to music production. He said that musicians couldn't imagine working without a possibility to instantly hear the effect of changes they make to their project. He said that graphics artists deserve same level of interactivity - WYSIWYG, instant feedback, without a need for a lengthy "build" or "render". That's why Notch was created. Then I thought: What about programmers? Don't they deserve it too? Shorter iteration times mean better work efficiency and higher quality of the result. Meanwhile, a programmer sometimes has to wait minutes or even hours to be able to test a change in his code, no matter how small it is. I think it's a big problem.

This is exactly what I like about development of desktop Windows applications and games: they can usually be built, ran, and tested locally within few seconds. Same applies to games made in Unity and Unreal Engine - developer can usually hit "Play" button and quickly test his gameplay. It is often not the case with development for smaller devices (like mobile or embedded) or larger (like servers/cloud).

I think that iteration time - time after which we can observe effects of our changes - is critical for developers' work efficiency, as well as their well-being. We programmers should demand better tools. All of us - including low-level C and C++ programmers. Currently we are at the good position in the job market so we can choose companies and projects to work on. Let's use it and vote with our feet. Decision makers and architects of software/hardware platforms may think that developers are smart, so they can work efficiently even in harsh conditions. They forget that wasting developers' precious time means wasting a lot of money, not to mention their frustration. Creating better tools is an investment that will pay off.

Now, whenever I get a job offer for a developer position, I ask two simple questions:

1. What is the typical iteration time, from the moment when I change something in the code, through compilation, deployment, application launch and loading, until I can observe the effect of my change? If the answer is: "Usually it's just a matter of few seconds. Files you changed are recompiled, then launching the app takes few seconds and that's it." - that's fine. But if the answer is more like: "Well, the whole project needs to be rebuilt. You don't do it locally. You shelve your changes in Perforce so that build server picks it and makes the build. The build is then deployed to the target device, which then needs to reboot and load your app. It takes 15-20 minutes." - then it's a NOPE for me.

2. How do you debug the application? Can you make experiments by setting up breakpoints and watching variables in a convenient way? If the answer is: "Yes, we have debugger nicely integrated with Visual Studio/WinDBG/Eclipse/other IDE and we debug whenever we see a problem." - that's fine. But when I hear: "Well, command-line GDB should work with this environment, but to be honest, it's so hard to setup that no one uses it here. We just put debug console prints in the code and recompile it whenever we want to make a debug experiment." - then that's a red light for me.

Comments | #career #tools #philosophy Share

# Vulkan Memory Allocator 2.1.0

Tue
28
Aug 2018

Yesterday I merged changes in the code of Vulkan Memory Allocator that I've been working on for past few months to "master" branch, which I consider a major milestone, so I marked it as version 2.1.0-beta.1. There are many new features, including:

Added linear allocation algorithm, accessible for custom pools, that can be used as free-at-once, stack, double stack, or ring buffer.
Added feature to record sequence of calls to the library to a file and replay it using dedicated application.
Improved support for non-coherent memory.
Improved debug features related to detecting incorrect mapped memory usage.
Changed format of JSON dump to include more information and allow better coloring in VmaDumpVis.

The release also includes many smaller bug fixes, improvements and additions. Everything is tested and documented. Yet I call it "beta" version, to encourage you to test it in your project and send me your feedback.

Comments | #vulkan #libraries #productions #graphics Share

# str_view - null-termination-aware string-view class for C++

Sun
19
Aug 2018

tl;dr I've written a small library, which I called "str_view - null-termination-aware string-view class for C++". You can find code and documentation on GitHub - sawickiap/str_view. Read on to see full story behind it...

Let me disclose my controversial beliefs: I like C++ STL. I think that any programming language needs to provide some built-in strings and containers to be called modern and suitable for developing large programs. But of course I'm aware that careless use of classes like std::list or std::map makes program very slow due to large number of dynamic allocations.

What I value the most is RAII - the concept that memory is automatically freed whenever an object referenced by value is destroyed. That's why I use std::unique_ptr all over the place in my personal code. Whenever I create and own an array, I use std::vector, but when I just pass it to some other code for reading, I pass raw pointer and number of elements - myVec.data() and myVec.size(). Similarly, whenever I own and build a string, I use std::string (or rather std::wstring - I like Unicode), but when I pass it somewhere for reading, I use raw pointer.

There are multiple ways a string can be passed. One is pointer to first character and number of characters. Another one is pointer to first character and pointer to the next after last character - a pair of iterators, also called range. These two can be trivially converted between each other. Out of these, I prefer pointer + length, because I think that number of characters is slightly more often needed than pointer past the end.

But there is another way of passing strings common in C and C++ programs - just one pointer to a string that needs to be null-terminated. I think that null-terminated strings is one of the worst and the most stupid inventions in computer science. Not only it limits set of characters available to be used in string content by excluding '\0', but it also makes calculation of string length O(n) time complexity. It also creates opportunity for security bugs. Still we have to deal with it because that's the format that most libraries expect.

I came up with an idea for a class that would encapsulate a reference to an externally-owned, immutable string, or a piece of thereof. Objects of such class could be used to pass strings to library functions instead of e.g. a pointer to null-terminated string or a pair of iterators. They can be then queried for length(), indexed to access individual characters etc., as well as asked for a null-terminated copy using c_str() method - similar to std::string.

Code like this already exists, e.g. C++17 introduces class std::string_view. But my implementation has a twist that I'm quite happy with, which made me call my class "null-termination-aware". My str_view class not only remembers pointer and length of the referred string, but also the way it was created to avoid unnecessary operations and lazily evaluate those that are requested.

If it was created from a null-terminated string:
- c_str() trivially returns pointer to the original string.
- Length is unknown and it is calculated upon first call to length().
On the other hand, if it was created from a string that is not null-terminated:
- Length is explicitly known, so length() trivially returns it.
- c_str() creates a local, null-terminated copy of the string upon first call.

If you consider such class useful in your C++ code, see GitHub - sawickiap/str_view project for code (it's just a single header file), documentation, and extensive set of tests. I share this code for free, on MIT license. Feel free to contact me if you find any bugs or have any suggestions regarding this library.

Comments | #productions #libraries #c++ Share

# Porting your engine to Vulkan or DX12 - video from my talk

Fri
06
Jul 2018

Organizers of Digital Dragons conference published video recording of my talk "Porting your engine to Vulkan or DX12":

PowerPoint slides are also available for download here: Porting your engine to Vulkan or DX12 - GPUOpen.

Comments | #vulkan #directx #teaching #events #video Share

# Vulkan layers don't work? Look at registry.

Fri
06
Jul 2018

If you program using Vulkan on Windows and you see that Vulkan layers stopped working (especially after updating graphics driver or Vulkan SDK), first thing to try is to uninstall Vulkan SDK and install it again. Also restart your computer, to be sure that environmental variables are updated. But sometimes it doesn't help or it even makes things worse.

If you are still not able to successfully find and enable VK_LAYER_LUNARG_standard_validation in your code, or you try to enable VK_LAYER_LUNARG_api_dump using environmental variable but can't see it working, or have problem with any other layer, first thing you can try is to issue vulkaninfo console command. If you see some errors about "JSON", it clearly indicates that there is a problem with configuration of Vulkan in your system, regarding paths to layer DLLs and their JSON descriptions.

Either way, the thing I recommend is to launch regedit (Registry Editor) and check values in following keys:

HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\ExplicitLayers
HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\ImplicitLayers

They should contain paths to valid JSON files that describe installed Vulkan layer DLLs. "ExplicitLayers" is the list of layers that can be manually enabled either programatically via VkInstanceCreateInfo::ppEnabledLayerNames, or via environmental variable VK_INSTANCE_LAYERS. "ImplicitLayers" is the list of layers loaded automatically by each Vulkan app (e.g. one added by Steam or RenderDoc). For further details see article: "Vulkan Validation and Debugging Layers".

It looks that the installer of Vulkan SDK can mess up these registry entries. Yesterday, after uninstalling old SDK and installing new one, I found there entries from both versions. Today, my colleague found these registry keys completely empty. So whenever you have problems with Vulkan layers on Windows, take a look at the registry.

Update 2018-08-27: There is Issue #38 on GitHub - Vulkan-Ecosystem about this.

Comments | #vulkan Share

Older entries

Blog

Twitter

Bookmarks

LinkedIn

Blog Tags