Discovering the structure of RDNA
As the brilliant, tuned-in developer that you are, you are doubtlessly already aware that a little under a month ago AMD released its brand-spanking new …
As the brilliant, tuned-in developer that you are, you are doubtlessly already aware that a little under a month ago AMD released its brand-spanking new …
This is a very short blog post to let everyone know that the RDNA Shader Instruction Set Architecture reference guide is now available. The document …
With this latest incarnation of RGP, we have added support for AMD’s new Radeon RX 5700 and RX 5700 XT ‘Navi’ graphics cards. Since this …
Today, we are excited to announce that we are releasing Cauldron 1.0. Cauldron is a framework for rapid prototyping that will be used in AMD …
Introduction Radeon GPU Analyzer (RGA) 2.2 introduces support for Direct3D 12 compute shaders in a new mode (-s dx12) of the command line tool. You …
On Monday 17th of June we released new version of our graphics driver – 19.6.2. With it we added support for 5 new Vulkan® extensions. …
Introduction This is part 3 of a series of posts on AMD FreeSync™ 2 HDR Technology (FreeSync 2 hereafter!). The first post covered color spaces …
The job of our worldwide developer technology engineers team is to directly help game developers to optimize their games, but also to educate developers by …
Microsoft® PIX is the premiere integrated performance tuning and debugging tool for Windows game developers using DirectX® 12. PIX can enable developers to debug and analyze …
Radeon GPU Profiler 1.5 We previewed the main RGP 1.5 features at GDC 2019 late last month, but didn’t set the release free because it …
Introduction This is part 2 of a series of posts on AMD FreeSync™ 2 HDR Technology (FreeSync 2 hereafter!). The first post covered color spaces …
If you weren’t able to attend GDC this year to catch the Advanced Graphics Techniques Tutorial Day and our Sponsored Sessions in person, or you …
Introduction This is going to be the first in a series of 4 blog posts covering different topics related to AMD FreeSync™ 2 HDR Technology …
Radeon GPU Analyzer (RGA) is our offline compiler and integrated code analysis tool, supporting the high-level shading and kernel languages that are consumed by DirectX® …
San Francisco is the destination for the Game Developers Conference again in 2019, hosting our fine industry at the Moscone Center, March 19th to 23rd. …
Introduction Vulkan Memory Allocator (VMA) is our single-header STB-like library for easily and efficiently managing memory allocation for your Vulkan games and applications. The last …
Foreword This is a guest post from Sebastian Aaltonen, co-founder of Second Order and previously senior rendering lead at Ubisoft®. Second Order published their first …
Radeon GPU Profiler 1.4 While the G in GPU stands for graphics, there are also popular SIMD programming models and associated APIs that map well …
The AMD GPU Services (AGS) library provides game and application developers with the ability to query information about installed AMD GPUs and their driver, in …
We are excited to announce the release of Compressonator v3.1! This version contains several new features and optimizations, including new installers for the SDK, CLI and …
Organised by the fine folks at Wargaming, the 4C conference was held in Prague over 2 days in early October this year, bringing attendees and …
Radeon GPU Profiler 1.3.1 RGP 1.3.1 is a hotfix release to keep compatibility with an upcoming Radeon Adrenalin Edition graphics driver. That driver descends from …
Introduction We released Vulkan Memory Allocator 1.0 (VMA) back in July last year, but we’ve been remiss in posting about the progress of the library …
Radeon GPU Profiler 1.3 First, happy birthday to RGP! We released 1.0 publicly almost exactly a year ago at the time of writing, something I’ve …
There are traditionally just two hard problems in computer science — naming things, cache invalidation, and off-by-1 errors — but I’ve long thought that there …
Adam Sawicki, a member of AMD RTG’s Game Engineering team, has spent the best part of a year assisting one of the world’s biggest game …
If you’ve ever heard the term “context roll” in the context of AMD GPUs — I’ll do that a lot in this post, sorry in …
Microsoft PIX is the premiere integrated performance tuning and debugging tool for Windows game developers using DirectX 12. PIX enables developers to debug and analyze …
With GDC 2018 done and dusted, we thought it’d be valuable to reemphasise that all of the presented content from the Advanced Graphics Techniques Tutorial …
The AMD GPU Services (AGS) library provides game and application developers with the ability to query information about installed AMD GPUs and their driver, in …
Radeon GPU Profiler 1.2 At GDC 2018 we talked about a new version of RGP that would interoperate with RenderDoc, allowing the two tools to …
Compressonator is a set of tools that allows artists and developers to work easily with compressed assets and easily visualize the quality impact of various …
We have posted the version 1.2 update to the TrueAudio Next open-source library to Github. It is available here. This update has a number of …
Vulkan™ is designed to have significantly smaller CPU overhead compared to other APIs like OpenGL®. This is achieved by various means – the API is …
Introduction Half-precision (FP16) computation is a performance-enhancing GPU technology long exploited in console and mobile devices not previously used or widely available in mainstream PC …
Real Time Ray Tracing was one of the hottest topics last week at GDC 2018. In this presentation, AMD Software Development Engineer and architect of Radeon …
The level of visual detail required of CAD models for the automotive industry or the most advanced film VFX requires a level of visual accuracy …
If you’re into the state of the art in games, especially real-time gaming graphics, your eyes will undoubtedly be on Moscone Center in San Francisco, …
The long wait is over. The GPU processing power of TrueAudio Next (TAN) has now been integrated into Steam Audio from Valve (Beta 13 release). …
Radeon GPU Profiler 1.1.1 With GDC 2018 getting ever closer, we wanted to get one last minor release of RGP out before things get hectic …
Radeon GPU Profiler 1.1.0 It feels like just last week that we released Radeon GPU Profiler (RGP) 1.0.3 but my calendar says almost 2 months …
Insights from Enscape as to how they designed a renderer that produces path traced real time global illumination and can also converge to offline rendered image quality
We are excited to announce the release of Compressonator V2.7! This version contains several new features and optimizations, including: Cross Platform Support Due to popular demand, …
Radeon GPU Profiler 1.0.3 A couple of months on from the release of 1.0.2, we’ve fully baked and sliced 1.0.3 for your low-level DX12- and …
The AMD GPU Services (AGS) library provides game and application developers with the ability to query information about installed AMD GPUs and their driver, in …
Due to architectural differences between Zen and our previous processor architecture, Bulldozer, developers need to take care when using the Windows® APIs for processor and core enumeration. …
The AMD GCN Vulkan extensions allow developers to get access to some additional functionalities offered by the GCN architecture which are not currently exposed in the Vulkan API. One of these is the ability to access the barycentric coordinates at the fragment-shader level.
Thanks (again!) Before we dive into a run over the release notes for the 1.0.2 release of Radeon GPU Profiler, we’d like to thank everyone …
Understanding the instruction-level capabilities of any processor is a worthwhile endeavour for any developer writing code for it, even if the instructions that get executed …
An important part of learning the Vulkan API – just like any other API – is to understand what types of objects are defined in it, what they represent and how they relate to each other. To help with this, we’ve created a diagram that shows all of the Vulkan objects and some of their relationships, especially the order in which you create one from another.
Summary In this blog post we are announcing the open-source availability of the Radeon™ ProRender renderer, an implementation of the Radeon ProRender API. We will give …
Introduction and thanks Effective GPU performance analysis is a more complex proposition for developers today than it ever has been, especially given developments in how …
TressFX 4 introduces a number of improvements. This blog post focuses on three of these, all of which are tied to simulation: Bone-based skinning Signed distance …
Full application control over GPU memory is one of the major differentiating features of the newer explicit graphics APIs such as Vulkan® and Direct3D® 12. …
We are excited to announce the release of Compressonator V2.6. This version contains several new features and optimizations, including: Adaptive Format Conversion for general transcoding operations …
When getting a new piece of hardware, the first step is to install the driver. You can see how to install them for the Radeon …
In this blog we will go through the installation process of the driver for your new Radeon Vega Frontier card. We will go through the …
When using a compute shader, it is important to consider the impact of thread group size on performance. Limited register space, memory latency and SIMD occupancy each affect shader performance in different ways. This article discusses potential performance issues, and techniques and optimizations that can dramatically increase performance if correctly applied.
The AMD Developer Tools team is thrilled to announce the availability of the AMD plugin for Microsoft’s PIX for Windows tool. PIX is a performance …
A new version of the CodeXL open-source developer tool is out! Here are the major new features in this release: CPU Profiling Support for AMD …
When it comes to multi-GPU (mGPU), most developers immediately think of complicated Crossfire setups with two or more GPUs and how to make their game …
Introduction Shortly after our Capsaicin and Cream event at GDC this year where we unveiled Radeon RX Vega, we hosted a developer-focused event designed to …
BC6 HDR Compression The BC6H codec has been improved and now offers better quality then previous releases, along with support for both 16 bit Half …
This article explains how to use Radeon GPU Analyzer (RGA) to produce a live VGPR analysis report for your shaders and kernels. Basic RGA usage …
I’m Mike Schmit, Director of Software Engineering with the Radeon Technologies Group at AMD. I’m leading the development of a new open-source 360-degree video-stitching framework …
AMD LiquidVR MultiView Rendering in Serious Sam VR with the GPU Services (AGS) Library AMD’s MultiView Rendering feature reduces the number of duplicated object draw …
In 2016, AMD brought TrueAudio Next to GameSoundCon. GameSoundCon was held Sept 27-28 at the Millennium Biltmore Hotel in Los Angeles. GameSoundCon caters to game …
Budgeting, measuring and debugging video memory usage is essential for the successful release of game titles on Windows. As a developer, this can be efficiently achieved with the …
Another year, another Game Developer Conference! GDC is held earlier this year (27 February – 3 March 2017) which is leaving even less time for …
With the launch of AGS 5.0 developers now have access to the shader compiler control API. Here’s a quick summary of the how and why…. Background …
There are many games out there taking place in vast environments. The basic building block of every environment is height-field based terrain – there’s no …
Understanding concurrency (and what breaks it) is extremely important when optimizing for modern GPUs. Modern APIs like DirectX® 12 or Vulkan™ provide the ability to …
Summary Many Gaming and workstation laptops are available with both (1) integrated power saving and (2) discrete high performance graphics devices. Unfortunately, 3D intensive application …
This post is taking a look at some of the interesting bits of helping id Software with their DOOM® Vulkan™ effort, from the perspective of …
This blog is guest authored by Croteam developer Karlo Jez and he will be giving us a detailed look at how Affinity Multi-GPU support was …
When opening a 64-bit crash dump you will find that you will not necessarily get a sensible call stack. This is because 64-bit crash dumps …
Vulkan™’s barrier system is unique as it not only requires you to provide what resources are transitioning, but also specify a source and destination pipeline …
This is the third post in the follow up series to my prior GDC talk on Variable Dynamic Range. Prior posts covered dithering, today’s topic …
Virtual desktop infrastructure systems and cloud gaming are increasingly gaining popularity thanks to an ever more improved internet infrastructure. This gives more flexibility to the …
As noted in my previous blog, new innovations in virtual reality have spearheaded a renewed interest in audio processing, and many new as well as …
This week marks the last in the series of our regular Warhammer Wednesday blog posts. We’d like to extent our thanks to Creative Assembly’s Lead …
Audio Must be Consistent With What You See Virtual reality demands a new way of thinking about audio processing. In the many years of history …
Happy Warhammer Wednesday! This week Creative Assembly’s Lead Graphics Programmer Tamas Rabel talks about how Total War: Warhammer utilized asynchronous compute to extract some extra …
It’s Wednesday, so we’re continuing with our series on Total War: Warhammer. Here’s Tamas Rabel again with some juicy details about how Creative Assembly brought …
A new release of the CodeXL open-source developer tool is out! Here’s the hot new stuff in this release: New platforms support Support Linux systems …
We’re back again on this fine Warhammer Wednesday with more from Tamas Rabel, Lead Graphics Programmer on the Total War series. In last week’s post …
For the next few weeks we’ll be having a regular feature on GPUOpen that we’ve affectionately dubbed “Warhammer Wednesdays”. We’re extremely lucky to have Tamas Rabel, …
Game engines do most of their shading work per-pixel or per-fragment. But there is another alternative that has been popular in film for decades: object …
EDIT: 2016/08/08 – Added section on Targeting Low-Memory GPUs This post serves as a guide on how to best use the various Memory Heaps and …
Before Direct3D® 12 and Vulkan™, resources were bound to shaders through a “slot” system. Some of you might remember when hardware did have only very …
Multi-GPU systems are much more common than you might think. Most of the time, when someone mentions mGPU, you think about high-end gaming machines with …
Compressonator is a set of tools to allow artists and developers to more easily create compressed texture image assets and easily visualize the quality impact …
Prior to explicit graphics APIs a lot of draw-time validation was performed to ensure that resources were synchronized and everything set up correctly. A side-effect of this robustness …
Direct3D® 12 and Vulkan™ significantly reduce CPU overhead and provide new tools to better use the GPU. For instance, one common use case for the …
As promised, we’re back and today I’m going to cover how to get resources to and from the GPU. In the last post, we learned …
A new CodeXL release is out! For the first time the AMD Developer Tools group worked on this release on the CodeXL GitHub public repository, …
Today, we are excited to announce that we are releasing an update for ShadowFX that adds support for DirectX® 12. Features Different shadowing modes Union of …
Achieving high performance from your Graphics or GPU Compute applications can sometimes be a difficult task. There are many things that a shader or kernel …
The GCN architecture contains a lot of functionality in the shader cores which is not currently exposed in current APIs like Vulkan™ or Direct3D® 12. One …
A Complete Tool to Transform Your Desktop Appearance After introducing our Display Output Post Processing (DOPP) technology, we are introducing a new tool to change …
Compaction is a basic building block of many algorithms – for instance, filtering out invisible triangles as seen in Optimizing the Graphics Pipeline with Compute. …
We are releasing TressFX 3.1. Our biggest update in this release is a new order-independent transparency (OIT) option we call “ShortCut”. We’ve also addressed some of …
Today’s update for GeometryFX introduces cluster culling. Previously, GeometryFX worked on a per-triangle level only. With cluster culling, GeometryFX is able to reject large chunks …
Full-speed, out-of-order rasterization If you’re familiar with graphics APIs, you’re certainly aware of the API ordering guarantees. At their core, these guarantees mean that if …
A New Milestone After the success of the first version, FireRays is moving to another major milestone. We are open sourcing the entire library which …
Last week, we organized a two hours-long talk at University of Lodz in Poland where we discussed the most common mistakes we come across in Vulkan applications. Dominik Witczak, …
We are very pleased to be announcing that AMD is open-sourcing one of our most popular tools and SDKs. Compressonator (previously released as AMD Compress …
Gaming at optimal performance and quality at high screen resolutions can sometimes be a demanding task for a single GPU. 4K monitors are becoming mainstream and gamers …
If you have supported Crossfire™ or Eyefinity™ in your previous titles, then you have probably already used our AMD GPU Services (AGS) library. A lot of …
Resource creation and management has changed dramatically in Direct3D® and Vulkan™ compared to previous APIs. In older APIs, memory is managed transparently by the driver. …
CodeXL major release 2.0 is out! It is chock-full of new features and a drastic change in the CodeXL development model: CodeXL is now open …
The prior post in this series established a base technique for adding grain, and now this post is going to look at very subtle changes to …
Welcome back to our performance & optimization series. Today, we’ll be looking more closely at shaders. On the surface, it may look as if they …
This is the first of a series of posts expanding on the ideas presented at GDC in the Advanced Techniques and Optimization of VDR Color …
The Game Developer Conference 2016 was an event of epic proportions. Presentations, tutorials, round-tables, and the show floor are only one part of the story …
This post describes how GCN hardware coalesces memory operations to minimize traffic throughout the memory hierarchy. The post uses the term “invocation” to describe one …
Bandwidth is always a scarce resource on a GPU. On one hand, hardware has made dramatic improvements with the introduction of ever faster memory standards …
Vulkan™ provides unprecedented control to developers over generating graphics and compute workloads for a wide range of hardware, from tiny embedded processors to high-end workstation GPUs with wildly different …
The Game Developer Conference 2016 (GDC16) is held March 14-18 in the Moscone Center in San Francisco. This is the most important event for game developers, …
Welcome back to our DX12 series! Let’s dive into one of the hottest topics right away: synchronization, that is, barriers and fences! Barriers A barrier is …
Vulkan™ is a high performance, low overhead graphics API designed to allow advanced applications to drive modern GPUs to their fullest capacity. Where traditional APIs …
Imagine that you were asked one day to design an API with bleeding-edge graphics hardware in mind. It would need to be as efficient as …
Hello and welcome to our series of blog posts covering performance advice for Direct3D® 12 & Vulkan™. You may have seen the #DX12PerfTweets on Twitter, and …
For GPU-side dynamically generated data structures which need 3D spherical mappings, two of the most useful mappings are cubemaps and octahedral maps. This post explores …
I have met enough game developers in my professional life to know that these guys are among the smartest people on the planet. Those particular individuals will go …
About CodeXL Analyzer CLI CodeXL Analyzer CLI is an offline compiler and performance analysis tool for OpenCL™ kernels, DirectX® shaders and OpenGL® shaders. Using CodeXL …
GPU PerfStudio supports DirectX® 12 on Windows® 10 PCs. The current tool set for DirectX 12 comprises of an API Trace, a new GPU Trace …
Today we’re going to take a look at how asynchronous compute can help you to get the maximum out of a GPU. I’ll be explaining …
What’s New With the recent adoption of new APIs such as DirectX® 12 and Vulkan™, we are seeing renewed interest in an older tool. AMD …
A typical problem with MSAA Resolve mixed with HDR is that a single sample with a large HDR value can over-power all other samples, resulting …
Budgeting, measuring and debugging video memory usage is essential for the successful release of game titles on Windows. As a developer, this can be efficiently achieved with the help of Microsoft®’s Windows® Performance Analyzer (WPA) tool and with a general understanding of how video resources are managed by the operating system.
Windows Performance Analyzer is part of the Windows Performance Toolkit, which is part of the Windows 10 SDK. When installing the Windows 10 SDK, make sure the Windows Performance Toolkit box is checked. The typical file path for the install is C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\. In that folder there is a perfcore.ini file that will need to be edited to enable video the GPU segment usage tab, which is the one needed for video memory profiling. Perfcore.ini contains a list of .dll files and perf_dx.dll needs to be added to the list.
By default, a log.cmd script for triggering capturing is included with the installation. When started, activity is captured for the entire machine, so all processes are included. Administrator privileges are required when running this script. To exemplify, typical steps are:
Profiling outputs a large amount of data, so to deal with the size it’s ideal to have WPA installed on a SSD (since that’s also the output folder). Also it’s ideal to work with Merged.etl files that are under 1GB by limiting capturing time. The Merged.etl files can be opened for analysis with either WPA or GPUView (or both at the same time).
WPA is the default app associated with the .etl file extension, so to open up the Merged.etl captures it’s sufficient to double click them in a file browser. Once the capture finishes loading, there should be a Graph Explorer tab on the left of the WPA window, containing categories of graphs. If Perfcore.ini has been setup correctly, under the Video drop down there will be a GPU Segment Usage graph. To open it, double click on it and an Analysis Window should open, which can be maximized.
The default layout of ‘GPU Segment Usage’ chart. (click to open in a separate window)
Under the graph there is a table containing video allocations. Each table line corresponds to either a video allocation or a dropdown under which allocations are grouped. Each table column corresponds to an allocation attribute. The columns are split into areas separated by colored table lines.
To the left of the yellow line, there are the attributes by which the allocations are grouped by, in order of grouping from left to right. By default, allocations are grouped first by Segment Type, second by Adapter and third by Segment Id.
To the right of the blue line are the size and legend color columns. The size column shows individual allocation sizes for allocation lines and the sum of all allocations belonging to a dropdown for the dropdown lines. The legend column displays the associated graph color of a given grouping in a solid box. If only the box outline is displayed, that means that particular grouping isn’t plotted on the graph. Clicking on the color box under the legend column will toggle the graph visibility for individual groupings.
In between the yellow and blue lines are the regular data columns, displaying the attributes of individual allocations. These columns display no information for allocation groupings.
The default grouping does not work very well, but it is very easy to change. Typically a more useful grouping would be to group first by Adapter, second by Segment Type and third by Process. This can be done by dragging the columns into their proper place with the mouse. The Segment Type column can be removed by right clicking on the table header and unchecking the appropriate box.
Since the groupings are so easy to reconfigure, it helps to do so often when analyzing, in order to easily navigate the table. Another useful tip is to Filter To Selection: for example if it is apparent that processes other than the profiled app are contributing insignificantly to GPU segment usage, the groupings can be changed to Adapter, Process and Segment ID, expand the Adapter tab for the desired adapter and then right click on the target process and select Filter To Selection. This will hide every allocation that is not associated with that particular adapter and process.
Once groupings are reconfigured, the Legend tab typically needs setup to be able to display meaningful graphs.
In this Civilization VI capture, customized grouping allows listing all the evicted resources belonging to the target app. There is an individual allocation selected (blue background line) and WPA conveniently highlights the allocation’s lifetime on the graph’s timeline. (click to open in a separate window)
For AMD, there are three GPU segments:
In addition, WPA lists evicted resources under segment ID -1. One of the reasons for resources getting evicted is when they have a specific need to be in certain segments, but there is no more space left in those segments. For instance, for some resources, various restrictions could require them to be in Local Visible when used. If too many such resources are competing for space, the video memory manager will evict some of them to free up space. This can result in thrashing patterns as the evicted resources need to be moved back in. Resources getting copied around the various segment IDs manifest as stutters/spikes in your app.
The evicted segment is the only one that is identifiable just by segment ID, as it’s always -1. The actual GPU segments are not easy to match to segment IDs, so the first step when analyzing GPU segment usage with WPA is always to identify which segment ID in a profile matches which GPU segment.
The total size of all allocations on a particular segment at a single moment in time, particularly at the peak of usage, is a very important bit of information needed in order to quickly identify the segments. However, the Size column will display the sum of all allocations across the entire time interval visible in the graph (which by default is for the life of the entire profile), so that information is not useful for understanding peak capacity. The graph actually holds the information of interest, as it charts the total allocation size across time. So to begin, for each segment ID, toggle the Legend column colors such that only individual segment get charted at a time and retrieve the size information for each segment from the graph. In case a segment sees a large variation in total capacity used, it’s the peak usage that is most interesting for identifying the segment.
Only the ‘Local Invisible’ segment is charted for this profile of a Civilization VI loading screen, by configuring the ‘Legend’ column. The segment usage increases monotonically as more graphics resources are loaded, so the usage peak can be found at the end of the timeline. (click to open in a separate window)
Rough steps for identifying the segments are:
Once the segments have been identified, the obvious thing to determine is whether any oversubscription is present. Oversubscription, in either Local Visible or Local Invisible has two negative effects: resources can end up in a suboptimal segment (for example a texture could end up being used from system instead of local, causing perf issues) and thrashing can cause stutters and spikes as the resources get copied around.
To check for stutters due to oversubscription, look at the Evicted segment and sort the allocations by the Start Time column. A number of allocations should already have been evicted when profiling begun, so the first lines in the table should display 0 as the Start Time. All the other evicted allocations have been evicted during the profiling itself. Evaluate the size and frequency of these evictions to determine if they could be an issue. If these evictions are large or happen often, look at the other segment and match the Start Time in the Evicted segment with an End Time in a source segment to find out where those resources have been evicted from. Some resources can start as evicted and then get transferred into a segment, completely unrelated to oversubscription. These typically reside in Evicted for very brief periods of time so they can be recognized by noticing the End Time and Start Time delta is very small (roughly around 0.001ms).
To check for resources ending up in suboptimal segments (like a resource that would be preferred for Local Invisible ending up in System), check if the peak usage of either of the local segments is close to capacity. If so, for Local Invisible the easiest way to confirm oversubscription is to install a GPU with higher VRAM than your app budget. For example, if you’re worried about oversubscription on a 4GB Fury X, profile on an 8GB Radeon RX 480 instead. Compare the peak usage of the System and Local Visible segments across the two cards and if the 4GB card uses more System and/or Local Visible (and is also close to capacity on Local Invisible), this means Local Invisible is oversubscribed on the 4GB card. If a higher VRAM capacity card is not available or if it’s Local Visible that’s close to capacity, compare the target segment’s graph with the other segments and try to find correlation in usage changes between segments (try determining it resource from the target segment are being moved to other segments or evicted).
Ultimately, check the System and Evicted segments for contents. If there is large total usage of these segments, that alone is cause for concern, since this implies either oversubscription, leaked resources, rarely used resources or other issues with the way video memory is managed. If large, the resources in those segments should be identified.
The common remedy for video memory oversubscription/thrashing is to optimize the app to use less video memory. For this to be possible, it is necessary to be able to breakdown video memory usage into categories for budgeting and also to be able to identify the WPA allocations (understand what app resources they are associated with).
For identifying categories of resources, exporting all the allocations to a spreadsheet editor helps. In order to do so, group all the allocations needing export into a single dropdown in WPA. It’s generally useful to export all the allocations belonging to a process, so in WPA the columns can be configured such that only the Process column is left of the yellow line. To actually export, select all the allocations under the target process (shift-click), right click on them and select Copy Selection. The cells can then be pasted in a spreadsheet editor.
To get a good overview and rough breakdown of resources, create a pivot table that groups resources first by Height, then by Width. Have the pivot table count the resources in each category and also sum the resource sizes. This pivot table can be sorted by the sum of resource sizes to quickly show which the most significant resource categories are.
This pivot table buckets all of the app’s graphics resources in ‘Local Invisible’ by dimension. The largest total usage in this case is caused by resources of the same dimension as the screen resolution, this could be optimized by using less than the 6 screen-sized RTs, if possible. Non-power of two texture usage also indicates at least 100MB can be easily saved by aligning those textures to power of 2 sizes. (click to open in a separate window)
Chances are, the most relevant categories are:
To better understand video memory usage, it’s useful to create additional pivot tables or use additional filtering (by segment, format, flags etc.) One useful example is to create a pivot table of only small resources. For DirectX 11, these can be all resources smaller than the suballocation size (usually 32KB). For DirectX 12, an arbitrary size can be chosen (or multiple sizes for additional pivot tables). Small resources should be insignificant in total size in DirectX 11, since most of them should be automatically suballocated by the driver. For DirectX 12 it is up to the app to suballocate most small resources, so this is a good way to check that the profiled allocations match expected app behavior.
Other textures may incur padding as well as non-power-of-two, especially on higher end video cards where the wide memory buses require larger alignment to deliver maximum performance – this extra padding is one of the largest differences between typical discrete GPUs and integrated and other unified memory solutions where the bus is not so wide. Keep an eye out for allocation sizes that are significantly larger than the image data. The spreadsheet table can be set up to calculate the expected size and compare this with the actual size. If there is significant waste, options for reducing this overhead include combining small textures into larger atlases and adjusting the bind flags or texture format to give the driver options to switch to smaller padding sizes (for example, a rendertarget or depth buffer is more likely to incur padding overhead than a texture). Formats with smaller texel sizes generally require less padding than those with larger sizes, with the exception of the BC formats (4bpp BC formats are equivalent to 64-bit texels, and 8bpp BC formats are equivalent to 128-bit texels).
There are many features and usage cases to WPA and this can make the tool difficult to approach. This post provides an easier introduction to this essential tool, allowing the readers to build on this knowledge to further explore on their own. I hope this has improved the understanding of both the hardware and driver model and that this knowledge will be put to good use by developers in delivering high performance graphics.
Really nice guide. Helped me a lot when trying to profile video memory usage for one of our graphic intense applications. Thanks for making this information available to the public!