Radeon GPU Profiler 1.5.1
Radeon GPU Profiler 1.5 We previewed the main RGP 1.5 features at GDC 2019 late last month, but didn’t set the release free because it …
Radeon GPU Profiler 1.5 We previewed the main RGP 1.5 features at GDC 2019 late last month, but didn’t set the release free because it …
Introduction This is part 2 of a series of posts on AMD FreeSync™ 2 HDR Technology (FreeSync 2 hereafter!). The first post covered color spaces …
If you weren’t able to attend GDC this year to catch the Advanced Graphics Techniques Tutorial Day and our Sponsored Sessions in person, or you …
Introduction This is going to be the first in a series of 4 blog posts covering different topics related to AMD FreeSync™ 2 HDR Technology …
OCAT is our open source capture and analytics tool, designed to help game developers and performance analysts dig into the details of how the GPU …
Radeon GPU Analyzer (RGA) is our offline compiler and integrated code analysis tool, supporting the high-level shading and kernel languages that are consumed by DirectX® …
San Francisco is the destination for the Game Developers Conference again in 2019, hosting our fine industry at the Moscone Center, March 19th to 23rd. …
Introduction Vulkan Memory Allocator (VMA) is our single-header STB-like library for easily and efficiently managing memory allocation for your Vulkan games and applications. The last …
Foreword This is a guest post from Sebastian Aaltonen, co-founder of Second Order and previously senior rendering lead at Ubisoft®. Second Order published their first …
OCAT is our open source capture and analytics tool, designed to help game developers and performance analysts dig into the details of how the GPU …
Radeon GPU Profiler 1.4 While the G in GPU stands for graphics, there are also popular SIMD programming models and associated APIs that map well …
The AMD GPU Services (AGS) library provides game and application developers with the ability to query information about installed AMD GPUs and their driver, in …
We are excited to announce the release of Compressonator v3.1! This version contains several new features and optimizations, including new installers for the SDK, CLI and …
Organised by the fine folks at Wargaming, the 4C conference was held in Prague over 2 days in early October this year, bringing attendees and …
Radeon GPU Profiler 1.3.1 RGP 1.3.1 is a hotfix release to keep compatibility with an upcoming Radeon Adrenalin Edition graphics driver. That driver descends from …
OCAT, our open source capture and analytics tool, has come a really long way since the 1.1 release around this time last year. The focus …
Introduction We released Vulkan Memory Allocator 1.0 (VMA) back in July last year, but we’ve been remiss in posting about the progress of the library …
Radeon GPU Profiler 1.3 First, happy birthday to RGP! We released 1.0 publicly almost exactly a year ago at the time of writing, something I’ve …
There are traditionally just two hard problems in computer science — naming things, cache invalidation, and off-by-1 errors — but I’ve long thought that there …
Adam Sawicki, a member of AMD RTG’s Game Engineering team, has spent the best part of a year assisting one of the world’s biggest game …
If you’ve ever heard the term “context roll” in the context of AMD GPUs — I’ll do that a lot in this post, sorry in …
Microsoft PIX is the premiere integrated performance tuning and debugging tool for Windows game developers using DirectX 12. PIX enables developers to debug and analyze …
With GDC 2018 done and dusted, we thought it’d be valuable to reemphasise that all of the presented content from the Advanced Graphics Techniques Tutorial …
The AMD GPU Services (AGS) library provides game and application developers with the ability to query information about installed AMD GPUs and their driver, in …
Radeon GPU Profiler 1.2 At GDC 2018 we talked about a new version of RGP that would interoperate with RenderDoc, allowing the two tools to …
Compressonator is a set of tools that allows artists and developers to work easily with compressed assets and easily visualize the quality impact of various …
We have posted the version 1.2 update to the TrueAudio Next open-source library to Github. It is available here. This update has a number of …
Vulkan™ is designed to have significantly smaller CPU overhead compared to other APIs like OpenGL®. This is achieved by various means – the API is …
Introduction Half-precision (FP16) computation is a performance-enhancing GPU technology long exploited in console and mobile devices not previously used or widely available in mainstream PC …
Real Time Ray Tracing was one of the hottest topics last week at GDC 2018. In this presentation, AMD Software Development Engineer and architect of Radeon …
The level of visual detail required of CAD models for the automotive industry or the most advanced film VFX requires a level of visual accuracy …
If you’re into the state of the art in games, especially real-time gaming graphics, your eyes will undoubtedly be on Moscone Center in San Francisco, …
The long wait is over. The GPU processing power of TrueAudio Next (TAN) has now been integrated into Steam Audio from Valve (Beta 13 release). …
Radeon GPU Profiler 1.1.1 With GDC 2018 getting ever closer, we wanted to get one last minor release of RGP out before things get hectic …
Radeon GPU Profiler 1.1.0 It feels like just last week that we released Radeon GPU Profiler (RGP) 1.0.3 but my calendar says almost 2 months …
Insights from Enscape as to how they designed a renderer that produces path traced real time global illumination and can also converge to offline rendered image quality
We are excited to announce the release of Compressonator V2.7! This version contains several new features and optimizations, including: Cross Platform Support Due to popular demand, …
Radeon GPU Profiler 1.0.3 A couple of months on from the release of 1.0.2, we’ve fully baked and sliced 1.0.3 for your low-level DX12- and …
The AMD GPU Services (AGS) library provides game and application developers with the ability to query information about installed AMD GPUs and their driver, in …
Due to architectural differences between Zen and our previous processor architecture, Bulldozer, developers need to take care when using the Windows® APIs for processor and core enumeration. …
The AMD GCN Vulkan extensions allow developers to get access to some additional functionalities offered by the GCN architecture which are not currently exposed in the Vulkan API. One of these is the ability to access the barycentric coordinates at the fragment-shader level.
Thanks (again!) Before we dive into a run over the release notes for the 1.0.2 release of Radeon GPU Profiler, we’d like to thank everyone …
Understanding the instruction-level capabilities of any processor is a worthwhile endeavour for any developer writing code for it, even if the instructions that get executed …
An important part of learning the Vulkan API – just like any other API – is to understand what types of objects are defined in it, what they represent and how they relate to each other. To help with this, we’ve created a diagram that shows all of the Vulkan objects and some of their relationships, especially the order in which you create one from another.
Summary In this blog post we are announcing the open-source availability of the Radeon™ ProRender renderer, an implementation of the Radeon ProRender API. We will give …
Introduction and thanks Effective GPU performance analysis is a more complex proposition for developers today than it ever has been, especially given developments in how …
TressFX 4 introduces a number of improvements. This blog post focuses on three of these, all of which are tied to simulation: Bone-based skinning Signed distance …
Full application control over GPU memory is one of the major differentiating features of the newer explicit graphics APIs such as Vulkan® and Direct3D® 12. …
We are excited to announce the release of Compressonator V2.6. This version contains several new features and optimizations, including: Adaptive Format Conversion for general transcoding operations …
When getting a new piece of hardware, the first step is to install the driver. You can see how to install them for the Radeon …
In this blog we will go through the installation process of the driver for your new Radeon Vega Frontier card. We will go through the …
When using a compute shader, it is important to consider the impact of thread group size on performance. Limited register space, memory latency and SIMD occupancy each affect shader performance in different ways. This article discusses potential performance issues, and techniques and optimizations that can dramatically increase performance if correctly applied.
The AMD Developer Tools team is thrilled to announce the availability of the AMD plugin for Microsoft’s PIX for Windows tool. PIX is a performance …
A new version of the CodeXL open-source developer tool is out! Here are the major new features in this release: CPU Profiling Support for AMD …
When it comes to multi-GPU (mGPU), most developers immediately think of complicated Crossfire setups with two or more GPUs and how to make their game …
Introduction Shortly after our Capsaicin and Cream event at GDC this year where we unveiled Radeon RX Vega, we hosted a developer-focused event designed to …
BC6 HDR Compression The BC6H codec has been improved and now offers better quality then previous releases, along with support for both 16 bit Half …
This article explains how to use Radeon GPU Analyzer (RGA) to produce a live VGPR analysis report for your shaders and kernels. Basic RGA usage …
I’m Mike Schmit, Director of Software Engineering with the Radeon Technologies Group at AMD. I’m leading the development of a new open-source 360-degree video-stitching framework …
AMD LiquidVR MultiView Rendering in Serious Sam VR with the GPU Services (AGS) Library AMD’s MultiView Rendering feature reduces the number of duplicated object draw …
In 2016, AMD brought TrueAudio Next to GameSoundCon. GameSoundCon was held Sept 27-28 at the Millennium Biltmore Hotel in Los Angeles. GameSoundCon caters to game …
Budgeting, measuring and debugging video memory usage is essential for the successful release of game titles on Windows. As a developer, this can be efficiently achieved with the …
Another year, another Game Developer Conference! GDC is held earlier this year (27 February – 3 March 2017) which is leaving even less time for …
With the launch of AGS 5.0 developers now have access to the shader compiler control API. Here’s a quick summary of the how and why…. Background …
There are many games out there taking place in vast environments. The basic building block of every environment is height-field based terrain – there’s no …
Understanding concurrency (and what breaks it) is extremely important when optimizing for modern GPUs. Modern APIs like DirectX® 12 or Vulkan™ provide the ability to …
Summary Many Gaming and workstation laptops are available with both (1) integrated power saving and (2) discrete high performance graphics devices. Unfortunately, 3D intensive application …
This post is taking a look at some of the interesting bits of helping id Software with their DOOM® Vulkan™ effort, from the perspective of …
This blog is guest authored by Croteam developer Karlo Jez and he will be giving us a detailed look at how Affinity Multi-GPU support was …
When opening a 64-bit crash dump you will find that you will not necessarily get a sensible call stack. This is because 64-bit crash dumps …
Vulkan™’s barrier system is unique as it not only requires you to provide what resources are transitioning, but also specify a source and destination pipeline …
This is the third post in the follow up series to my prior GDC talk on Variable Dynamic Range. Prior posts covered dithering, today’s topic …
Virtual desktop infrastructure systems and cloud gaming are increasingly gaining popularity thanks to an ever more improved internet infrastructure. This gives more flexibility to the …
As noted in my previous blog, new innovations in virtual reality have spearheaded a renewed interest in audio processing, and many new as well as …
This week marks the last in the series of our regular Warhammer Wednesday blog posts. We’d like to extent our thanks to Creative Assembly’s Lead …
Audio Must be Consistent With What You See Virtual reality demands a new way of thinking about audio processing. In the many years of history …
Happy Warhammer Wednesday! This week Creative Assembly’s Lead Graphics Programmer Tamas Rabel talks about how Total War: Warhammer utilized asynchronous compute to extract some extra …
It’s Wednesday, so we’re continuing with our series on Total War: Warhammer. Here’s Tamas Rabel again with some juicy details about how Creative Assembly brought …
A new release of the CodeXL open-source developer tool is out! Here’s the hot new stuff in this release: New platforms support Support Linux systems …
We’re back again on this fine Warhammer Wednesday with more from Tamas Rabel, Lead Graphics Programmer on the Total War series. In last week’s post …
For the next few weeks we’ll be having a regular feature on GPUOpen that we’ve affectionately dubbed “Warhammer Wednesdays”. We’re extremely lucky to have Tamas Rabel, …
Game engines do most of their shading work per-pixel or per-fragment. But there is another alternative that has been popular in film for decades: object …
EDIT: 2016/08/08 – Added section on Targeting Low-Memory GPUs This post serves as a guide on how to best use the various Memory Heaps and …
Before Direct3D® 12 and Vulkan™, resources were bound to shaders through a “slot” system. Some of you might remember when hardware did have only very …
Multi-GPU systems are much more common than you might think. Most of the time, when someone mentions mGPU, you think about high-end gaming machines with …
Compressonator is a set of tools to allow artists and developers to more easily create compressed texture image assets and easily visualize the quality impact …
Prior to explicit graphics APIs a lot of draw-time validation was performed to ensure that resources were synchronized and everything set up correctly. A side-effect of this robustness …
Direct3D® 12 and Vulkan™ significantly reduce CPU overhead and provide new tools to better use the GPU. For instance, one common use case for the …
As promised, we’re back and today I’m going to cover how to get resources to and from the GPU. In the last post, we learned …
A new CodeXL release is out! For the first time the AMD Developer Tools group worked on this release on the CodeXL GitHub public repository, …
Today, we are excited to announce that we are releasing an update for ShadowFX that adds support for DirectX® 12. Features Different shadowing modes Union of …
Achieving high performance from your Graphics or GPU Compute applications can sometimes be a difficult task. There are many things that a shader or kernel …
The GCN architecture contains a lot of functionality in the shader cores which is not currently exposed in current APIs like Vulkan™ or Direct3D® 12. One …
A Complete Tool to Transform Your Desktop Appearance After introducing our Display Output Post Processing (DOPP) technology, we are introducing a new tool to change …
Compaction is a basic building block of many algorithms – for instance, filtering out invisible triangles as seen in Optimizing the Graphics Pipeline with Compute. …
We are releasing TressFX 3.1. Our biggest update in this release is a new order-independent transparency (OIT) option we call “ShortCut”. We’ve also addressed some of …
Today’s update for GeometryFX introduces cluster culling. Previously, GeometryFX worked on a per-triangle level only. With cluster culling, GeometryFX is able to reject large chunks …
Full-speed, out-of-order rasterization If you’re familiar with graphics APIs, you’re certainly aware of the API ordering guarantees. At their core, these guarantees mean that if …
A New Milestone After the success of the first version, FireRays is moving to another major milestone. We are open sourcing the entire library which …
Last week, we organized a two hours-long talk at University of Lodz in Poland where we discussed the most common mistakes we come across in Vulkan applications. Dominik Witczak, …
We are very pleased to be announcing that AMD is open-sourcing one of our most popular tools and SDKs. Compressonator (previously released as AMD Compress …
Gaming at optimal performance and quality at high screen resolutions can sometimes be a demanding task for a single GPU. 4K monitors are becoming mainstream and gamers …
If you have supported Crossfire™ or Eyefinity™ in your previous titles, then you have probably already used our AMD GPU Services (AGS) library. A lot of …
Resource creation and management has changed dramatically in Direct3D® and Vulkan™ compared to previous APIs. In older APIs, memory is managed transparently by the driver. …
CodeXL major release 2.0 is out! It is chock-full of new features and a drastic change in the CodeXL development model: CodeXL is now open …
The prior post in this series established a base technique for adding grain, and now this post is going to look at very subtle changes to …
Welcome back to our performance & optimization series. Today, we’ll be looking more closely at shaders. On the surface, it may look as if they …
This is the first of a series of posts expanding on the ideas presented at GDC in the Advanced Techniques and Optimization of VDR Color …
The Game Developer Conference 2016 was an event of epic proportions. Presentations, tutorials, round-tables, and the show floor are only one part of the story …
This post describes how GCN hardware coalesces memory operations to minimize traffic throughout the memory hierarchy. The post uses the term “invocation” to describe one …
Bandwidth is always a scarce resource on a GPU. On one hand, hardware has made dramatic improvements with the introduction of ever faster memory standards …
Vulkan™ provides unprecedented control to developers over generating graphics and compute workloads for a wide range of hardware, from tiny embedded processors to high-end workstation GPUs with wildly different …
The Game Developer Conference 2016 (GDC16) is held March 14-18 in the Moscone Center in San Francisco. This is the most important event for game developers, …
Welcome back to our DX12 series! Let’s dive into one of the hottest topics right away: synchronization, that is, barriers and fences! Barriers A barrier is …
Vulkan™ is a high performance, low overhead graphics API designed to allow advanced applications to drive modern GPUs to their fullest capacity. Where traditional APIs …
Imagine that you were asked one day to design an API with bleeding-edge graphics hardware in mind. It would need to be as efficient as …
Hello and welcome to our series of blog posts covering performance advice for Direct3D® 12 & Vulkan™. You may have seen the #DX12PerfTweets on Twitter, and …
For GPU-side dynamically generated data structures which need 3D spherical mappings, two of the most useful mappings are cubemaps and octahedral maps. This post explores …
I have met enough game developers in my professional life to know that these guys are among the smartest people on the planet. Those particular individuals will go …
About CodeXL Analyzer CLI CodeXL Analyzer CLI is an offline compiler and performance analysis tool for OpenCL™ kernels, DirectX® shaders and OpenGL® shaders. Using CodeXL …
GPU PerfStudio supports DirectX® 12 on Windows® 10 PCs. The current tool set for DirectX 12 comprises of an API Trace, a new GPU Trace …
Today we’re going to take a look at how asynchronous compute can help you to get the maximum out of a GPU. I’ll be explaining …
What’s New With the recent adoption of new APIs such as DirectX® 12 and Vulkan™, we are seeing renewed interest in an older tool. AMD …
A typical problem with MSAA Resolve mixed with HDR is that a single sample with a large HDR value can over-power all other samples, resulting …
Virtual desktop infrastructure systems and cloud gaming are increasingly gaining popularity thanks to an ever more improved internet infrastructure. This gives more flexibility to the user, as software can now be provided as a service which can be used from virtually everywhere. However, the user expects to get the same experience as he is used to from working on his workstation or playing on his gaming console. This requires minimizing the latency between the server the software is running on and the client. The latency is composed of two parts: First we have the latency for the communication over the internet, and the other part is the time the software needs to respond to the input of the user and to generate the output that is then transmitted back.
To minimize the data size of the stream that has to be sent back to the user, the images can be compressed on the server by generating an encoded stream. But because the encoding can take a long time if it is done with software encoding on the CPU, which also reduces the CPU time that is available to the system for other tasks leading to a lower performance, the better solution is to use the encoding capabilities of a GPU if the server is equipped with one. To facilitate the implementation of this process in your own applications, AMD developed the RapidFire SDK.
The RapidFire SDK provides a software interface to capture and encode the input images entirely on the GPU and then copy the encoded result into the system memory where it can be used for further processing on the CPU. While remote or virtual desktop applications have to capture the entire screen, cloud gaming applications have to encode the render target that they are rendering into directly. RapidFire covers both use cases by allowing the user to select a certain display or desktop as input or to register his own render targets that he created. The captured images can then be processed by the encoders that are integrated into RapidFire. Those encoders can either create an H.264 encoded video stream or a difference map indicating the regions of the images with modified content, or just return the original image for further processing. The H.264 encoder offers a wide variety of parameters that can be used to configure the output stream dynamically to meet the requirements for the user’s application.
To get a better understanding on how to use RapidFire we are now going to look at examples for both use cases. In the first example we are capturing the entire display and the mouse cursor, and then encoding the captured display with the difference encoder, which represents a common scenario for virtual desktop applications. In the second example we are going to capture a render target that was created by the application with DirectX® 11 and then encode it with the AMF encoder which produces an H.264 video stream. This configuration best demonstrates how to use RapidFire for cloud gaming applications.
For the sample code provided in this post, error checking was left out to make it easier to read the code. However, applications using RapidFire should be checking the RFStatus
values returned by the RapidFire function calls. More information about what values can be returned and what they mean can be found in the RapidFire documentation. In addition to the return values a log file is generated with further information for each RapidFire session the application created.
If we want to capture a display, we will have to tell RapidFire which one it should capture. To get the display IDs the user can, for example, enumerate the displays and then pass the IDs to RapidFire. For our sample we assume the user has already queried the display IDs; the sample MultiDeskoptCapture in the RapidFire repository on GitHub demonstrates in more detail how this can be done.
Once we get the display IDs for the displays that we want to capture, we can handle each display in a separate thread. The following sample code shows the basic setup to handle one of the displays and also how to capture the mouse cursor. As the mouse cursor shape and position is the same for all of the displays, it only has to be captured once in one thread.
void DesktopCapturingThread(unsigned int displayId, unsigned int streamWidth, unsigned int streamHeight
unsigned int diffMapBlockWidth, unsigned int diffMapBlockHeight)
{
RFProperties sessionProperties[] = { RF_ENCODER, RF_DIFFERENCE,
RF_DESKTOP_DSP_ID, displayId,
RF_ASYNC_SOURCE_COPY, 1,
RF_MOUSE_DATA, 1,
0 };
RFEncodeSession rfSession = NULL;
rfCreateEncodeSession(&rfSession, sessionProperties);
RFProperties encoderProperties[] = { RF_ENCODER_FORMAT, RF_RGBA8,
RF_DIFF_ENCODER_BLOCK_S, diffMapBlockWidth,
RF_DIFF_ENCODER_BLOCK_T, diffMapBlockHeight,
0 };
rfCreateEncoder2(rfSession, streamWidth, streamHeight, encoderProperties);
RFProperties diffMapWidth = 0, diffMapHeight = 0;
rfGetEncodeParameter(rfSession, RF_ENCODER_OUTPUT_WIDTH, &diffMapWidth);
rfGetEncodeParameter(rfSession, RF_ENCODER_OUTPUT_HEIGHT, &diffMapHeight);
/* Set up application specific resources for handling the RapidFire outputs */
thread mouseCursorCapturingThread(MouseCursorCapturingThread, rfSession);
void* pDiffMap = nullptr;
unsigned int diffMapSize = 0;
void* pSourceFrame = nullptr;
unsigned int sourceFrameSize = 0;
rfEncodeFrame(rfSession, 0);
while (/* More frames to process */)
{
rfEncodeFrame(rfSession, 0);
// The following two calls query the result
// for the frame of the previous iteration
rfGetSourceFrame(rfSession, &sourceFrameSize, &pSourceFrame));
rfGetEncodedFrame(rfSession, &diffMapSize, &pDiffMap));
/* Application specific handling of RapidFire outputs */
}
rfReleaseEvent(rfSession, RFMouseShapeNotification);
mouseCursorCapturingThread.join();
rfDeleteEncodeSession(&rfSession);
}
void MouseCursorCapturingThread(const RFEncodeSession& rfSession)
{
RFMouseData mouseData = {};
RFStatus rfStatus = RF_STATUS_OK;
while (rfStatus != RF_STATUS_MOUSEGRAB_NO_CHANGE)
{
rfStatus = rfGetMouseData(rfSession, 1, &mouseData);
/* Application specific handling of mouse cursor data */
}
}
The first thing that we have to do is to define the properties of the RapidFire session. This is done by creating an array with RFProperties
pairs, where the first entry of the pair defines which property we want to set and the second entry of the pair the value that we want to set the property to. In this example, we choose the encoder to be the RF_DIFFERENCE
encoder and specify the display ID for the display we want to capture. The RF_ASYNC_SOURCE_COPY
property is set so that RapidFire will copy the encoded results from the GPU into the system memory asynchronously. This helps ensure that the results of the encoding or the source images can be returned as fast as possible when queried. As we also want to capture the mouse cursor later, we have to enable this functionality for the RapidFire session by setting the RF_MOUSE_DATA
property to one. After setting it up, the array of properties is then used as input for the function rfCreateEncodeSession
that creates the RapidFire session which we are going to use to capture the display. This means that we have to create one RapidFire session per display that we want to capture.
Next we have to set up the encoder by filling an array of RFProperties
. The RF_ENCODER_FORMAT
sets the input format for the encoder. This means that the format of the captured image is first converted into this format before it is handed over to the encoder, and that we can later query the image with that format. In this example, we are setting this to an uncompressed RGBA format that can be handled by the difference encoder. Since we are using a difference encoder that is generating a difference map for the captured image we also have to set the size of the region that each entry in the difference map represents. This is done by setting the RF_DIFF_ENCODER_BLOCK_S/T
properties. We can now use this array of properties to create the encoder for the RapidFire session. This is done by calling the function rfCreateEncoder2
with the desired stream width, height and encoder properties.
After the encoder is created we can now query the encode parameters that we need to know in order to handle the encoder output later in our own application. For the difference encoder we have to query the height and the width of the difference map. As long as we are not changing the encoder output dimensions these values will stay unchanged. Before we continue to start the frame capturing loop we should first set up all other resources that the application will need in order to process the outputs from RapidFire.
Before we start with the frame capturing and encoding process we are creating a separate thread that is going to handle the mouse cursor capturing. This thread is periodically calling the function rfGetMouseData
in such a way that it is blocking until the mouse cursor has changed or the function was released by the user in the thread handling the RapidFire session. A release by the user means that the function returned without a cursor shape change. So once that happened we can leave the loop and terminate the thread.
We now have everything set up to start the frame capturing and encoding process. Before we enter the loop, however, we call the function rfEncodeFrame
once. By always calling rfEncodeFrame
and rfGetEncodedFrame
once per iteration in the loop, we can ensure that we always have one frame left for processing asynchronously on the GPU while we are processing the output for the previous frame.
By calling the function rfEncodeFrame,
we capture the image that is currently displayed on the display and start the encoding process for it on the GPU. As we have set the RF_ASYNC_SOURCE_COPY
property for the RapidFire session, the result of the encoding and the source image are also already being copied into the system memory asynchronously. Next we call the function rfGetSourceFrame
which returns a pointer to a buffer in system memory that contains the image that was used as input for the encoding. It is important that we call this function before we call the function rfGetEncodedFrame
, because after that call the encoded result and source image will be removed from the queue storing the results in RapidFire. Once we have queried the source frame and the corresponding difference map we can use them to create the data that we are going to send back to the client. To only send the regions of the captured image that changed, we have to read the difference map and only send back those regions for which 1 is stored in the difference map.
When we are finished with capturing the display we have to clean up the RapidFire resources that we created. First we have to call rfReleaseEvent
with the rfMouseShapeNotification
flag set, so that the blocking function call for rfGetMouseData
gets released inside the mouse cursor capturing thread and then wait until that thread has terminated. At the end we will release all resources allocated in the RapidFire session by calling rfDeleteEncodeSession
.
For the second sample code we want to take a look at how to capture frames with RapidFire that have been rendered into a render target with a graphics API and encode them with the AMF encoder. For this sample we are using DirectX® 11 but you can also use DirectX® 9 or OpenGL®.
void RenderTargetCapturingThread(ID3D11Device* device, ID3D11Texture2D* renderTargets[2],
unsigned int renderTargetWidth, unsigned int renderTargetHeight,
unsigned int streamWidth, unsigned int streamHeight)
{
RFProperties sessionProperties[] = { RF_ENCODER, RF_AMF,
RF_D3D11_DEVICE, device,
0 };
RFEncodeSession rfSession = NULL;
rfCreateEncodeSession(&rfSession, sessionProperties);
rfCreateEncoder(rfSession, streamWidth, streamHeight, RF_PRESET_BALANCED);
unsigned int rfRenderTargetIndices[] = {0, 0};
rfRegisterRenderTarget(rfSession, renderTargets[0], renderTargetWidth,
renderTargetHeight, &rfRenderTargetIndices[0]);
rfRegisterRenderTarget(rfSession, renderTargets[1], renderTargetWidth,
renderTargetHeight, &rfRenderTargetIndices[1]);
unsigned int renderTargetIndex = 0;
void* pEncodedFrame = nullptr;
unsigned int encodedFrameSize = 0;
/* Set up application specific resources for rendering and render first frame */
rfEncodeFrame(rfSession, rfRenderTargetIndices[renderTargetIndex]);
while (/* More frames to process */)
{
/* Render into the render target and synchronize between threads */
renderTargetIndex = 1 - renderTargetIndex;
rfEncodeFrame(rfSession, rfRenderTargetIndices[renderTargetIndex]);
rfGetEncodedFrame(rfSession, &encodedFrameSize, &pEncodedFrame);
/* Application specific handling of RapidFire outputs */
}
rfDeleteEncodeSession(&rfSession);
}
First we have to set up the RapidFire session properties again. For this sample we want to use the AMF encoder that will create an H.264 encoded stream so we are setting the RF_ENCODER
property to RF_AMF
. Additionally, we have to pass the DirectX® 11 device to the RapidFire session by setting the RF_D3D11_DEVICE
property to the pointer for the DirectX® 11 device. RapidFire also supports the DirectX® 9(Ex) API for which we would have to set the RF_D3D9(EX)_DEVICE
property to the pointer for the DirectX® 9(Ex) device. For the OpenGL® API we would have to set the RF_GL_DEVICE_CTX
property to the device context that was used to create the OpenGL® context and the RF_GL_GRAPHICS_CTX
property to the OpenGL® context.
For the encoder creation we are using the RF_PRESET_BALANCED
preset that is provided by RapidFire for simplicity’s sake. There are three different presets available that can be used for different use cases and they will get you started quickly. If you want to set the AMF encoder properties yourself, you can create an RFProperties
array and set the encoding parameters. Further information about the encoding parameters used for the presets and valid configurations for the encoding parameters can be found in the RapidFire documentation.
Before we start capturing the render targets with RapidFire we have to register them with the RapidFire session that we created. This is done by passing the pointers or handles to the render targets with its dimensions to the function rfRegisterRenderTarget
. The function will return an index that is used by the RapidFire session to distinguish between the different render targets that were registered.
Now we can start rendering the frames into the render targets and encode the results with RapidFire. But if we want to encode those results it is important that we are passing the correct indices to the rfEncodeFrame
function and synchronize the rendering by the application with the calls to start encoding for RapidFire. This way we will always render into one render target and encode the other one without overlapping those two processes, which could lead to corruptions showing up in the encoded stream. In this sample we do this by setting the index to the index of the render target that we just rendered into, while also synchronizing the rendering calls of our application with the encoding calls in the RapidFire thread.
Furthermore, we have to process the output stream from RapidFire that is stored in the system memory and thus can be processed entirely on the CPU without interfering with any work that is done on the GPU. This allows us to process the result for the render target while the application is rendering into it. What we end up with is a pipeline that has three stages and is running all stages, two on the GPU and one on the CPU, in parallel.
As we can see for both use case examples, the work involved to get started with RapidFire is fairly low. Of course we have just covered the basics in this post and in both cases further optimizations are possible by starting the encoding in a separate thread from the one that will do the querying and processing of the output from RapidFire. This approach removes the dependency between these two processes and can lead to an even lower latency. If you want to find out more on how you can use RapidFire for different scenarios and want to get more details on how to use it in your implementation you can check the sample projects that are available alongside the documentation in the RapidFire repository on GitHub.