Using RapidFire for Virtual Desktop and Cloud Gaming

Posted on September 27, 2016September 27, 2016 by Bruno Stefanizzi

Cloud Gaming, GPU, RapidFire, Virtual Desktop

Virtual desktop infrastructure systems and cloud gaming are increasingly gaining popularity thanks to an ever more improved internet infrastructure. This gives more flexibility to the user, as software can now be provided as a service which can be used from virtually everywhere. However, the user expects to get the same experience as he is used to from working on his workstation or playing on his gaming console. This requires minimizing the latency between the server the software is running on and the client. The latency is composed of two parts: First we have the latency for the communication over the internet, and the other part is the time the software needs to respond to the input of the user and to generate the output that is then transmitted back.

To minimize the data size of the stream that has to be sent back to the user, the images can be compressed on the server by generating an encoded stream. But because the encoding can take a long time if it is done with software encoding on the CPU, which also reduces the CPU time that is available to the system for other tasks leading to a lower performance, the better solution is to use the encoding capabilities of a GPU if the server is equipped with one. To facilitate the implementation of this process in your own applications, AMD developed the RapidFire SDK.

The RapidFire SDK provides a software interface to capture and encode the input images entirely on the GPU and then copy the encoded result into the system memory where it can be used for further processing on the CPU. While remote or virtual desktop applications have to capture the entire screen, cloud gaming applications have to encode the render target that they are rendering into directly. RapidFire covers both use cases by allowing the user to select a certain display or desktop as input or to register his own render targets that he created. The captured images can then be processed by the encoders that are integrated into RapidFire. Those encoders can either create an H.264 encoded video stream or a difference map indicating the regions of the images with modified content, or just return the original image for further processing. The H.264 encoder offers a wide variety of parameters that can be used to configure the output stream dynamically to meet the requirements for the user’s application.

To get a better understanding on how to use RapidFire we are now going to look at examples for both use cases. In the first example we are capturing the entire display and the mouse cursor, and then encoding the captured display with the difference encoder, which represents a common scenario for virtual desktop applications. In the second example we are going to capture a render target that was created by the application with DirectX® 11 and then encode it with the AMF encoder which produces an H.264 video stream. This configuration best demonstrates how to use RapidFire for cloud gaming applications.

For the sample code provided in this post, error checking was left out to make it easier to read the code. However, applications using RapidFire should be checking the RFStatus values returned by the RapidFire function calls. More information about what values can be returned and what they mean can be found in the RapidFire documentation. In addition to the return values a log file is generated with further information for each RapidFire session the application created.

Virtual Desktop Infrastructure Use Case

If we want to capture a display, we will have to tell RapidFire which one it should capture. To get the display IDs the user can, for example, enumerate the displays and then pass the IDs to RapidFire. For our sample we assume the user has already queried the display IDs; the sample MultiDeskoptCapture in the RapidFire repository on GitHub demonstrates in more detail how this can be done.

Once we get the display IDs for the displays that we want to capture, we can handle each display in a separate thread. The following sample code shows the basic setup to handle one of the displays and also how to capture the mouse cursor. As the mouse cursor shape and position is the same for all of the displays, it only has to be captured once in one thread.

void DesktopCapturingThread(unsigned int displayId, unsigned int streamWidth, unsigned int streamHeight
                            unsigned int diffMapBlockWidth, unsigned int diffMapBlockHeight)
{
    RFProperties sessionProperties[] = { RF_ENCODER,                  RF_DIFFERENCE,
                                         RF_DESKTOP_DSP_ID,           displayId,
                                         RF_ASYNC_SOURCE_COPY,        1,
                                         RF_MOUSE_DATA,               1,
                                         0 };

    RFEncodeSession rfSession = NULL;
    rfCreateEncodeSession(&rfSession, sessionProperties);

    RFProperties encoderProperties[] = { RF_ENCODER_FORMAT,       RF_RGBA8,
                                         RF_DIFF_ENCODER_BLOCK_S, diffMapBlockWidth,
                                         RF_DIFF_ENCODER_BLOCK_T, diffMapBlockHeight,
                                         0 };

    rfCreateEncoder2(rfSession, streamWidth, streamHeight, encoderProperties);

    RFProperties diffMapWidth = 0, diffMapHeight = 0;
    rfGetEncodeParameter(rfSession, RF_ENCODER_OUTPUT_WIDTH, &diffMapWidth);
    rfGetEncodeParameter(rfSession, RF_ENCODER_OUTPUT_HEIGHT, &diffMapHeight);

    /* Set up application specific resources for handling the RapidFire outputs */

    thread mouseCursorCapturingThread(MouseCursorCapturingThread, rfSession);

    void*           pDiffMap        = nullptr;
    unsigned int    diffMapSize     = 0;
    void*           pSourceFrame    = nullptr;
    unsigned int    sourceFrameSize = 0;

    rfEncodeFrame(rfSession, 0);

    while (/* More frames to process */)
    {
        rfEncodeFrame(rfSession, 0);

        // The following two calls query the result 
        // for the frame of the previous iteration
        rfGetSourceFrame(rfSession, &sourceFrameSize, &pSourceFrame));
        rfGetEncodedFrame(rfSession, &diffMapSize, &pDiffMap));

        /* Application specific handling of RapidFire outputs */
    }

    rfReleaseEvent(rfSession, RFMouseShapeNotification);
    mouseCursorCapturingThread.join();

    rfDeleteEncodeSession(&rfSession);
}

void MouseCursorCapturingThread(const RFEncodeSession& rfSession)
{
    RFMouseData mouseData = {};
    RFStatus rfStatus     = RF_STATUS_OK;

    while (rfStatus != RF_STATUS_MOUSEGRAB_NO_CHANGE)
    {
        rfStatus = rfGetMouseData(rfSession, 1, &mouseData);

        /* Application specific handling of mouse cursor data */
    }
}

The first thing that we have to do is to define the properties of the RapidFire session. This is done by creating an array with RFProperties pairs, where the first entry of the pair defines which property we want to set and the second entry of the pair the value that we want to set the property to. In this example, we choose the encoder to be the RF_DIFFERENCE encoder and specify the display ID for the display we want to capture. The RF_ASYNC_SOURCE_COPY property is set so that RapidFire will copy the encoded results from the GPU into the system memory asynchronously. This helps ensure that the results of the encoding or the source images can be returned as fast as possible when queried. As we also want to capture the mouse cursor later, we have to enable this functionality for the RapidFire session by setting the RF_MOUSE_DATA property to one. After setting it up, the array of properties is then used as input for the function rfCreateEncodeSession that creates the RapidFire session which we are going to use to capture the display. This means that we have to create one RapidFire session per display that we want to capture.

Next we have to set up the encoder by filling an array of RFProperties. The RF_ENCODER_FORMAT sets the input format for the encoder. This means that the format of the captured image is first converted into this format before it is handed over to the encoder, and that we can later query the image with that format. In this example, we are setting this to an uncompressed RGBA format that can be handled by the difference encoder. Since we are using a difference encoder that is generating a difference map for the captured image we also have to set the size of the region that each entry in the difference map represents. This is done by setting the RF_DIFF_ENCODER_BLOCK_S/T properties. We can now use this array of properties to create the encoder for the RapidFire session. This is done by calling the function rfCreateEncoder2 with the desired stream width, height and encoder properties.

After the encoder is created we can now query the encode parameters that we need to know in order to handle the encoder output later in our own application. For the difference encoder we have to query the height and the width of the difference map. As long as we are not changing the encoder output dimensions these values will stay unchanged. Before we continue to start the frame capturing loop we should first set up all other resources that the application will need in order to process the outputs from RapidFire.

Before we start with the frame capturing and encoding process we are creating a separate thread that is going to handle the mouse cursor capturing. This thread is periodically calling the function rfGetMouseData in such a way that it is blocking until the mouse cursor has changed or the function was released by the user in the thread handling the RapidFire session. A release by the user means that the function returned without a cursor shape change. So once that happened we can leave the loop and terminate the thread.

We now have everything set up to start the frame capturing and encoding process. Before we enter the loop, however, we call the function rfEncodeFrame once. By always calling rfEncodeFrame and rfGetEncodedFrame once per iteration in the loop, we can ensure that we always have one frame left for processing asynchronously on the GPU while we are processing the output for the previous frame.

By calling the function rfEncodeFrame, we capture the image that is currently displayed on the display and start the encoding process for it on the GPU. As we have set the RF_ASYNC_SOURCE_COPY property for the RapidFire session, the result of the encoding and the source image are also already being copied into the system memory asynchronously. Next we call the function rfGetSourceFrame which returns a pointer to a buffer in system memory that contains the image that was used as input for the encoding. It is important that we call this function before we call the function rfGetEncodedFrame, because after that call the encoded result and source image will be removed from the queue storing the results in RapidFire. Once we have queried the source frame and the corresponding difference map we can use them to create the data that we are going to send back to the client. To only send the regions of the captured image that changed, we have to read the difference map and only send back those regions for which 1 is stored in the difference map.

When we are finished with capturing the display we have to clean up the RapidFire resources that we created. First we have to call rfReleaseEvent with the rfMouseShapeNotification flag set, so that the blocking function call for rfGetMouseData gets released inside the mouse cursor capturing thread and then wait until that thread has terminated. At the end we will release all resources allocated in the RapidFire session by calling rfDeleteEncodeSession.

Cloud Gaming Use Case

For the second sample code we want to take a look at how to capture frames with RapidFire that have been rendered into a render target with a graphics API and encode them with the AMF encoder. For this sample we are using DirectX® 11 but you can also use DirectX® 9 or OpenGL®.

void RenderTargetCapturingThread(ID3D11Device* device, ID3D11Texture2D* renderTargets[2],
                                 unsigned int renderTargetWidth, unsigned int renderTargetHeight,
                                 unsigned int streamWidth, unsigned int streamHeight)
{
    RFProperties sessionProperties[] = { RF_ENCODER,      RF_AMF,
                                         RF_D3D11_DEVICE, device,
                                         0 };

    RFEncodeSession rfSession = NULL;

    rfCreateEncodeSession(&rfSession, sessionProperties);

    rfCreateEncoder(rfSession, streamWidth, streamHeight, RF_PRESET_BALANCED);

    unsigned int rfRenderTargetIndices[] = {0, 0};
    rfRegisterRenderTarget(rfSession, renderTargets[0], renderTargetWidth, 
                           renderTargetHeight, &rfRenderTargetIndices[0]);
    rfRegisterRenderTarget(rfSession, renderTargets[1], renderTargetWidth, 
                           renderTargetHeight, &rfRenderTargetIndices[1]);

    unsigned int renderTargetIndex = 0;
    void*        pEncodedFrame = nullptr;
    unsigned int encodedFrameSize = 0;

    /* Set up application specific resources for rendering and render first frame */

    rfEncodeFrame(rfSession, rfRenderTargetIndices[renderTargetIndex]);

    while (/* More frames to process */)
    {
        /* Render into the render target and synchronize between threads */
        
        renderTargetIndex = 1 - renderTargetIndex;
        
        rfEncodeFrame(rfSession, rfRenderTargetIndices[renderTargetIndex]);

        rfGetEncodedFrame(rfSession, &encodedFrameSize, &pEncodedFrame);

        /* Application specific handling of RapidFire outputs */
    }

    rfDeleteEncodeSession(&rfSession);
}

First we have to set up the RapidFire session properties again. For this sample we want to use the AMF encoder that will create an H.264 encoded stream so we are setting the RF_ENCODER property to RF_AMF. Additionally, we have to pass the DirectX® 11 device to the RapidFire session by setting the RF_D3D11_DEVICE property to the pointer for the DirectX® 11 device. RapidFire also supports the DirectX® 9(Ex) API for which we would have to set the RF_D3D9(EX)_DEVICE property to the pointer for the DirectX® 9(Ex) device. For the OpenGL® API we would have to set the RF_GL_DEVICE_CTX property to the device context that was used to create the OpenGL® context and the RF_GL_GRAPHICS_CTX property to the OpenGL® context.

For the encoder creation we are using the RF_PRESET_BALANCED preset that is provided by RapidFire for simplicity’s sake. There are three different presets available that can be used for different use cases and they will get you started quickly. If you want to set the AMF encoder properties yourself, you can create an RFProperties array and set the encoding parameters. Further information about the encoding parameters used for the presets and valid configurations for the encoding parameters can be found in the RapidFire documentation.

Before we start capturing the render targets with RapidFire we have to register them with the RapidFire session that we created. This is done by passing the pointers or handles to the render targets with its dimensions to the function rfRegisterRenderTarget. The function will return an index that is used by the RapidFire session to distinguish between the different render targets that were registered.

Now we can start rendering the frames into the render targets and encode the results with RapidFire. But if we want to encode those results it is important that we are passing the correct indices to the rfEncodeFrame function and synchronize the rendering by the application with the calls to start encoding for RapidFire. This way we will always render into one render target and encode the other one without overlapping those two processes, which could lead to corruptions showing up in the encoded stream. In this sample we do this by setting the index to the index of the render target that we just rendered into, while also synchronizing the rendering calls of our application with the encoding calls in the RapidFire thread.

Furthermore, we have to process the output stream from RapidFire that is stored in the system memory and thus can be processed entirely on the CPU without interfering with any work that is done on the GPU. This allows us to process the result for the render target while the application is rendering into it. What we end up with is a pipeline that has three stages and is running all stages, two on the GPU and one on the CPU, in parallel.

Summary

As we can see for both use case examples, the work involved to get started with RapidFire is fairly low. Of course we have just covered the basics in this post and in both cases further optimizations are possible by starting the encoding in a separate thread from the one that will do the querying and processing of the output from RapidFire. This approach removes the dependency between these two processes and can lead to an even lower latency. If you want to find out more on how you can use RapidFire for different scenarios and want to get more details on how to use it in your implementation you can check the sample projects that are available alongside the documentation in the RapidFire repository on GitHub.

Bruno Stefanizzi is Senior Manager for AMD FirePro Developer Technology at AMD. Links to third party sites and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

Radeon GPU Profiler 1.5.1

Using AMD Freesync 2 HDR: Tone Mapping

GDC 2019 Presentation Links

Using AMD FreeSync 2 HDR: Color Spaces

OCAT 1.4

Radeon GPU Analyzer 2.1

GDC 2019 Presentations

Vulkan Memory Allocator 2.2

Ryzen Threadripper for Game Development – optimising UE4 build times

OCAT 1.3

Radeon GPU Profiler 1.4

AMD GPU Services 5.3.0

New Compressonator 3.1 SDK for seamless integration into asset toolchains – and more!

Optimize your engine using compute @ 4C Prague 2018

Radeon GPU Profiler 1.3.1

OCAT 1.2

Vulkan Memory Allocator 2.1

Radeon GPU Profiler 1.3

Decoding Radeon Vulkan versions

Porting your engine to Vulkan or DX12

Understanding GPU context rolls

Microsoft PIX Introduces AMD-Integrated Plug-In with Occupancy Data Graph

GDC 2018 Presentation Links

AMD GPU Services 5.2.0

Radeon GPU Profiler 1.2

Compressonator V3.0 Release Brings Powerful New 3D Model Features

TrueAudio Next Version 1.2 Now Posted to Github

Reducing Vulkan API call overhead

First steps when implementing FP16

GDC 2018 Presentation: Real-Time Ray-Tracing Techniques for Integration into Existing Renderers

Real-Time Ray Tracing with Radeon ProRender

GDC 2018 Presentations

TrueAudio Next is Now Integrated into Steam Audio

Radeon GPU Profiler 1.1.1

Radeon GPU Profiler 1.1.0

Deferred Path Tracing By Enscape

Compressonator V2.7 Release adds cross platform support and 3D Model compression with glTF v2.0

Radeon GPU Profiler 1.0.3

AMD GPU Services 5.1.1

CPU core count detection on Windows

Stable barycentric coordinates

Radeon GPU Profiler 1.0.2

AMD Vega Instruction Set Architecture documentation

Understanding Vulkan objects

Open-source Radeon ProRender

Radeon GPU Profiler 1.0

TressFX 4 Simulation Changes

Vulkan Memory Allocator 1.0

Compressonator V2.6 Release Adds HDR Tonemapping Compression, New Image Analysis Features

Vega Frontier : How to for developers

Vega Frontier : How to install the driver

Optimizing GPU occupancy and resource usage with large thread groups

DirectX12 Hardware Counter Profiling with Microsoft PIX and the AMD Plugin

CodeXL 2.3 is released!

Content Creation Tools and Multi-GPU

Capsaicin and Cream developer talks at GDC 2017

Compressonator V2.5 Release Adds Enhanced HDR Support

Live VGPR Analysis with Radeon GPU Analyzer

The Radeon Loom Stitching Pipeline

AMD LiquidVR MultiView Rendering in Serious Sam VR

TrueAudio Next Demo and Paper at GameSoundCon

Profiling video memory with Windows Performance Analyzer

GDC 2017 Presentations

AGS 5.0 – Shader Compiler Controls

Optimizing Terrain Shadows

Leveraging asynchronous queues for concurrent execution

Selecting the Best Graphics Device to Run a 3D Intensive Application

Vulkan and DOOM

Implementing LiquidVR™ Affinity Multi-GPU support in Serious Sam VR

AMD Driver Symbol Server

Vulkan barriers explained

VDR Follow Up – Tonemapping for HDR Signals

Using RapidFire for Virtual Desktop and Cloud Gaming

AMD TrueAudio Next and CU Reservation – What is the Context?

Anatomy Of The Total War Engine: Part V

The Importance of Audio in VR

Anatomy Of The Total War Engine: Part IV

Anatomy Of The Total War Engine: Part III

Blazing CodeXL 2.2 is here!

Anatomy Of The Total War Engine: Part II