Shared Shader Caching#

Shader caching is a performance optimization technique used in GPU rendering to store the results of shader compilation for reuse in future sessions. Shaders are programs that run on the GPU to handle graphical tasks like lighting, shading, and other visual effects. These shaders must be compiled from source code into executable instructions before they can be run on the GPU, and this process can be time-consuming. Shader caching addresses this issue by storing the compiled shaders, significantly reducing the need for recompilation and improving performance.

Why Shader Caching is Needed#

The shader compilation process can introduce significant delays in applications that rely on GPU rendering, such as gaming, virtual environments, or 3D modeling. When shaders are not cached, they must be compiled every time the application runs, which leads to longer startup times and latency, especially in complex or resource-heavy scenes. By caching compiled shaders, applications can avoid repeated compilation, allowing for faster startup and a smoother user experience.

This is particularly important in distributed or containerized environments, where multiple sessions may be run on different GPU nodes. Without shader caching, every node might need to recompile the same shaders, leading to unnecessary delays and inefficient resource use. A shader cache minimizes these issues by storing compiled shaders for reuse across different nodes or sessions.

Shader Caching in Omniverse RTX Renderer#

Before the Omniverse RTX Renderer can render a USD scene, all shaders must be compiled by the GPU driver running on the GPU Worker Node. This process can introduce significant latency from the moment an Omniverse Kit application is launched until the user begins receiving the WebRTC stream.

Local Environments: In a local environment, the NVIDIA GPU driver caches the compiled shaders in a directory. The first time a shader is used, it incurs the cost of compilation, but on subsequent uses, the shader is pulled from the cache, significantly reducing startup time.

Containerized Environments: In containerized environments, however, the default behavior is different. Compiled shaders are written to the container’s virtual filesystem, which is not persistent across sessions. This means the compiled shaders are lost once the container is terminated. Moreover, subsequent invocations might run on different GPU worker nodes, which further complicates shader reuse. As a result, users may experience repeated shader compilation delays, leading to inefficient performance, especially when switching between dynamically allocated GPU nodes.

Solution: Shared Shader Cache#

To address the challenges in distributed environments, a shared shader cache can be implemented. This approach leverages NVIDIA technology to store compiled shaders in a key-value store, such as memcached, which is a fast, distributed, in-memory caching system. This setup allows multiple GPU nodes to share access to the same compiled shaders, avoiding redundant compilation and significantly reducing the time it takes for users to start streaming.

How to Set Up a Shared Shader Cache#

To deploy a shared shader cache, perform the following steps:

  1. Start a memcached instance in the cluster. memcached serves as the key-value store for caching compiled shaders, enabling fast retrieval across different nodes.

  2. Configure Omniverse Kit for shared shader caching. This is accomplished by setting the AUTO_LOAD_DRIVER_CACHE_WRAPPER environment variable with the URL of the memcached instance when initializing the Omniverse Kit application. Once configured, the application can pull compiled shaders from the shared cache, speeding up the start of streaming and minimizing compilation delays across sessions.

By implementing a shared shader cache, you can significantly reduce the start-up latency in distributed or containerized environments, improving the overall performance of GPU-accelerated applications.