Render Farm Dispatching

Introduction

Render Farm Dispatching allows developers to build a distributed task system using Kit’s micro services. (Kit services ) It uses the abstractions the services framework provides to turn your regular Kit service into a batcheable workload.

It is infrastructure agnostic and build in such a way that it can run on bare metal servers and VMs as well as using more advanced scheduling on platforms such as Kubernetes. It is also task agnostic meaning it can do anything from running an instance of Create and producing a render, generating USD scenes for machine learning, running ffmpeg to generate a video of a set of frames, docker workloads, kubernetes jobs.

Farm Kit Overview Video

Use Case Example

As a developer you have written an extension that can render a USD stage (omni.kit.capture). You can trigger this from inside Kit. While it renders your machine is busy and you can’t use it. There is a spare machine next to you so you wrap the omni.kit.capture extension into a service API with Kit services. You log into the machine and set up Create and the omni.services.render service.

When ready you use an service request (simple HTTP here would work) to trigger your render. You are now rendering remotely. Setting up Kit again and again is tedious. So you use the Agent provided by Render Farm Dispatching and configure it once so it can start op an instance of Create. Now instead of targeting the render service directly you target the agent and ask it to spin up an instance of Create and then trigger the render of your USD stage.

This is still a two step process. So you use the Queue provided by Render Farm Dispatching. Instead of targetting the Agent, you target the Queue and configure the Agent to fetch tasks from the Queue. When ready you submit your task to the Queue and the agent will spawn an instance of create and will have received the ‘render request’ and will run the render.

Now you have 10 stages to render or one large stage to render and there are 10 more machines free. You run the agent on all 10 machines. Now you can run 10 renders in parallel. If the hardware runs out you can scale this out to Cloud VMs, Kubernetes, Lambda functions etc.

Your service has not changed, all you have done is plug A kit of services together that will take care of the distribution of that workload across the machines. Now you decide that you want to render a quicktime from the frames that were rendered. You implement another function on your render service and tell the Queue that the task is not a render task but a generate quicktime task. You can instantly distribute this across one of the machines or, since your machine is not rendering, run that bit locally as a service.

This is what Render Farm Dispatching was designed to do. The ultimate goal of Render Farm Dispatching is to provide abstractions and tooling, using the same principles as Kit services, to make it extremely easy to integrate and run what were simple local kit based workloads as scaling using todays and tomorrows standards and infrastructure.

Architecture

Render Farm Dispatching is made up of two main runtime components.

The Queue and the Agent. Both are built and made up of several Kit services.

This will allow each component to either be running within the same processes or scaled horizontally across many processes allowing it to be suitable for adhoc small use cases as well as much much larger workflows as has been described here.

Users submit tasks to the Queue, the agent is responsible of fetching the tasks it knows how to execute and running them.

Tasks

As mentioned in the introduction, tasks can technically be anything but where they are most powerful is when combined with Kit services. They are defined as a tuple of (process, service function). For example: (create, render.run) will spin up create and then invoke the render.run service inside that instance of create.

The simplest way is to define them into the app’s kit file but more advanced options can be implemented to read this list from an API for example.

Task Definition

[settings.exts."omni.services.farm.agent.operator".apps.create]
type = "kit-task"
name = "create"
command = "/opt/omniverse/create/create-rc.7/omni.create.headless.sh"
args = [
    "--enable", "omni.services.render",
]
task_function = "render.run"
no_window = false
env = {}
log_to_stdout = true

The kit-task type here will indicate to the operator that it will need to invoke a function after it runs the process.

Other Types
base = "omni.services.farm.agent.operator.processes.BaseProcess"
kit = "omni.services.farm.agent.operator.processes.KitProcess"
kit-task = "omni.services.farm.agent.operator.processes.KitTask

Where base can be any process as long as the software is installed. Kit will expect a Kit experience to be ran. The user is meant to manage the life cycle of Kit. (ie, when it is done shut it down).

Only predefined tasks can be ran to maintain security as much as possible.

Queue

The queue is what tasks are being submitted to. It is made up of several services.

One service will manage the agents. Agents will periodically check in with the agent service as a heartbeat. There is a back channel here so that the queue can give instructions to the agents (ie: cancel a running task, update the agent, eject the agent etc.)

One service will manage tasks. This is task submission, updating task statuses, handing out work to agents that come and fetch work.

The final service is a logs and metrics endpoint. This is task specific logs and metrics (rather than machine type metrics) so that the status and progress of a task can be updated. This was implemented as a separate service as it is likely to see lots of larger traffic and as such will probably be the first one to require more than a single instance.

Agent

The agent is what would sit on a host or on in Kubernetes and is responsible for launching a process and the task. As mentioned in the example in the introduction, the agent can be run by itself. It does not need a queue but to automate the execution the queue can be used.

At the core the agent itself is made up of two services. One will handle the spawning and stopping of a process the other will manage the tasks.

The implementation of it can be found here:

omni.services.farm.agent.controller

Render Farm Dispatching Diagram

../_images/ext_render-farm-dispatching_agent-chart.png

Other Use Cases

The agent has been built in such a way that it does not need a queue. It is just a service sitting on a host and depending on the extensions enabled can be turned into a Pull type pattern where it goes and fetches work or it can be a Push pattern where a request is made to it directly.

The latter is why at the core of the agent we use an operator pattern. This is so that it can integrate with the Kubernetes Operator framework for example.

This will allow it to stand up a lot more advanced infrastructure yet maintain the abstractions necessary to replicate this on one Kubernetes infrastructure as well.

We will be using the agent set up for StreamKit which will allow users to stand up on demand streaming infrastructure to run a remote instance of kit and interact with it via a streaming protocol.

Getting Started

There are 2 apps that are required. Both are available in the Public Kit registry (this registry is available in the latest versions of Create for example).

Farm Queue

Both will need to be installed. This can be done via the extension manager. (Additional ways are being developed (prebuild ISOs, Docker images, Ansible playbooks) but are also available in the following repositories:

Queue: https://gitlab-master.nvidia.com/jeenbergen/omni.services.farm.management.tasks/-/blob/master/source/apps/farm.queue.kit

Agent: https://gitlab-master.nvidia.com/jeenbergen/omni.services.farm.agent.controller/-/blob/master/source/apps/farm.agent.kit

Farm Queue Setup Video

Windows Powershell

<install location of kit>\kit.exe <path to farm queue>\farm.queue.kit

Linux Shell

<install location of kit>/kit <path to farm queue>/farm.queue.kit

Default Port Used: 8011

Farm Agent

Farm Agent Setup Video

Open the farm.agent.kit file and add/change the various tasks you want the agent to be able to run.

There is a default set up for Create to run the render service defined in there. Make sure to change the path to the actual location of where create is installed

Change exts.”omni.services.farm.agent.controller”.manager_url to point to the machine the queue is running on. If for example the queue and agent run on the same host change the url to: http://localhost:8011/farm/management

Save the file and Run

Windows powershell

<install location of kit>\kit.exe <path to farm queue>\farm.agent.kit

Linux Shell

<install location of kit>/kit <path to farm queue>/farm.agent.kit

For both the farm queue and the agent no UI will pop up. They will run as basic shell applications.

If the queue is running on your local machine you can browse to link below. In the list of agents you should see one agent.

http://localhost:8011/farm/management/ui/status

Replace localhost with Name or IP Aaddress of host if queue is running on another computer.

Render Farm Dispatcher

Submitting tasks to a render farm from Create can be done via the Render Farm Dispatcher.

Render Farm dispatch can be found in the extension registry. If it is not installed or enabled. Click the install button and then enable the extension. If you plan on using it constantly, you can enable auto load.

../_images/ext_render-farm-dispatching_extension-window.png

Loading the menu

The Render Farm Dispatcher, once enabled, can be found under the rendering menu under Render Farm Dispatcher

../_images/ext_render-farm-dispatching_menu.png

Submitting a task

The Render Farm Dispatcher options are similar to the one of the Movie Maker extension. It will automatically populate with the current settings of your stage and the selected camera. This can however all be changed.

The render farm specific fields are under Farm Settings

../_images/ext_render-farm-dispatching_properties.png

Farm Instance: To select to which farm to submit select the farm under Farm Instance. Depending on where you might be running the farms available might change. (Additional farms can be added via the settings) Start delay: There is currently no event to catch when all textures and MDL have finished loading. This will allow to specify a specific delay (in seconds) on how long to wait after the assets have been loaded to give Create the time to load all textures. Task comment: Any sort of comment you’d like to add to the task. One other important setting is the output path. It will default to a Nucleus path but it is important that all nodes, attached to Farm, have access to the output location.

Note

When selecting different renderer presets, options change. See RTX Documentation

../_images/ext_render-farm-dispatching_properties-realtime.png

Once all settings are set, click the dispatch button to submit your render.