Deploying Omniverse Farm on headless systems

../_images/app_farm_banner_baremetal.png

1. Introduction

This guide will go through the installation of Omniverse Farm in order to be able to run it headlessly.

This document will work for the deployment across a few nodes but for anything at a larger scale we’d recommend using a solution like Ansible to help with the orchestration of the nodes. If baremetal and/or VMs are not a hard requirement we’d recommend running OV Farm in a Kubernetes environment as it allows for better control and scalability.

This deployment is similar to a deployment done via the Launcher and will have similar limitations where, by default, scale and redundancy are a limiting factor in this deployment.

At the end of this guide there is some information on how to add some scalability and persistence by deploying a SQL database and Redis instance.

2. Installation

A. Queue installation

To automate deployment of Farm Queue on Linux, it may be convenient to install it in headless manner.

Prerequisites include Ubuntu Server 20.04 or greater, with an Internet connection in order to download the necessary additional software and packages.

Note

Other Linux distributions should also be compatible with Omniverse Farm Queue, although only Ubuntu 20.04 is officially supported for production use.

  1. Start by installing the required software dependencies:

    $ sudo apt-get install -y --no-install-recommends \
            curl \
            libatomic1 \
            libxi6 \
            libxrandr2 \
            libxt6 \
            libegl1 \
            libglu1-mesa \
            libgomp1 \
            libsm6 \
            unzip
    
  2. Upload the farm_queue_install.sh script to the server running Farm Queue, and place in the /opt/ove folder after creating it:

    farm_queue_install.sh
     1#!/bin/bash
     2
     3#
     4# Note: Specific package versions can be retrieved from the Omniverse Launcher.
     5#
     6
     7# Install the Omniverse Farm Queue package, containing the core Queue capabilities:
     8mkdir -p ov-farm-queue
     9cd ov-farm-queue
    10pwd
    11curl https://d4i3qtqj3r0z5.cloudfront.net/farm-queue-launcher%40105.1.0%2B105.1.x.174.c6feac39.teamcity.linux-x86_64.release.zip > farm-queue-launcher.zip
    12unzip farm-queue-launcher.zip
    13rm farm-queue-launcher.zip
    14
    15# patch the 105.1.0 headless release that shipped referencing old modules that cause errors
    16sed -i'.backup' 's/\t"omni\.services\.farm\.management\.tasks-0\.19\.3"/\t"omni\.services\.farm\.management\.tasks-0\.19\.4"/g' apps/omni.farm.queue.headless.kit
    17sed -i'.backup' 's/\t"omni\.services\.farm\.facilities\.store\.db-0\.11\.2"/\t"omni\.services\.farm\.facilities\.store\.db-0\.11\.3"/g' apps/omni.farm.queue.headless.kit
    18
    19# Install the Kit SDK package, containing the set of features and extensions shared by Omniverse applications:
    20mkdir kit
    21cd kit
    22pwd
    23curl https://d4i3qtqj3r0z5.cloudfront.net/kit-sdk-launcher@105.1%2Bmaster.120930.709ebe37.tc.linux-x86_64.release.zip > kit-sdk-launcher.zip
    24unzip kit-sdk-launcher.zip
    25rm kit-sdk-launcher.zip
    26
    27cd ..
    28
    29# Create a boilerplate launch script for the Queue:
    30cat << 'EOF' > queue.sh
    31#!/bin/bash
    32
    33BASEDIR=$(dirname "$0")
    34exec $BASEDIR/kit/kit $BASEDIR/apps/omni.farm.queue.headless.kit \
    35    --ext-folder $BASEDIR/exts-farm-queue \
    36    --/exts/omni.services.farm.management.tasks/dbs/task-persistence/connection_string=sqlite:///$BASEDIR//task-management.db
    37EOF
    38
    39chmod +x queue.sh
    
  3. Change the permission of the farm_queue_install.sh script in order to make it executable:

    $ chmod +x farm_queue_install.sh
    
  4. Run the script from within the /opt/ove folder as a non-root user:

    $ ./farm_queue_install.sh
    
  5. Once the files are downloaded and extracted, you will have a folder named /opt/ove/ov-farm-queue.

  6. Ensure that the /opt/ove/ov-farm-queue and all the files within are owned by a non-root user, then launch the Queue:

    $ ./queue.sh &
    

To confirm that the installation completed successfully, you may attempt to reach the API endpoint of the Queue responsible for providing a health status about the service, by emitting a curl request and validating it returns a response of “OK”:

$ curl http://localhost:8222/status

A successful response should contain information similar to the following, illustrating that all required Omniverse Extension were loaded, that configuration options we successfully applied, and that the Queue is ready to receive and dispatch tasks:

[user@machine ov-farm-queue] $ curl http://localhost:8222/status
"OK"

B. Agent installation

To automate deployment of Farm Agents on Linux, and scale the compute capabilities to multiple machines, it may be convenient to install Agents in headless manner.

Prerequisites include Ubuntu Server 20.04 or greater, with an Internet connection in order to download the necessary additional software and packages.

Note

Other Linux distributions should also be compatible with Omniverse Farm Agent, although only Ubuntu 20.04 is officially supported for production use.

  1. Start by installing the required software dependencies:

    $ sudo apt-get install -y --no-install-recommends \
            curl \
            libatomic1 \
            libxi6 \
            libxrandr2 \
            libxt6 \
            libegl1 \
            libglu1-mesa \
            libgomp1 \
            libsm6 \
            unzip
    
  2. Upload the farm_agent_install.sh script to the server running Farm Agent, and place in the /opt/ove folder after creating it:

    farm_agent_install.sh
     1#!/bin/bash
     2
     3#
     4# Note: Specific package versions can be retrieved from the Omniverse Launcher.
     5#
     6
     7# Install the Omniverse Farm Agent package, containing the core Agent capabilities:
     8mkdir -p ov-farm-agent
     9cd ov-farm-agent
    10pwd
    11curl https://d4i3qtqj3r0z5.cloudfront.net/farm-agent-launcher@105.1.0%2Bmaster.267.63d5b393.tc.linux-x86_64.release.zip > farm-agent-launcher.zip
    12unzip farm-agent-launcher.zip
    13rm farm-agent-launcher.zip
    14
    15# Install the Kit SDK package, containing the set of features and extensions shared by Omniverse applications:
    16mkdir kit
    17cd kit
    18pwd
    19curl https://d4i3qtqj3r0z5.cloudfront.net/kit-sdk-launcher@105.1%2Bmaster.120930.709ebe37.tc.linux-x86_64.release.zip > kit-sdk-launcher.zip
    20unzip kit-sdk-launcher.zip
    21rm kit-sdk-launcher.zip
    22
    23# Install the Multiview Batch package:
    24cd ..
    25mkdir -p jobs/multiview-batch
    26cd jobs/multiview-batch
    27pwd
    28curl https://d4i3qtqj3r0z5.cloudfront.net/farm-job-multiview-batch-render%40105.1.0%2Bmain.136.1ebfd569.tc.linux-x86_64.release.zip > farm-job-multiview-batch.zip
    29unzip farm-job-multiview-batch.zip
    30rm farm-job-multiview-batch.zip
    31
    32# Install the "create-render" package, containing the job definition for the rendering task:
    33cd ../..
    34mkdir -p jobs/create-render
    35cd jobs/create-render
    36pwd
    37curl https://d4i3qtqj3r0z5.cloudfront.net/farm-job-create-render@105.1.0%2Bmain.205.7755a0d5.tc.linux-x86_64.release.zip > farm-job-create-render.zip
    38unzip farm-job-create-render.zip
    39rm farm-job-create-render.zip
    40
    41cd ../..
    42
    43# Create a boilerplate launch script for the Agent:
    44cat << 'EOF' > agent.sh
    45#!/bin/bash
    46
    47BASEDIR=$(dirname "$0")
    48exec $BASEDIR/kit/kit $BASEDIR/apps/omni.farm.agent.headless.kit \
    49    --ext-folder $BASEDIR/exts-farm-agent \
    50    --/exts/omni.services.farm.agent.operator/job_store_args/job_directories/0=$BASEDIR/jobs/* \
    51    --/exts/omni.services.farm.agent.operator/manager_host=http://<QUEUE IP>:<QUEUE PORT> \
    52    --/exts/omni.services.farm.agent.controller/agents_service_host=http://<QUEUE IP>:<QUEUE PORT> \
    53    --/exts/omni.services.farm.agent.controller/tasks_service_host=http://<QUEUE IP>:<QUEUE PORT>
    54EOF
    55
    56chmod +x agent.sh
    
  3. Change the permission of the farm_agent_install.sh script in order to make it executable:

    $ chmod +x farm_agent_install.sh
    
  4. Run the script from within the /opt/ove folder as a non-root user:

    $ ./farm_agent_install.sh
    
  5. Once the files are downloaded and extracted, you will have a folder named /opt/ove/ov-farm-agent.

  6. Ensure that the /opt/ove/ov-farm-agent and all the files within are owned by a non-root user.

  7. Configure the Farm Agent Controller and Operator addresses with the Farm Queue Server address and port (where the default Farm Queue Server port is 8222 unless it was explicitly modified) to agent.sh:

    agent.sh
    # [...]
    
    --/exts/omni.services.farm.agent.operator/manager_host=http://<QUEUE IP>:8222
    --/exts/omni.services.farm.agent.controller/agents_service_host=http://<QUEUE IP>:8222
    --/exts/omni.services.farm.agent.controller/tasks_service_host=http://<QUEUE IP>:8222
    
  8. Once the Controller and Operator addresses are configured, launch the Agent:

    $ ./agent.sh &
    

Updating Job Definitions For Manually Installed Builds

By default, the Agent’s create-render job definition will use the default Composer build that is installed and managed by the Omniverse Launcher.

In the case that you are not using the Omniverse Launcher to manage Composer builds and are instead manually installing builds, then you must update the create-render job definition to point to the Composer startup script. This applies equally to other applications that are manually installed which are generally managed through the Omniverse Launcher.

To have the agent use your manually installed Composer build, modify the [job.create-render] section of the agent Kit file and replace the command = “launcher:///create” line to point to the absolute path of your Composer startup script:

jobs/create-render/job.omni.farm.render.kit
command = "/absolute-path-to-my-manually-installed-composer/startup.sh"

Failure to update the create-render job definition in the case of manually managing the Composer builds will result in an error resembling the following:

Farm Agent Launcher link resolve error
[Error] [omni.services.farm.facilities.jobs.store.directory] Failed to resolve launch settings for create-render. Make sure the launcher is running and that the requested app is installed. Error: Cannot connect to host localhost:33480 ssl:default [Connect call failed ('127.0.0.1', 33480)]

Output Log

When executing the Agent, the console will display an output similar to the following to indicate it is running successfully:

[user@machine ov-farm-agent] $ ./agent.sh
[Info] [carb] Logging to file: /home/user/.nvidia-omniverse/logs/Kit/omni.farm.agent.headless/102.1/kit_20220502_132007.log
[0.346s] [ext: omni.kit.pipapi-0.0.0] startup
[0.360s] [ext: omni.services.pip_archive-0.3.0] startup
. . .
[1.464s] [ext: omni.farm.agent.headless-102.1.0] startup
[1.574s] app ready

In case of error due to an Agent is running without proper acceleration, an output similar to the following will be displayed:

[1.964s] [ext: omni.farm.agent.headless-102.1.0] startup
2022-04-29 20:57:11 [2,049ms] [Error] [omni.services.farm.facilities.agent.capacity.managers.base] Failed to load capacities for omni.services.farm.facilities.agent.capacity.GPU: NVML Shared Library Not Found
[2.075s] app ready

See the Linux Troubleshooting for any installation issues.

3. Scaling

A. SQL database

By default OV Farm, when installed in this configuration, will use a SQLite DB. This will suffer from performance and scalability as many agents come and request work. It also prevents multiple instances of the various services to be active.

It is possible to instead use a different SQL database such as MariaDB to run remotely or on the same host and provide better performance.

The user account will need to have create permissions to create the DB and the tables.

To change the database connection string to use a MariaDB based SQL instance, Add the following value in the omni.queue.headless.kit file:

omni.queue.headless.kit
[settings]
# Avoids shader cache compilation at startup/RTX requirements.
exts."omni.kit.renderer.core".compatibilityMode = true
exts."omni.kit.async_engine".event_loop_windows = "ProactorEventLoop"
exts."omni.services.transport.server.http".allow_port_range = false
exts."omni.services.transport.server.http".port = 8222
exts."omni.services.farm.management.tasks".dbs.task-persistence.connection_string="mysql://<username>:<password>@<host>:<port>/<db_name>"

And remove the following line from the queue.sh script:

queue.sh
#!/bin/bash

BASEDIR=$(dirname "$0")
exec $BASEDIR/kit/kit $BASEDIR/apps/omni.farm.queue.headless.kit \
    --ext-folder $BASEDIR/exts-farm-queue \
    --/exts/omni.services.farm.management.tasks/dbs/task-persistence/connection_string=sqlite:///$BASEDIR//task-management.db

B. Redis

B.1 Agent services

By default the agent store is in memory meaning that only a single instance of the agent service is supported. It is possible to swap out the in memory agent store with a Redis backed instance.

To change the agent’s backend, add the following values in the omni.queue.headless.kit file, installed as part of the installations steps above:

Be sure to replace the <host> and <port> with the host and port of the redis instance.

The connection_string format documentation is available here

omni.queue.headless.kit
[settings]
# Avoids shader cache compilation at startup/RTX requirements.
exts."omni.kit.renderer.core".compatibilityMode = true
exts."omni.kit.async_engine".event_loop_windows = "ProactorEventLoop"
exts."omni.services.transport.server.http".allow_port_range = false
exts."omni.services.transport.server.http".port = 8222
exts."omni.services.farm.management.agents".manager_class = "omni.services.farm.management.agents.managers.redis.RedisAgentManager"

[settings.exts."omni.services.farm.management.agents".manager_args]
connection_string="redis://<host>:<port>"