Creating Job Definitions

About

When creating a new job to be distributed on Omniverse Farm, one of the first steps you may wish to take is creating a job definition for it.

Omniverse Farm job definitions act as the point of entry for the work to be executed, and provide information about the requirements and dependencies necessary for their operations. Using this information, the services bundled in Farm Agents are then able to select the next task it can execute when querying the Farm Queue about awaiting tasks.

In the following section, we will look at job definitions in greater details, so you will have the information you need to start creating your own distributed jobs, whether they are implemented as:

Job definition schema: System Executables

Job definitions are nothing more than KIT files you should already be familiar with if you have previously created an extension for an Omniverse application. If you have not yet had the opportunity to get acquainted with the development of extensions, you may be interested in looking at some of the resources available on that topic to get started.

Let’s start with a simple example printing a mandatory “Hello Omniverse!” message, in order to provide an overview of what we will be describing in greater detail:

minimal-job-definition.kit
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Standard KIT metadata about the package for the job, providing information about what the feature accomplishes so
# it can be made visible to Users in Omniverse applications:
[package]
title = "Minimal Omniverse Farm job definition"
description = "A simple job definition for an Omniverse Farm job, printing a welcoming message."
category = "jobs"
version = "1.0.0"
authors = ["Omniverse Team"]
keywords = ["job"]

# Schema for the job definition of a system command or executable:
[job.hello-omniverse]
# Type of the job. Using "base" makes it possible to run executable files:
job_type = "base"
# User-friendly display name for the job:
name = "simple-hello-omniverse-job"
# The command or application that will be executed by the job:
command = "echo"
# Arguments to supply to the command specified above:
args = ["Hello Omniverse!"]
# Capture information from `stdout` and `stderr` for the job's logs:
log_to_stdout = true

As you may have noticed, we have included some comments and annotations in the file. For more details about the job definition properties, refer to the Job Definition Schema Reference

As a best practice, we encourage you to provide documentation in the job definition, as it acts as the entry point for the work that will be executed. This not only makes it easier to maintain your work over time, but also makes it easier to share it with others so they can reuse the service you created, and build even larger workflows thanks to the fruit of your labor.

Why use a KIT file for job definitions?

You may be wondering why use KIT files to define jobs, if JSON or YAML could have supported similar features.

The main motivation is that KIT files offer an number of additional features offered to Omniverse applications, such as token expansion and platform-specific options. Additionally, we strongly believe that creating services for Omniverse Farms should not be a different experience from the standard development of extensions and features for Omniverse applications, allowing authors to reuse the work they have already done on the platform to scale it to multiply its efficiency.

Another driving factor for using this format is that it allows you to package your job, along with its dependencies, so its bundle can be hosted in a location accessible to Farm Agents, or shared for others to reuse thanks to the built-in capabilities of KIT files.

Example: Token expansion

One of the benefits of using the familiar KIT file format, is that the command property of the job definition schema supports standard token resolution supported by Omniverse applications. This means we could be using the my-app${exe_ext} token so the executable file extensions resolve to my-app.exe on Windows and my-app on Linux.

Example: Platform-specific configuration

Another feature of the KIT file format is that is offers flexibility in terms of platform-specific options. On top of the token expansion mentioned previously, the format offers the ability to augment or filter definitions based on a number of features available to the host, making it possible to support Linux or Windows hosts with minimal configuration.

As an illustrative example of this capability, consider the simple job definition we crafted earlier. Using the KIT feature of append elements to a list, we could easily change the greeting message depending on the host where the job will be executed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Continuation from the job definition declared earlier:
[job.hello-omniverse]
# [...]
# Arguments to supply to the `echo` command:
args = ["Hello Omniverse!"]

# Add a Linux-specific item to the list of `args` that will be supplied to the `echo` command:
[job.hello-omniverse.args."filter:platform"."linux-x86_64"]
"++" = [" from Linux"]

# Add a Windows-specific item to the list of `args` that will be supplied to the `echo` command:
[job.hello-omniverse.args."filter:platform"."windows-x86_64"]
"++" = [" from Windows"]

While a simplistic example for demonstration purposes, you could envision using this ability for any platform-specific configuration for your job, such as:

  • Declaring environment variables

  • Enabling/disabling extensions

  • Setting default task arguments

  • etc.

Note

Now that you have a better understanding of the structure of job definitions for Omniverse Farm, head over to the Omniverse Farm examples page for practical guides to using the feature for production.

Uploading Job Definitions

Omniverse Farm can be deployed as a standalone install (refer to the Farm Queue install and Farm Agent install guides) or in a Kubernetes environment via Helm.

Note

In Kubernetes, there is additional support for supplying Kubernetes specific properties under the capacity_requirements setting, refer to the Job Definition Schema Reference for more details.

standalone deployments

When Omniverse Farm is deployed as standalone services, job definitions can be specified in the configured job directories of the Agents (typically <farmAgentInstallDir>/jobs/*) or by utilizing the provided Job definition wizard (see Integrating Job with Omniverse Agent and Queue).

kubernetes deployments

For Kubernetes deployments, cluster access is required.

You will first need to retrieve the API key from the Job’s service config map and utilize a custom job_definition_upload.py script to upload the job definitions.

To retrieve the Job’s API key, issue the following command

kubectl get configmap omniverse-farm-jobs -o yaml -n <<farm namespace>> | grep api_key | head -n 1

The API key is unique per Farm instance and must be kept private.

The job_definition_upload.py script can be retrieved from NGC

Before using the script, two Python dependencies are required (requests and toml)

pip install requests
pip install toml

You are now ready to upload job definitions:

python job_definition_upload.py <Job Definition Filepath> --farm-url=<Omniverse Farm URL> --api-key=<API Key>

Here’s a quick usage example:

python /opt/scripts/job_definition_upload.py /home/foobar/df.kit --farm-url=http://my-awesome-farm.com --api-key="123shh-s3cr3t"

The job definition may take up to about 1 minute to propagate to the various services in the cluster.

Note

To get a list of job definitions currently in Farm, the /queue/management/jobs/load endpoint can be utilized.

Job definition schema: Omniverse Services

Now that you know how to define a simple job definition and launch a command on the system, let’s see how to launch an Omniverse application to start building larger workflows.

In this example, we will go one step beyond our earlier example and introduce a few additional properties of the job definition to let you create more complex workflows:

omniverse-application-job-definition.kit
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Standard KIT metadata about the package for the job, providing information about what the feature accomplishes so
# it can be made visible to Users in Omniverse applications:
[package]
title = "Sample Omniverse job definition"
description = "Sample job definition showcasing how to launch Omniverse applications."
version = "1.0.0"
authors = ["Omniverse Team"]
category = "jobs"
keywords = ["job"]

# Schema for the job definition of an Omniverse application:
[job.sample-omniverse-application-job]
# Type of the job. Using "kit-service" makes it possible to execute services exposed by Omniverse applications:
job_type = "kit-service"
# User-friendly display name for the job:
name = "sample-omniverse-application-job"
# Resolve and launch the "Create" application from the Omniverse Launcher (note the 3 slashes in the URL):
command = "launcher:///create"
# List of command-line arguments to provide to the Omniverse application when launched:
args = [
    # Make sure the Omniverse application can be closed when the processing of the job has completed, and that the
    # notification asking if the USD stage from the active session should be saved prior to closing does not prevent
    # the application from shutting down:
    "--/app/file/ignoreUnsavedOnExit=true",
    # Make sure Omniverse application are active upon launch, and that notification prompt asking for User inputs
    # are not preventing the session from being interactive:
    "--/app/extensions/excluded/0='omni.kit.window.privacy",
    # Add any additional setting required by the Omniverse application, or your own extensions:
    # [...]
]
# Path of the service endpoint where to route arguments in order to start the processing of the job:
task_function = "sample-processing-extension.run"
# Flag indicating whether to execute the Omniverse application in headless mode (i.e. the equivalent of supplying it
# with the `--no-window` command-line option):
headless = true
# Capture information from `stdout` and `stderr` for the job's logs:
log_to_stdout = true

# Supply a list of folders where extensions on which the job depends can be found:
[settings.app.exts.folders]
"++" = [
    "${job}/exts-sample-omniverse-application-job",
    # ...
]

# List of extensions on which the job depends in order to execute:
[dependencies]
"omni.services.farm.agent.runner" = {}
# ...

# When running the job, enable the following extensions:
[settings.app.exts]
enabled = [
    # Extension exposing a "run" endpoint, which will receive the arguments of the task as payload, and start the
    # job process:
    "sample-processing-extension",
    # ...
]

Fundamentally, jobs implemented as Omniverse application services declare a set of extensions which should be enabled by the application, and the path to the endpoint that one of them exposes in order to fulfill the task.

The process should be familiar to you if you have already created an Omniverse extension, as it follows the typical development workflow. For clarity, a few details nonetheless about the example above, where we are:

  1. Providing configuration options to the Omniverse application, so it can launch in a state that will allow it to perform the work it will receive.

  2. Specifying the location of the extension(s) that we expect the Omniverse application to load for us.

  3. Enabling any extension we require from the Omniverse application, along with the one that will act as the entrypoint for incoming requests to kickstart the execution of the task.

This entrypoint extension is expected to expose an endpoint that the location defined by the task_function option of the schema. This endpoint, implemented using the Service stack, will be called by the Agent tasked with performing a job, and that will supply the endpoint with any information it needs in order to execute the work.

Note

For concrete examples of how arguments can be supplied to the endpoint service, head over to the Omniverse Farm examples page.

A few additional notes about the layout of this job definition for Omniverse services:

  • We used Omniverse Create from the Launcher for demonstration purposes in this sample, however you are free to use any application available on the Launcher by supplying its unique identifier to the command property of the job definition. For example, you could be using isaac_sim to target workflows based on Isaac Sim.

  • For convenience, the headless flag can be used during development as a way of inspecting the operations performed by the service, in order see the progress of the operations performed. Once deployed in a production context, running the application in headless mode make it both more performant and easier to scale, as batch workflows typically do not require a user interface to perform actions, and thus makes an entire desktop environment optional.

Schema Reference

For reference, the following is a brief list of properties available for job definitions:

Property

Type

Description

job_type

string

Type of the job, can be either base or kit-service.

name

string

User-friendly name uniquely identifying the job.

task_function

string

Module to execute when when specifying a kit-service.

command

string

Application or command to be executed by the job.

working_directory

string

Directory where the command should be executed.

success_return_codes

Array<int>

List of return codes from the command that should be considered as successful executions.

args

Array<string>

List of arguments to supply to the command, and identical to all jobs instances.

allowed_args

Dict<string,Dict>

Dictionary of arguments which may be unique to each execution of a job, including default values. Arguments can be defined as:

[job.sample-job.allowed_args]
source      = { arg = "--source",      default = "" }
destination = { arg = "--destination", default = "" }
ratio       = { arg = "--ratio",       default = "0.5" }

env

Dict<string,string>

Dictionary of environment variables to supply to the command.

extension_paths

Array<string>

List of extension paths.

log_to_stdout

boolean

Flag indicating whether to capture information from stdout and stderr in the task’s logs.

headless

boolean

Flag indicating whether the application should be run in headless mode.

active

boolean

Flag indicating whether the task is enabled.

container

string

Image location of a Docker container to execute.

capacity_requirements

Dict<string,any>

See Capacity Requirements Schema Reference below.

Capacity Requirements Schema Reference (kubernetes)

The following contains a list of capacity_requirements properties available if deployed within a Kubernetes environment.

The following properties are specific to the container-v1-core and podspec-v1-core from Kubernetes version 1.24.

Two special properties are provided container_spec_field_overrides and pod_spec_field_overrides for specifying fields that may come in future Kubernetes specs.

Container Core Properties

Property

Type

Description

container_spec_field_overrides

Dict<string,any>

Special property that does not apply to any particular Kubernetes field, instead this can used to inject fields that may be added in future Kubernetes releases.

[job.sample-job.capacity_requirements.container_spec_field_overrides]
futureKuberneteContainerCoreField = "foobar"

env

Array<Dict<string,any>>

List of environment variables to set in the job’s container pod env.

[[job.sample-job.capacity_requirements.env]]
name = "foo"
value = "bar"

env_from

Array<Dict<string,any>>

List of sources to populate environment variables in the job’s container pod env from.

[[job.sample-job.capacity_requirements.envFrom]]
[job.sample-job.capacity_requirements.envFrom.configMapRef]
name = "sample-config"

image_pull_policy

string

The image pull policy for the job’s container image image pull policy.

[job.sample-job.capacity_requirements]
image_pull_policy = "Always"

lifecycle

Dict<string,any>

Specify the job’s container lifecycle lifecycle.

[job.sample-job.capacity_requirements.lifecycle.postStart.exec]
command = [
 "/bin/sh",
 "-c",
 "echo Hello from the postStart handler > /usr/share/message"
]

[job.sample-job.capacity_requirements.lifecycle.preStop.exec]
command = [
 "/bin/sh",
 "-c",
 "sleep 1"
]

liveness_probe

Dict<string,any>

Specify the job’s container pod liveness probe.

[job.sample-job.capacity_requirements.liveness_probe]
  [job.sample-job.capacity_requirements.liveness_probe.httpGet]
  path = "/status"
  port = "http"

ports

Array<Dict<string,any>>

Specify the job’s container pod container ports.

[[job.sample-job.capacity_requirements.ports]]
name = "http"
containerPort = 80
protocol = "TCP"

resource_limits

Dict<string,any>

Specify the job’s container pod resource limits. Refer to resource units for acceptable units.

[job.sample-job.capacity_requirements.resource_limits]
cpu = 1
memory = "4096Mi"
"nvidia.com/gpu" = 1

readiness_probe

Dict<string,any>

Specify the job’s container pod readiness probe.

[job.sample-job.capacity_requirements.readiness_probe]
  [job.sample-job.capacity_requirements.readiness_probe.httpGet]
  path = "/status"
  port = "http"

security_context

Dict<string,any>

Specify the job’s container pod, security context.

[job.sample-job.capacity_requirements.security_context]
runAsUser = 2000
allowPrivilegeEscalation = False

startup_probe

Dict<string,any>

Specify the job’s container pod startup probe.

[job.sample-job.capacity_requirements.startup_probe]
  [job.sample-job.capacity_requirements.startup_probe.httpGet]
  path = "/status"
  port = "http"

stdin

boolean

Control whether the job’s container should allocate a buffer for stdin in the container runtime stdin.

[job.sample-job.capacity_requirements]
stdin = true

stdin_once

boolean

Control whether the job’s container runtime should close the stdin channel after it has been opened by a single attach stdin once.

[job.sample-job.capacity_requirements]
stdin_once = false

termination_message_path

string

Path at which the file to which the container’s termination message will be written is mounted into the container’s filesystem termination message path.

[job.sample-job.capacity_requirements]
termination_message_path = "/dev/termination-log"

termination_message_policy

string

Indicate how the termination message should be populated termination message policy.

[job.sample-job.capacity_requirements]
termination_message_policy = "File"

tty

boolean

Control whether the job’s container should allocate a TTY for itself, also requires ‘stdin’ to be true tty.

[job.sample-job.capacity_requirements]
tty = true

volume_devices

Array<Dict<string,any>>

Specify the job’s container pod volume devices volume devices.

[[job.sample-job.capacity_requirements.volume_devices]]
devicePath = "/myrawblockdevice"
name = "blockDevicePvc"

volume_mounts

Array<Dict<string,any>>

Specify the job’s container pod volume mounts.

[[job.sample-job.capacity_requirements.volume_mounts]]
mountPath = "/root/.provider/"
name = "creds"

Pod Spec Properties

Property

Type

Description

pod_spec_field_overrides

Dict<string,any>

Special property that does not apply to any particular Kubernetes field, instead this can used to inject fields that may be added in future Kubernetes releases.

[job.sample-job.capacity_requirements.pod_spec_field_overrides]
futureKubernetesPodSpecField = "foobar"

active_deadline_seconds

integer

Duration in seconds the pod may be active on the node relative to StartTime before the system will actively try to mark it failed and kill associated containers active deadline seconds.

[job.sample-job.capacity_requirements]
active_deadline_seconds = 30

affinity

Dict<string,any>

Specify the job’s container pod affinity.

[[job.sample-job.capacity_requirements.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms]]
[[job.sample-job.capacity_requirements.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms.matchExpressions]]
key = "name"
operator = "In"
values = [ "app-worker-node" ]

[[job.sample-job.capacity_requirements.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution]]
weight = 1

[[job.sample-job.capacity_requirements.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution.preference.matchExpressions]]
key = "type"
operator = "In"
values = [ "app-01" ]

automount_service_account_token

boolean

Indicate whether a service account token should be automatically mounted automount service account token.

[job.sample-job.capacity_requirements]
active_deadline_seconds = 30

dns_config

Dict<string,any>

Specifies the DNS parameters of the job’s container pod dns config.

[job.sample-job.capacity_requirements.dnsConfig]
nameservers = [ "1.2.3.4" ]
searches = [ "ns1.svc.cluster-domain.example", "my.dns.search.suffix" ]

[[job.sample-job.capacity_requirements.dnsConfig.options]]
name = "ndots"
value = "2"

[[job.sample-job.capacity_requirements.dnsConfig.options]]
name = "edns0"

dns_policy

Dict<string,any>

Set DNS policy for the job’s container pod dns policy.

[job.sample-job.capacity_requirements]
dns_policy = "ClusterFirst"

enable_service_links

boolean

Indicates whether information about services should be injected into pod’s environment variables enable service links.

[job.sample-job.capacity_requirements]
enable_service_links = true

ephemeral_containers

Array<Dict<string,any>>

List of ephemeral containers run in the job’s container pod. ephemeral containers.

host_aliases

Array<Dict<string,any>>

List of hosts and IPs that will be injected into the pod’s hosts file if specified. This is only valid for non-hostNetwork pods. host aliases.

host_IPC

boolean

Use the host’s IPC namespace host IPC.

host_network

boolean

Host networking requested for the job’s container pod host network.

host_PID

boolean

Use the host’s PID namespace host PID.

hostname

string

Specifies the hostname of the Pod hostname.

image_pull_secrets

Array<Dict<string,any>>

List of references to secrets in the same namespace to use for pulling any of the images image pull secrets.

[[job.sample-job.capacity_requirements.imagePullSecrets]]
name = "registry-secret"

init_containers

Array<Dict<string,any>>

List of initialization containers init containers.

node_name

string

Node name is a request to schedule this pod onto a specific node node name.

node_selector

Dict<string,string>

Selector which must be true for the pod to fit on a node node selector.

[job.sample-job.capacity_requirements.node_selector]
"beta.kubernetes.io/instance-type" = "worker"
"beta.kubernetes.io/os" = "linux"

os

Dict<string,string>

Specifies the OS of the containers in the pod os.

overhead

Dict>string,any>

Overhead represents the resource overhead associated with running a pod for a given RuntimeClass overhead.

preemption_policy

string

Policy for preempting pods with lower priority preemption policy.

priority

string

Priority value priority.

priority_class_name

string

Indicate the pod’s priority priority class name.

readiness_gates

Array<Dict<string,any>>

Pod’s readiness gates.

runtime_class_name

string

Set the pod’s runtime class name.

scheduler_name

string

Specific scheduler to dispatch the pod scheduler name.

pod_security_context

Dict<string,any>

Specify the job’s container pod, pod security context.

[job.sample-job.capacity_requirements.pod_security_context]
runAsUser = 1000

service_account

string

Set the pod’s service account.

service_account_name

string

Name of the service account to use to run this pod service account name.

set_hostname_as_FQDN

boolean

The pod’s hostname will be configured as the pod’s FQDN set hostname as FQDN.

share_process_namespace

boolean

Share a single process namespace between all of the containers in a pod share process namespace.

subdomain

string

Specify the pod’s subdomain.

termination_grace_period_seconds

integer

Duration in seconds the pod needs to terminate gracefully termination grace period seconds.

tolerations

Array<Dict<string,any>>

Specify the job’s container pod tolerations.

[[job.sample-job.capacity_requirements.tolerations]]
key = "key1"
operator = "Equal"
value = "value1"
effect = "NoSchedule"

topology_spread_constraints

Array<Dict<string,any>>

Topology domain constraints see details.

volumes

Array<Dict<string,any>>

Specify the job’s container pod volumes. Refer to volumes for more examples and valid fields. The follow is an example of mounting a config map.

[[job.sample-job.capacity_requirements.volumes]]
name = "creds"
  [job.sample-job.capacity_requirements.volumes.configMap]
  name = "credentials-cm"