Creating Job Definitions#
About#
When creating a new job to be distributed on Omniverse Farm, one of the first steps you may wish to take is creating a job definition for it.
Omniverse Farm job definitions act as the point of entry for the work to be executed, and provide information about the requirements and dependencies necessary for their operations. Using this information, the services bundled in Farm Agents are then able to select the next task it can execute when querying the Farm Queue about awaiting tasks.
In the following section, we will look at job definitions in greater details, so you will have the information you need to start creating your own distributed jobs, whether they are implemented as:
Job definition schema: System Executables#
Job definitions are nothing more than KIT files you should already be familiar with if you have previously created an extension for an Omniverse application. If you have not yet had the opportunity to get acquainted with the development of extensions, you may be interested in looking at some of the resources available on that topic to get started.
Let’s start with a simple example printing a mandatory “Hello Omniverse!” message, in order to provide an overview of what we will be describing in greater detail:
1# Standard KIT metadata about the package for the job, providing information about what the feature accomplishes so
2# it can be made visible to Users in Omniverse applications:
3[package]
4title = "Minimal Omniverse Farm job definition"
5description = "A simple job definition for an Omniverse Farm job, printing a welcoming message."
6category = "jobs"
7version = "1.0.0"
8authors = ["Omniverse Team"]
9keywords = ["job"]
10
11# Schema for the job definition of a system command or executable:
12[job.hello-omniverse]
13# Type of the job. Using "base" makes it possible to run executable files:
14job_type = "base"
15# User-friendly display name for the job:
16name = "simple-hello-omniverse-job"
17# The command or application that will be executed by the job:
18command = "echo"
19# Arguments to supply to the command specified above:
20args = ["Hello Omniverse!"]
21# Capture information from `stdout` and `stderr` for the job's logs:
22log_to_stdout = true
As you may have noticed, we have included some comments and annotations in the file. For more details about the job definition properties, refer to the Job Definition Schema Reference
As a best practice, we encourage you to provide documentation in the job definition, as it acts as the entry point for the work that will be executed. This not only makes it easier to maintain your work over time, but also makes it easier to share it with others so they can reuse the service you created, and build even larger workflows thanks to the fruit of your labor.
Why use a KIT file for job definitions?#
You may be wondering why use KIT files to define jobs, if JSON or YAML could have supported similar features.
The main motivation is that KIT files offer an number of additional features offered to Omniverse applications, such as token expansion and platform-specific options. Additionally, we strongly believe that creating services for Omniverse Farms should not be a different experience from the standard development of extensions and features for Omniverse applications, allowing authors to reuse the work they have already done on the platform to scale it to multiply its efficiency.
Another driving factor for using this format is that it allows you to package your job, along with its dependencies, so its bundle can be hosted in a location accessible to Farm Agents, or shared for others to reuse thanks to the built-in capabilities of KIT files.
Example: Token expansion#
One of the benefits of using the familiar KIT file format, is that the command
property of the job definition schema supports standard token resolution supported by Omniverse applications. This means we could be using the my-app${exe_ext}
token so the executable file extensions resolve to my-app.exe
on Windows and my-app
on Linux.
Example: Platform-specific configuration#
Another feature of the KIT file format is that is offers flexibility in terms of platform-specific options. On top of the token expansion mentioned previously, the format offers the ability to augment or filter definitions based on a number of features available to the host, making it possible to support Linux or Windows hosts with minimal configuration.
As an illustrative example of this capability, consider the simple job definition we crafted earlier. Using the KIT feature of append elements to a list, we could easily change the greeting message depending on the host where the job will be executed:
1# Continuation from the job definition declared earlier:
2[job.hello-omniverse]
3# [...]
4# Arguments to supply to the `echo` command:
5args = ["Hello Omniverse!"]
6
7# Add a Linux-specific item to the list of `args` that will be supplied to the `echo` command:
8[job.hello-omniverse.args."filter:platform"."linux-x86_64"]
9"++" = [" from Linux"]
10
11# Add a Windows-specific item to the list of `args` that will be supplied to the `echo` command:
12[job.hello-omniverse.args."filter:platform"."windows-x86_64"]
13"++" = [" from Windows"]
While a simplistic example for demonstration purposes, you could envision using this ability for any platform-specific configuration for your job, such as:
Declaring environment variables
Enabling/disabling extensions
Setting default task arguments
etc.
Note
Now that you have a better understanding of the structure of job definitions for Omniverse Farm, head over to the Omniverse Farm examples page for practical guides to using the feature for production.
Uploading Job Definitions#
Omniverse Farm can be deployed as a standalone install (refer to the Farm Queue install and Farm Agent install guides) or in a Kubernetes environment via Helm.
Note
In Kubernetes, there is additional support for supplying Kubernetes specific properties under the capacity_requirements
setting, refer to the Job Definition Schema Reference for more details.
standalone deployments#
When Omniverse Farm is deployed as standalone services, job definitions can be specified in the configured job directories of the Agents (typically <farmAgentInstallDir>/jobs/*
) or by utilizing the provided Job definition wizard (see Integrating Job with Omniverse Agent and Queue).
kubernetes deployments#
For Kubernetes deployments, cluster access is required.
You will first need to retrieve the API key from the Job’s service config map and utilize a custom job_definition_upload.py
script to upload the job definitions.
To retrieve the Job’s API key, issue the following command
kubectl get configmap omniverse-farm-jobs -o yaml -n <<farm namespace>> | grep api_key | head -n 1
The API key is unique per Farm instance and must be kept private.
The job_definition_upload.py
script can be retrieved from NGC
Before using the script, two Python dependencies are required (requests
and toml
)
pip install requests
pip install toml
You are now ready to upload job definitions:
python job_definition_upload.py <Job Definition Filepath> --farm-url=<Omniverse Farm URL> --api-key=<API Key>
Here’s a quick usage example:
python /opt/scripts/job_definition_upload.py /home/foobar/df.kit --farm-url=http://my-awesome-farm.com --api-key="123shh-s3cr3t"
The job definition may take up to about 1 minute to propagate to the various services in the cluster.
Note
To get a list of job definitions currently in Farm, the /queue/management/jobs/load
endpoint can be utilized.
Job definition schema: Omniverse Services#
Now that you know how to define a simple job definition and launch a command on the system, let’s see how to launch an Omniverse application to start building larger workflows.
In this example, we will go one step beyond our earlier example and introduce a few additional properties of the job definition to let you create more complex workflows:
1# Standard KIT metadata about the package for the job, providing information about what the feature accomplishes so
2# it can be made visible to Users in Omniverse applications:
3[package]
4title = "Sample Omniverse job definition"
5description = "Sample job definition showcasing how to launch Omniverse applications."
6version = "1.0.0"
7authors = ["Omniverse Team"]
8category = "jobs"
9keywords = ["job"]
10
11# Schema for the job definition of an Omniverse application:
12[job.sample-omniverse-application-job]
13# Type of the job. Using "kit-service" makes it possible to execute services exposed by Omniverse applications:
14job_type = "kit-service"
15# User-friendly display name for the job:
16name = "sample-omniverse-application-job"
17# Resolve and launch the "Create" application from the Omniverse Launcher (note the 3 slashes in the URL):
18command = "launcher:///create"
19# List of command-line arguments to provide to the Omniverse application when launched:
20args = [
21 # Make sure the Omniverse application can be closed when the processing of the job has completed, and that the
22 # notification asking if the USD stage from the active session should be saved prior to closing does not prevent
23 # the application from shutting down:
24 "--/app/file/ignoreUnsavedOnExit=true",
25 # Make sure Omniverse application are active upon launch, and that notification prompt asking for User inputs
26 # are not preventing the session from being interactive:
27 "--/app/extensions/excluded/0='omni.kit.window.privacy",
28 # Add any additional setting required by the Omniverse application, or your own extensions:
29 # [...]
30]
31# Path of the service endpoint where to route arguments in order to start the processing of the job:
32task_function = "sample-processing-extension.run"
33# Flag indicating whether to execute the Omniverse application in headless mode (i.e. the equivalent of supplying it
34# with the `--no-window` command-line option):
35headless = true
36# Capture information from `stdout` and `stderr` for the job's logs:
37log_to_stdout = true
38
39# Supply a list of folders where extensions on which the job depends can be found:
40[settings.app.exts.folders]
41"++" = [
42 "${job}/exts-sample-omniverse-application-job",
43 # ...
44]
45
46# List of extensions on which the job depends in order to execute:
47[dependencies]
48"omni.services.farm.agent.runner" = {}
49# ...
50
51# When running the job, enable the following extensions:
52[settings.app.exts]
53enabled = [
54 # Extension exposing a "run" endpoint, which will receive the arguments of the task as payload, and start the
55 # job process:
56 "sample-processing-extension",
57 # ...
58]
Fundamentally, jobs implemented as Omniverse application services declare a set of extensions which should be enabled by the application, and the path to the endpoint that one of them exposes in order to fulfill the task.
The process should be familiar to you if you have already created an Omniverse extension, as it follows the typical development workflow. For clarity, a few details nonetheless about the example above, where we are:
Providing configuration options to the Omniverse application, so it can launch in a state that will allow it to perform the work it will receive.
Specifying the location of the extension(s) that we expect the Omniverse application to load for us.
Enabling any extension we require from the Omniverse application, along with the one that will act as the entrypoint for incoming requests to kickstart the execution of the task.
This entrypoint extension is expected to expose an endpoint that the location defined by the task_function
option of the schema. This endpoint, implemented using the Service stack, will be called by the Agent tasked with performing a job, and that will supply the endpoint with any information it needs in order to execute the work.
Note
For concrete examples of how arguments can be supplied to the endpoint service, head over to the Omniverse Farm examples page.
A few additional notes about the layout of this job definition for Omniverse services:
We used Omniverse USD Composer from the Launcher for demonstration purposes in this sample, however you are free to use any application available on the Launcher by supplying its unique identifier to the
command
property of the job definition. For example, you could be usingisaac_sim
to target workflows based on Isaac Sim.For convenience, the
headless
flag can be used during development as a way of inspecting the operations performed by the service, in order see the progress of the operations performed. Once deployed in a production context, running the application inheadless
mode make it both more performant and easier to scale, as batch workflows typically do not require a user interface to perform actions, and thus makes an entire desktop environment optional.
Schema Reference#
For reference, the following is a brief list of properties available for job definitions:
Property |
Type |
Description |
---|---|---|
|
|
Type of the job, can be either |
|
|
User-friendly name uniquely identifying the job. |
|
|
Module to execute when when specifying a |
|
|
Application or command to be executed by the job. |
|
|
Directory where the |
|
|
List of return codes from the |
|
|
List of arguments to supply to the |
|
|
Dictionary of arguments which may be unique to each execution of a job, including default values. Arguments can be defined as: [job.sample-job.allowed_args]
source = { arg = "--source", default = "" }
destination = { arg = "--destination", default = "" }
ratio = { arg = "--ratio", default = "0.5" }
|
|
|
Dictionary of environment variables to supply to the |
|
|
List of extension paths. |
|
|
Flag indicating whether to capture information from |
|
|
Flag indicating whether the application should be run in headless mode. |
|
|
Flag indicating whether the task is enabled. |
|
|
Image location of a Docker container to execute. |
|
|
See Capacity Requirements Schema Reference below. |
Capacity Requirements Schema Reference (kubernetes)#
The following contains a list of capacity_requirements
properties available if deployed within a Kubernetes environment.
The following properties are specific to the container-v1-core and podspec-v1-core from Kubernetes version 1.24.
Two special properties are provided container_spec_field_overrides
and pod_spec_field_overrides
for specifying fields that may come in future Kubernetes specs.
Container Core Properties#
Property |
Type |
Description |
---|---|---|
|
|
Special property that does not apply to any particular Kubernetes field, instead this can used to inject fields that may be added in future Kubernetes releases. [job.sample-job.capacity_requirements.container_spec_field_overrides]
futureKuberneteContainerCoreField = "foobar"
|
|
|
List of environment variables to set in the job’s container pod env. [[job.sample-job.capacity_requirements.env]]
name = "foo"
value = "bar"
|
|
|
List of sources to populate environment variables in the job’s container pod env from. [[job.sample-job.capacity_requirements.envFrom]]
[job.sample-job.capacity_requirements.envFrom.configMapRef]
name = "sample-config"
|
|
|
The image pull policy for the job’s container image image pull policy. [job.sample-job.capacity_requirements]
image_pull_policy = "Always"
|
|
|
Specify the job’s container lifecycle lifecycle. [job.sample-job.capacity_requirements.lifecycle.postStart.exec]
command = [
"/bin/sh",
"-c",
"echo Hello from the postStart handler > /usr/share/message"
]
[job.sample-job.capacity_requirements.lifecycle.preStop.exec]
command = [
"/bin/sh",
"-c",
"sleep 1"
]
|
|
|
Specify the job’s container pod liveness probe. [job.sample-job.capacity_requirements.liveness_probe]
[job.sample-job.capacity_requirements.liveness_probe.httpGet]
path = "/status"
port = "http"
|
|
|
Specify the job’s container pod container ports. [[job.sample-job.capacity_requirements.ports]]
name = "http"
containerPort = 80
protocol = "TCP"
|
|
|
Specify the job’s container pod resource limits. Refer to resource units for acceptable units. [job.sample-job.capacity_requirements.resource_limits]
cpu = 1
memory = "4096Mi"
"nvidia.com/gpu" = 1
|
|
|
Specify the job’s container pod readiness probe. [job.sample-job.capacity_requirements.readiness_probe]
[job.sample-job.capacity_requirements.readiness_probe.httpGet]
path = "/status"
port = "http"
|
|
|
Specify the job’s container pod, security context. [job.sample-job.capacity_requirements.security_context]
runAsUser = 2000
allowPrivilegeEscalation = False
|
|
|
Specify the job’s container pod startup probe. [job.sample-job.capacity_requirements.startup_probe]
[job.sample-job.capacity_requirements.startup_probe.httpGet]
path = "/status"
port = "http"
|
|
|
Control whether the job’s container should allocate a buffer for stdin in the container runtime stdin. [job.sample-job.capacity_requirements]
stdin = true
|
|
|
Control whether the job’s container runtime should close the stdin channel after it has been opened by a single attach stdin once. [job.sample-job.capacity_requirements]
stdin_once = false
|
|
|
Path at which the file to which the container’s termination message will be written is mounted into the container’s filesystem termination message path. [job.sample-job.capacity_requirements]
termination_message_path = "/dev/termination-log"
|
|
|
Indicate how the termination message should be populated termination message policy. [job.sample-job.capacity_requirements]
termination_message_policy = "File"
|
|
|
Control whether the job’s container should allocate a TTY for itself, also requires ‘stdin’ to be true tty. [job.sample-job.capacity_requirements]
tty = true
|
|
|
Specify the job’s container pod volume devices volume devices. [[job.sample-job.capacity_requirements.volume_devices]]
devicePath = "/myrawblockdevice"
name = "blockDevicePvc"
|
|
|
Specify the job’s container pod volume mounts. [[job.sample-job.capacity_requirements.volume_mounts]]
mountPath = "/root/.provider/"
name = "creds"
|
Pod Spec Properties#
Property |
Type |
Description |
---|---|---|
|
|
Special property that does not apply to any particular Kubernetes field, instead this can used to inject fields that may be added in future Kubernetes releases. [job.sample-job.capacity_requirements.pod_spec_field_overrides]
futureKubernetesPodSpecField = "foobar"
|
|
|
Duration in seconds the pod may be active on the node relative to StartTime before the system will actively try to mark it failed and kill associated containers active deadline seconds. [job.sample-job.capacity_requirements]
active_deadline_seconds = 30
|
|
|
Specify the job’s container pod affinity. [[job.sample-job.capacity_requirements.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms]]
[[job.sample-job.capacity_requirements.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms.matchExpressions]]
key = "name"
operator = "In"
values = [ "worker-node" ]
[[job.sample-job.capacity_requirements.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution]]
weight = 1
[[job.sample-job.capacity_requirements.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution.preference.matchExpressions]]
key = "type"
operator = "In"
values = [ "01" ]
|
|
|
Indicate whether a service account token should be automatically mounted automount service account token. [job.sample-job.capacity_requirements]
active_deadline_seconds = 30
|
|
|
Specifies the DNS parameters of the job’s container pod dns config. [job.sample-job.capacity_requirements.dnsConfig]
nameservers = [ "1.2.3.4" ]
searches = [ "ns1.svc.cluster-domain.example", "my.dns.search.suffix" ]
[[job.sample-job.capacity_requirements.dnsConfig.options]]
name = "ndots"
value = "2"
[[job.sample-job.capacity_requirements.dnsConfig.options]]
name = "edns0"
|
|
|
Set DNS policy for the job’s container pod dns policy. [job.sample-job.capacity_requirements]
dns_policy = "ClusterFirst"
|
|
|
Indicates whether information about services should be injected into pod’s environment variables enable service links. [job.sample-job.capacity_requirements]
enable_service_links = true
|
|
|
List of ephemeral containers run in the job’s container pod. ephemeral containers. |
|
|
List of hosts and IPs that will be injected into the pod’s hosts file if specified. This is only valid for non-hostNetwork pods. host aliases. |
|
|
Use the host’s IPC namespace host IPC. |
|
|
Host networking requested for the job’s container pod host network. |
|
|
Use the host’s PID namespace host PID. |
|
|
Specifies the hostname of the Pod hostname. |
|
|
List of references to secrets in the same namespace to use for pulling any of the images image pull secrets. [[job.sample-job.capacity_requirements.imagePullSecrets]]
name = "registry-secret"
|
|
|
List of initialization containers init containers. |
|
|
Node name is a request to schedule this pod onto a specific node node name. |
|
|
Selector which must be true for the pod to fit on a node node selector. [job.sample-job.capacity_requirements.node_selector]
"beta.kubernetes.io/instance-type" = "worker"
"beta.kubernetes.io/os" = "linux"
|
|
|
Specifies the OS of the containers in the pod os. |
|
|
Overhead represents the resource overhead associated with running a pod for a given RuntimeClass overhead. |
|
|
Policy for preempting pods with lower priority preemption policy. |
|
|
Priority value priority. |
|
|
Indicate the pod’s priority priority class name. |
|
|
Pod’s readiness gates. |
|
|
Set the pod’s runtime class name. |
|
|
Specific scheduler to dispatch the pod scheduler name. |
|
|
Specify the job’s container pod, pod security context. [job.sample-job.capacity_requirements.pod_security_context]
runAsUser = 1000
|
|
|
Set the pod’s service account. |
|
|
Name of the service account to use to run this pod service account name. |
|
|
The pod’s hostname will be configured as the pod’s FQDN set hostname as FQDN. |
|
|
Share a single process namespace between all of the containers in a pod share process namespace. |
|
|
Specify the pod’s subdomain. |
|
|
Duration in seconds the pod needs to terminate gracefully termination grace period seconds. |
|
|
Specify the job’s container pod tolerations. [[job.sample-job.capacity_requirements.tolerations]]
key = "key1"
operator = "Equal"
value = "value1"
effect = "NoSchedule"
|
|
|
Topology domain constraints see details. |
|
|
Specify the job’s container pod volumes. Refer to volumes for more examples and valid fields. The follow is an example of mounting a config map. [[job.sample-job.capacity_requirements.volumes]]
name = "creds"
[job.sample-job.capacity_requirements.volumes.configMap]
name = "credentials-cm"
|