Writing a computeVectorized function

Instead of writing a compute function in your node, you can choose to write a computeVectorized function. When using instancing, this function will receive a batch of instances to compute instead of a single one.

Indexing

The data associated to a given instance can be retrieved by providing an omni::graph::core::InstanceIndex. The data of the following instances can be either retrieved by providing at incremented omni::graph::core::InstanceIndex, but also by directly indexing the returned pointer. Indeed, the data of subsequent instances for the provided batch is guaranteed to be contiguous. This indexing is always relative to the “current active instance”: when the framework calls a node compute function, or its computeVectorized function with a batch of (or a unique) instance(s), it keeps track in the context of the “current active instance”, the first (or unique) one of the provided batch. In that context, calling any ABI function that takes an omni::graph::core::InstanceIndex will always apply the provided index as an offset to the “current active instance”. For example, proving omni::graph::core::InstanceIndex{0} to such an ABI will return the data for the first instance in the batch. Outside of the context of a compute function (like during authoring actions), the “current active instance” is actually not an instance, but rather refers to the “authoring graph”, the template used to create all instances. The authoring graph can be seen as a shared data structure, that can be used by all instances, in the same way than the static members of a C++ class. Inside compute, indexing is relative to the current active instance, but outside of compute, the current active instance will be the authoring graph, and indexing will be absolute. Some pre-defined constants can be used to precisely access what you want depending on the context in which the ABI is called:

  • omni::graph::core::kAccordingToContextIndex: this constant will target the current active instance. In the context of compute/computeVectorized, this is equivalent to omni::graph::core::InstanceIndex{0}

  • omni::graph::core::kAuthoringGraphIndex: This will target the “authoring graph”, the template shared by all instances. Outside of compute, this is equivalent to omni::graph::core::kAccordingToContextIndex

When using OGN, there is another layer of indexing on top of the context indexing: the entire OGN database can be shifted to the next instance in a batch by simply calling db.moveToNextInstance(). The current instance it is pointing to can be retrieved by calling db.getCurrentInstanceIndex(). All attributes accessed through the database will be the ones that belong to the instance pointed by the database.

Writing an ABI computeVectorized

If not using the OGN, you can implement the following static method in your node, in place of the regular compute:

size_t computeVectorized(GraphContextObj const& contextObj, NodeObj const& nodeObj, size_t count)

When used in an instantiated graph, the framework may call this function with a full batch of instances, in which case the count parameter will tell how many instances are expected to be computed. In all other situations, count will be equal to 1.

There are 3 ways to access and iterates the attributes data of all the provided instances. Each one provides a different access pattern that returns the same data, and can freely be mixed to your liking. It really depends on the operation your node tries to perform and what is the best way to express it. For this example, let’s consider a passthrough node, which only purpose is to pass some data as-is. While such a node is useless, it allows us to focus on the subject at hand: accessing the instantiated data.

Here is how its regular compute function could be implemented:

 1    static bool compute(GraphContextObj const& contextObj, NodeObj const& nodeObj)
 2    {
 3        NodeContextHandle nodeHandle = nodeObj.nodeContextHandle;
 4
 5        auto inputValueAttr = getAttributeR(contextObj, nodeHandle, Token("inputs:value"), kAccordingToContextIndex);
 6        const float* inputValue = getDataR<float>(contextObj, inputValueAttr);
 7
 8        auto outputValueAttr = getAttributeW(contextObj, nodeHandle, Token("outputs:value"), kAccordingToContextIndex);
 9        float* outputValue = getDataW<float>(contextObj, outputValueAttr);
10
11        if (inputValue && outputValue)
12        {
13            *outputValue = *inputValue;
14            return true;
15        }
16
17        return false;
18    }

Now let’s look at the vectorized versions.

Method #1: Access individual attribute data with an instance offset

 1    static size_t computeVectorized(GraphContextObj const& contextObj, NodeObj const& nodeObj, size_t count)
 2    {
 3        GraphContextObj const* contexts = nullptr;
 4        NodeObj const* nodes = nullptr;
 5
 6        // When using auto instancing, similar graphs can get merged together, and computed vectorized
 7        // In such case, each instance represent a different node in a different graph
 8        // Accessing the data either through the provided node, or through the actual auto-instance node would work
 9        // properly But any other ABI call requiring the node would need to provide the proper node. While not necessary
10        // in this context, do the work of using the proper auto-instance node in order to demonstrate how to use it.
11        size_t handleCount = nodeObj.iNode->getAutoInstances(nodeObj, contexts, nodes);
12
13        auto nodeHandle = [&](InstanceIndex index) -> NodeContextHandle
14        { return nodes[handleCount == 1 ? 0 : index.index].nodeContextHandle; };
15
16        auto context = [&](InstanceIndex index) -> GraphContextObj
17        { return contexts[handleCount == 1 ? 0 : index.index]; };
18
19        size_t ret = 0;
20        const float* inputValue{ nullptr };
21        float* outputValue{ nullptr };
22        auto inToken = Token("inputs:value");
23        auto outToken = Token("outputs:value");
24
25        for (InstanceIndex idx{ 0 }; idx < InstanceIndex{ count }; ++idx)
26        {
27            auto inputValueAttr = getAttributeR(context(idx), nodeHandle(idx), inToken, idx);
28            inputValue = getDataR<float>(context(idx), inputValueAttr);
29
30            auto outputValueAttr = getAttributeW(context(idx), nodeHandle(idx), outToken, idx);
31            outputValue = getDataW<float>(context(idx), outputValueAttr);
32
33            if (inputValue && outputValue)
34            {
35                *outputValue = *inputValue;
36                ++ret;
37            }
38        }
39
40        return ret;
41    }

As you notice here, we can access each instance attribute data by providing an additional offset (compared to the current active instance) in the attribute data accessor.

Method #2: Mutate the attribute data handles

 1    static size_t computeVectorized(GraphContextObj const& contextObj, NodeObj const& nodeObj, size_t count)
 2    {
 3        NodeContextHandle nodeHandle = nodeObj.nodeContextHandle;
 4
 5        size_t ret = 0;
 6        const float* inputValue{ nullptr };
 7        float* outputValue{ nullptr };
 8        auto inputValueAttr = getAttributeR(contextObj, nodeHandle, Token("inputs:value"), kAccordingToContextIndex);
 9        auto outputValueAttr = getAttributeW(contextObj, nodeHandle, Token("outputs:value"), kAccordingToContextIndex);
10
11        while (count--)
12        {
13            inputValue = getDataR<float>(contextObj, inputValueAttr);
14            outputValue = getDataW<float>(contextObj, outputValueAttr);
15
16            if (inputValue && outputValue)
17            {
18                *outputValue = *inputValue;
19                ++ret;
20            }
21            inputValueAttr = contextObj.iAttributeData->moveToAnotherInstanceR(contextObj, inputValueAttr, 1);
22            outputValueAttr = contextObj.iAttributeData->moveToAnotherInstanceW(contextObj, outputValueAttr, 1);
23        }
24
25        return ret;
26    }

In that example, we are retrieving the attribute data handles just once, and then make them move to the next instance in the loop.

Method #3: Retrieve the raw data

 1    static size_t computeVectorized(GraphContextObj const& contextObj, NodeObj const& nodeObj, size_t count)
 2    {
 3        NodeContextHandle nodeHandle = nodeObj.nodeContextHandle;
 4
 5        auto inputValueAttr = getAttributeR(contextObj, nodeHandle, Token("inputs:value"), kAccordingToContextIndex);
 6        const float* inputValue = getDataR<float>(contextObj, inputValueAttr);
 7
 8        auto outputValueAttr = getAttributeW(contextObj, nodeHandle, Token("outputs:value"), kAccordingToContextIndex);
 9        float* outputValue = getDataW<float>(contextObj, outputValueAttr);
10
11        if (inputValue && outputValue)
12        {
13            memcpy(outputValue, inputValue, count * sizeof(float));
14            return count;
15        }
16
17        return 0;
18    }

In this example, we are working and indexing directly the raw data, as we are guaranteed that all the data for the provided instances is arranged in a contiguous way. This is the fastest and preferred method.

Writing an OGN computeVectorized

When using the OGN framework, you can implement the following static method instead of a regular compute:

size_t computeVectorized(OgnMyNodeDatabase& db, size_t count)

Int he same way than the ABI version, this function will receive the number of instances expected to be computed. There are also 3 ways to access and iterates the attributes data of all the provided instances when using OGN. Like for the ABI version, each one provides a different access pattern that returns the same data, and can freely be mixed to your liking. If for any reason you need to use an ABI function that requires the instance index, be sure to pass db.getCurrentInstanceIndex() for this argument.

Let’s consider the same passthrough node, but let’s look at how it could be implemented using the OGN wrappers.

Here what its OGN could look like:

 1{
 2    "TutorialVectorizedPassThrough": {
 3        "version": 1,
 4        "description": "Simple passthrough node that copy its input to its output in a vectorized way",
 5        "categories": "tutorials",
 6        "uiName": "Tutorial Node: Vectorized Passthrough",
 7        "inputs":{
 8            "value":{
 9                "type": "float",
10                "description": "input value"
11            }
12        },
13        "outputs": {
14            "value": {
15                "type": "float",
16                "description": "output value"
17            }
18        },
19        "tests" : [
20            { "inputs:value": 1, "outputs:value": 1 },
21            { "inputs:value": 2, "outputs:value": 2 },
22            { "inputs:value": 3, "outputs:value": 3 },
23            { "inputs:value": 4, "outputs:value": 4 }
24        ]
25    }
26}

Here is how its regular compute function could be implemented:

1    static bool compute(OgnTutorialVectorizedPassthroughDatabase& db)
2    {
3        db.outputs.value() = db.inputs.value();
4        return true;
5    }

Now let’s look at the vectorized versions.

Method #1: shift the entire database to the next instance

1    static size_t computeVectorized(OgnTutorialVectorizedPassthroughDatabase& db, size_t count)
2    {
3        for (size_t idx = 0; idx < count; ++idx)
4        {
5            db.outputs.value() = db.inputs.value();
6            db.moveToNextInstance();
7        }
8        return count;
9    }

As you notice here, by making a call to db.moveToNextInstance(), we can keep the heart of the node compute unchanged. This is the default loop that the OGN framework does for you if you don’t implement a computeVectorized in your node:

// Pseudo code of the loop performed by the framework for OGN nodes that don't implement a computeVectorized
for(size_t idx = 0; idx < count; ++idx)
{
    OgnNode::compute(db);
    db.moveToNextInstance();
}

Method #2: Access individual attribute data with an instance offset

1    static size_t computeVectorized(OgnTutorialVectorizedPassthroughDatabase& db, size_t count)
2    {
3        for (size_t idx = 0; idx < count; ++idx)
4            db.outputs.value(idx) = db.inputs.value(idx);
5        return count;
6    }

You can notice in that case that we keep the database pointing to the current active index, and then access each instance attribute data by providing an additional offset (compared to the database) in the attribute data accessor operator().

Method #3: Retrieve the full range in a single call

1    static size_t computeVectorized(OgnTutorialVectorizedPassthroughDatabase& db, size_t count)
2    {
3        auto spanIn = db.inputs.value.vectorized(count);
4        auto spanOut = db.outputs.value.vectorized(count);
5
6        memcpy(spanOut.data(), spanIn.data(), std::min(spanIn.size_bytes(), spanOut.size_bytes()));
7
8        return count;
9    }

In this example, we are using the vectorized() accessor on the attribute that returns the full range of data for all instances. This is the preferred method to use advanced instructions (such as SIMD) or low level functions (such as memcpy), as you can access directly the raw data.