Performance Optimization Tips

Implementing a computeVectorized function

When using instancing, the framework organizes the data for all instances of a given graph in contiguous arrays per attributes in order to improve performance. This organization not only provides data locality which reduces cache misses and improve performance, but also gives the ability to write the compute function in a vectorized way. Doing so, the node writer will receive a full batch of instances. This allows computation in a vectorized way, treating several instances at the same time (by using SIMD instructions for example), but also to factorize invariant computations for all instances and write tight and efficient loops around the heart of the actual computation. Such technique brings tremendous performance gains, and makes any addition of new instances extremely cheap to compute. Details on how to write a computeVectorized function can be found here Writing a computeVectorized function.