This section contains a quick and somewhat contrived demonstration that illustrates a number of basic Enoki features. The remainder of the documentation contains a more systematic introduction of this functionality.
Consider the function
which converts a linear color value into one that can be displayed on a modern display following the sRGB standard—this is known as gamma correction. A standard C++ implementation of this function might look as follows:
float srgb_gamma(float x) {
if (x <= 0.0031308f)
return x * 12.92f;
else
return std::pow(x * 1.055f, 1.f / 2.4f) - 0.055f;
}
A Enoki implementation of the same computation first replaces float
by a
generic Value
type. The second difference is that scalar conditionals are
replaced by generalized expressions involving masks.
template <typename Value> Value srgb_gamma(Value x) {
return enoki::select(
x <= 0.0031308f,
x * 12.92f,
enoki::pow(x * 1.055f, 1.f / 2.4f) - 0.055f
);
}
The simple generalization from normal C++ code to an Enoki function template
enables a number of interesting applications. For instance, the function
automatically extends to cases where Value
is a color type with three
components, in which case the arithmetic operations recursively thread through
the array.
using Color3f = enoki::Array<float, 3>;
Color3f input = /* ... */;
Color3f output = srgb_gamma(input);
Arrays can be nested arbitrarily: the following snippet declares a 16-wide
FloatP
“packet” type (hence the “P” suffix) and uses it to construct a
a new type storing 16 colors that will all be processed in parallel.
using FloatP = enoki::Array<float, 16>;
using Color3fP = enoki::Array<FloatP, 3>;
Color3fP input = /* ... */;
Color3fP output = srgb_gamma(input);
If this code is compiled on a machine supporting the SSE4.2, AVX, AVX2, or AVX512 instructions set extensions, these vector instructions will be leveraged to carry out the computation more efficiently.
Vectorization is not restricted to the CPU—for instance, the following type declarations create a special array that is resident in GPU memory. In this mode of operation, Enoki relies on an internal just-in-time compiler to generate efficient CUDA kernels on the fly.
using FloatC = enoki::CUDAArray<float>;
using Color3fC = enoki::Array<FloatC, 3>;
Color3fC input = /* ... */;
Color3fC output = srgb_gamma(input);
Enoki’s CUDAArray<T>
type applies an important optimization that leads to
significantly improved performance: in contrast to the previous examples, the
function call srgb_gamma(input)
now merely records the sequence of
computations that is needed to determine the value of output
but does not
yet execute it.
Eventually, this evaluation can no longer be postponed (e.g. when we try to
access or print the array contents). At this point, Enoki’s JIT backend
compiles and executes a kernel that contains all queued computations using
NVIDIA’s PTX intermediate representation. All of this happens automatically: in
particular, no CUDA-specific rewrite (e.g. to nvcc
compatible kernels) of
the program is necessary!
Enoki can also apply transparent forward or reverse-mode automatic
differentiation to a program using a special enoki::DiffArray<T>
array that
wraps a number type or another Enoki array T
.
For instance, the following example computes the gradient of a loss function that measures L2 distance from a given gamma-corrected color value. Both primal and gradient-related computations involve GPU-resident arrays, and the resulting computation is queued up as in the previously example using Enoki’s just-in-time compiler.
using FloatC = enoki::CUDAArray<float>;
using FloatD = enoki::DiffArray<FloatC>;
using Color3fD = enoki::Array<FloatD, 3>;
Color3fD input = /* ... */;
enoki::set_requires_gradient(input);
Color3fD output = srgb_gamma(input);
FloatD loss = enoki::norm(output - Color3fD(.1f, .2f, .3f));
enoki::backward(loss);
std::cout << enoki::gradient(input) << std::endl;
All Enoki functions also accept non-array arguments, hence the original scalar implementation remains available:
float input = /* ... */;
float output = srgb_gamma(input);
Modern C++ systems often strive to provide fine-grained Python bindings to facilitate rapid prototyping and interoperability with other software. Enoki is designed to work with the widely used pybind11 library (itself based on template metaprogramming) to facilitate this. Exposing an Enoki function on the Python side is usually a 1-liner, even for the “fancy” GPU+autodiff variants, as in the following example:
/// Create python bindings with 2 overloads (here, 'm' is a py::module)
m.def("srgb_gamma", &srgb_gamma<float>);
m.def("srgb_gamma", &srgb_gamma<Color3fD>);
In summary: Enoki, along with a generalized template implementation of a computation enables several powerful transformations:
There are many missing pieces that weren’t discussed in this basic overview: how to handle more complex control flow, types and data structures, virtual method calls, and so on. The remainder of this documentation provides a more systematic overview of these topics.