Warning
This section is a work in progress.
The function signatures provided in this section are simplified to aid readability. For instance, the array addition operator
hides the fact that
The function supports mixed arguments with builtin broadcasting. For instance, the following is legal
Array<float, 4> x = ...;
Array<Array<double, 8>, 4> y = ...;
auto z = x + y;
The entries of x
are broadcasted and promoted in the above example, and
the type of z
is thus Array<Array<double, 8>, 4>
. See the section on
broadcasting rules for details.
The operator uses SFINAE (std::enable_if
) so that it only becomes active
when x
or y
are Enoki arrays.
The operator replicates the C++ typing rules. Recall that adding two
float&
references in standard C++ produces a float
temporary (i.e.
not a reference) holding the result of the operation.
Analogously, z
is of type Array<float, 4>
in the example below.
Array<float&, 4> x = ...;
Array<float&, 4> y = ...;
auto z = x + y;
Writing out these type transformation rules in every function definition would make for tedious reading, hence the simplifications. If in doubt, please look at the source code of Enoki.
ENOKI_VERSION_MAJOR
¶Integer value denoting the major version of the Enoki release.
ENOKI_VERSION_MINOR
¶Integer value denoting the minor version of the Enoki release.
ENOKI_VERSION_PATCH
¶Integer value denoting the patch version of the Enoki release.
ENOKI_VERSION
¶Enoki version string (e.g. "0.1.2"
).
ENOKI_LIKELY
(condition)¶Signals that the branch is almost always taken, which can be used for improved code layout if supported by the compiler. An example is shown below:
if (ENOKI_LIKELY(x > 0)) {
/// ....
}
ENOKI_UNLIKELY
(condition)¶Signals that the branch is rarely taken analogous to
ENOKI_LIKELY()
.
ENOKI_UNROLL
¶Cross-platform mechanism for asking the compiler to unroll a loop. The
macro should be placed before the for
statement.
ENOKI_NOUNROLL
¶Cross-platform mechanism for asking the compiler to never unroll a loop
analogous to ENOKI_UNROLL()
.
ENOKI_INLINE
¶Cross-platform mechanism for asking the compiler to always inline a function. The macro should be placed in front of the function declaration.
ENOKI_INLINE void foo() { ... }
ENOKI_NOINLINE
¶Cross-platform mechanism for asking the compiler to never inline a
function analogous to ENOKI_INLINE()
.
has_avx512dq
¶Specifies whether AVX512DQ instructions are available on the target architecture.
has_avx512vl
¶Specifies whether AVX512VL instructions are available on the target architecture.
has_avx512bw
¶Specifies whether AVX512BW instructions are available on the target architecture.
has_avx512cd
¶Specifies whether AVX512CD instructions are available on the target architecture.
has_avx512pf
¶Specifies whether AVX512PF instructions are available on the target architecture.
has_avx512er
¶Specifies whether AVX512ER instructions are available on the target architecture.
has_avx512vpopcntdq
¶Specifies whether AVX512VPOPCNTDQ instructions are available on the target architecture.
has_avx512f
¶Specifies whether AVX512F instructions are available on the target architecture.
has_avx2
¶Specifies whether AVX2 instructions are available on the target architecture.
has_avx
¶Specifies whether AVX instructions are available on the target architecture.
has_fma
¶Specifies whether fused multiply-add instructions are available on the target architecture (ARM & x86).
has_f16c
¶Specifies whether F16C instructions are available on the target architecture.
has_sse42
¶Specifies whether SSE 4.2 instructions are available on the target architecture.
has_x86_32
¶Specifies whether the target architecture is x86, 32 bit.
has_x86_64
¶Specifies whether the target architecture is x86, 64 bit.
has_arm_neon
¶Specifies whether ARM NEON instructions are available on the target architecture.
has_arm_32
¶Specifies whether the target architecture is a 32-bit ARM processor (armv7).
has_arm_64
¶Specifies whether the target architecture is aarch64 (armv8+).
max_packet_size
¶Denotes the maximal packet size (in bytes) that can be mapped to native vector registers. It is equal to 64 if AVX512 is present, 32 if AVX is present, and 16 for machines with SSE 4.2 or ARM NEON.
array_default_size
¶Denotes the default size of Enoki arrays. Equal to max_packet_size / 4
.
template <typename Value, size_t Size = array_default_size>
Array
: StaticArrayImpl<Value, Size, Array<Value, Size>>¶The default Enoki array class – a generic container that stores a
fixed-size array of an arbitrary data type similar to the standard template
library class std::array
. The main distinction between the two is that
enoki::Array
forwards all arithmetic operations (and other
standard mathematical functions) to the contained elements.
It has several template parameters:
typename Value
: the type of individual array entries.size_t Size
: the number of packed array entries.This class is just a small wrapper that instantiates
enoki::StaticArrayImpl
using the Curiously Recurring Template
Pattern (CRTP). The latter provides the actual machinery that is needed to
evaluate array expressions. See Defining custom array types for details.
template <typename Value, size_t Size = array_default_size>
Packet
: StaticArrayImpl<Value, Size, Array<Value, Size>>¶The Packet
type is identical to enoki::Array
except for
its broadcasting behavior.
Value
, size_t Size
, typename Derived
>StaticArrayImpl
¶This base class provides the core implementation of an Enoki array. It
cannot be instantiated directly and is used via the Curiously Recurring
Template Pattern (CRTP). See Array
and Defining custom array types
for details on how to create custom array types.
StaticArrayImpl
()¶Create an unitialized array. Floating point arrays are initialized
using std::numeric_limits<Value>::quiet_NaN()
when the application
is compiled in debug mode.
Args
>StaticArrayImpl
(Args... args)¶Initialize the individual array entries with args
(where
sizeof...(args) == Size
).
Value2
, typename Derived2
>StaticArrayImpl
(const StaticArrayImpl<Value2, Size, Derived2> &other)¶Initialize the array with the contents of another given array that potentially has a different underlying type. Enoki will perform a vectorized type conversion if this is supported by the target processor.
size
() const¶Returns the size of the array.
operator[]
(size_t index) const¶Return a reference to an array element (const version). When the
application is compiled in debug mode, the function performs a range
check and throws std::out_of_range
in case of an out-of-range
access. This behavior can be disabled by defining
ENOKI_DISABLE_RANGE_CHECK
.
operator[]
(size_t index)¶Return a reference to an array element. When the application is
compiled in debug mode, the function performs a range check and throws
std::out_of_range
in case of an out-of-range access. This behavior
can be disabled by defining ENOKI_DISABLE_RANGE_CHECK
.
coeff
(size_t index) const¶Just like operator[]()
, but without the range check (const
version).
coeff
(size_t index)¶Just like operator[]()
, but without the range check.
alloc
(size_t size)¶Allocates size
bytes of memory that are sufficiently aligned so that
any Enoki array can be safely stored at the returned address.
Array
>load
(const void *mem, mask_t<Array> mask = true)¶Loads an array of type Array
from the memory address mem
(which is
assumed to be aligned on a multiple of alignof(Array)
bytes). No loads
are performed for entries whose mask bit is false
—instead, these
entries are initialized with zero.
Warning
Performing an aligned load from an unaligned memory address will cause a general protection fault that immediately terminates the application.
Note
When the mask
parameter is specified, the function implements a
masked load, which is fairly slow on machines without the AVX512
instruction set.
Array
>load_unaligned
(const void *mem, mask_t<Array> mask = true)¶Loads an array of type Array
from the memory address mem
(which is
not required to be aligned). No loads are performed for entries whose mask
bit is false
—instead, these entries are initialized with zero.
Note
When the mask
parameter is specified, the function implements a
masked load, which is fairly slow on machines without the AVX512
instruction set.
Array
>store
(const void *mem, Array array, mask_t<Array> mask = true)¶Stores an array of type Array
at the memory address mem
(which is
assumed to be aligned on a multiple of alignof(Array)
bytes). No stores
are performed for entries whose mask bit is false
.
Warning
Performing an aligned storefrom an unaligned memory address will cause a general protection fault that immediately terminates the application.
Note
When the mask
parameter is specified, the function implements a
masked store, which is fairly slow on machines without the AVX512
instruction set.
Array
>store_unaligned
(const void *mem, Array array, mask_t<Array> mask = true)¶Stores an array of type Array
at the memory address mem
(which is
not required to be aligned). No stores are performed for entries whose mask
bit is false
.
Note
When the mask
parameter is specified, the function implements a
masked store, which is fairly slow on machines without the AVX512
instruction set.
Array
, size_t Stride
= sizeof(scalar_t<Array>), typename Index
>gather
(const void *mem, Index index, mask_t<Array> mask = true)¶Loads an array of type Array
using a masked gather operation. This is
equivalent to the following scalar loop (which is mapped to efficient
hardware instructions if supported by the target hardware).
Array result;
for (size_t i = 0; i < Array::Size; ++i)
if (mask[i])
result[i] = ((Value *) mem)[index[i]];
else
result[i] = Value(0);
The index
parameter must be a 32 or 64 bit integer array having the
same number of entries. It will be interpreted as a signed array regardless
of whether the provided array is signed or unsigned.
The default value of the Stride
parameter indicates that the data at
mem
uses a packed memory layout (i.e. a stride value of
sizeof(Value)
); other values override this behavior.
Stride
= 0, typename Array
, typename Index
>scatter
(const void *mem, Array array, Index index, mask_t<Array> mask = true)¶Stores an array of type Array
using a scatter operation. This is
equivalent to the following scalar loop (which is mapped to efficient
hardware instructions if supported by the target hardware).
for (size_t i = 0; i < Array::Size; ++i)
if (mask[i])
((Value *) mem)[index[i]] = array[i];
The index
parameter must be a 32 or 64 bit integer array having the
same number of entries. It will be interpreted as a signed array regardless
of whether the provided array is signed or unsigned.
The default value of the Stride
parameter indicates that the data at
mem
uses a packed memory layout (i.e. a stride value of
sizeof(Value)
); other values override this behavior.
Array
, bool Write
= false, size_t Level
= 2, size_t Stride
= sizeof(scalar_t<Array>), typename Index
>prefetch
(const void *mem, Index index, mask_t<Array> mask = true)¶Pre-fetches an array of type Array
into the L1 or L2 cache (as
indicated via the Level
template parameter) to reduce the latency of a
future gather or scatter operation. If Write = true
, the
the associated cache line should be acquired for write access (i.e. a
scatter rather than a gather operation).
The index
parameter must be a 32 or 64 bit integer array having the
same number of entries. It will be interpreted as a signed array regardless
of whether the provided array is signed or unsigned.
If provided, the mask parameter specifies which of the pre-fetches should actually be performed.
The default value of the Stride
parameter indicates that the data at
mem
uses a packed memory layout (i.e. a stride value of
sizeof(Value)
); other values override this behavior.
Array
, typename Index
>scatter_add
(const void *mem, Array array, Index index, mask_t<Array> mask = true)¶Performs a scatter-add operation that is equivalent to the following scalar loop (mapped to efficient hardware instructions if supported by the target hardware).
for (size_t i = 0; i < Array::Size; ++i)
if (mask[i])
((Value *) mem)[index[i]] += array[i];
The implementation avoids conflicts in case multiple indices refer to the same entry.
Output
, typename Input
, typename Mask
>compress
(Output output, Input input, Mask mask)¶Tightly packs the input values selected by a provided mask and writes them
to output
, which must be a pointer or a structure of pointers. See the
advanced topics section with regards to usage. The
function returns count(mask)
and also advances the pointer by this
amount.
Array
, typename Index
, typename Mask
, typename Func
, typename ...Args
>transform
(scalar_t<Array> *mem, Index index, Mask mask, Func func, Args&&... args)¶Transforms referenced entries at mem
by the function func
while
avoiding potential conflicts. The variadic template arguments args
are
forwarded to the function. The pseudocode for this operation is
for (size_t i = 0; i < Array::Size; ++i) {
if (mask[i])
func(mem[index], args...);
}
See the section on the histogram problem and conflict detection on how to use this function.
Note
To efficiently perform the transformation at the hardware level, the
Index
and Array
types should occupy the same size. The
implementation ensures that this is the case by performing an explicit
cast of the index parameter to int_array_t<Array>
.
DArray
>empty
(size_t size)¶Allocates and returns a dynamic array of type DArray
of size size
.
The array contents are uninitialized.
Array
>zero
()¶Returns a static array filled with zeros. This is analogous to writing
Array(0)
but makes it more explicit to the compiler that a specific
efficient instruction sequence should be used for zero-initialization.
DArray
>zero
(size_t size)¶Allocates and returns a dynamic array of type DArray
that is filled
with zeros.
Array
>arange
()¶Return an array initialized with an index sequence, i.e. 0, 1, .., Array::Size-1
.
DArray
>arange
(size_t size)¶Allocates and returns a dynamic array of type DArray
that is filled an
index sequence 0..size-1
.
Array
>linspace
(scalar_t<Array> min, scalar_t<Array> max)¶Return an array initialized with linear linearly spaced entries including
the endpoints min
and max
.
DArray
>linspace
(scalar_t<DArray> min, scalar_t<DArray> max, size_t size)¶Allocates and returns a dynamic array initialized with size
linear
linearly spaced entries including the endpoints min
and max
.
DArray
>meshgrid
(const DArray &x, const DArray &y)¶Creates a 2D coordinate array containing all pairs of entries from the
x
and y
arrays. Analogous to the meshgrid
function in NumPy.
using FloatP = Array<float>;
using FloatX = DynamicArray<FloatP>;
auto x = linspace<FloatX>(0.f, 1.f, 4);
auto y = linspace<FloatX>(2.f, 3.f, 4);
Array<FloatX, 2> grid = meshgrid(x, y);
std::cout << grid << std::endl;
/* Prints:
[[0, 2],
[0.333333, 2],
[0.666667, 2],
[1, 2],
[0, 2.33333],
[0.333333, 2.33333],
[0.666667, 2.33333],
[1, 2.33333],
[0, 2.66667],
[0.333333, 2.66667],
[0.666667, 2.66667],
[1, 2.66667],
[0, 3],
[0.333333, 3],
[0.666667, 3],
[1, 3]]
*/
Predicate
, typename Index
>binary_search
(scalar_t<Index> start, scalar_t<Index> end, Predicate pred)¶Perform binary search over a range given a predicate pred
, which
monotonically decreases over this range (i.e. max one true
-> false
transition).
Given a (scalar) start
and end
index of a range, this function
evaluates a predicate floor(log2(end-start) + 1)
times with index
values on the interval [start, end] (inclusive) to find the first index
that no longer satisfies it. Note that the template parameter Index
is
automatically inferred from the supplied predicate. Specifically, the
predicate takes an index or an index vector of type Index
as input
argument. In the vectorized case, each vector lane can thus evaluate a
different predicate. When pred
is false
for all entries, the
function returns start
, and when it is true
for all cases, it
returns end
.
The following code example shows a typical use case: array
contains a
sorted list of floating point numbers, and the goal is to map floating
point entries of x
to the first index j
such that array[j] >= x
(and all of this of course in parallel for each vector element).
std::vector<float> array = ...;
Float32P x = ...;
UInt32P j = binary_search(
0,
array.size() - 1,
[&](UInt32P index) {
return gather(array.data(), index) < x;
}
);
Array
>operator*=
(Array &x, Array y)¶Compound assignment operator for multiplication. Concerning Array
types
with non-commutative multiplication: the operation expands to x = x * y;
.
Array
>mulhi
(Array x, Array y)¶Returns the high part of an integer multiplication. For 32-bit scalar input, this is e.g. equivalent to the following expression
(int32_t) (((int64_t) x * (int64_t) y) >> 32);
Array
>operator/
(Array x, Array y)¶Binary division operator. A special overload to multiply by the reciprocal when the second argument is a scalar.
Integer division is handled specially, see Vectorized integer division by constants for details.
Array
>operator||
(Array x, Array y)¶Binary logical OR operator (identical to operator|
, as no
short-circuiting is supported in operator overloads).
Array
>operator&&
(Array x, Array y)¶Binary logical AND operator. (identical to operator&
, as no
short-circuiting is supported in operator overloads).
Array
>andnot
(Array x, Array y)¶Binary logical AND NOT operator. (identical to x & ~y
).
Array
>operator<=
(Array x, Array y)¶Less-than-or-equal comparison operator.
Array
>operator>
(Array x, Array y)¶Greater-than comparison operator.
Array
>operator>=
(Array x, Array y)¶Greater-than-or-equal comparison operator.
Array
>neq
(Array x, Array y)¶Inequality operator (vertical operation).
Imm
, typename Array
>rol
(Array x)¶Left shift with rotation by an immediate amount Imm
.
Imm
, typename Array
>ror
(Array x)¶Right shift with rotation by an immediate amount Imm
.
Imm
, typename Array
>ror_array
(Array x)¶Rotate the entire array by Imm
entries towards the right, i.e.
coeff[0]
becomes coeff[Imm]
, etc.
Imm
, typename Array
>rol_array
(Array x)¶Rotate the entire array by Imm
entries towards the left, i.e.
coeff[Imm]
becomes coeff[0]
, etc.
Target
, typename Source
>reinterpret_array
(Source x)¶Reinterprets the bit-level representation of an array (e.g. from
uint32_t
to float
). See the section on reinterpreting array
contents for further details.
Array
>rcp
(Array x)¶Computes the reciprocal \(\frac{1}{x}\). A slightly less accurate (but
more efficient) implementation is used when approximate mode is enabled for
Array
. Relies on AVX512ER instructions if available.
Array
>rsqrt
(Array x)¶Computes the reciprocal square root \(\frac{1}{\sqrt{x}}\). A slightly
less accurate (but more efficient) implementation is used when approximate
mode is enabled for Array
. Relies on AVX512ER instructions if available.
Array
>abs
(Array x)¶Computes the absolute value \(|x|\) (analogous to std::abs
).
Array
>max
(Array x, Array y)¶Returns the maximum of \(x\) and \(y\) (analogous to std::max
).
Array
>min
(Array x, Array y)¶Returns the minimum of \(x\) and \(y\) (analogous to std::min
).
Array
>sign
(Array x)¶Computes the signum function \(\begin{cases}1,&\mathrm{if}\ x\ge 0\\-1,&\mathrm{otherwise}\end{cases}\)
Analogous to std::copysign(1.f, x)
.
Array
>copysign
(Array x, Array y)¶Copies the sign of the array y
to x
(analogous to std::copysign
).
Array
>mulsign_neg
(Array x, Array y)¶Efficiently multiplies x
by the sign of -y
.
Array
>sqrt
(Array x)¶Computes the square root of \(x\) (analogous to std::sqrt
).
Array
>cbrt
(Array x)¶Computes the cube root of \(x\) (analogous to std::cbrt
).
Array
>hypot
(Array x, Array y)¶Computes \(\sqrt{x^2+y^2}\) while avoiding overflow and underflow.
Array
>ceil
(Array x)¶Computes the ceiling of \(x\) (analogous to std::ceil
).
Array
>floor
(Array x)¶Computes the floor of \(x\) (analogous to std::floor
).
Target
, typename Array
>ceil2int
(Array x)¶Computes the ceiling of \(x\) and converts the result to an integer. If
supported by the hardware, the combined operation is more efficient than
the analogous expression Target(ceil(x))
.
Target
, typename Array
>floor2int
(Array x)¶Computes the floor of \(x\) and converts the result to an integer. If
supported by the hardware, the combined operation is more efficient than
the analogous expression Target(floor(x))
.
Array
>round
(Array x)¶Rounds \(x\) to the nearest integer using Banker’s rounding for half-way values.
Note
This is analogous to std::rint
, not std::round
.
Array
>fmod
(Array x, Array y)¶Computes the floating-point remainder of the division operation x/y
Array
>fmadd
(Array x, Array y, Array z)¶Performs a fused multiply-add operation if supported by the target
hardware. Otherwise, the operation is emulated using conventional
multiplication and addition (i.e. x * y + z
).
Array
>fnmadd
(Array x, Array y, Array z)¶Performs a fused negative multiply-add operation if supported by the target
hardware. Otherwise, the operation is emulated using conventional
multiplication and addition (i.e. -x * y + z
).
Array
>fmsub
(Array x, Array y, Array z)¶Performs a fused multiply-subtract operation if supported by the target
hardware. Otherwise, the operation is emulated using conventional
multiplication and subtraction (i.e. x * y - z
).
Array
>fnmsub
(Array x, Array y, Array z)¶Performs a fused negative multiply-subtract operation if supported by the
target hardware. Otherwise, the operation is emulated using conventional
multiplication and subtraction (i.e. -x * y - z
).
Array
>fmaddsub
(Array x, Array y, Array z)¶Performs a fused multiply-add and multiply-subtract operation for alternating elements. The pseudocode for this operation is
Array result;
for (size_t i = 0; i < Array::Size; ++i) {
if (i % 2 == 0)
result[i] = x[i] * y[i] - c[i];
else
result[i] = x[i] * y[i] + c[i];
}
Array
>fmsubadd
(Array x, Array y, Array z)¶Performs a fused multiply-add and multiply-subtract operation for alternating elements. The pseudocode for this operation is
Array result;
for (size_t i = 0; i < Array::Size; ++i) {
if (i % 2 == 0)
result[i] = x[i] * y[i] + c[i];
else
result[i] = x[i] * y[i] - c[i];
}
Array
>ldexp
(Array x, Array n)¶Multiplies \(x\) by \(2^n\). Analogous to std::ldexp
except
that n
is a floating point argument.
Array
>frexp
(Array x)¶Breaks the floating-point number \(x\) into a normalized fraction and
power of 2. Analogous to std::frexp
except that both return values are
floating point values.
Array
>operator==
(Array x, Array y)¶Equality operator.
Warning
Following the principle of least surprise,
enoki::operator==()
is a horizontal operations that returns a
boolean value; a vertical alternatives named eq()
is also
available. The following pair of operations is equivalent:
bool b1 = (f1 == f2);
bool b2 = all(eq(f1, f2));
Array
>operator!=
(Array x, Array y)¶Warning
Following the principle of least surprise,
enoki::operator!=()
is a horizontal operations that returns a
boolean value; a vertical alternatives named neq()
is also
available. The following pair of operations is equivalent:
bool b1 = (f1 != f2);
bool b2 = any(neq(f1, f2));
Array
>reverse
(Array value)¶Returns the input array with all components reversed, i.e.
value[Array::Size-1], .., value[0]
For multidimensional arrays, the outermost dimension of Array
are
reversed. Note that this operation is currently not efficiently vectorized
on 1D CPU arrays (though GPU and/or multi-dimensional arrays are fine).
Array
>psum
(Array value)¶Computes the inclusive prefix sum of the components of value
, i.e.
value[0], value[0] + value[1], .., value[0] + .. + value[Array::Size-1];
For multidimensional arrays, the horizontal reduction is performed over the
outermost dimension of Array
. Note that this operation is currently
not efficiently vectorized on 1D CPU arrays (though GPU and/or
multi-dimensional arrays are fine).
Array
>hsum
(Array value)¶Efficiently computes the horizontal sum of the components of value
, i.e.
value[0] + .. + value[Array::Size-1];
For 1D arrays, hsum()
returns a scalar result. For multidimensional
arrays, the horizontal reduction is performed over the outermost dimension
of Array
, and the result is of type value_t<Array>
.
Array
>hsum_inner
(Array value)¶Analogous to hsum()
, exept that the horizontal reduction is
performed over the innermost dimension of Array
(which is only
relevant in the case of a multidimensional input array).
Array
>hsum_nested
(Array value)¶Recursive version of hsum()
, which nests through all dimensions
and always returns a scalar.
Array
>hprod
(Array value)¶Efficiently computes the horizontal product of the components of value
, i.e.
value[0] * .. * value[Array::Size-1];
For 1D arrays, hprod()
returns a scalar result. For multidimensional
arrays, the horizontal reduction is performed over the outermost dimension
of Array
, and the result is of type value_t<Array>
.
Array
>hprod_inner
(Array value)¶Analogous to hprod()
, exept that the horizontal reduction is
performed over the innermost dimension of Array
(which is only
relevant in the case of a multidimensional input array).
Array
>hprod_nested
(Array value)¶Recursive version of hprod()
, which nests through all dimensions
and always returns a scalar.
Array
>hmax
(Array value)¶Efficiently computes the horizontal maximum of the components of value
, i.e.
max(value[0], max(value[1], ...))
For 1D arrays, hmax()
returns a scalar result. For multidimensional
arrays, the horizontal reduction is performed over the outermost dimension
of Array
, and the result is of type value_t<Array>
.
Array
>hmax_inner
(Array value)¶Analogous to hmax()
, exept that the horizontal reduction is
performed over the innermost dimension of Array
(which is only
relevant in the case of a multidimensional input array).
Array
>hmax_nested
(Array value)¶Recursive version of hmax()
, which nests through all dimensions
and always returns a scalar.
Array
>hmin
(Array value)¶Efficiently computes the horizontal minimum of the components of value
, i.e.
min(value[0], min(value[1], ...))
For 1D arrays, hmin()
returns a scalar result. For multidimensional
arrays, the horizontal reduction is performed over the outermost dimension
of Array
, and the result is of type value_t<Array>
.
Array
>hmin_inner
(Array value)¶Analogous to hmin()
, exept that the horizontal reduction is
performed over the innermost dimension of Array
(which is only
relevant in the case of a multidimensional input array).
Array
>hmin_nested
(Array value)¶Recursive version of hmin()
, which nests through all dimensions
and always returns a scalar.
Mask
>all
(Mask value)¶Efficiently computes the horizontal AND (i.e. logical conjunction) of the
components of the mask value
, i.e.
value[0] & ... & value[Size-1]
For 1D arrays, all()
returns a boolean result. For multidimensional
arrays, the horizontal reduction is performed over the outermost dimension
of Mask
, and the result is of type mask_t<value_t<Mask>>
.
Mask
>all_inner
(Mask value)¶Analogous to all()
, exept that the horizontal reduction is
performed over the innermost dimension of Mask
(which is only
relevant in the case of a multidimensional input array).
Mask
>all_nested
(Mask value)¶Recursive version of all()
, which nests through all dimensions
and always returns a boolean value.
Mask
, bool Default
>all_or
(Mask value)¶This function calls returns the Default
template argument when Mask
is
a GPU array. Otherwise, it falls back all()
. See the section on
horizontal operations on the GPU for
details.
Mask
>any
(Mask value)¶Efficiently computes the horizontal OR (i.e. logical disjunction) of the
components of the mask value
, i.e.
value[0] | ... | value[Size-1]
For 1D arrays, any()
returns a boolean result. For multidimensional
arrays, the horizontal reduction is performed over the outermost dimension
of Mask
, and the result is of type mask_t<value_t<Mask>>
.
Mask
>any_inner
(Mask value)¶Analogous to any()
, exept that the horizontal reduction is
performed over the innermost dimension of Mask
(which is only
relevant in the case of a multidimensional input array).
Mask
>any_nested
(Mask value)¶Recursive version of any()
, which nests through all dimensions
and always returns a boolean value.
Mask
, bool Default
>any_or
(Mask value)¶This function calls returns the Default
template argument when Mask
is
a GPU array. Otherwise, it falls back any()
. See the section on
horizontal operations on the GPU for
details.
Mask
>none
(Mask value)¶Efficiently computes the negated horizontal OR of the components of the
mask value
, i.e.
~(value[0] | ... | value[Size-1])
For 1D arrays, none()
returns a boolean result. For multidimensional
arrays, the horizontal reduction is performed over the outermost dimension
of Mask
, and the result is of type mask_t<value_t<Mask>>
.
Mask
>none_inner
(Mask value)¶Analogous to none()
, exept that the horizontal reduction is
performed over the innermost dimension of Mask
(which is only
relevant in the case of a multidimensional input array).
Mask
>none_nested
(Mask value)¶Recursive version of none()
, which nests through all dimensions
and always returns a boolean value.
Mask
, bool Default
>none_or
(Mask value)¶This function calls returns the Default
template argument when Mask
is
a GPU array. Otherwise, it falls back none()
. See the section on
horizontal operations on the GPU for
details.
Mask
>count
(Mask value)¶Efficiently computes the number of components whose mask bits are turned on, i.e.
(value[0] ? 1 : 0) + ... (value[Size - 1] ? 1 : 0)
For 1D arrays, count()
returns a result of type size_t
. For
multidimensional arrays, the horizontal reduction is performed over the
outermost dimension of Mask
, and the result is of type
size_array_t<value_t<Mask>>
.
Mask
>count_inner
(Mask value)¶Analogous to count()
, exept that the horizontal reduction is
performed over the innermost dimension of Mask
(which is only
relevant in the case of a multidimensional input array).
Mask
>count_nested
(Mask value)¶Recursive version of count()
, which nests through all dimensions
and always returns a boolean value.
Array
>dot
(Array value1, Array value2)¶Efficiently computes the dot products of value1
and value2
, i.e.:
value1[0]*value2[0] + .. + value1[Array::Size-1]*value2[Array::Size-1];
The return value is of type value_t<Array>
, which is a scalar (e.g.
float
) for ordinary inputs and an array for nested array inputs.
Array
>norm
(Array value)¶Computes the 2-norm of the input array. The return value is of type
value_t<Array>
, which is a scalar (e.g. float
) for ordinary inputs
and an array for nested array inputs.
Most approximations of transcendental functions are based on routines in the CEPHES math library. The table below provides some statistics on their absolute and relative error.
The CEPHES approximations are only used when approximate mode is enabled; otherwise, the functions below will invoke the corresponding non-vectorized standard C library routines.
Note
The forward trigonometric functions (sin, cos, tan) are optimized for low error on the domain \(|x| < 8192\) and don’t perform as well beyond this range.
Function | Tested domain | Abs. error (mean) | Abs. error (max) | Rel. error (mean) | Rel. error (max) |
---|---|---|---|---|---|
\(\mathrm{sin}()\) | \(-8192 < x < 8192\) | \(1.2 \cdot 10^{-8}\) | \(1.2 \cdot 10^{-7}\) | \(1.9 \cdot 10^{-8}\,(0.25\,\mathrm{ulp})\) | \(1.8 \cdot 10^{-6}\,(19\,\mathrm{ulp})\) |
\(\mathrm{cos}()\) | \(-8192 < x < 8192\) | \(1.2 \cdot 10^{-8}\) | \(1.2 \cdot 10^{-7}\) | \(1.9 \cdot 10^{-8}\,(0.25\,\mathrm{ulp})\) | \(3.1 \cdot 10^{-6}\,(47\,\mathrm{ulp})\) |
\(\mathrm{tan}()\) | \(-8192 < x < 8192\) | \(4.7 \cdot 10^{-6}\) | \(8.1 \cdot 10^{-1}\) | \(3.4 \cdot 10^{-8}\,(0.42\,\mathrm{ulp})\) | \(3.1 \cdot 10^{-6}\,(30\,\mathrm{ulp})\) |
\(\mathrm{cot}()\) | \(-8192 < x < 8192\) | \(2.6 \cdot 10^{-6}\) | \(0.11 \cdot 10^{1}\) | \(3.5 \cdot 10^{-8}\,(0.42\,\mathrm{ulp})\) | \(3.1 \cdot 10^{-6}\,(47\,\mathrm{ulp})\) |
\(\mathrm{asin}()\) | \(-1 < x < 1\) | \(2.3 \cdot 10^{-8}\) | \(1.2 \cdot 10^{-7}\) | \(2.9 \cdot 10^{-8}\,(0.33\,\mathrm{ulp})\) | \(2.3 \cdot 10^{-7}\,(2\,\mathrm{ulp})\) |
\(\mathrm{acos}()\) | \(-1 < x < 1\) | \(4.7 \cdot 10^{-8}\) | \(2.4 \cdot 10^{-7}\) | \(2.9 \cdot 10^{-8}\,(0.33\,\mathrm{ulp})\) | \(1.2 \cdot 10^{-7}\,(1\,\mathrm{ulp})\) |
\(\mathrm{atan}()\) | \(-1 < x < 1\) | \(1.8 \cdot 10^{-7}\) | \(6 \cdot 10^{-7}\) | \(4.2 \cdot 10^{-7}\,(4.9\,\mathrm{ulp})\) | \(8.2 \cdot 10^{-7}\,(12\,\mathrm{ulp})\) |
\(\mathrm{sinh}()\) | \(-10 < x < 10\) | \(2.6 \cdot 10^{-5}\) | \(2 \cdot 10^{-3}\) | \(2.8 \cdot 10^{-8}\,(0.34\,\mathrm{ulp})\) | \(2.7 \cdot 10^{-7}\,(3\,\mathrm{ulp})\) |
\(\mathrm{cosh}()\) | \(-10 < x < 10\) | \(2.9 \cdot 10^{-5}\) | \(2 \cdot 10^{-3}\) | \(2.9 \cdot 10^{-8}\,(0.35\,\mathrm{ulp})\) | \(2.5 \cdot 10^{-7}\,(4\,\mathrm{ulp})\) |
\(\mathrm{tanh}()\) | \(-10 < x < 10\) | \(4.8 \cdot 10^{-8}\) | \(4.2 \cdot 10^{-7}\) | \(5 \cdot 10^{-8}\,(0.76\,\mathrm{ulp})\) | \(5 \cdot 10^{-7}\,(7\,\mathrm{ulp})\) |
\(\mathrm{csch}()\) | \(-10 < x < 10\) | \(5.7 \cdot 10^{-8}\) | \(7.8 \cdot 10^{-3}\) | \(4.4 \cdot 10^{-8}\,(0.54\,\mathrm{ulp})\) | \(3.1 \cdot 10^{-7}\,(5\,\mathrm{ulp})\) |
\(\mathrm{sech}()\) | \(-10 < x < 10\) | \(6.7 \cdot 10^{-9}\) | \(1.8 \cdot 10^{-7}\) | \(4.3 \cdot 10^{-8}\,(0.54\,\mathrm{ulp})\) | \(3.2 \cdot 10^{-7}\,(4\,\mathrm{ulp})\) |
\(\mathrm{coth}()\) | \(-10 < x < 10\) | \(1.2 \cdot 10^{-7}\) | \(7.8 \cdot 10^{-3}\) | \(6.9 \cdot 10^{-8}\,(0.61\,\mathrm{ulp})\) | \(5.7 \cdot 10^{-7}\,(8\,\mathrm{ulp})\) |
\(\mathrm{asinh}()\) | \(-30 < x < 30\) | \(2.8 \cdot 10^{-8}\) | \(4.8 \cdot 10^{-7}\) | \(1 \cdot 10^{-8}\,(0.13\,\mathrm{ulp})\) | \(1.7 \cdot 10^{-7}\,(2\,\mathrm{ulp})\) |
\(\mathrm{acosh}()\) | \(1 < x < 10\) | \(2.9 \cdot 10^{-8}\) | \(2.4 \cdot 10^{-7}\) | \(1.5 \cdot 10^{-8}\,(0.18\,\mathrm{ulp})\) | \(2.4 \cdot 10^{-7}\,(3\,\mathrm{ulp})\) |
\(\mathrm{atanh}()\) | \(-1 < x < 1\) | \(9.9 \cdot 10^{-9}\) | \(2.4 \cdot 10^{-7}\) | \(1.5 \cdot 10^{-8}\,(0.18\,\mathrm{ulp})\) | \(1.2 \cdot 10^{-7}\,(1\,\mathrm{ulp})\) |
\(\mathrm{exp}()\) | \(-20 < x < 30\) | \(0.72 \cdot 10^{4}\) | \(0.1 \cdot 10^{7}\) | \(2.4 \cdot 10^{-8}\,(0.27\,\mathrm{ulp})\) | \(1.2 \cdot 10^{-7}\,(1\,\mathrm{ulp})\) |
\(\mathrm{log}()\) | \(10^{-20} < x < 2\cdot 10^{30}\) | \(9.6 \cdot 10^{-9}\) | \(7.6 \cdot 10^{-6}\) | \(1.4 \cdot 10^{-10}\,(0.0013\,\mathrm{ulp})\) | \(1.2 \cdot 10^{-7}\,(1\,\mathrm{ulp})\) |
\(\mathrm{erf}()\) | \(-1 < x < 1\) | \(3.2 \cdot 10^{-8}\) | \(1.8 \cdot 10^{-7}\) | \(6.4 \cdot 10^{-8}\,(0.78\,\mathrm{ulp})\) | \(3.3 \cdot 10^{-7}\,(4\,\mathrm{ulp})\) |
\(\mathrm{erfc}()\) | \(-1 < x < 1\) | \(3.4 \cdot 10^{-8}\) | \(2.4 \cdot 10^{-7}\) | \(6.4 \cdot 10^{-8}\,(0.79\,\mathrm{ulp})\) | \(1 \cdot 10^{-6}\,(11\,\mathrm{ulp})\) |
Function | Tested domain | Abs. error (mean) | Abs. error (max) | Rel. error (mean) | Rel. error (max) |
---|---|---|---|---|---|
\(\mathrm{sin}()\) | \(-8192 < x < 8192\) | \(2.2 \cdot 10^{-17}\) | \(2.2 \cdot 10^{-16}\) | \(3.6 \cdot 10^{-17}\,(0.25\,\mathrm{ulp})\) | \(3.1 \cdot 10^{-16}\,(2\,\mathrm{ulp})\) |
\(\mathrm{cos}()\) | \(-8192 < x < 8192\) | \(2.2 \cdot 10^{-17}\) | \(2.2 \cdot 10^{-16}\) | \(3.6 \cdot 10^{-17}\,(0.25\,\mathrm{ulp})\) | \(3 \cdot 10^{-16}\,(2\,\mathrm{ulp})\) |
\(\mathrm{tan}()\) | \(-8192 < x < 8192\) | \(6.8 \cdot 10^{-16}\) | \(1.2 \cdot 10^{-10}\) | \(5.4 \cdot 10^{-17}\,(0.35\,\mathrm{ulp})\) | \(4.1 \cdot 10^{-16}\,(3\,\mathrm{ulp})\) |
\(\mathrm{cot}()\) | \(-8192 < x < 8192\) | \(4.9 \cdot 10^{-16}\) | \(1.2 \cdot 10^{-10}\) | \(5.5 \cdot 10^{-17}\,(0.36\,\mathrm{ulp})\) | \(4.4 \cdot 10^{-16}\,(3\,\mathrm{ulp})\) |
\(\mathrm{asin}()\) | \(-1 < x < 1\) | \(1.3 \cdot 10^{-17}\) | \(2.2 \cdot 10^{-16}\) | \(1.5 \cdot 10^{-17}\,(0.098\,\mathrm{ulp})\) | \(2.2 \cdot 10^{-16}\,(1\,\mathrm{ulp})\) |
\(\mathrm{acos}()\) | \(-1 < x < 1\) | \(5.4 \cdot 10^{-17}\) | \(4.4 \cdot 10^{-16}\) | \(3.5 \cdot 10^{-17}\,(0.23\,\mathrm{ulp})\) | \(2.2 \cdot 10^{-16}\,(1\,\mathrm{ulp})\) |
\(\mathrm{atan}()\) | \(-1 < x < 1\) | \(4.3 \cdot 10^{-17}\) | \(3.3 \cdot 10^{-16}\) | \(1 \cdot 10^{-16}\,(0.65\,\mathrm{ulp})\) | \(7.1 \cdot 10^{-16}\,(5\,\mathrm{ulp})\) |
\(\mathrm{sinh}()\) | \(-10 < x < 10\) | \(3.1 \cdot 10^{-14}\) | \(1.8 \cdot 10^{-12}\) | \(3.3 \cdot 10^{-17}\,(0.22\,\mathrm{ulp})\) | \(4.3 \cdot 10^{-16}\,(2\,\mathrm{ulp})\) |
\(\mathrm{cosh}()\) | \(-10 < x < 10\) | \(2.2 \cdot 10^{-14}\) | \(1.8 \cdot 10^{-12}\) | \(2 \cdot 10^{-17}\,(0.13\,\mathrm{ulp})\) | \(2.9 \cdot 10^{-16}\,(2\,\mathrm{ulp})\) |
\(\mathrm{tanh}()\) | \(-10 < x < 10\) | \(5.6 \cdot 10^{-17}\) | \(3.3 \cdot 10^{-16}\) | \(6.1 \cdot 10^{-17}\,(0.52\,\mathrm{ulp})\) | \(5.5 \cdot 10^{-16}\,(3\,\mathrm{ulp})\) |
\(\mathrm{csch}()\) | \(-10 < x < 10\) | \(4.5 \cdot 10^{-17}\) | \(1.8 \cdot 10^{-12}\) | \(3.3 \cdot 10^{-17}\,(0.21\,\mathrm{ulp})\) | \(5.1 \cdot 10^{-16}\,(4\,\mathrm{ulp})\) |
\(\mathrm{sech}()\) | \(-10 < x < 10\) | \(3 \cdot 10^{-18}\) | \(2.2 \cdot 10^{-16}\) | \(2 \cdot 10^{-17}\,(0.13\,\mathrm{ulp})\) | \(4.3 \cdot 10^{-16}\,(2\,\mathrm{ulp})\) |
\(\mathrm{coth}()\) | \(-10 < x < 10\) | \(1.2 \cdot 10^{-16}\) | \(3.6 \cdot 10^{-12}\) | \(6.2 \cdot 10^{-17}\,(0.3\,\mathrm{ulp})\) | \(6.7 \cdot 10^{-16}\,(5\,\mathrm{ulp})\) |
\(\mathrm{asinh}()\) | \(-30 < x < 30\) | \(5.1 \cdot 10^{-17}\) | \(8.9 \cdot 10^{-16}\) | \(1.9 \cdot 10^{-17}\,(0.13\,\mathrm{ulp})\) | \(4.4 \cdot 10^{-16}\,(2\,\mathrm{ulp})\) |
\(\mathrm{acosh}()\) | \(1 < x < 10\) | \(4.9 \cdot 10^{-17}\) | \(4.4 \cdot 10^{-16}\) | \(2.6 \cdot 10^{-17}\,(0.17\,\mathrm{ulp})\) | \(6.6 \cdot 10^{-16}\,(5\,\mathrm{ulp})\) |
\(\mathrm{atanh}()\) | \(-1 < x < 1\) | \(1.8 \cdot 10^{-17}\) | \(4.4 \cdot 10^{-16}\) | \(3.2 \cdot 10^{-17}\,(0.21\,\mathrm{ulp})\) | \(3 \cdot 10^{-16}\,(2\,\mathrm{ulp})\) |
\(\mathrm{exp}()\) | \(-20 < x < 30\) | \(4.7 \cdot 10^{-6}\) | \(2 \cdot 10^{-3}\) | \(2.5 \cdot 10^{-17}\,(0.16\,\mathrm{ulp})\) | \(3.3 \cdot 10^{-16}\,(2\,\mathrm{ulp})\) |
\(\mathrm{log}()\) | \(10^{-20} < x < 2\cdot 10^{30}\) | \(1.9 \cdot 10^{-17}\) | \(1.4 \cdot 10^{-14}\) | \(2.7 \cdot 10^{-19}\,(0.0013\,\mathrm{ulp})\) | \(2.2 \cdot 10^{-16}\,(1\,\mathrm{ulp})\) |
\(\mathrm{erf}()\) | \(-1 < x < 1\) | \(4.7 \cdot 10^{-17}\) | \(4.4 \cdot 10^{-16}\) | \(9.6 \cdot 10^{-17}\,(0.63\,\mathrm{ulp})\) | \(5.9 \cdot 10^{-16}\,(5\,\mathrm{ulp})\) |
\(\mathrm{erfc}()\) | \(-1 < x < 1\) | \(4.8 \cdot 10^{-17}\) | \(4.4 \cdot 10^{-16}\) | \(9.6 \cdot 10^{-17}\,(0.64\,\mathrm{ulp})\) | \(2.5 \cdot 10^{-15}\,(16\,\mathrm{ulp})\) |
Array
>sin
(Array x)¶Sine function approximation based on the CEPHES library.
Array
>cos
(Array x)¶Cosine function approximation based on the CEPHES library.
Array
>sincos
(Array x)¶Simultaneous sine and cosine function approximation based on the CEPHES library.
Array
>tan
(Array x)¶Tangent function approximation based on the CEPHES library.
Array
>csc
(Array x)¶Cosecant convenience function implemented as rcp(sin(x))
.
Array
>sec
(Array x)¶Cosecant convenience function implemented as rcp(cos(x))
.
Array
>cot
(Array x)¶Cotangent convenience function implemented as rcp(tan(x))
.
Array
>asin
(Array x)¶Arcsine function approximation based on the CEPHES library.
Array
>acos
(Array x)¶Arccosine function approximation based on the CEPHES library.
Array
>sinh
(Array x)¶Hyperbolic sine function approximation based on the CEPHES library.
Array
>cosh
(Array x)¶Hyperbolic cosine function approximation based on the CEPHES library.
Array
>sincosh
(Array x)¶Simultaneous hyperbolic sine and cosine function approximation based on the CEPHES library.
Array
>tanh
(Array x)¶Hyperbolic tangent function approximation based on the CEPHES library.
Array
>csch
(Array x)¶Hyperbolic cosecant convenience function implemented as rcp(sinh(x))
.
Array
>coth
(Array x)¶Hyperbolic cotangent convenience function implemented as rcp(tanh(x))
.
Array
>asinh
(Array x)¶Hyperbolic arcsine function approximation based on the CEPHES library.
Array
>exp
(Array x)¶Natural exponential function approximation based on the CEPHES library. Relies on AVX512ER instructions if available.
Array
>safe_sqrt
(Array x)¶Computes sqrt(max(0, x))
to avoid issues with negative inputs
(e.g. due to roundoff error in a prior calculation).
Array
>safe_rsqrt
(Array x)¶Computes rsqrt(max(0, x))
to avoid issues with negative inputs
(e.g. due to roundoff error in a prior calculation).
The following special functions require including the header
enoki/special.h
.
Array
>erf
(Array x)¶Evaluates the error function defined as
Requires a real-valued input array x
.
Array
>erfi
(Array x)¶Evaluates the imaginary error function defined as
Requires a real-valued input array x
.
Array
>i0e
(Array x)¶Evaluates the exponentially scaled modified Bessel function of order zero defined as
where
Array
>gamma
(Array x)¶Evaluates the Gamma function defined as
Array
>lgamma
(Array x)¶Evaluates the natural logarithm of the Gamma function.
Array
>dawson
(Array x)¶Evaluates Dawson’s integral defined as
Array
>ellint_1
(Array phi, Array k)¶Evaluates the incomplete elliptic integral of the first kind
Array
>comp_ellint_1
(Array k)¶Evaluates the complete elliptic integral of the first kind
Array
>ellint_2
(Array phi, Array k)¶Evaluates the incomplete elliptic integral of the second kind
Array
>comp_ellint_2
(Array k)¶Evaluates the complete elliptic integral of the second kind
Array
>isnan
(Array x)¶Checks for NaN values and returns a mask, analogous to std::isnan
.
Array
>isinf
(Array x)¶Checks for infinite values and returns a mask, analogous to std::isinf
.
Array
>isfinite
(Array x)¶Checks for finite values and returns a mask, analogous to std::isfinite
.
Array
>isdenormal
(Array x)¶Checks for denormalized values and returns a mask.
Array
>deg_to_rad
(Array array)¶Convenience function which multiplies the input array by \(\frac{\pi}{180}\).
Array
>rad_to_deg
(Array array)¶Convenience function which multiplies the input array by \(\frac{180}{\pi}\).
Array
>prev_float
(Array array)¶Return the prev representable floating point value for each element of
array
analogous to std::nextafter(array, -∞)
. Special values
(infinities & not-a-number values) are returned unchanged.
Array
>next_float
(Array array)¶Return the next representable floating point value for each element of
array
analogous to std::nextafter(array, ∞)
. Special values
(infinities & not-a-number values) are returned unchanged.
Array
>tzcnt
(Array array)¶Return the number of trailing zero bits (assumes that Array
is an integer array).
Array
>lzcnt
(Array array)¶Return the number of leading zero bits (assumes that Array
is an integer array).
Array
>popcnt
(Array array)¶Return the number nonzero bits (assumes that Array
is an integer array).
Array
>log2i
(Array array)¶Return the floor of the base-two logarithm (assumes that Array
is an integer array).
Index
>range
(scalar_t<Index> size)¶Returns an iterable, which generates linearly increasing index vectors from
0
to size-1
. This function is meant to be used with the C++11
range-based for loop:
for (auto pair : range<Index>(1000)) {
Index index = pair.first;
mask_t<Index> mask = pair.second;
// ...
}
The mask specifies which index vector entries are active: unless the number of interations is exactly divisible by the packet size, the last loop iteration will generally have several disabled entries.
The implementation also supports iteration of multidimensional arrays
using UInt32P = Packet<uint32_t>;
using Vector3uP = Array<UInt32P, 3>;
for (auto pair : range<Vector3uP>(10, 20, 30)) {
Vector3uP index = pair.first;
mask_t<Index> mask = pair.second;
// ...
}
which will generate indices (0, 0, 0)
, (0, 0, 1)
, etc. As before, the
last loop iteration will generally have several disabled entries.
set_flush_denormals
(bool value)¶Arithmetic involving denormalized floating point numbers triggers a slow microcode handler on most current architectures, which leads to severe performance penalties. This function can be used to specify whether denormalized floating point values are simply flushed to zero, which sidesteps the performance issues.
Enoki also provides a tiny a RAII wrapper named scoped_flush_denormals
which sets (and later resets) this parameter.
flush_denormals
()¶Returns the denormals are flushed to zero (see set_flush_denormals()
).
Index
, typename Array
>shuffle
(Array a)¶Shuffles the contents of an array given indices specified as a template parameter. The pseudocode for this operation is
Array out;
for (size_t i = 0; i<Array::Size; ++i)
out[i] = a[Index[i]];
return out;
Array
, typename Indices
>shuffle
(Array a, Indices ind)¶Shuffles the contents of an array given indices specified via an integer array. The pseudocode for this operation is
Array out;
for (size_t i = 0; i<Array::Size; ++i)
out[i] = a[ind[i]];
return out;
Array1
, typename Array2
>concat
(Array1 a1, Array2 a2)¶Concatenates the contents of two arrays a1
and a2
. The pseudocode
for this operation is
Array<value_t<Array1>, Array1::Size + Array2::Size> out;
for (size_t i = 0; i<Array1::Size; ++i)
out[i] = a1[i];
for (size_t i = 0; i<Array2::Size; ++i)
out[i + Array1::Size] = a2[i];
return out;
Array
>low
(Array a)¶Returns the low part of the input array a
. The length of the low part
is defined as the largest power of two that is smaller than
Array::Size
. For power-of-two sized input, this function simply returns
the low half.
Array
>high
(Array a)¶Returns the high part of the input array a
. The length of the high part
is equal to Array::Size
minus the size of the low part. For
power-of-two sized input, this function simply returns the high half.
Size
, typename Array
>head
(Array a)¶Returns a new array containing the leading Size
elements of a
.
Size
, typename Array
>tail
(Array a)¶Returns a new array containing the trailing Size
elements of a
.
Outer
, typename Inner
>full
(const Inner &inner)¶Given an array type Outer
and a value of type Inner
, this function
returns a new composite type of shape [<shape of Outer>, <shape of
Inner>]
that is filled with the value of the inner
argument.
In the simplest case, this can be used to create a (potentially nested) array that is filled with constant values.
using Vector4f = Array<float, 4>;
using MyMatrix = Array<Vector4f, 4>;
MyMatrix result = full<MyMatrix>(10.f);
std::cout << result << std::endl;
/* Prints:
[[10, 10, 10, 10],
[10, 10, 10, 10],
[10, 10, 10, 10],
[10, 10, 10, 10]]
*/
Another use case entails replicating an array over the trailing dimensions of a new array:
result = full<Vector4f>(Vector4f(1, 2, 3, 4))
std::cout << result << std::endl;
/* Prints:
[[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4]]
*/
Note how this is different from the default broadcasting behavior of
arrays. In this case, Vector4f
and MyMatrix
have the same size
in the leading dimension, which would replicate the vector over that
axis instead:
result = MyMatrix(Vector4f(1, 2, 3, 4));
std::cout << result << std::endl;
/* Prints:
[[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]]
*/
DArray
>packet
(DArray &&a, size_t i)¶Extracts the \(i\)-th packet from a dynamic array or data structure. See the chapter on dynamic arrays on how to use this function.
DArray
>packets
(const DArray &a)¶Return the number of packets stored in the given dynamic array or data structure.
DArray
>slice
(DArray &&a, size_t i)¶Extracts the \(i\)-th slice from a dynamic array or data structure. See the chapter on dynamic arrays on how to use this function.
DArray
>slices
(const DArray &a)¶Return the number of packets stored in the given dynamic array or data structure.
DArray
>set_slices
(DArray &a, size_t size)¶Resize the given dynamic array or data structure so that there is space for
size
slices. When reducing the size of the array, any memory allocated
so far is kept. Since there’s no exponential allocation mechanism, it is not
recommended to call set_slices
repeatedly to append elements.
Unlike std::vector::resize()
, previous values are not preserved when
enlarging the array.
The following type traits are available to query the properties of arrays at compile time.
The enoki::replace_scalar_t
type trait and various aliases construct arrays
matching a certain layout, but with different-flavored data. This is often
helpful when defining custom data structures or function inputs. See the
section on custom data structures for an example
usage.
Array
, typename Scalar
>replace_scalar_t
¶Replaces the scalar type underlying an array. For instance,
replace_scalar_t<Array<Array<float, 16>, 32>, int>
is equal to Array<Array<int,
16>, 32>
.
The type trait also works for scalar arguments. Pointers and reference
arguments are copied—for instance, replace_scalar_t<const float *, int>
is
equal to const int *
.
Array
>uint32_array_t
= replace_scalar_t<Array, uint32_t>¶Create a 32-bit unsigned integer array matching the layout of Array
.
Array
>int32_array_t
= replace_scalar_t<Array, int32_t>¶Create a 32-bit signed integer array matching the layout of Array
.
Array
>uint64_array_t
= replace_scalar_t<Array, uint64_t>¶Create a 64-bit unsigned integer array matching the layout of Array
.
Array
>int64_array_t
= replace_scalar_t<Array, int64_t>¶Create a 64-bit signed integer array matching the layout of Array
.
Array
>int_array_t
¶Create a signed integer array (with the same number of bits per entry as
the input) matching the layout of Array
.
Array
>uint_array_t
¶Create an unsigned integer array (with the same number of bits per entry as
the input) matching the layout of Array
.
Array
>float16_array_t
= replace_scalar_t<Array, half>¶Create a half precision array matching the layout of Array
.
Array
>float32_array_t
= replace_scalar_t<Array, float>¶Create a single precision array matching the layout of Array
.
Array
>float64_array_t
= replace_scalar_t<Array, double>¶Create a double precision array matching the layout of Array
.
Array
>float_array_t
¶Create a floating point array (with the same number of bits per entry as
the input) matching the layout of Array
.
Array
>bool_array_t
= replace_scalar_t<Array, bool>¶Create a boolean array matching the layout of Array
.
Array
>size_array_t
= replace_scalar_t<Array, size_t>¶Create a size_t
-valued array matching the layout of Array
.
Array
>ssize_array_t
= replace_scalar_t<Array, ssize_t>¶Create a ssize_t
-valued array matching the layout of Array
.
The following section discusses helper types that can be used to selectively enable or disable template functions for Enoki arrays, e.g. like so:
template <typename Value, enable_if_array_t<Value> = 0>
void f(Value value) {
/* Invoked if 'Value' is an Enoki array */
}
template <typename Value, enable_if_not_array_t<Value> = 0>
void f(Value value) {
/* Invoked if 'Value' is *not* an Enoki array */
}
T
>is_array
¶value
¶Equal to true
iff T
is a static or dynamic Enoki array type.
T
>is_mask
¶value
¶Equal to true
iff T
is a static or dynamic Enoki mask type.
T
>is_static_array
¶value
¶Equal to true
iff T
is a static Enoki array type.
T
>is_dynamic_array
¶value
¶Equal to true
iff T
is a dynamic Enoki array type.
T
>enable_if_dynamic_array_t
= std::enable_if_t<is_dynamic_array_v<T>, int>¶SFINAE alias to selectively enable a class or function definition for dynamic Enoki array types.
T
>enable_if_not_dynamic_array_t
= std::enable_if_t<!is_dynamic_array_v<T>, int>¶SFINAE alias to selectively enable a class or function definition for dynamic Enoki array types.
T
>is_dynamic
¶value
¶Equal to true
iff T
(which could be a nested Enoki array) contains
a dynamic array at any level.
This is different from is_dynamic_array
, which only cares
about the outermost level – for instance, given static array T
containing a nested dynamic array, is_dynamic_array_v<T> ==
false
, while is_dynamic_v<T> == true
.