# LoopRecipes.jl

`LoopRecipes.LoopRecipes`

`LoopRecipes.prefetching`

`LoopRecipes.simdeachindex`

`LoopRecipes.simdpairs`

`LoopRecipes.simdstored`

`LoopRecipes.unroll`

`LoopRecipes.LoopRecipes`

— Module**LoopRecipes: composable loops**

LoopRecipes.jl provides several constructs for high-performance loops based on the extended `foldl`

protocol of Transducers.jl.

See more in the documentation.

**API summary**

`unroll(factor, xs)`

: Unroll an array`xs`

by a given`factor`

.`prefetching(xs)`

: Prefetch each element in`xs`

. It works when`xs`

is a nested data structure (e.g., vector of vectors, vector of strings).`simdeachindex([width,] xs)`

: Iterate over indices of`xs`

using`SIMD.VecRange`

. It also takes care of the remainder loop.`simdpairs([width,] xs)`

: Iterate over index-value pairs of`xs`

using`SIMD.VecRange`

and`SIMD.Vec`

. It also takes care of the remainder loop.`simdstored([width,] xs)`

: For a sparse array`xs`

, iterate over stored index-value pairs of`xs`

using`SIMD.Vec`

. It also takes care of the remainder loop.

`LoopRecipes.prefetching`

— Method`prefetching(xs)`

Prefetch each boxed element in `xs`

. It can be used when `xs`

is a nested data structure (e.g., vector of vectors, vector of strings). Do nothing when the element of `xs`

is not boxed.

**Examples**

```
julia> using LoopRecipes
julia> sum(sum, prefetching([[1], [2, 3], [4, 5, 6]]))
21
julia> using FLoops
julia> @floop begin
acc = 0
for x in prefetching([[1], [2, 3], [4, 5, 6]])
acc += sum(x)
end
acc
end
21
```

`LoopRecipes.simdeachindex`

— Method`simdeachindex([width,] xs)`

Return a foldable that iterates over indices of `xs`

using `SIMD.VecRange`

and/or integer.

`width`

is an integer or a `Val`

of integer that specifies the SIMD width.

**Examples**

```
julia> using LoopRecipes
julia> foreach(simdeachindex(4, ones(10))) do i
@show i
end;
i = VecRange{4}(1)
i = VecRange{4}(5)
i = 9
i = 10
```

`LoopRecipes.simdpairs`

— Method`simdpairs([width,] xs)`

Return a foldable that iterates over index-value pairs of `xs`

using `SIMD.VecRange`

and `SIMD.Vec`

for the main part.

`width`

is an integer or a `Val`

of integer that specifies the SIMD width.

See also `simdeachindex`

.

**Examples**

```
julia> using LoopRecipes
julia> foreach(simdpairs(4, collect(100:100:1000))) do (i, v)
@show i v
end;
i = VecRange{4}(1)
v = <4 x Int64>[100, 200, 300, 400]
i = VecRange{4}(5)
v = <4 x Int64>[500, 600, 700, 800]
i = 9
v = 900
i = 10
v = 1000
```

Thanks to `setindex!`

overload on `VecRange`

, it is straightforward to use it for implementing a simple mapping.

```
julia> function double!(ys, xs)
@assert axes(ys) == axes(xs)
foreach(simdpairs(xs)) do (i, x)
@inbounds ys[i] = 2x
end
return ys
end;
julia> double!(zeros(5), ones(5))
5-element Array{Float64,1}:
2.0
2.0
2.0
2.0
2.0
```

When using `simdpairs`

for reduction, the accumulator `acc`

should be properly reduced depending on the type of `v`

. This is because the same loop body (aka `op`

or `rf`

) is used for all stages of the iteration.

```
julia> using SIMD # for Vec
julia> foldl(simdpairs(collect(1:10)); init = 0) do acc, (_, v)
x = 2 * v
(v isa Vec ? acc : sum(acc)) + x
end
110
```

Here is another example for demonstrating how `v isa Vec`

works:

```
julia> foldl(simdpairs(4, collect(10:24)); init = 0) do acc, (i, v)
@show first(i), acc, v
(v isa Vec ? acc : sum(acc)) + v
end
(first(i), acc, v) = (1, 0, <4 x Int64>[10, 11, 12, 13])
(first(i), acc, v) = (5, <4 x Int64>[10, 11, 12, 13], <4 x Int64>[14, 15, 16, 17])
(first(i), acc, v) = (9, <4 x Int64>[24, 26, 28, 30], <4 x Int64>[18, 19, 20, 21])
(first(i), acc, v) = (13, <4 x Int64>[42, 45, 48, 51], 22)
(first(i), acc, v) = (14, 208, 23)
(first(i), acc, v) = (15, 231, 24)
255
```

Observe that:

(1) When at the first iteration (`first(i) == 1`

), `acc`

is `0`

(as specified by `init = 0`

). This is broadcast to a `Vec`

because `v`

is a `Vec`

. See that `acc`

in the second iteration (`first(i) == 5`

) is a `Vec`

(`<4 x Int64>[10, 11, 12, 13]`

).

(2) At the second and third iterations, both `acc`

and `v`

are `Vec`

, yielding an `acc::Vec`

for the next iteration.

(3) At the iteration `first(i) == 13`

, `v`

is not `Vec`

(i.e., we are in the reminder loop). Thus, `acc`

(`<4 x Int64>[42, 45, 48, 51]`

) is reduced a scalar before adding `v`

(`22`

). See that `acc`

in the next iteration is a scalar (`208`

).

(4) Final two iterations deals with scalar `acc`

and `v`

. Note that we do not need a special code since `sum(::Number)`

is an identity function.

Since `simdeachindex`

and thus `simdpairs`

uses `Transducers.__foldl__`

instead of `Base.iterate`

to implement the iteration, these four stages are all properly type-stabilized.

These may look complicated but the rule is simple: the returned value of the reducing function (i.e., accumulation result) should have the same "shape" as the input value `v`

.

`LoopRecipes.simdstored`

— Method`simdstored([width,] xs)`

Return a foldable that iterates over stored index-value pairs of `xs`

using `SIMD.Vec`

for the main part.

`width`

is an integer or a `Val`

of integer that specifies the SIMD width.

**Examples**

For dense arrays, `simdstored`

is identical to `simdpairs`

:

```
julia> using LoopRecipes
julia> foreach(simdstored(4, collect(1:10))) do (i, v)
@show i v
end;
i = VecRange{4}(1)
v = <4 x Int64>[1, 2, 3, 4]
i = VecRange{4}(5)
v = <4 x Int64>[5, 6, 7, 8]
i = 9
v = 9
i = 10
v = 10
```

For parse arrays, `simdstored`

iterates over only stored index-value pairs:

```
julia> using SparseArrays
julia> xs = SparseVector(10, [1, 3, 4, 7, 8], [1, 2, 3, 4, 5]);
julia> foreach(simdstored(4, xs)) do (i, v)
@show i v
end;
i = <4 x Int64>[1, 3, 4, 7]
v = <4 x Int64>[1, 2, 3, 4]
i = 8
v = 5
```

Like `simdpairs`

, the accumulator `acc`

should be properly reduced depending on the type of `v`

:

```
julia> using SIMD # for Vec
julia> foldl(simdstored(xs); init = 0) do acc, (_, v)
x = 2 * v
(v isa Vec ? acc : sum(acc)) + x
end
30
```

Sparse-dense dot product:

```
julia> function simddot(xs::SparseVector, ys)
init = zero(eltype(xs)) * zero(eltype(ys))
foldl(simdstored(xs); init = init) do acc, (i, x)
Base.@_inline_meta
(x isa Vec ? acc : sum(acc)) + @inbounds x * ys[i]
end
end;
julia> simddot(xs, [1:10;])
87
```

Identical function written using FLoops.jl:

```
julia> using FLoops
julia> function simddot′(xs::SparseVector, ys)
@floop begin
acc = zero(eltype(xs)) * zero(eltype(ys))
for (i, x) in simdstored(xs)
acc = (x isa Vec ? acc : sum(acc)) + @inbounds x * ys[i]
end
acc
end
end;
julia> simddot′(xs, [1:10;])
87
```

`LoopRecipes.unroll`

— Function```
unroll(factor, xs)
unroll(factor, indexstyle, xs)
```

Unroll an array `xs`

by a given `factor`

(an integer or a `Val`

of integer).

**Examples**

```
julia> using LoopRecipes
julia> sum(unroll(4, 1:10))
55
julia> sum(unroll(Val(4), 1:10)) # equivalent
55
julia> using FLoops
julia> @floop begin
acc = 0
for x in unroll(4, 1:10)
acc += x
end
acc
end
55
```