LoopRecipes.jl
LoopRecipes.LoopRecipesLoopRecipes.prefetchingLoopRecipes.simdeachindexLoopRecipes.simdpairsLoopRecipes.simdstoredLoopRecipes.unroll
LoopRecipes.LoopRecipes — ModuleLoopRecipes: composable loops
LoopRecipes.jl provides several constructs for high-performance loops based on the extended foldl protocol of Transducers.jl.
See more in the documentation.
API summary
unroll(factor, xs): Unroll an arrayxsby a givenfactor.prefetching(xs): Prefetch each element inxs. It works whenxsis a nested data structure (e.g., vector of vectors, vector of strings).simdeachindex([width,] xs): Iterate over indices ofxsusingSIMD.VecRange. It also takes care of the remainder loop.simdpairs([width,] xs): Iterate over index-value pairs ofxsusingSIMD.VecRangeandSIMD.Vec. It also takes care of the remainder loop.simdstored([width,] xs): For a sparse arrayxs, iterate over stored index-value pairs ofxsusingSIMD.Vec. It also takes care of the remainder loop.
LoopRecipes.prefetching — Methodprefetching(xs)Prefetch each boxed element in xs. It can be used when xs is a nested data structure (e.g., vector of vectors, vector of strings). Do nothing when the element of xs is not boxed.
Examples
julia> using LoopRecipes
julia> sum(sum, prefetching([[1], [2, 3], [4, 5, 6]]))
21
julia> using FLoops
julia> @floop begin
acc = 0
for x in prefetching([[1], [2, 3], [4, 5, 6]])
acc += sum(x)
end
acc
end
21LoopRecipes.simdeachindex — Methodsimdeachindex([width,] xs)Return a foldable that iterates over indices of xs using SIMD.VecRange and/or integer.
width is an integer or a Val of integer that specifies the SIMD width.
Examples
julia> using LoopRecipes
julia> foreach(simdeachindex(4, ones(10))) do i
@show i
end;
i = VecRange{4}(1)
i = VecRange{4}(5)
i = 9
i = 10LoopRecipes.simdpairs — Methodsimdpairs([width,] xs)Return a foldable that iterates over index-value pairs of xs using SIMD.VecRange and SIMD.Vec for the main part.
width is an integer or a Val of integer that specifies the SIMD width.
See also simdeachindex.
Examples
julia> using LoopRecipes
julia> foreach(simdpairs(4, collect(100:100:1000))) do (i, v)
@show i v
end;
i = VecRange{4}(1)
v = <4 x Int64>[100, 200, 300, 400]
i = VecRange{4}(5)
v = <4 x Int64>[500, 600, 700, 800]
i = 9
v = 900
i = 10
v = 1000Thanks to setindex! overload on VecRange, it is straightforward to use it for implementing a simple mapping.
julia> function double!(ys, xs)
@assert axes(ys) == axes(xs)
foreach(simdpairs(xs)) do (i, x)
@inbounds ys[i] = 2x
end
return ys
end;
julia> double!(zeros(5), ones(5))
5-element Array{Float64,1}:
2.0
2.0
2.0
2.0
2.0When using simdpairs for reduction, the accumulator acc should be properly reduced depending on the type of v. This is because the same loop body (aka op or rf) is used for all stages of the iteration.
julia> using SIMD # for Vec
julia> foldl(simdpairs(collect(1:10)); init = 0) do acc, (_, v)
x = 2 * v
(v isa Vec ? acc : sum(acc)) + x
end
110Here is another example for demonstrating how v isa Vec works:
julia> foldl(simdpairs(4, collect(10:24)); init = 0) do acc, (i, v)
@show first(i), acc, v
(v isa Vec ? acc : sum(acc)) + v
end
(first(i), acc, v) = (1, 0, <4 x Int64>[10, 11, 12, 13])
(first(i), acc, v) = (5, <4 x Int64>[10, 11, 12, 13], <4 x Int64>[14, 15, 16, 17])
(first(i), acc, v) = (9, <4 x Int64>[24, 26, 28, 30], <4 x Int64>[18, 19, 20, 21])
(first(i), acc, v) = (13, <4 x Int64>[42, 45, 48, 51], 22)
(first(i), acc, v) = (14, 208, 23)
(first(i), acc, v) = (15, 231, 24)
255Observe that:
(1) When at the first iteration (first(i) == 1), acc is 0 (as specified by init = 0). This is broadcast to a Vec because v is a Vec. See that acc in the second iteration (first(i) == 5) is a Vec (<4 x Int64>[10, 11, 12, 13]).
(2) At the second and third iterations, both acc and v are Vec, yielding an acc::Vec for the next iteration.
(3) At the iteration first(i) == 13, v is not Vec (i.e., we are in the reminder loop). Thus, acc (<4 x Int64>[42, 45, 48, 51]) is reduced a scalar before adding v (22). See that acc in the next iteration is a scalar (208).
(4) Final two iterations deals with scalar acc and v. Note that we do not need a special code since sum(::Number) is an identity function.
Since simdeachindex and thus simdpairs uses Transducers.__foldl__ instead of Base.iterate to implement the iteration, these four stages are all properly type-stabilized.
These may look complicated but the rule is simple: the returned value of the reducing function (i.e., accumulation result) should have the same "shape" as the input value v.
LoopRecipes.simdstored — Methodsimdstored([width,] xs)Return a foldable that iterates over stored index-value pairs of xs using SIMD.Vec for the main part.
width is an integer or a Val of integer that specifies the SIMD width.
Examples
For dense arrays, simdstored is identical to simdpairs:
julia> using LoopRecipes
julia> foreach(simdstored(4, collect(1:10))) do (i, v)
@show i v
end;
i = VecRange{4}(1)
v = <4 x Int64>[1, 2, 3, 4]
i = VecRange{4}(5)
v = <4 x Int64>[5, 6, 7, 8]
i = 9
v = 9
i = 10
v = 10For parse arrays, simdstored iterates over only stored index-value pairs:
julia> using SparseArrays
julia> xs = SparseVector(10, [1, 3, 4, 7, 8], [1, 2, 3, 4, 5]);
julia> foreach(simdstored(4, xs)) do (i, v)
@show i v
end;
i = <4 x Int64>[1, 3, 4, 7]
v = <4 x Int64>[1, 2, 3, 4]
i = 8
v = 5Like simdpairs, the accumulator acc should be properly reduced depending on the type of v:
julia> using SIMD # for Vec
julia> foldl(simdstored(xs); init = 0) do acc, (_, v)
x = 2 * v
(v isa Vec ? acc : sum(acc)) + x
end
30Sparse-dense dot product:
julia> function simddot(xs::SparseVector, ys)
init = zero(eltype(xs)) * zero(eltype(ys))
foldl(simdstored(xs); init = init) do acc, (i, x)
Base.@_inline_meta
(x isa Vec ? acc : sum(acc)) + @inbounds x * ys[i]
end
end;
julia> simddot(xs, [1:10;])
87Identical function written using FLoops.jl:
julia> using FLoops
julia> function simddot′(xs::SparseVector, ys)
@floop begin
acc = zero(eltype(xs)) * zero(eltype(ys))
for (i, x) in simdstored(xs)
acc = (x isa Vec ? acc : sum(acc)) + @inbounds x * ys[i]
end
acc
end
end;
julia> simddot′(xs, [1:10;])
87LoopRecipes.unroll — Functionunroll(factor, xs)
unroll(factor, indexstyle, xs)Unroll an array xs by a given factor (an integer or a Val of integer).
Examples
julia> using LoopRecipes
julia> sum(unroll(4, 1:10))
55
julia> sum(unroll(Val(4), 1:10)) # equivalent
55
julia> using FLoops
julia> @floop begin
acc = 0
for x in unroll(4, 1:10)
acc += x
end
acc
end
55