FoldsThreads.jl

FoldsThreads.FoldsThreadsModule

FoldsThreads: Extra threaded executors for JuliaFolds/*.jl

Dev GitHub Actions

FoldsThreads.jl provides extra thread-based executors usable with various JuliaFolds/*.jl packages such as Transducers.jl and FLoops.jl.

                                  Executors
                           ,----------------------.
     Algorithms            |    FoldsThreads.jl    |         Data structures
,------------------.       |-----------------------|       ,-----------------.
|  FLoops,         |       |  ThreadedEx*          |       |  Array,         |
|  Folds,          |       |  WorkStealingEx,      |       |  Tables,        |
|  Transducers,    |  ---  |  DepthFirstEx,        |  ---  |  FGenerators,   |
|  OnlineStats,    |       |  TaskPoolEx,          |       |  Dict,          |
|  DataTools, ...  '       |  NondeterministicEx,  |       |  Set, ...       |
`------------------'       |  ...                  |       `-----------------'
                           `-----------------------'

(* ThreadedEx is the default executor provided by Transducers.jl)

  • WorkStealingEx implements work stealing (continuation stealing). Useful for load-balancing.
  • DepthFirstEx implements depth-first scheduling. Useful for findfirst-type computations.
  • TaskPoolEx: Task pool executor. Useful for fine execution control (e.g., back pressure and "background" threads).
  • NondeterministicEx: An executor for parallelizing computations with non-parallelizable iterators.
source
FoldsThreads.DepthFirstExType
DepthFirstEx(; [simd,] [basesize])

Depth-first scheduling for parallel execution. Useful for findfirst-type of computation.

Examples

julia> using FoldsThreads
       using Folds

julia> Folds.sum(i -> gcd(i, 42), 1:1000_000, DepthFirstEx())
4642844

Extended help

DepthFirstEx schedules chunks of size roughly equal to basesize in the order that each chunk appears in the input collection. However, the base case computation does not wait for all the tasks to be scheduled. This approach performs better than a more naive approach where the all tasks are scheduled at once before the reduction starts. DepthFirstEx is useful for reductions that can terminate early (e.g., findfirst, @floop with break).

More examples

julia> using FoldsThreads
       using FLoops

julia> @floop DepthFirstEx() for x in 1:1000_000
           y = gcd(x, 42)
           @reduce(acc += y)
       end
       acc
4642844

Keyword Arguments

  • basesize: The size of base case.
  • simd: false, true, :ivdep, or Val of one of them. If true/:ivdep, the inner-most loop of each base case is annotated by @simd/@simd ivdep. Use a plain loop if false (default).
source
FoldsThreads.NonThreadedExType
NonThreadedEx(; [simd,] [basesize])

Single-threaded executor with call tree similar to divide-and-conquer type of threaded executors. It is useful for debugging, testing, and benchmarking.

source
FoldsThreads.NondeterministicExType
NondeterministicEx(; [simd,] [basesize,] [ntasks,])

Pipelined batched reduction for parallelizing computations with non-parallelizable collections (e.g., Channel, Iterators.Stateful).

This is a simple wrapper of NondeterministicThreading transducer. Use NondeterministicThreading directly for explicit control on what transducers are parallelized.

Examples

julia> using FoldsThreads
       using Folds

julia> partially_parallelizable(seq) = (gcd(y, 42) for x in seq for y in 1:10000x);

julia> Folds.sum(partially_parallelizable(Iterators.Stateful(1:100)), NondeterministicEx())
234462500

Extended help

In the above example, we can run gcd(y, 42) (mapping), for y in 1:10000x (flattening), and + for sum (reduction) in parallel even though the iteration of Iterators.Stateful(1:100) is not parallelizable. Note that, as indicated in the example, the computation per each iteration of the non-parallelizable collection should be very CPU-intensive in order for NondeterministicEx to show any kind of performance benefits.

Same computation using FLoops.jl:

julia> using FoldsThreads
       using FLoops

julia> @floop NondeterministicEx() for x in Iterators.Stateful(1:100)
           for y in 1:10000x
               z = gcd(y, 42)
               @reduce(acc += z)
           end
       end
       acc
234462500

Notes

"Nondeterministic" in the name indicates that the result of the reduction is not deterministic (i.e., schedule-dependent) if the reducing function is only approximately associative (e.g., + on floats). For computations (e.g., Folds.collect) that uses strictly associative operations (e.g., "vcat"), the result does not depend on the particular scheduling decision of Julia runtime. To be more specific, this executor uses the scheduling that does not produce deterministic divide-and-conquer "task" graph. Instead, the shape of the graph is determined by the particular timing of each computation at run-time.

Keyword Arguments

  • basesize: The size of base case.
  • ntasks: The number of tasks used by this executor.
  • simd: false, true, :ivdep, or Val of one of them. If true/:ivdep, the inner-most loop of each base case is annotated by @simd/@simd ivdep. Use a plain loop if false (default).
source
FoldsThreads.TaskPoolExType
TaskPoolEx(; [simd,] [basesize,] [ntasks,] [background,])

Executor using pooled tasks for reduction. Useful for reductions with I/O and managing back pressure. With background = true, it can also be used to isolate throughput-oriented tasks and use the primary thread for latency-oriented tasks.

Examples

julia> using FoldsThreads
       using Folds

julia> Folds.sum(i -> gcd(i, 42), 1:1000_000, TaskPoolEx())
4642844

Extended help

Worker tasks are pooled (for each executor) so that the number of Julia Tasks used for a reduction can be much smaller than input_length ÷ basesize. This strategy is used mainly for limiting resource (e.g., memory) required by the reduction than for load-balancing. WorkStealingEx performs better for load-balancing of compute-intensive reductions.

NOTE: This executor is inspired by ThreadPools.jl. The hack for assigning a task to a dedicated thread is stolen from ThreadPools.jl.

Warning

It is highly discouraged to use this executor in Julia packages; especially those that are used as libraries rather than end-user applications. This is because the whole purpose of this executor is to prevent Julia runtime from doing the right thing for managing tasks. Ideally, the library user should be able to pass an executor as an argument so that your library function can be used with any executors including TaskPoolEx.

More examples

julia> using FoldsThreads
       using FLoops

julia> @floop TaskPoolEx() for x in 1:1000_000
           y = gcd(x, 42)
           @reduce(acc += y)
       end
       acc
4642844

Keyword Arguments

  • background = false: If background == true, do not run tasks on threadid() == 1.
  • ntasks: The number of tasks to be used.
  • basesize: The size of base case.
  • simd: false, true, :ivdep, or Val of one of them. If true/:ivdep, the inner-most loop of each base case is annotated by @simd/@simd ivdep. Use a plain loop if false (default).
source
FoldsThreads.WorkStealingExType
WorkStealingEx(; [simd,] [basesize])

Work-stealing scheduling for parallel execution. Useful for load-balancing.

Examples

julia> using FoldsThreads
       using Folds

julia> Folds.sum(i -> gcd(i, 42), 1:1000_000, WorkStealingEx())
4642844

Extended help

WorkStealingEx implements work stealing scheduler (in particular, continuation stealing) for Transducers.jl and other JuliaFolds/*.jl packages. Worker tasks are cached and re-used so that the number of Julia Tasks used for a reduction can be much smaller than input_length ÷ basesize. This has a positive impact on computations that require load-balancing since this does not incur the overhead of spawning tasks.

NOTE: WorkStealingEx is more complex and experimental than the default multi-thread executor ThreadedEx.

More examples

julia> using FoldsThreads
       using FLoops

julia> @floop WorkStealingEx() for x in 1:1000_000
           y = gcd(x, 42)
           @reduce(acc += y)
       end
       acc
4642844

Keyword Arguments

  • basesize: The size of base case.
  • simd: false, true, :ivdep, or Val of one of them. If true/:ivdep, the inner-most loop of each base case is annotated by @simd/@simd ivdep. Use a plain loop if false (default).
source