SplittablesBase.jl

SplittablesBaseModule

SplittablesBase: a simple API for parallel computation on collections

Dev GitHub Actions

SplittablesBase.jl defines a simple set of APIs:

  • halve(collection): splitting given collection roughly in half.
  • amount(collection): an "approximation" of length.

These are the basis of parallel algorithms that can be derived from reduce. Custom containers can support many parallel algorithms by simply defining these functions.

SplittablesTesting.jl provides simple test utility functions SplittablesTesting.test_ordered(examples) and SplittablesTesting.test_unordered(examples) that run some automatable tests with each example container in examples.

See more in the documentation.

Supported collections

halve methods for following collections in Base are implemented in SplittablesBase.jl:

  • AbstractArray
  • AbstractString
  • Dict
  • keys(::Dict)
  • values(::Dict)
  • Set
  • Tuple
  • NamedTuple
  • zip
  • pairs
  • Base.Generator
  • Iterators.filter
  • Iterators.flatten
  • Iterators.partition
  • Iterators.product
  • Iterators.enumerate
  • Iterators.reverse
  • skipmissing

Packages using SplittablesBase.jl

See also

source
SplittablesBase.halveFunction
SplittablesBase.halve(collection) -> (left, right)

Split collection (roughly) in half.

Examples

julia> using SplittablesBase: halve

julia> halve([1, 2, 3, 4])
([1, 2], [3, 4])

Implementation

Implementations of halve on custom collections must satisfy the following laws.

(1) If the original collection is ordered, concatenating the sub-collections returned by halve must create a collection that is equivalent to the original collection. More precisely,

isequal(
    vec(collect(collection)),
    vcat(vec(collect(left)), vec(collect(right))),
)

must hold.

Similar relationship must hold for unordered collections; i.e., taking union of left and right collections as multiset must create a collection that is equivalent to the original collection as a multiset:

using StatsBase: countmap
isequal(
    countmap(collect(collection)),
    merge(+, countmap(collect(left)), countmap(collect(right))),
)

(2) halve must eventually shorten the collection. More precisely, the following function must terminate:

function recursive_halve(collection)
    length(collection) <= 1 && return
    left, right = halve(collection)
    recursive_halve(left)
    recursive_halve(right)
end

Furthermore, whenever implementable with cheap operations, length(left) should be close to length(collection) ÷ 2 as much as possible.

source
SplittablesBase.amountFunction
SplittablesBase.amount(collection) :: Int

Return the number of elements in collection or rough "estimate" of it.

Examples

julia> using SplittablesBase: amount

julia> amount([1, 2, 3, 4])
4

julia> amount("aえ𝑖∅υ")
12

julia> length("aえ𝑖∅υ")  # != `amount`
5

Note that amount on strings is not equal to length because the latter cannot be computed in O(1) time.

Implementation

Implementations of amount on a collection must satisfy the following laws.

(1) Any empty collection must have zero amount.

(2) Any operation that increments length on collection must increments amount.

Ideally, the time-complexity of amount should be O(1).

source
SplittablesTesting.test_orderedFunction
SplittablesTesting.test_ordered(examples)
SplittablesTesting.test_unordered(examples)

Run interface tests on each test case in examples.

examples is an iterator where each element is either:

  1. A container to be tested.

  2. A NamedTuple with following keys

    • :label: A label used for Test.@testcase.
    • :data: A container to be tested.

Examples

julia> using SplittablesBase

julia> SplittablesBase.Testing.test_ordered([
           (label = "First Test", data = 1:5),
           (label = "Second Test", data = (a = 1, b = 2, c = 3)),
           zip(1:3, 4:6),
       ]);
Test Summary: | Pass  Total
First Test    |    2      2
Test Summary: | Pass  Total
Second Test   |    2      2
Test Summary: | Pass  Total
3             |    2      2