Transducers for Julia

Transducers are transformations of "sequence" of input that can be composed very efficiently. The interface used by transducers naturally describes a wide range of processes that is expressible as a succession of steps. Furthermore, transducers can be defined without specifying the details of the input and output (collections, streams, channels, etc.) and therefore achieves a full reusability. Transducers are introduced by Rich Hickey, the creator of the Clojure language. His Strange Loop talk is a great introduction to the idea of transducers.

Transducers.jl is an implementation of the transducers in Julia. Aiming to satisfy high-performance needs of Julia users, Transducers.jl uses a formulation that is pure [pure] and aiding type-stability.

Installation

]add Transducers

Examples

If you are familiar with iterators (see also Base.Iterators and IterTools.jl) it would look very familiar to you:

julia> using Transducers

julia> collect(Map(x -> 2x), 1:3)  # double each element
3-element Array{Int64,1}:
 2
 4
 6

julia> collect(Filter(iseven), 1:6)  # collect only evens
3-element Array{Int64,1}:
 2
 4
 6

julia> collect(MapCat(x -> 1:x), 1:3)  # concatenate mapped results
6-element Array{Int64,1}:
 1
 1
 2
 1
 2
 3

Transducers can be composed (without, unlike iterators, referring to the input):

julia> xf = Filter(iseven) |> Map(x -> 2x)
       collect(xf, 1:6)
3-element Array{Int64,1}:
  4
  8
 12

An efficient way to use transducers is combination with mapfoldl. This computation is done without creating any intermediate lazy object and compiles to a single loop:

julia> mapfoldl(xf, +, 1:6)
24

Difference to iterators

How mapfoldl and foldl are used illustrates the difference between iterators and transducers. Implementation of the above computation in iterator would be:

f(x) = 2x
imap = Base.Iterators.Generator  # like `map`, but returns an iterator
mapfoldl(f, +, filter(iseven, input), init=0)
foldl(+, imap(f, filter(iseven, input)))  # equivalent
#        ______________________________
#        composition occurs at input part

Compare it to how transducers are used:

mapfoldl(Filter(iseven) |> Map(f), +, input, init=0)
#        ________________________
#        composition occurs at computation part

Although this is just a syntactic difference, it is reflected in the actual code generated by those two frameworks. The code for iterator would be lowered to:

function map_filter_iterators(xs, init)
    ret = iterate(xs)
    ret === nothing && return
    acc = init
    @goto filter
    local state, x
    while true
        while true                                    # input
            ret = iterate(xs, state)                  #
            ret === nothing && return acc             #
            @label filter                             #
            x, state = ret                            #
            iseven(x) && break             # filter   :
        end                                #          :
        y = 2x              # imap         :          :
        acc += y    # +     :              :          :
    end             # :     :              :          :
    #                 + <-- imap <-------- filter <-- input
    return acc
end

Notice that the iteration of input is the inner most block, followed by filter, imap, and then finally +. Iterators are described as pull-based; an outer iterator (say imap) has to "pull" an item from the inner iterator (filter in above example). It is reflected in the lowered code above.

On the other hand, the code using transducers is lowered to:

function map_filter_transducers(xs, init)
    acc = init
    #              input -> Filter --> Map --> +
    for x in xs  # input    :          :       :
        if iseven(x)  #     Filter     :       :
            y = 2x    #                Map     :
            acc += y  #                        +
        end
    end
    return acc
end

xs = [6, 8, 1, 4, 5, 6, 6, 7, 9, 9, 7, 8, 6, 8, 2, 5, 2, 4, 3, 7]
@assert map_filter_iterators(xs, 0) == map_filter_transducers(xs, 0)

Notice that the iteration of input is at the outer most block while + is in the inner most block. Transducers passed to mapfoldl appears in the block between them in the order they are composed. An outer transducer (say Filter) "pushes" arbitrary number of items to the inner transducer (Map in above example). Note that Filter can choose to not push an item (i.e., push zero item) when the predicate returns false. This push-based nature of the transducers allows the generation of very natural and efficient code. To put it another way, the transducers and transducible processes own the loop.

As a consequence, computations requiring to expand an item into a sequence can be processed efficiently. Consider the following example:

julia> xf = Map(x -> 1:x) |> Filter(iseven ∘ sum) |> Cat()
       mapfoldl(xf, *, 1:10)
29262643200

This is lowered to a nested for loops:

function map_filter_cat_transducers(xs, init)
    acc = init
    for x in xs
        y1 = 1:x                # Map
        if iseven(sum(y1))      # Filter
            for y2 in y1        # Cat
                acc *= y2       # *
            end
        end
    end
    return acc
end

@assert mapfoldl(xf, *, 1:10) == map_filter_cat_transducers(1:10, 1)

It is not straightforward to implement an iterator like Cat that can output more than one items at a time. Such an iterator has to track the state of the inner (y1 in above) and outer (xs in above) iterators and conditionally invoke the outer iterator once the inner iterator terminates. This generates a complicated code and the compiler would have hard time optimizing it.

List of transducers

Here is the list of pre-defined transducers:

Transducer	Summary
`Cat()`	Concatenate/flatten nested iterators.
`Count([start[, step]])`	Generate a sequence `start`, `start + step`, `start + step + step`, and so on.
`Dedupe()`	De-duplicate consecutive items.
`Drop(n)`	Drop first `n` items.
`DropLast(n)`	Drop last `n` items.
`DropWhile(pred)`	Drop items while `pred` returns `true` consecutively. It becomes a no-op after `pred` returns a `false`.
`Enumerate([start[, step]])`	Transducer variant of `Base.enumerate`. The `start` and `step` arguments are optional and have the same meaning as in `Count`.
`Filter(pred)`	Skip items for which `pred` is evaluated to `false`.
`FlagFirst()`	Output `(isfirst, input)` where `isfirst::Bool` is `true` only for the first iteration and `input` is the original input.
`Interpose(sep)`	Interleave input items with a `sep`.
`Iterated(f, init[, T::Type])`	Generate a sequence `init`, `f(init)`, `f(f(init))`, `f(f(f(init)))`, and so on.
`Keep(f)`	Pass non-`nothing` output of `f` to the inner reducing step.
`Map(f)`	Apply unary function `f` to each input and pass the result to the inner reducing step.
`MapCat(f)`	Concatenate output of `f` which is expected to return an iterable.
`MapSplat(f)`	Like `Map(f)` but calls `f(input...)` for each `input` and then pass the result to the inner reducing step.
`NotA(T)`	Skip items of type `T`. Unlike `Filter(!ismissing)`, downstream transducers can have a correct type information for `NotA(Missing)`.
`OfType(T)`	Include only items of type `T`.
`Partition(size, step = size, flush = false)`	Sliding window of width `size` and interval `step`.
`PartitionBy(f)`	Group input sequence into chunks in which `f` returns a same value consecutively.
`Replace(assoc)`	Replace each input with the value in the associative container `assoc` (e.g., a dictionary, array, string) if it matches with a key/index. Otherwise output the input as-is.
`Scan(f, [init])`	Accumulate input with binary function `f` and pass the accumulated result so far to the inner reduction step.
`ScanEmit(f, init[, onlast])`	Accumulate input `x` with a function `f` with the call signature `(u, x) -> (y, u)` and pass the result `y` to the inner reduction step.
`Take(n)`	Take `n` items from the input sequence.
`TakeLast(n)`	Take last `n` items from the input sequence.
`TakeNth(n)`	Output every `n` item to the inner reducing step.
`TakeWhile(pred)`	Take items while `pred` returns `true`. Abort the reduction when `pred` returns `false` for the first time.
`Unique()`	Pass only unseen item to the inner reducing step.
`Zip(xforms...)`	Zip outputs of transducers `xforms` in a tuple and pass it to the inner reduction step.

Glossary

mapfoldl(xf, step, input, init=...)
#   |    |   |     |
#   |    |   |     `-- reducible
#   |    |   |
#   |    |   `-- "bottom" (inner most) reducing function
#   |    |
#   |    `-- transducer
#   |
#   `-- transducible process

Reducing function or Reducing step (function): A reducing function combines result-so-far with the input. It in a narrow sense is a "callable" op of the signature op(::X, ::Y) :: X (or op(::X, ::X) :: X in case for mapreduce) or schematically:
\[(\text{result-so-far}, \text{input}) \mapsto \text{result-so-far}\]
It is the function that can be passed to the classic (non-Transducers.jl) Base.foldl or Base.reduce. It is sometimes referred to as a step or op. In Transducers.jl, next(rf, ::X, ::Y) :: X is used instead of a "bare" callable. Furthermore, a reducing function in a loose sense also includes other interfaces such as start(rf, ::X) and complete(rf, ::X).
Transducer: A transducer transforms a reducing function into a new reducing function. It is sometimes referred to as a xf or xform. A transducer can be composed of many sub-transducers; the syntax in Transducers.jl is xf = xf₁ |> xf₂ |> ... |> xfₙ. The composed transducers are applied to the "bottom" reducing function from right to left, i.e., schematically, a new reducing function $\mathrm{rf}$ is obtained from the "bottom" reducing function $\mathrm{step}$ by
\[\mathrm{rf} = \mathrm{xf}_1(\mathrm{xf}_2(...(\mathrm{xf}_{n}(\mathrm{step}))))\]
Given a composition xf₁ |> xf₂, transducer xf₂ is said to be the inner transducer of xf₁ |> xf₂. Likewise, $\mathrm{xf}_2(\mathrm{rf}')$ is an inner reducing function of $\mathrm{xf}_1(\mathrm{xf}_2(\mathrm{rf}'))$.
Reducible collection (or just Reducible): Any object that can be passed to mapfoldl and alike is reducible. A reducible collection knows how to apply reducing function to its elements. Iterators are automatically reducible as this is the canonical fallback implementation.
Transducible process: A function that can reduce reducible collections using transducers is a transducible process. Examples are mapfoldl and mapreduce. Find more in Transducible processes.

Links

[pure]

...although not pure in the strong sense as Base.@pure.