Useful patterns

using Transducers

Flattening nested objects using MapCat

Simple MapCat usage

Consider a vector of "objects" (here just NamedTuples) which in turn contain a vector of objects:

nested_objects = [
(a = 1,  b = [(c = 2,  d = 3),  (c = 4,  d = 5)]),
(a = 10, b = [(c = 20, d = 30), (c = 40, d = 50)]),
];

We can flatten this into a table by using Map inside MapCat:

using TypedTables
astable(xs) = copy(Table, xs)  # using TypedTables for a nice display

table1 = nested_objects |> MapCat() do x
x.b |> Map() do b  # not MapCat
(a = x.a, b...)
end
end |> astable
Table with 3 columns and 4 rows:
a   c   d
┌───────────
1 │ 1   2   3
2 │ 1   4   5
3 │ 10  20  30
4 │ 10  40  50

(Note that the transducer used inside MapCat is Map, not MapCat)

Nested MapCat

This pattern can handle more nested objects:

more_nested_objects = [
(a = 1,  b = [(c = 2,  d = [(e = 3,  f = 4),  (e = 4,  f = 5)]),
(c = 6,  d = [])]),
(a = 10, b = [(c = 20, d = [(e = 30, f = 40), (e = 40, f = 50)])]),
];

By using nested MapCat (except for the "inner most" processing which uses Map since there is nothing to concatenate):

table3 =
more_nested_objects |> MapCat() do x
x.b |> MapCat() do b
b.d |> Map() do d
(a = x.a, c = b.c, d...)
end
end
end |> astable
Table with 4 columns and 4 rows:
a   c   e   f
┌───────────────
1 │ 1   2   3   4
2 │ 1   2   4   5
3 │ 10  20  30  40
4 │ 10  20  40  50

Comparison with iterator comprehension

As a comparison, here is how to do it with iterator comprehension

rows = (
(a = x.a, c = b.c, d...)
for x in more_nested_objects
for b in x.b
for d in b.d
)
@assert Table(collect(rows)) == table3

For a simple flattening and mapping, iterator comprehension as above perhaps is the simplest solution.

Note that Transducers.jl works well with iterator comprehensions. Transducers.jl-specific entry points like foldxl converts iterator comprehensions to transducers internally. eduction can be used to explicitly do this conversion:

@assert astable(eduction(rows)) == table3

Complex MapCat example

For more complex processing that requires intermediate variables, the iterator comprehension does not work well. Fortunately, it is easy to use intermediate variables with transducers:

more_nested_objects |>
MapCat() do x
a2 = x.a * 2
x.b |> MapCat() do b
a2_plus_c = a2 + b.c
b.d |> Map() do d
c_plus_e = b.c + d.e
c_plus_f = b.c + d.f
(a2_plus_c = a2_plus_c, c_plus_e = c_plus_e, c_plus_f = c_plus_f)
end
end
end |>
astable
Table with 3 columns and 4 rows:
a2_plus_c  c_plus_e  c_plus_f
┌──────────────────────────────
1 │ 4          5         6
2 │ 4          6         7
3 │ 40         50        60
4 │ 40         60        70

MapCat with zip

Note also that MapCat can be combined with arbitrary iterator combinators such as zip

[(a = 1:3, b = 'x':'z'), (a = 1:4, b = 'i':'l')] |>
MapCat() do x
zip(x.a, x.b)
end |>
MapSplat((a, b) -> (a = a, b = b)) |>
astable
Table with 2 columns and 7 rows:
a  b
┌─────
1 │ 1  x
2 │ 2  y
3 │ 3  z
4 │ 1  i
5 │ 2  j
6 │ 3  k
7 │ 4  l

MapCat with Iterators.product

... and product

[(a = 1:3, b = 'x':'z'), (a = 1:4, b = 'i':'l')] |>
MapCat() do x
Iterators.product(x.a, x.b)
end |>
Enumerate() |>
Filter(x -> x[1] % 5 == 0) |>  # include only every five item
MapSplat((n, (a, b)) -> (n = n, a = a, b = b)) |>
astable
Table with 3 columns and 5 rows:
n   a  b
┌─────────
1 │ 5   2  y
2 │ 10  1  i
3 │ 15  2  j
4 │ 20  3  k
5 │ 25  4  l

"Missing value" handling with KeepSomething

Transducers.jl has a generic filtering such as Filter as well as type-based filtering such as NotA and OfType. These transducers can be used to filter out "missing values" represented as missing or nothing.

KeepSomething is a transducer that is useful for working on Union{Nothing,Some{T}}. It filters out nothing and yield itmes after applying something.

[nothing, 1, Some(nothing), 2, 3] |> KeepSomething(identity) |> collect
4-element Vector{Union{Nothing, Int64}}:
1
nothing
2
3

Thus, KeepSomething works well with any tools that operate on Union{Nothing,Some{T}}. Here is an example of using it with Maybe.jl. Consider a vector of heterogeneous dictionaries with varying set of keys:

heterogeneous_objects = [
Dict(:a => 1, :b => Dict(:c => 2)),
Dict(:a => 1),                          # missing key
Dict(:a => 1, :b => Dict()),            # missing key
Dict(:b => Dict(:c => 2)),              # missing key
Dict(:a => 10, :b => Dict(:ccc => 20)), # alternative key name
];

Using @something and @? macros from Maybe.jl, we can convert this to a regular table quite easily:

using Maybe
using Maybe: @something

heterogeneous_objects |>
KeepSomething() do x
c = @something {       # (1)
@? x[:b][:c];      # (2)
@? x[:b][:ccc];    # (3)
return;            # (4)
}
@? (a = x[:a], c = c)  # (5)
end |>
astable
Table with 2 columns and 2 rows:
a   c
┌───────
1 │ 1   2
2 │ 10  20

In this example, for each dictionary x, the body of the do block works as follows:

• (1) Try to extract the item c.
• (2) First, try to get it from x[:b][:c].
• (3) If x[:b][:c] doesn't exist, try x[:b][:ccc] next.
• (4) If both x[:b][:c] and x[:b][:ccc] do not exist, return nothing. KeepSomething will filter out this entry.
• (5) Try to extract the item a from x[:a].
• If this does not exist, the whole expression wrapped by @? evaluates to nothing. This, in turn, will be filtered out by KeepSomething.
• If x[:a] exists, @? (a = x[:a], c = c) evaluates to Some((a = value_of_a, c = value_of_c)). The Some wrapper is unwrapped by something called by KeepSomething.

Multiple outputs

Usually, reducers like sum and collect have one output. However we can use TeeRF etc. to "fan-out" input items to multiple outputs.

Multiple output vectors

Here is an example of creating two output vectors of integers and symbols in one go:

ints, symbols =
[1, :two, missing, 3, 4, :five, 6] |>
Filter(!isequal(6)) |>
foldxl(TeeRF(
OfType(Int)'(push!!),    # push integers to a vector
OfType(Symbol)'(push!!), # push symbols to a vector
))
([1, 3, 4], [:two, :five])

Here, we use TeeRF(rf₁, rf₂, ..., rfₙ) to fan-out input items to multiple reducing functions. To compose each reducing function, we use OfType transducer as reducing function transformation xf'(rf).

Handling empty results

Note that fold with push!! throws when the input is empty. To obtain an empty vector when the input is empty or all filtered out, we need to specify init. MicroCollections.jl includes a library of collections useful as init. Here, we can use EmptyVector:

using MicroCollections

ints, strings =
[1, :two, missing, 3, 4, :five, 6] |>
Filter(!isequal(6)) |>
foldxl(TeeRF(
OfType(Int)'(push!!),    # push integers to a vector
OfType(String)'(push!!), # push strings to a vector (but there is no string)
); init = EmptyVector())
([1, 3, 4], Union{}[])

Composed transducers with TeeRF

Each reducing function passed to TeeRF can use arbitrary complex transducers. Here is an example of filtering-in symbols and then map them to strings:

ints, strings =
[1, :two, missing, 3, 4, :five, 6] |>
Filter(!isequal(6)) |>
foldxl(
TeeRF(
OfType(Int)'(push!!),
opcompose(OfType(Symbol), Map(String))'(push!!),  # filter _then_ map
);
init = EmptyVector(),
)
([1, 3, 4], ["two", "five"])

Nested TeeRF

Each reducing function itself passed to TeeRF can even be composed using TeeRF (or other reducing function combinators; e.g., ProductRF). Here is an example of computing extrema on integers:

(imin, imax), strings =
[1, :two, missing, 3, 4, :five, 6] |>
Filter(!isequal(6)) |>
foldxl(
TeeRF(
OfType(Int)'(TeeRF(max, min)),  # extrema on integers
opcompose(OfType(Symbol), Map(String))'(push!!),  # filter _then_ map
);
init = ((typemin(Int), typemax(Int)), EmptyVector()),
)
((4, 1), ["two", "five"])

When input is a tuple: ProductRF

ProductRF is like TeeRF but it expects that the input is already a tuple:

ints, io =
[(1:3, 'x':'z'), nothing, (1:4, 'i':'l')] |>
NotA(Nothing) |>
foldxl(
ProductRF(
opcompose(Cat(), Filter(isodd))'(push!!),    # process 1:3 etc.
Cat()'((io, char) -> (write(io, char); io)), # process 'x':'z' etc.
);
init = (EmptyVector(), IOBuffer()),
);
String(take!(io))
"xyzijkl"
ints
4-element Vector{Int64}:
1
3
1
3

When input is a row: DataTools.oncol

oncol from DataTools.jl is like ProductRF but acts on NamedTuple (as well as any Setfield.jl-compatible possibly nested objects).

using DataTools
foldxl(oncol(a = +, b = *), [(a = 1, b = 2), (a = 3, b = 4)])
(a = 4, b = 8)