Useful patterns
This page includes some useful patterns using Transducers.jl.
using Transducers
Flattening nested objects using MapCat
Simple MapCat
usage
Consider a vector of "objects" (here just NamedTuple
s) which in turn contain a vector of objects:
nested_objects = [
(a = 1, b = [(c = 2, d = 3), (c = 4, d = 5)]),
(a = 10, b = [(c = 20, d = 30), (c = 40, d = 50)]),
];
We can flatten this into a table by using Map
inside MapCat
:
using TypedTables
astable(xs) = copy(Table, xs) # using `TypedTables` for a nice display
table1 = nested_objects |> MapCat() do x
x.b |> Map() do b # not `MapCat`
(a = x.a, b...)
end
end |> astable
Table with 3 columns and 4 rows: a c d ┌─────────── 1 │ 1 2 3 2 │ 1 4 5 3 │ 10 20 30 4 │ 10 40 50
(Note that the transducer used inside MapCat
is Map
, not MapCat
)
Nested MapCat
This pattern can handle more nested objects:
more_nested_objects = [
(a = 1, b = [(c = 2, d = [(e = 3, f = 4), (e = 4, f = 5)]),
(c = 6, d = [])]),
(a = 10, b = [(c = 20, d = [(e = 30, f = 40), (e = 40, f = 50)])]),
];
By using nested MapCat
(except for the "inner most" processing which uses Map
since there is nothing to concatenate):
table3 =
more_nested_objects |> MapCat() do x
x.b |> MapCat() do b
b.d |> Map() do d
(a = x.a, c = b.c, d...)
end
end
end |> astable
Table with 4 columns and 4 rows: a c e f ┌─────────────── 1 │ 1 2 3 4 2 │ 1 2 4 5 3 │ 10 20 30 40 4 │ 10 20 40 50
Comparison with iterator comprehension
As a comparison, here is how to do it with iterator comprehension
rows = (
(a = x.a, c = b.c, d...)
for x in more_nested_objects
for b in x.b
for d in b.d
)
@assert Table(collect(rows)) == table3
For a simple flattening and mapping, iterator comprehension as above perhaps is the simplest solution.
Note that Transducers.jl works well with iterator comprehensions. Transducers.jl-specific entry points like foldxl
converts iterator comprehensions to transducers internally. eduction
can be used to explicitly do this conversion:
@assert astable(eduction(rows)) == table3
Complex MapCat
example
For more complex processing that requires intermediate variables, the iterator comprehension does not work well. Fortunately, it is easy to use intermediate variables with transducers:
more_nested_objects |>
MapCat() do x
a2 = x.a * 2
x.b |> MapCat() do b
a2_plus_c = a2 + b.c
b.d |> Map() do d
c_plus_e = b.c + d.e
c_plus_f = b.c + d.f
(a2_plus_c = a2_plus_c, c_plus_e = c_plus_e, c_plus_f = c_plus_f)
end
end
end |>
astable
Table with 3 columns and 4 rows: a2_plus_c c_plus_e c_plus_f ┌────────────────────────────── 1 │ 4 5 6 2 │ 4 6 7 3 │ 40 50 60 4 │ 40 60 70
MapCat
with zip
Note also that MapCat
can be combined with arbitrary iterator combinators such as zip
[(a = 1:3, b = 'x':'z'), (a = 1:4, b = 'i':'l')] |>
MapCat() do x
zip(x.a, x.b)
end |>
MapSplat((a, b) -> (a = a, b = b)) |>
astable
Table with 2 columns and 7 rows: a b ┌───── 1 │ 1 x 2 │ 2 y 3 │ 3 z 4 │ 1 i 5 │ 2 j 6 │ 3 k 7 │ 4 l
MapCat
with Iterators.product
... and product
[(a = 1:3, b = 'x':'z'), (a = 1:4, b = 'i':'l')] |>
MapCat() do x
Iterators.product(x.a, x.b)
end |>
Enumerate() |>
Filter(x -> x[1] % 5 == 0) |> # include only every five item
MapSplat((n, (a, b)) -> (n = n, a = a, b = b)) |>
astable
Table with 3 columns and 5 rows: n a b ┌───────── 1 │ 5 2 y 2 │ 10 1 i 3 │ 15 2 j 4 │ 20 3 k 5 │ 25 4 l
"Missing value" handling with KeepSomething
Transducers.jl has a generic filtering such as Filter
as well as type-based filtering such as NotA
and OfType
. These transducers can be used to filter out "missing values" represented as missing
or nothing
.
KeepSomething
is a transducer that is useful for working on Union{Nothing,Some{T}}
. It filters out nothing
and yield itmes after applying something
.
[nothing, 1, Some(nothing), 2, 3] |> KeepSomething(identity) |> collect
4-element Vector{Union{Nothing, Int64}}: 1 nothing 2 3
Thus, KeepSomething
works well with any tools that operate on Union{Nothing,Some{T}}
. Here is an example of using it with Maybe.jl. Consider a vector of heterogeneous dictionaries with varying set of keys:
heterogeneous_objects = [
Dict(:a => 1, :b => Dict(:c => 2)),
Dict(:a => 1), # missing key
Dict(:a => 1, :b => Dict()), # missing key
Dict(:b => Dict(:c => 2)), # missing key
Dict(:a => 10, :b => Dict(:ccc => 20)), # alternative key name
];
Using @something
and @?
macros from Maybe.jl, we can convert this to a regular table quite easily:
using Maybe
using Maybe: @something
heterogeneous_objects |>
KeepSomething() do x
c = @something { # (1)
@? x[:b][:c]; # (2)
@? x[:b][:ccc]; # (3)
return; # (4)
}
@? (a = x[:a], c = c) # (5)
end |>
astable
Table with 2 columns and 2 rows: a c ┌─────── 1 │ 1 2 2 │ 10 20
In this example, for each dictionary x
, the body of the do
block works as follows:
- (1) Try to extract the item
c
.- (2) First, try to get it from
x[:b][:c]
. - (3) If
x[:b][:c]
doesn't exist, tryx[:b][:ccc]
next. - (4) If both
x[:b][:c]
andx[:b][:ccc]
do not exist, returnnothing
.KeepSomething
will filter out this entry.
- (2) First, try to get it from
- (5) Try to extract the item
a
fromx[:a]
.- If this does not exist, the whole expression wrapped by
@?
evaluates tonothing
. This, in turn, will be filtered out byKeepSomething
. - If
x[:a]
exists,@? (a = x[:a], c = c)
evaluates toSome((a = value_of_a, c = value_of_c))
. TheSome
wrapper is unwrapped bysomething
called byKeepSomething
.
- If this does not exist, the whole expression wrapped by
For more information, see the tutorial in Maybe.jl documentation.
Multiple outputs
Usually, reducers like sum
and collect
have one output. However we can use TeeRF
etc. to "fan-out" input items to multiple outputs.
Multiple output vectors
Here is an example of creating two output vectors of integers and symbols in one go:
ints, symbols =
[1, :two, missing, 3, 4, :five, 6] |>
Filter(!isequal(6)) |>
foldxl(TeeRF(
OfType(Int)'(push!!), # push integers to a vector
OfType(Symbol)'(push!!), # push symbols to a vector
))
([1, 3, 4], [:two, :five])
Here, we use TeeRF(rf₁, rf₂, ..., rfₙ)
to fan-out input items to multiple reducing functions. To compose each reducing function, we use OfType
transducer as reducing function transformation xf'(rf)
.
Handling empty results
Note that fold with push!!
throws when the input is empty. To obtain an empty vector when the input is empty or all filtered out, we need to specify init
. MicroCollections.jl includes a library of collections useful as init
. Here, we can use EmptyVector
:
using MicroCollections
ints, strings =
[1, :two, missing, 3, 4, :five, 6] |>
Filter(!isequal(6)) |>
foldxl(TeeRF(
OfType(Int)'(push!!), # push integers to a vector
OfType(String)'(push!!), # push strings to a vector (but there is no string)
); init = EmptyVector())
([1, 3, 4], Union{}[])
Composed transducers with TeeRF
Each reducing function passed to TeeRF
can use arbitrary complex transducers. Here is an example of filtering-in symbols and then map them to strings:
ints, strings =
[1, :two, missing, 3, 4, :five, 6] |>
Filter(!isequal(6)) |>
foldxl(
TeeRF(
OfType(Int)'(push!!),
opcompose(OfType(Symbol), Map(String))'(push!!), # filter _then_ map
);
init = EmptyVector(),
)
([1, 3, 4], ["two", "five"])
Nested TeeRF
Each reducing function itself passed to TeeRF
can even be composed using TeeRF
(or other reducing function combinators; e.g., ProductRF
). Here is an example of computing extrema on integers:
(imin, imax), strings =
[1, :two, missing, 3, 4, :five, 6] |>
Filter(!isequal(6)) |>
foldxl(
TeeRF(
OfType(Int)'(TeeRF(max, min)), # extrema on integers
opcompose(OfType(Symbol), Map(String))'(push!!), # filter _then_ map
);
init = ((typemin(Int), typemax(Int)), EmptyVector()),
)
((4, 1), ["two", "five"])
When input is a tuple: ProductRF
ProductRF
is like TeeRF
but it expects that the input is already a tuple:
ints, io =
[(1:3, 'x':'z'), nothing, (1:4, 'i':'l')] |>
NotA(Nothing) |>
foldxl(
ProductRF(
opcompose(Cat(), Filter(isodd))'(push!!), # process 1:3 etc.
Cat()'((io, char) -> (write(io, char); io)), # process 'x':'z' etc.
);
init = (EmptyVector(), IOBuffer()),
);
String(take!(io))
"xyzijkl"
ints
4-element Vector{Int64}: 1 3 1 3
When input is a row: DataTools.oncol
oncol
from DataTools.jl is like ProductRF
but acts on NamedTuple
(as well as any Setfield.jl-compatible possibly nested objects).
using DataTools
foldxl(oncol(a = +, b = *), [(a = 1, b = 2), (a = 3, b = 4)])
(a = 4, b = 8)
This page was generated using Literate.jl.