LazyGroupBy.jl
LazyGroupBy.LazyGroupBy
Base.all
Base.any
Base.collect
Base.count
Base.extrema
Base.findall
Base.findfirst
Base.findlast
Base.foldl
Base.keys
Base.length
Base.map
Base.mapfoldl
Base.maximum
Base.minimum
Base.pairs
Base.prod
Base.sum
Base.view
LazyGroupBy.grouped
Statistics.mean
Statistics.std
Statistics.var
Transducers.dcollect
Transducers.foldxd
Transducers.foldxt
Transducers.tcollect
LazyGroupBy.LazyGroupBy
— ModuleLazyGroupBy: lazy, parallelizable and composable group-by operations
LazyGroupBy.jl exports a single API grouped
. It can be used to run group-by operation using the dot-call syntax:
reducer.(..., grouped(key, collection), ...)
where reducer
runs on each group (thus, grouped(key, collection)
can be considered a as a key-value pairs with Dictionaries.jl-like broadcasting rule). Roughly speaking, grouped(key, collection)
is equivalent to Dict(k_1 => [v_11, v_12, ...], k_2 => [v_21, v_22, ...], ...)
where k_i
is an output of value of key(v_ij)
for v_ij
in collection
and each call of reducer
is evaluated with a group "vector" [v_i1, v_i2, ...]
.
For example:
julia> using LazyGroupBy
julia> collect.(grouped(isodd, 1:7))
Transducers.GroupByViewDict{Bool,Array{Int64,1},…} with 2 entries:
false => [2, 4, 6]
true => [1, 3, 5, 7]
julia> length.(grouped(isodd, 1:7))
Transducers.GroupByViewDict{Bool,Int64,…} with 2 entries:
false => 3
true => 4
julia> keys.(grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Array{Int64,1},…} with 2 entries:
false => [1, 7, 9]
true => [2, 3, 4, 5, 6, 8, 10]
julia> foldl.(tuple, grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Any,…} with 2 entries:
false => ((0, 4), 0)
true => ((((((7, 3), 1), 5), 9), 3), 5)
julia> foldl.(tuple, grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]); init = -1)
Transducers.GroupByViewDict{Bool,Tuple{Any,Int64},…} with 2 entries:
false => (((-1, 0), 4), 0)
true => (((((((-1, 7), 3), 1), 5), 9), 3), 5)
julia> extrema_rf((min1, max1), (min2, max2)) = (min(min1, min2), max(max1, max2));
julia> mapfoldl.(x -> (x, x), extrema_rf, grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Tuple{Int64,Int64},…} with 2 entries:
false => (0, 4)
true => (1, 9)
Following generic and standard reducers are supported:
collect.(op, grouped(...))
→DICT{Key,Vector{...}}
view.(grouped(_, array))
→DICT{Key,SubArray}
map.(f, grouped(...))
length.(op, grouped(...))
→DICT{Key,Int}
count.([f,] op, grouped(...))
→DICT{Key,Int}
sum.([f,] op, grouped(...))
→DICT{Key,Number}
prod.([f,] op, grouped(...))
→DICT{Key,Number}
any.(f, op, grouped(...))
→DICT{Key,Bool}
all.(f, op, grouped(...))
→DICT{Key,Bool}
minimum.([f,] op, grouped(...))
maximum.([f,] op, grouped(...))
extrema.([f,] op, grouped(...))
keys.(op, grouped(_, collection))
→DICT{Key,Vector{keytype(collection)}}
pairs.(op, grouped(_, collection))
→DICT{Key,DICT{keytype(collection),valtype(collection)}}
findfirst.(f, grouped(_, array))
→DICT{Key,keytype(collection)}
findlast.(f, grouped(_, array))
→DICT{Key,keytype(collection)}
findall.(f, grouped(_, array))
→DICT{Key,Vector{keytype(collection)}}
foldl.(op, grouped(...); [init])
mapfoldl.(f, op, grouped(...); [init])
where DICT{K,V}
above is a short-hand for AbstractDict{<:K,<:V}
and Key
is the type of the values returned from key
function passed to grouped
.
For more complex tasks, Transducers.jl and OnlineStats.jl can also be used:
foldl.(op, xf, grouped(...); [init])
foldxl.(op, [xf,] grouped(...); [init])
foldxt.(op, [xf,] grouped(...); [init])
(multi-threaded)foldxd.(op, [xf,] grouped(...); [init])
(distributed)collect.(xf, grouped(...))
tcollect.(xf, grouped(...))
(multi-threaded version ofcollect
)dcollect.(xf, grouped(...))
(distributed version ofcollect
)
where xf::Transducer
is initiated for each group individually and op
is either a two-argument function or an OnlineStat
object (e.g., OnlineStats.Mean
).
Caveats
The dot-call syntax is used for defining the "domain-specific language" (DSL) and it is different from the standard semantics of broadcasting on arrays. In particular, reducer.(..., grouped(key, collection), ...)
may not actually call reducer
. Rather, it is pattern-matched and dispatched to an alternative definition based on Transducers.jl.
Implementation
LazyGroupBy.jl is implemented as a direct transformation to foldl
/foldxt
/foldxd
and GroupBy
from Transducer.jl. Consider
foldl.(rf, xf, grouped(key, collection); init = init)
This is simply translated to
foldl(right, GroupBy(key, xf, rf, init), collection)
Other reducers like sum
and collect
are implemented in terms of above transformation.
Base.all
— Functionall.(f, grouped(key, array))
Examples
julia> using LazyGroupBy
julia> xs = [0, 7, 3];
julia> gs = all.(<(1), grouped(isodd, xs))
Transducers.GroupByViewDict{Bool,Bool,…} with 2 entries:
false => true
true => false
Base.any
— Functionany.(f, grouped(key, array))
Examples
julia> using LazyGroupBy
julia> xs = [0, 7, 3];
julia> gs = any.(>(5), grouped(isodd, xs))
Transducers.GroupByViewDict{Bool,Bool,…} with 2 entries:
false => false
true => true
Base.collect
— Functioncollect.([xf,] grouped(key, collection))
Collect each group as a Vector
.
The first optional argument xf
is a transducer.
Example
julia> using LazyGroupBy
julia> collect.(grouped(isodd, [0, 7, 3]))
Transducers.GroupByViewDict{Bool,Array{Int64,1},…} with 2 entries:
false => [0]
true => [7, 3]
Base.count
— Functioncount.([f,] grouped(key, collection))
Count number of items f
is evaluated to true
in each group.
Example
julia> using LazyGroupBy
julia> count.(<(5), grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Int64,…} with 2 entries:
false => 3
true => 3
Base.extrema
— Functionextrema.([f,] grouped(key, collection); [init])
Examples
julia> using LazyGroupBy
julia> xs = [0, 7, 2, 3];
julia> extrema.(grouped(isodd, xs))
Transducers.GroupByViewDict{Bool,Tuple{Int64,Int64},…} with 2 entries:
false => (0, 2)
true => (3, 7)
Base.findall
— Functionfindall.(f, grouped(key, array))
Examples
julia> using LazyGroupBy
julia> xs = [0, 7, 2, 3];
julia> gs = findall.(>(1), grouped(isodd, xs))
Transducers.GroupByViewDict{Bool,Array{Int64,1},…} with 2 entries:
false => [3]
true => [2, 4]
julia> xs[gs[false]]
1-element Array{Int64,1}:
2
julia> xs[gs[true]]
2-element Array{Int64,1}:
7
3
Base.findfirst
— Functionfindfirst.(f, grouped(key, array))
Examples
julia> using LazyGroupBy
julia> xs = [0, 7, 2, 3];
julia> gs = findfirst.(>(1), grouped(isodd, xs))
Transducers.GroupByViewDict{Bool,Int64,…} with 2 entries:
false => 3
true => 2
julia> xs[gs[false]]
2
julia> xs[gs[true]]
7
Base.findlast
— Functionfindlast.(f, grouped(key, array))
Examples
julia> using LazyGroupBy
julia> xs = [0, 7, 2, 3];
julia> gs = findlast.(<(5), grouped(isodd, xs))
Transducers.GroupByViewDict{Bool,Int64,…} with 2 entries:
false => 3
true => 4
julia> xs[gs[false]]
2
julia> xs[gs[true]]
3
Base.foldl
— Functionfoldl.(op, [xf,] grouped(key, collection); [init])
foldl.(os::OnlineStat, [xf,] grouped(key, collection); [init])
The first argument is either a reducing step function or an OnlineStat
. The second optional argument xf
is a transducer.
Examples
julia> using LazyGroupBy
julia> foldl.(tuple, grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Any,…} with 2 entries:
false => ((0, 4), 0)
true => ((((((7, 3), 1), 5), 9), 3), 5)
julia> using OnlineStats
julia> foldl.(Ref(Mean()), grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Mean{Float64,EqualWeight},…} with 2 entries:
false => Mean: n=3 | value=1.33333
true => Mean: n=7 | value=4.71429
Base.keys
— Functionkeys.(grouped(key, indexable))
Return a dictionary whose value is a vector of keys to the indexable
input collection.
Example
julia> using LazyGroupBy
julia> keys.(grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Array{Int64,1},…} with 2 entries:
false => [1, 7, 9]
true => [2, 3, 4, 5, 6, 8, 10]
julia> keys.(grouped(isodd, Dict(zip('a':'e', 1:5))))
Transducers.GroupByViewDict{Bool,Array{Char,1},…} with 2 entries:
false => ['d', 'b']
true => ['a', 'c', 'e']
Base.length
— Functionlength.(grouped(key, collection))
Count number of items in each group. This is defined as count.(_ -> true, grouped(key, collection))
rather than materializing each group vector.
Example
julia> using LazyGroupBy
julia> length.(grouped(isodd, 1:7))
Transducers.GroupByViewDict{Bool,Int64,…} with 2 entries:
false => 3
true => 4
Base.map
— Functionmap.(f, grouped(key, collection))
Like collect.(grouped(key, collection))
, but process each item with f
.
Examples
julia> using LazyGroupBy
julia> map.(string, grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Array{String,1},…} with 2 entries:
false => ["0", "4", "0"]
true => ["7", "3", "1", "5", "9", "3", "5"]
Base.mapfoldl
— Functionmapfoldl.(f, op, grouped(key, collection); [init])
Examples
julia> using LazyGroupBy
julia> extrema_rf((min1, max1), (min2, max2)) = (min(min1, min2), max(max1, max2));
julia> mapfoldl.(x -> (x, x), extrema_rf, grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Tuple{Int64,Int64},…} with 2 entries:
false => (0, 4)
true => (1, 9)
Base.maximum
— Functionmaximum.([f,] grouped(key, collection); [init])
Examples
julia> using LazyGroupBy
julia> maximum.(grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Int64,…} with 2 entries:
false => 4
true => 9
Base.minimum
— Functionminimum.([f,] grouped(key, collection); [init])
Examples
julia> using LazyGroupBy
julia> minimum.(grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Int64,…} with 2 entries:
false => 0
true => 1
Base.pairs
— Functionpairs.(grouped(key, indexable))
Return a dictionary whose value is a vector of keys to the indexable
input collection.
Example
julia> using LazyGroupBy
julia> pairs.(grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Dict{Int64,Int64},…} with 2 entries:
false => Dict(7=>4,9=>0,1=>0)
true => Dict(4=>1,10=>5,2=>7,3=>3,5=>5,8=>3,6=>9)
julia> pairs.(grouped(isodd, Dict(zip('a':'e', 1:5))))
Transducers.GroupByViewDict{Bool,Dict{Char,Int64},…} with 2 entries:
false => Dict('d'=>4,'b'=>2)
true => Dict('a'=>1,'c'=>3,'e'=>5)
Base.prod
— Functionprod.([f,] grouped(key, collection); [prod])
Examples
julia> using LazyGroupBy
julia> prod.(grouped(isodd, [7, 3, 1, 5, 9, 4, 3, 5]))
Transducers.GroupByViewDict{Bool,Int64,…} with 2 entries:
false => 4
true => 14175
Base.sum
— Functionsum.([f,] grouped(key, collection); [init])
Examples
julia> using LazyGroupBy
julia> sum.(grouped(isodd, [7, 3, 1, 5, 9, 4, 3, 5]))
Transducers.GroupByViewDict{Bool,Int64,…} with 2 entries:
false => 4
true => 33
Base.view
— Functionview.(grouped(key, array))
Like collect.(grouped(key, array))
, but return a mutable view to the input array
.
Examples
julia> using LazyGroupBy
julia> xs = [0, 7, 3];
julia> gs = view.(grouped(isodd, xs))
Dict{Bool,SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}} with 2 entries:
false => [0]
true => [7, 3]
julia> gs[false][end] = 111;
julia> xs
3-element Array{Int64,1}:
111
7
3
LazyGroupBy.grouped
— Methodgrouped(key, collection)
Create a lazy associative (dict-like) object grouped by a function key
. Actual per-group reduction can be initiated by the dot-call (broadcasting) of the "reducers" like foldl
and reduce
.
Examples
julia> using LazyGroupBy
julia> length.(grouped(isodd, 1:7))
Transducers.GroupByViewDict{Bool,Int64,…} with 2 entries:
false => 3
true => 4
Statistics.mean
— Functionmean.([f,] grouped(key, collection))
Compute mean
of each group.
Example
julia> using LazyGroupBy, Statistics
julia> mean.(grouped(isodd, 1:7))
Dict{Bool,Float64} with 2 entries:
false => 4.0
true => 4.0
Statistics.std
— Functionstd.([f,] grouped(key, collection))
Compute standard deviation of each group.
Example
julia> using LazyGroupBy, Statistics
julia> std.(grouped(isodd, 1:10))
Dict{Bool,Float64} with 2 entries:
false => 3.16228
true => 3.16228
Statistics.var
— Functionvar.([f,] grouped(key, collection))
Compute variance of each group.
Example
julia> using LazyGroupBy, Statistics
julia> var.(grouped(isodd, 1:10))
Dict{Bool,Float64} with 2 entries:
false => 10.0
true => 10.0
Transducers.dcollect
— Functiondcollect.([xf,] grouped(key, collection))
Collect each group as a Vector
using Distributed.jl.
The first optional argument xf
is a transducer.
Example
julia> using LazyGroupBy
using Transducers
julia> dcollect.(grouped(isodd, [0, 7, 3]))
Transducers.GroupByViewDict{Bool,Array{Int64,1},…} with 2 entries:
false => [0]
true => [7, 3]
Transducers.foldxd
— Functionfoldxd.(op, [xf,] grouped(key, collection); [init])
foldxd.(os::OnlineStat, [xf,] grouped(key, collection); [init])
The first argument is either a reducing step function or an OnlineStat
. The second optional argument xf
is a transducer.
Examples
julia> using LazyGroupBy
using Transducers
julia> foldxd.(+, grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Int64,…} with 2 entries:
false => 4
true => 33
Transducers.foldxt
— Functionfoldxt.(op, [xf,] grouped(key, collection); [init])
foldxt.(os::OnlineStat, [xf,] grouped(key, collection); [init])
The first argument is either a reducing step function or an OnlineStat
. The second optional argument xf
is a transducer.
Examples
julia> using LazyGroupBy, Transducers
julia> foldxt.(max, grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Int64,…} with 2 entries:
false => 4
true => 9
julia> using OnlineStats
julia> foldxt.(Ref(Mean()), grouped(isodd, [0, 7, 3, 1, 5, 9, 4, 3, 0, 5]))
Transducers.GroupByViewDict{Bool,Mean{Float64,EqualWeight},…} with 2 entries:
false => Mean: n=3 | value=1.33333
true => Mean: n=7 | value=4.71429
An example for calculating the minimum, maximum, and number of each group in one go:
julia> table = ((k = gcd(v, 42), v = v) for v in 1:100);
julia> collect(Iterators.take(table, 5)) # preview
5-element Array{NamedTuple{(:k, :v),Tuple{Int64,Int64}},1}:
(k = 1, v = 1)
(k = 2, v = 2)
(k = 3, v = 3)
(k = 2, v = 4)
(k = 1, v = 5)
julia> counter = reducingfunction(Map(_ -> 1), +);
julia> foldxt.(TeeRF(min, max, counter), Map(x -> x.v), grouped(x -> x.k, table))
Transducers.GroupByViewDict{Int64,Tuple{Int64,Int64,Int64},…} with 8 entries:
7 => (7, 91, 5)
14 => (14, 98, 5)
42 => (42, 84, 2)
2 => (2, 100, 29)
3 => (3, 99, 15)
21 => (21, 63, 2)
6 => (6, 96, 14)
1 => (1, 97, 28)
Transducers.tcollect
— Functiontcollect.([xf,] grouped(key, collection))
Collect each group as a Vector
using multiple threads. See also collect.(grouped(key, collection))
.
The first optional argument xf
is a transducer.
Example
julia> using LazyGroupBy
using Transducers
julia> tcollect.(grouped(isodd, [0, 7, 3]))
Transducers.GroupByViewDict{Bool,Array{Int64,1},…} with 2 entries:
false => [0]
true => [7, 3]