DataTools
DataTools.DataTools
DataTools.averaging
DataTools.firstitem
DataTools.firstitems
DataTools.inc1
DataTools.lastitem
DataTools.lastitems
DataTools.meanvar
DataTools.modifying
DataTools.nitems
DataTools.oncol
DataTools.rightif
DataTools.DataTools
— ModuleDataTools: manipulating flat tables and nested data structures using Transducers.jl
julia> using DataTools: oncol, modifying, averaging
julia> using Transducers: Filter
julia> data = [(a = 1, b = 7), (a = 2, b = 3), (a = 3, b = 4)];
julia> rf = oncol(a = +, b = averaging);
julia> foldl(rf, Filter(x -> isodd(x.a)), data)
(a = 4, b = 5.5)
julia> map(modifying(a = string), data)
3-element Array{NamedTuple{(:a, :b),Tuple{String,Int64}},1}:
(a = "1", b = 7)
(a = "2", b = 3)
(a = "3", b = 4)
julia> reduce(modifying(a = +), data)
(a = 6, b = 7)
julia> using Accessors: @optic
julia> data = [(a = ((b = 1,), 2),), (a = ((b = 3,), 4),)];
julia> map(modifying(@optic(_.a[1].b) => x -> 10x), data)
2-element Array{NamedTuple{(:a,),Tuple{Tuple{NamedTuple{(:b,),Tuple{Int64}},Int64}}},1}:
(a = ((b = 10,), 2),)
(a = ((b = 30,), 4),)
DataTools.averaging
— Functionaveraging
A reducing function for averaging elements.
Examples
julia> using DataTools
using Transducers
julia> foldl(averaging, Filter(isodd), 1:10)
5.0
julia> rf = oncol(a = averaging, b = averaging);
julia> foldl(rf, Map(identity), [(a = 1, b = 2), (a = 2, b = 3)])
(a = 1.5, b = 2.5)
DataTools.firstitem
— Functionfirstitem(xs)
Get the first item of xs
. Consume xs
if necessary.
Examples
julia> using DataTools, Transducers
julia> firstitem(3:7)
3
julia> 3:7 |> Map(x -> x + 1) |> Filter(isodd) |> firstitem
5
DataTools.firstitems
— Functionfirstitems(xs, n::Integer)
firstitems(n::Integer) -> xs -> firstitems(xs, n)
Get the first n
items of xs
. Consume xs
if necessary.
DataTools.inc1
— Methodinc1(n, _) -> n + 1
A reducing function for counting elements. It increments the first argument by one.
Examples
julia> using DataTools
using Transducers
julia> inc1(10, :ignored)
11
julia> inc1(Init(inc1), :ignored)
1
julia> foldl(inc1, Map(identity), 'a':2:'e')
3
julia> foldl(TeeRF(+, inc1), Map(identity), 1:2:10) # sum and count
(25, 5)
julia> rf = oncol(:a => (+) => :sum, :a => inc1 => :count);
julia> foldl(rf, Map(identity), [(a = 1, b = 2), (a = 2, b = 3)])
(sum = 3, count = 2)
DataTools.lastitem
— Functionlastitem(xs)
Get the last item of xs
. Consume xs
if necessary.
Examples
julia> using DataTools, Transducers
julia> lastitem(3:7)
7
julia> 3:7 |> Map(x -> x + 1) |> Filter(isodd) |> lastitem
7
DataTools.lastitems
— Functionlastitems(xs, n::Integer)
lastitems(n::Integer) -> xs -> lastitems(xs, n)
Get the last n
items of xs
. Consume xs
if necessary.
DataTools.meanvar
— Functionmeanvar
A reducing function for computing the mean and variance.
Examples
julia> using DataTools, Transducers, Statistics
julia> acc = foldl(meanvar, Filter(isodd), 1:96)
MeanVarState(mean=48.0, var=784.0, count=48)
julia> acc.mean, mean(acc)
(48.0, 48.0)
julia> acc.var, var(acc), var(acc, corrected = false)
(784.0, 784.0, 767.6666666666666)
julia> acc.std, std(acc)
(28.0, 28.0)
julia> acc.count
48
julia> m, v, c = acc; # destructuring works
julia> Tuple(acc) # (mean, var, count)
(48.0, 784.0, 48)
julia> NamedTuple(acc)
(mean = 48.0, var = 784.0, count = 48)
julia> rf = oncol(a = meanvar, b = meanvar);
julia> foldl(rf, Map(identity), [(a = 1, b = 2), (a = 2, b = 3)])
(a = MeanVarState(mean=1.5, var=0.5, count=2), b = MeanVarState(mean=2.5, var=0.5, count=2))
DataTools.modifying
— Functionmodifying(; $property₁ = f₁, ..., $propertyₙ = fₙ) -> g::Function
modifying(lens₁ => f₁, ..., lensₙ => fₙ) -> g::Function
Create a function that runs function fᵢ
on the locations specified by propertyᵢ
or lensᵢ
.
The keyword-only method modifying(; a = f₁, b = f₂)
is equivalent to modifying(@len(_.a) => f₁, @len(_.b) => f₂)
.
The unary method g(x)
is equivalent to
x = modify(f₁, x, lens₁)
x = modify(f₂, x, lens₂)
...
x = modify(fₙ, x, lensₙ)
The binary method g(x, y)
is equivalent to
x = set(x, lens₁, f₁(lens₁(x)), lens₁(y))
x = set(x, lens₂, f₂(lens₂(x)), lens₂(y))
...
x = set(x, lensₙ, fₙ(lensₙ(x)), lensₙ(y))
Note that the locations that are not specified by the lenses keep the values as in x
. This is similar to how mergewith
behaves.
Examples
julia> using DataTools
julia> map(modifying(a = string), [(a = 1, b = 2), (a = 3, b = 4)])
2-element Array{NamedTuple{(:a, :b),Tuple{String,Int64}},1}:
(a = "1", b = 2)
(a = "3", b = 4)
julia> reduce(modifying(a = +), [(a = 1, b = 2), (a = 3, b = 4)])
(a = 4, b = 2)
julia> using Accessors
julia> map(modifying(@optic(_.a[1].b) => x -> 10x),
[(a = ((b = 1,), 2),), (a = ((b = 3,), 4),)])
2-element Array{NamedTuple{(:a,),Tuple{Tuple{NamedTuple{(:b,),Tuple{Int64}},Int64}}},1}:
(a = ((b = 10,), 2),)
(a = ((b = 30,), 4),)
DataTools.nitems
— Functionnitems(xs) -> n::Integer
Count number of items in xs
. Consume xs
if necessary.
Examples
julia> using DataTools, Transducers
julia> nitems(1:10)
10
julia> 1:10 |> Filter(isodd) |> Map(inv) |> nitems
5
DataTools.oncol
— Functiononcol(iname₁ => spec₁, ..., inameₙ => specₙ) -> f::Function
oncol(; $iname₁ = spec₁, ..., $inameₙ = specₙ) -> f::Function
Combine functions that work on a column and create a function that work on an entire row.
It constructs a reducing step function acting on a table row where specᵢ
is either a reducing step function or a Pair
of a reducing step function and an output column name.
It also defines a unary function when specᵢ
is either a unary function or a Pair
of a unary function and an output column name.
This function is inspired by the "Pair
notation" in DataFrames.jl (see also Split-apply-combine · DataFrames.jl and DataFrames.select
).
Examples
julia> using DataTools
using Transducers
julia> rf = oncol(a = +, b = *);
julia> foldl(rf, Map(identity), [(a = 1, b = 2), (a = 3, b = 4)])
(a = 4, b = 8)
julia> rf((a = 1, b = 2), (a = 3, b = 4))
(a = 4, b = 8)
julia> rf = oncol(:a => (+) => :sum, :a => max => :max);
julia> foldl(rf, Map(identity), [(a = 1,), (a = 2,)])
(sum = 3, max = 2)
julia> rf((sum = 1, max = 1), (a = 2,))
(sum = 3, max = 2)
julia> rf = oncol(:a => min, :a => max);
julia> foldl(rf, Map(identity), [(a = 2,), (a = 1,)])
(a_min = 1, a_max = 2)
julia> rf((a_min = 2, a_max = 2), (a = 1,))
(a_min = 1, a_max = 2)
julia> foldl(rf, Map(x -> (a = x,)), [5, 2, 6, 8, 3])
(a_min = 2, a_max = 8)
oncol
also defines a unary function
julia> f = oncol(a = string);
julia> f((a = 1, b = 2))
(a = "1",)
Note that oncol
does not verify the arity of input functions. If the input functions have unary and binary methods, oncol
is callable with both arities:
julia> f((a = 1, b = 2), (a = 3, b = 4))
(a = "13",)
DataTools.rightif
— Functionrightif(predicate, [focus = identity]) -> op::Function
Return a binary function that keeps the first argument unless predicate
evaluates to true
.
This is equivalent to
(l, r) -> predicate(focus(l), focus(r)) ? r : l
Examples
julia> using DataTools, Transducers
julia> table = 1:100 |> Map(x -> (k = gcd(x, 42), v = x));
julia> table |> Take(5) |> collect # preview
5-element Array{NamedTuple{(:k, :v),Tuple{Int64,Int64}},1}:
(k = 1, v = 1)
(k = 2, v = 2)
(k = 3, v = 3)
(k = 2, v = 4)
(k = 1, v = 5)
julia> foldl(rightif(<), Map(x -> x.k), table) # maximum
42
julia> foldl(rightif(>), Map(x -> x.k), table) # minimum
1
julia> foldl(rightif(<, x -> x.k), table) # first maximum
(k = 42, v = 42)
julia> foldl(rightif(<=, x -> x.k), table) # last maximum
(k = 42, v = 84)
julia> foldl(rightif(>, x -> x.k), table) # first minimum
(k = 1, v = 1)
julia> foldl(rightif(>=, x -> x.k), table) # last minimum
(k = 1, v = 97)
julia> table |> Scan(rightif(<, x -> x.k)) |> Take(5) |> collect
5-element Array{NamedTuple{(:k, :v),Tuple{Int64,Int64}},1}:
(k = 1, v = 1)
(k = 2, v = 2)
(k = 3, v = 3)
(k = 3, v = 3)
(k = 3, v = 3)