DataTools

DataTools.DataToolsModule

DataTools: manipulating flat tables and nested data structures using Transducers.jl

Dev GitHub Actions

julia> using DataTools: oncol, modifying, averaging

julia> using Transducers: Filter

julia> data = [(a = 1, b = 7), (a = 2, b = 3), (a = 3, b = 4)];

julia> rf = oncol(a = +, b = averaging);

julia> foldl(rf, Filter(x -> isodd(x.a)), data)
(a = 4, b = 5.5)

julia> map(modifying(a = string), data)
3-element Array{NamedTuple{(:a, :b),Tuple{String,Int64}},1}:
 (a = "1", b = 7)
 (a = "2", b = 3)
 (a = "3", b = 4)

julia> reduce(modifying(a = +), data)
(a = 6, b = 7)

julia> using Accessors: @optic

julia> data = [(a = ((b = 1,), 2),), (a = ((b = 3,), 4),)];

julia> map(modifying(@optic(_.a[1].b) => x -> 10x), data)
2-element Array{NamedTuple{(:a,),Tuple{Tuple{NamedTuple{(:b,),Tuple{Int64}},Int64}}},1}:
 (a = ((b = 10,), 2),)
 (a = ((b = 30,), 4),)
source
DataTools.averagingFunction
averaging

A reducing function for averaging elements.

Examples

julia> using DataTools
       using Transducers

julia> foldl(averaging, Filter(isodd), 1:10)
5.0

julia> rf = oncol(a = averaging, b = averaging);

julia> foldl(rf, Map(identity), [(a = 1, b = 2), (a = 2, b = 3)])
(a = 1.5, b = 2.5)
source
DataTools.firstitemFunction
firstitem(xs)

Get the first item of xs. Consume xs if necessary.

Examples

julia> using DataTools, Transducers

julia> firstitem(3:7)
3

julia> 3:7 |> Map(x -> x + 1) |> Filter(isodd) |> firstitem
5
source
DataTools.firstitemsFunction
firstitems(xs, n::Integer)
firstitems(n::Integer) -> xs -> firstitems(xs, n)

Get the first n items of xs. Consume xs if necessary.

source
DataTools.inc1Method
inc1(n, _) -> n + 1

A reducing function for counting elements. It increments the first argument by one.

Examples

julia> using DataTools
       using Transducers

julia> inc1(10, :ignored)
11

julia> inc1(Init(inc1), :ignored)
1

julia> foldl(inc1, Map(identity), 'a':2:'e')
3

julia> foldl(TeeRF(+, inc1), Map(identity), 1:2:10)  # sum and count
(25, 5)

julia> rf = oncol(:a => (+) => :sum, :a => inc1 => :count);

julia> foldl(rf, Map(identity), [(a = 1, b = 2), (a = 2, b = 3)])
(sum = 3, count = 2)
source
DataTools.lastitemFunction
lastitem(xs)

Get the last item of xs. Consume xs if necessary.

Examples

julia> using DataTools, Transducers

julia> lastitem(3:7)
7

julia> 3:7 |> Map(x -> x + 1) |> Filter(isodd) |> lastitem
7
source
DataTools.lastitemsFunction
lastitems(xs, n::Integer)
lastitems(n::Integer) -> xs -> lastitems(xs, n)

Get the last n items of xs. Consume xs if necessary.

source
DataTools.meanvarFunction
meanvar

A reducing function for computing the mean and variance.

Examples

julia> using DataTools, Transducers, Statistics

julia> acc = foldl(meanvar, Filter(isodd), 1:96)
MeanVarState(mean=48.0, var=784.0, count=48)

julia> acc.mean, mean(acc)
(48.0, 48.0)

julia> acc.var, var(acc), var(acc, corrected = false)
(784.0, 784.0, 767.6666666666666)

julia> acc.std, std(acc)
(28.0, 28.0)

julia> acc.count
48

julia> m, v, c = acc;  # destructuring works

julia> Tuple(acc)  # (mean, var, count)
(48.0, 784.0, 48)

julia> NamedTuple(acc)
(mean = 48.0, var = 784.0, count = 48)

julia> rf = oncol(a = meanvar, b = meanvar);

julia> foldl(rf, Map(identity), [(a = 1, b = 2), (a = 2, b = 3)])
(a = MeanVarState(mean=1.5, var=0.5, count=2), b = MeanVarState(mean=2.5, var=0.5, count=2))
source
DataTools.modifyingFunction
modifying(; $property₁ = f₁, ..., $propertyₙ = fₙ) -> g::Function
modifying(lens₁ => f₁, ..., lensₙ => fₙ) -> g::Function

Create a function that runs function fᵢ on the locations specified by propertyᵢ or lensᵢ.

The keyword-only method modifying(; a = f₁, b = f₂) is equivalent to modifying(@len(_.a) => f₁, @len(_.b) => f₂).

The unary method g(x) is equivalent to

x = modify(f₁, x, lens₁)
x = modify(f₂, x, lens₂)
...
x = modify(fₙ, x, lensₙ)

The binary method g(x, y) is equivalent to

x = set(x, lens₁, f₁(lens₁(x)), lens₁(y))
x = set(x, lens₂, f₂(lens₂(x)), lens₂(y))
...
x = set(x, lensₙ, fₙ(lensₙ(x)), lensₙ(y))

Note that the locations that are not specified by the lenses keep the values as in x. This is similar to how mergewith behaves.

Examples

julia> using DataTools

julia> map(modifying(a = string), [(a = 1, b = 2), (a = 3, b = 4)])
2-element Array{NamedTuple{(:a, :b),Tuple{String,Int64}},1}:
 (a = "1", b = 2)
 (a = "3", b = 4)

julia> reduce(modifying(a = +), [(a = 1, b = 2), (a = 3, b = 4)])
(a = 4, b = 2)

julia> using Accessors

julia> map(modifying(@optic(_.a[1].b) => x -> 10x),
           [(a = ((b = 1,), 2),), (a = ((b = 3,), 4),)])
2-element Array{NamedTuple{(:a,),Tuple{Tuple{NamedTuple{(:b,),Tuple{Int64}},Int64}}},1}:
 (a = ((b = 10,), 2),)
 (a = ((b = 30,), 4),)
source
DataTools.nitemsFunction
nitems(xs) -> n::Integer

Count number of items in xs. Consume xs if necessary.

Examples

julia> using DataTools, Transducers

julia> nitems(1:10)
10

julia> 1:10 |> Filter(isodd) |> Map(inv) |> nitems
5
source
DataTools.oncolFunction
oncol(iname₁ => spec₁, ..., inameₙ => specₙ) -> f::Function
oncol(; $iname₁ = spec₁, ..., $inameₙ = specₙ) -> f::Function

Combine functions that work on a column and create a function that work on an entire row.

It constructs a reducing step function acting on a table row where specᵢ is either a reducing step function or a Pair of a reducing step function and an output column name.

It also defines a unary function when specᵢ is either a unary function or a Pair of a unary function and an output column name.

This function is inspired by the "Pair notation" in DataFrames.jl (see also Split-apply-combine · DataFrames.jl and DataFrames.select).

Examples

julia> using DataTools
       using Transducers

julia> rf = oncol(a = +, b = *);

julia> foldl(rf, Map(identity), [(a = 1, b = 2), (a = 3, b = 4)])
(a = 4, b = 8)

julia> rf((a = 1, b = 2), (a = 3, b = 4))
(a = 4, b = 8)

julia> rf = oncol(:a => (+) => :sum, :a => max => :max);

julia> foldl(rf, Map(identity), [(a = 1,), (a = 2,)])
(sum = 3, max = 2)

julia> rf((sum = 1, max = 1), (a = 2,))
(sum = 3, max = 2)

julia> rf = oncol(:a => min, :a => max);

julia> foldl(rf, Map(identity), [(a = 2,), (a = 1,)])
(a_min = 1, a_max = 2)

julia> rf((a_min = 2, a_max = 2), (a = 1,))
(a_min = 1, a_max = 2)

julia> foldl(rf, Map(x -> (a = x,)), [5, 2, 6, 8, 3])
(a_min = 2, a_max = 8)

oncol also defines a unary function

julia> f = oncol(a = string);

julia> f((a = 1, b = 2))
(a = "1",)

Note that oncol does not verify the arity of input functions. If the input functions have unary and binary methods, oncol is callable with both arities:

julia> f((a = 1, b = 2), (a = 3, b = 4))
(a = "13",)
source
DataTools.rightifFunction
rightif(predicate, [focus = identity]) -> op::Function

Return a binary function that keeps the first argument unless predicate evaluates to true.

This is equivalent to

(l, r) -> predicate(focus(l), focus(r)) ? r : l

Examples

julia> using DataTools, Transducers

julia> table = 1:100 |> Map(x -> (k = gcd(x, 42), v = x));

julia> table |> Take(5) |> collect  # preview
5-element Array{NamedTuple{(:k, :v),Tuple{Int64,Int64}},1}:
 (k = 1, v = 1)
 (k = 2, v = 2)
 (k = 3, v = 3)
 (k = 2, v = 4)
 (k = 1, v = 5)

julia> foldl(rightif(<), Map(x -> x.k), table)  # maximum
42

julia> foldl(rightif(>), Map(x -> x.k), table)  # minimum
1

julia> foldl(rightif(<, x -> x.k), table)   # first maximum
(k = 42, v = 42)

julia> foldl(rightif(<=, x -> x.k), table)  # last maximum
(k = 42, v = 84)

julia> foldl(rightif(>, x -> x.k), table)   # first minimum
(k = 1, v = 1)

julia> foldl(rightif(>=, x -> x.k), table)  # last minimum
(k = 1, v = 97)

julia> table |> Scan(rightif(<, x -> x.k)) |> Take(5) |> collect
5-element Array{NamedTuple{(:k, :v),Tuple{Int64,Int64}},1}:
 (k = 1, v = 1)
 (k = 2, v = 2)
 (k = 3, v = 3)
 (k = 3, v = 3)
 (k = 3, v = 3)
source