Cell rule pack is a rule pack which defines a set of rules for cells, i.e. functions which convert cells of interest to logical values. It should return a data frame with the following properties:
Number of rows equals to number of rows for checked cells.
Column names should be treated as concatenation of 'column name of check cell' + 'separator' + 'rule name'
Values indicate whether the cell follows the rule.
This format is inspired by scoped variants of transmute().
The most common way to define cell pack is by creating a functional sequence containing one of:
transmute_all(.funs = rules(...))
.
transmute_if(.predicate, .funs = rules(...))
.
transmute_at(.vars, .funs = rules(...))
.
Note that (as of dplyr
version 0.7.4) when only one column is
transmuted, names of the output don't have a necessary structure. The 'column
name of check cell' is missing which results (after exposure)
into empty string in var
column of validation report. The
current way of dealing with this is to name the input column (see examples).
Using rules()
to create list of functions for scoped dplyr
"mutating"
verbs (such as summarise_all() and
transmute_all()) is recommended because:
It is a convenient way to ensure consistent naming of rules without manual name.
It adds a common prefix to all rule names. This helps in defining separator as prefix surrounded by any number of non-alphanumeric values.
Note that during exposure packs are applied to keyed object with id key. So they can rearrange rows as long as it is done with functions supported by keyholder. Rows will be tracked and recognized as in the original data frame of interest.
cell_outlier_rules <- . %>% dplyr::transmute_at(
c("disp", "qsec"),
rules(z_score = abs(. - mean(.)) / sd(.) > 1)
)
cell_packs(outlier = cell_outlier_rules)
#> $outlier
#> A Cell rule pack:
#> Functional sequence with the following components:
#>
#> 1. dplyr::transmute_at(., c("disp", "qsec"), rules(z_score = abs(. - mean(.))/sd(.) > 1))
#>
#> Use 'functions' to extract the individual functions.
#>
# Dealing with one column edge case
improper_pack <- . %>% dplyr::transmute_at(
dplyr::vars(vs),
rules(improper_is_neg = . < 0)
)
proper_pack <- . %>% dplyr::transmute_at(
dplyr::vars(vs = vs),
rules(proper_is_neg = . < 0)
)
mtcars[1:2, ] %>%
expose(cell_packs(improper_pack, proper_pack)) %>%
get_report()
#> Tidy data validation report:
#> # A tibble: 4 × 5
#> pack rule var id value
#> <chr> <chr> <chr> <int> <lgl>
#> 1 cell_pack__1 improper_is_neg "" 1 FALSE
#> 2 cell_pack__1 improper_is_neg "" 2 FALSE
#> 3 cell_pack__2 proper_is_neg "vs" 1 FALSE
#> 4 cell_pack__2 proper_is_neg "vs" 2 FALSE