Column rule pack is a rule pack which defines a set of rules for columns as a whole, i.e. functions which convert columns of interest to logical values. It should return a data frame with the following properties:
Number of rows equals to one.
Column names should be treated as concatenation of 'check column name' + 'separator' + 'rule name'.
Values indicate whether the column as a whole follows the rule.
This format is inspired by dplyr
's
scoped variants of summarise() applied to non-grouped
data.
The most common way to define column pack is by creating a functional sequence with no grouping and ending with one of:
summarise_all(.funs = rules(...))
.
summarise_if(.predicate, .funs = rules(...))
.
summarise_at(.vars, .funs = rules(...))
.
Note that (as of dplyr
version 0.7.4) when only one column is
summarised, names of the output don't have a necessary structure. The 'check
column name' is missing which results (after exposure) into empty
string in var
column of validation report. The current way
of dealing with this is to name the input column (see examples).
Using rules()
to create list of functions for scoped dplyr
"mutating"
verbs (such as summarise_all() and
transmute_all()) is recommended because:
It is a convenient way to ensure consistent naming of rules without manual name.
It adds a common prefix to all rule names. This helps in defining separator as prefix surrounded by any number of non-alphanumeric values.
# Validating present columns
numeric_column_rules <- . %>% dplyr::summarise_if(
is.numeric,
rules(mean(.) > 5, sd(.) < 10)
)
character_column_rules <- . %>% dplyr::summarise_if(
is.character,
rules(. %in% letters[1:4])
)
col_packs(
num_col = numeric_column_rules,
chr_col = character_column_rules
)
#> $num_col
#> A Column rule pack:
#> Functional sequence with the following components:
#>
#> 1. dplyr::summarise_if(., is.numeric, rules(mean(.) > 5, sd(.) < 10))
#>
#> Use 'functions' to extract the individual functions.
#>
#> $chr_col
#> A Column rule pack:
#> Functional sequence with the following components:
#>
#> 1. dplyr::summarise_if(., is.character, rules(. %in% letters[1:4]))
#>
#> Use 'functions' to extract the individual functions.
#>
# Dealing with one column edge case
improper_pack <- . %>% dplyr::summarise_at(
dplyr::vars(vs),
rules(improper_is_chr = is.character)
)
proper_pack <- . %>% dplyr::summarise_at(
dplyr::vars(vs = vs),
rules(proper_is_chr = is.character)
)
mtcars %>%
expose(col_packs(improper_pack, proper_pack)) %>%
get_report()
#> Tidy data validation report:
#> # A tibble: 2 × 5
#> pack rule var id value
#> <chr> <chr> <chr> <int> <lgl>
#> 1 col_pack__1 improper_is_chr "" 0 FALSE
#> 2 col_pack__2 proper_is_chr "vs" 0 FALSE