Function for applying rule packs to data.
expose(.tbl, ..., .rule_sep = inside_punct("\\._\\."),
.remove_obeyers = TRUE, .guess = TRUE)
Data frame of interest.
Rule packs. They can be in pure form or inside a list (at any depth).
Regular expression used as separator between column and rule names in col packs and cell packs.
Whether to remove elements which obey rules from report.
Whether to guess type of unsupported rule pack type (see Details).
A .tbl
with possibly added 'exposure' attribute containing the
resulting exposure. If .tbl
already contains 'exposure' attribute then
the result is binded with it.
expose()
applies all supplied rule packs to data, creates an
exposure object based on results and stores it to attribute 'exposure'.
It is guaranteed that .tbl
is not modified in any other way in order to
use expose()
inside a pipe
.
It is a good idea to name all rule packs: explicitly in ...
(if they are
supplied not inside list) or during creation with respective rule pack
function. In case of missing name it is imputed based on possibly existing
exposure attribute in .tbl
and supplied rule packs. Imputation is similar
to one in rules()
but applied to every pack type separately.
Default value for .rule_sep
is the regular expression characters ._. surrounded by non alphanumeric characters
. It is picked to be used
smoothly with dplyr
's scoped verbs and rules()
instead
of pure list. In most cases it shouldn't be changed but if needed it
should align with .prefix
in rules()
.
To work properly in some edge cases one should specify pack types with
appropriate function. However with .guess
equals to TRUE
expose
will guess the pack type based on its output after applying to
.tbl
. It uses the following features:
Presence of non-logical columns: if present then the guess is group pack. Grouping columns are guessed as all non-logical. This works incorrectly if some grouping column is logical: it will be guessed as result of applying the rule. Note that on most occasions this edge case will produce error about grouping columns define non-unique levels.
Combination of whether number of rows equals 1 (n_rows_one
) and
presence of .rule_sep
in all column names (all_contain_sep
). Guesses
are:
Data pack if n_rows_one == TRUE
and all_contain_sep == FALSE
.
Column pack if n_rows_one == TRUE
and
all_contain_sep == TRUE
.
Row pack if n_rows_one == FALSE
and all_contain_sep == FALSE
. This works incorrectly if output has one row which is checked.
In this case it will be guessed as data pack.
Cell pack if n_rows_one == FALSE
and all_contain_sep == TRUE
. This works incorrectly if output has one row in which cells
are checked. In this case it will be guessed as column pack.
my_rule_pack <- . %>% dplyr::summarise(nrow_neg = nrow(.) < 0)
my_data_packs <- data_packs(my_data_pack_1 = my_rule_pack)
# These pipes give identical results
mtcars %>%
expose(my_data_packs) %>%
get_report()
#> Tidy data validation report:
#> # A tibble: 1 × 5
#> pack rule var id value
#> <chr> <chr> <chr> <int> <lgl>
#> 1 my_data_pack_1 nrow_neg .all 0 FALSE
mtcars %>%
expose(my_data_pack_1 = my_rule_pack) %>%
get_report()
#> Tidy data validation report:
#> # A tibble: 1 × 5
#> pack rule var id value
#> <chr> <chr> <chr> <int> <lgl>
#> 1 my_data_pack_1 nrow_neg .all 0 FALSE
# This throws an error because no pack type is specified for my_rule_pack
if (FALSE) {
mtcars %>% expose(my_data_pack_1 = my_rule_pack, .guess = FALSE)
}
# Edge cases against using 'guess = TRUE' for robust code
group_rule_pack <- . %>%
dplyr::mutate(vs_one = vs == 1) %>%
dplyr::group_by(vs_one, am) %>%
dplyr::summarise(n_low = dplyr::n() > 10)
group_rule_pack_dummy <- . %>%
dplyr::mutate(vs_one = vs == 1) %>%
dplyr::group_by(mpg, vs_one, wt) %>%
dplyr::summarise(n_low = dplyr::n() > 10)
row_rule_pack <- . %>% dplyr::transmute(neg_row_sum = rowSums(.) < 0)
cell_rule_pack <- . %>% dplyr::transmute_all(rules(neg_value = . < 0))
# Only column 'am' is guessed as grouping which defines non-unique levels.
if (FALSE) {
mtcars %>%
expose(group_rule_pack, .remove_obeyers = FALSE, .guess = TRUE) %>%
get_report()
}
# Values in `var` should contain combination of three grouping columns but
# column 'vs_one' is guessed as rule. No error is thrown because the guessed
# grouping column define unique levels.
mtcars %>%
expose(group_rule_pack_dummy, .remove_obeyers = FALSE, .guess = TRUE) %>%
get_report()
#> Tidy data validation report:
#> # A tibble: 64 × 5
#> pack rule var id value
#> <chr> <chr> <chr> <int> <lgl>
#> 1 group_pack__1 vs_one 10.4.5.25 0 FALSE
#> 2 group_pack__1 vs_one 10.4.5.424 0 FALSE
#> 3 group_pack__1 vs_one 13.3.3.84 0 FALSE
#> 4 group_pack__1 vs_one 14.3.3.57 0 FALSE
#> 5 group_pack__1 vs_one 14.7.5.345 0 FALSE
#> 6 group_pack__1 vs_one 15.3.57 0 FALSE
#> 7 group_pack__1 vs_one 15.2.3.435 0 FALSE
#> 8 group_pack__1 vs_one 15.2.3.78 0 FALSE
#> 9 group_pack__1 vs_one 15.5.3.52 0 FALSE
#> 10 group_pack__1 vs_one 15.8.3.17 0 FALSE
#> # ℹ 54 more rows
# Results should have in column 'id' value 1 and not 0.
mtcars %>%
dplyr::slice(1) %>%
expose(row_rule_pack) %>%
get_report()
#> Tidy data validation report:
#> # A tibble: 1 × 5
#> pack rule var id value
#> <chr> <chr> <chr> <int> <lgl>
#> 1 data_pack__1 neg_row_sum .all 0 FALSE
mtcars %>%
dplyr::slice(1) %>%
expose(cell_rule_pack) %>%
get_report()
#> Tidy data validation report:
#> # A tibble: 11 × 5
#> pack rule var id value
#> <chr> <chr> <chr> <int> <lgl>
#> 1 col_pack__1 neg_value mpg 0 FALSE
#> 2 col_pack__1 neg_value cyl 0 FALSE
#> 3 col_pack__1 neg_value disp 0 FALSE
#> 4 col_pack__1 neg_value hp 0 FALSE
#> 5 col_pack__1 neg_value drat 0 FALSE
#> 6 col_pack__1 neg_value wt 0 FALSE
#> 7 col_pack__1 neg_value qsec 0 FALSE
#> 8 col_pack__1 neg_value vs 0 FALSE
#> 9 col_pack__1 neg_value am 0 FALSE
#> 10 col_pack__1 neg_value gear 0 FALSE
#> 11 col_pack__1 neg_value carb 0 FALSE