Functions for conversion between long pair-value data (data frame with columns for pair identifiers and value column) and matrix.
long_to_mat(tbl, row_key, col_key, value = NULL, fill = NULL, silent = FALSE)
mat_to_long(mat, row_key, col_key, value, drop = FALSE)
tbl | Data frame with pair-value data. |
---|---|
row_key | String name of column for first key in pair. |
col_key | String name of column for second key in pair. |
value | String name of column for value (or |
fill | Value to fill for missing pairs. |
silent | Use |
mat | Matrix with pair-value data. |
drop | Use |
long_to_mat()
returns a matrix with selected values where row names
indicate first key in pair, col names - second.
mat_to_long()
returns a tibble
with three columns: the
one for first key in pair, the one for second, and the one for value.
Pair-value data is commonly used in description of pairs of objects. Pair is described by two keys (usually integer or character) and value is an object of arbitrary nature.
In long format there are at least three columns: for first key in pair, for second key and for value (might be more). In matrix format pair-value data is represented as matrix of values with row names as character representation of first key, column names - second key.
long_to_mat()
works as follows:
Pair identifiers are taken from columns with names row_key
(to be used as
row names) and col_key
(to be used as column names). Unique identifiers
(and future dimension names) are determined with levels2()
. This is a way
to target function on specific set of pairs by using factor columns. Note
that NA
s are treated as single unknown key and put on last place (in case
of non-factor).
Values are taken from column with name value
. Note that if value
has length 0 (typically NULL
) then long_to_mat()
will take first
non-key column. If there is no such column, it will use vector of dummy
values (NA
s or fill
s). In both cases a message is given if silent = FALSE
.
Output is a matrix with described row and column names. Value of pair "key_1" and "key_2" is stored at intersection of row "key_1" and "key_2". Note that in case of duplicated pairs the value from first occurrence is taken.
mat_to_long()
basically performs inverse operation to long_to_mat()
but
pair identifiers are always character. If drop = TRUE
it drops rows with
values (but not keys) being missing.
long_data <- data.frame(
key_1 = c("a", "a", "b"),
key_2 = c("c", "d", "c"),
val = 1:3,
stringsAsFactors = FALSE
)
mat_data <- long_data %>% long_to_mat("key_1", "key_2", "val")
print(mat_data)
#> c d
#> a 1 2
#> b 3 NA
# Converts to tibble
mat_data %>% mat_to_long("new_key_1", "new_key_2", "new_val")
#> # A tibble: 4 x 3
#> new_key_1 new_key_2 new_val
#> <chr> <chr> <int>
#> 1 a c 1
#> 2 a d 2
#> 3 b c 3
#> 4 b d NA
# Drops rows with valuus missing
mat_data %>% mat_to_long("new_key_1", "new_key_2", "new_val", drop = TRUE)
#> # A tibble: 3 x 3
#> new_key_1 new_key_2 new_val
#> <chr> <chr> <int>
#> 1 a c 1
#> 2 a d 2
#> 3 b c 3