Convert between long pair-value data and matrix — convert-pair-value • comperes

Functions for conversion between long pair-value data (data frame with columns for pair identifiers and value column) and matrix.

long_to_mat(tbl, row_key, col_key, value = NULL, fill = NULL, silent = FALSE)

mat_to_long(mat, row_key, col_key, value, drop = FALSE)

Arguments

tbl	Data frame with pair-value data.
row_key	String name of column for first key in pair.
col_key	String name of column for second key in pair.
value	String name of column for value (or `NULL` for `long_to_mat()`).
fill	Value to fill for missing pairs.
silent	Use `TRUE` to omit message about guessed value column (see Details).
mat	Matrix with pair-value data.
drop	Use `TRUE` to drop rows with missing value (see Details).

Value

long_to_mat() returns a matrix with selected values where row names indicate first key in pair, col names - second.

mat_to_long() returns a tibble with three columns: the one for first key in pair, the one for second, and the one for value.

Details

Pair-value data is commonly used in description of pairs of objects. Pair is described by two keys (usually integer or character) and value is an object of arbitrary nature.

In long format there are at least three columns: for first key in pair, for second key and for value (might be more). In matrix format pair-value data is represented as matrix of values with row names as character representation of first key, column names - second key.

long_to_mat() works as follows:

Pair identifiers are taken from columns with names row_key (to be used as row names) and col_key (to be used as column names). Unique identifiers (and future dimension names) are determined with levels2(). This is a way to target function on specific set of pairs by using factor columns. Note that NAs are treated as single unknown key and put on last place (in case of non-factor).
Values are taken from column with name value. Note that if value has length 0 (typically NULL) then long_to_mat() will take first non-key column. If there is no such column, it will use vector of dummy values (NAs or fills). In both cases a message is given if silent = FALSE.
Output is a matrix with described row and column names. Value of pair "key_1" and "key_2" is stored at intersection of row "key_1" and "key_2". Note that in case of duplicated pairs the value from first occurrence is taken.

mat_to_long() basically performs inverse operation to long_to_mat() but pair identifiers are always character. If drop = TRUE it drops rows with values (but not keys) being missing.

Examples

long_data <- data.frame(
  key_1 = c("a", "a", "b"),
  key_2 = c("c", "d", "c"),
  val = 1:3,
  stringsAsFactors = FALSE
)

mat_data <- long_data %>% long_to_mat("key_1", "key_2", "val")
print(mat_data)
#>   c  d
#> a 1  2
#> b 3 NA

# Converts to tibble
mat_data %>% mat_to_long("new_key_1", "new_key_2", "new_val")
#> # A tibble: 4 x 3
#>   new_key_1 new_key_2 new_val
#>   <chr>     <chr>       <int>
#> 1 a         c               1
#> 2 a         d               2
#> 3 b         c               3
#> 4 b         d              NA

# Drops rows with valuus missing
mat_data %>% mat_to_long("new_key_1", "new_key_2", "new_val", drop = TRUE)
#> # A tibble: 3 x 3
#>   new_key_1 new_key_2 new_val
#>   <chr>     <chr>       <int>
#> 1 a         c               1
#> 2 a         d               2
#> 3 b         c               3