Functions for ordering the set of pdqr-functions supplied in a list. This might be useful for doing comparative statistical inference for several groups of data.
summ_order(f_list, method = "compare", decreasing = FALSE)
summ_sort(f_list, method = "compare", decreasing = FALSE)
summ_rank(f_list, method = "compare")
f_list | List of pdqr-functions. |
---|---|
method | Method to be used for ordering. Should be one of "compare", "mean", "median", "mode", "midrange". |
decreasing | If |
summ_order()
works essentially like order(). It
returns an integer vector representing a permutation which rearranges
f_list
in desired order.
summ_sort()
returns a sorted (in desired order) variant of f_list
.
summ_rank()
returns a numeric vector representing ranks of f_list
elements: 1 for the "smallest", length(f_list)
for the "biggest".
Ties for all methods are handled so as to preserve the original order.
Method "compare" is using the following ordering relation: pdqr-function f
is greater than g
if and only if P(f >= g) > 0.5
, or in code
summ_prob_true(f >= g) > 0.5
(see pdqr methods for "Ops" group generic family for more details on comparing pdqr-functions).
This method orders input based on this relation and order()
function. Notes:
This relation doesn't define strictly ordering because it is not
transitive: there can be pdqr-functions f
, g
, and h
, for which f
is
greater than g
, g
is greater than h
, and h
is greater than f
(but
should be otherwise). If not addressed, this might result into dependence of
output on order of the input. It is solved by first preordering f_list
based on method "mean" and then calling order()
.
Because comparing two pdqr-functions can be time consuming, this method
becomes rather slow as number of f_list
elements grows.
Methods "mean", "median", "mode", and "midrange" are based on
summ_center()
: ordering of f_list
is defined as ordering of corresponding
measures of distribution's center.
Other summary functions:
summ_center()
,
summ_classmetric()
,
summ_distance()
,
summ_entropy()
,
summ_hdr()
,
summ_interval()
,
summ_moment()
,
summ_prob_true()
,
summ_pval()
,
summ_quantile()
,
summ_roc()
,
summ_separation()
,
summ_spread()
#> [1] 3 1 2summ_sort(f_list)
#> $c
#> Density function of continuous type
#> Support: [-1, 0] (10000 intervals)
#>
#> $a
#> Density function of continuous type
#> Support: [0, 1] (10000 intervals)
#>
#> $b
#> Density function of continuous type
#> Support: [1, 2] (10000 intervals)
#> summ_rank(f_list)
#> a b c
#> 2 3 1
# All methods might give different results on some elaborated pdqr-functions
# Methods "compare" and "mean" are not equivalent
non_mean_list <- list(
new_d(data.frame(x = c(0.56, 0.815), y = c(1, 1)), "continuous"),
new_d(data.frame(x = 0:1, y = c(0, 1)), "continuous")
)
summ_order(non_mean_list, method = "compare")
#> [1] 1 2summ_order(non_mean_list, method = "mean")
#> [1] 2 1
# Methods powered by `summ_center()` are not equivalent
m <- c(0, 0.2, 0.1)
s <- c(1.1, 1.2, 1.3)
dlnorm_list <- lapply(seq_along(m), function(i) {
as_d(dlnorm, meanlog = m[i], sdlog = s[i])
})
summ_order(dlnorm_list, method = "mean")
#> [1] 1 2 3summ_order(dlnorm_list, method = "median")
#> [1] 1 3 2summ_order(dlnorm_list, method = "mode")
#> [1] 3 2 1
# Method "compare" handles inherited non-transitivity. Here third element is
# "greater" than second (`P(f >= g) > 0.5`), second - than first, and first
# is "greater" than third.
non_trans_list <- list(
new_d(data.frame(x = c(0.39, 0.44, 0.46), y = c(17, 14, 0)), "continuous"),
new_d(data.frame(x = c(0.05, 0.3, 0.70), y = c(4, 0, 4)), "continuous"),
new_d(data.frame(x = c(0.03, 0.40, 0.80), y = c(1, 1, 1)), "continuous")
)
summ_sort(non_trans_list)
#> [[1]]
#> Density function of continuous type
#> Support: [0.05, 0.7] (2 intervals)
#>
#> [[2]]
#> Density function of continuous type
#> Support: [0.03, 0.8] (2 intervals)
#>
#> [[3]]
#> Density function of continuous type
#> Support: [0.39, 0.46] (2 intervals)
#> #> [[1]]
#> Density function of continuous type
#> Support: [0.05, 0.7] (2 intervals)
#>
#> [[2]]
#> Density function of continuous type
#> Support: [0.03, 0.8] (2 intervals)
#>
#> [[3]]
#> Density function of continuous type
#> Support: [0.39, 0.46] (2 intervals)
#>