summ_hdr() computes a Highest Density Region (HDR) of some pdqr-function for a supplied level: a union of (closed) intervals total probability of which is not less than level and probability/density at any point inside it is bigger than some threshold (which should be maximum one with a property of HDR having total probability not less than level). This also represents a set of intervals with the lowest total width among all sets with total probability not less than a level.

summ_hdr(f, level = 0.95)

Arguments

f

A pdqr-function representing distribution.

level

A desired lower bound for a total probability of an output set of intervals.

Value

A data frame with one row representing one closed interval of HDR and the following columns:

  • left <dbl> : Left end of intervals.

  • right <dbl> : Right end of intervals.

Details

General algorithm of summ_hdr() consists from two steps:

  1. Find "target height". That is a value of probability/density which divides all support into two sets: the one with probability/density not less than target height (it is a desired HDR) and the other - with strictly less. The first set should also have total probability not less than level.

  2. Form a HDR as a set of closed intervals.

If f has "discrete" type, target height is computed by looking at "x" values of "x_tbl" metadata in order of decreasing probability until their total probability is not less than level. After that, all "x" values with probability not less than height are considered to form a HDR. Output is formed as a set of closed intervals (i.e. both edges included) inside of which lie all HDR "x" elements and others - don't.

If f has "continuous" type, target height is estimated as 1-level quantile of Y = d_f(X) distribution, where d_f is d-function corresponding to f (as_d(f) in other words) and X is a random variable represented by f. Essentially, Y has a distribution of f's density values and its 1-level quantile is a target height. After that, HDR is formed as a set of intervals with positive width (if level is more than 0, see Notes) inside which density is not less than target height.

Notes:

  • If level is 0, output has one interval of zero width at point of global mode.

  • If level is 1, output has one interval equal to support.

  • Computation of target height in case of "continuous" type is approximate which in some extreme cases (for example, like winsorized distributions) can lead to HDR having total probability very approximate to and even slightly lower than level.

  • If d-function has "plateaus" (consecutive values with equal probability/density) at computed target height, total probability of HDR can be considerably bigger than level (see examples). However, this aligns with HDR definition, as density values should be not less than target height and total probability should be not less than level.

See also

region_*() family of functions for working with output HDR.

summ_interval() for computing of single interval summary of distribution.

Other summary functions: summ_center(), summ_classmetric(), summ_distance(), summ_entropy(), summ_interval(), summ_moment(), summ_order(), summ_prob_true(), summ_pval(), summ_quantile(), summ_roc(), summ_separation(), summ_spread()

Examples

# "discrete" functions d_dis <- new_d(data.frame(x = 1:4, prob = c(0.4, 0.2, 0.3, 0.1)), "discrete") summ_hdr(d_dis, 0.3)
#> left right #> 1 1 1
summ_hdr(d_dis, 0.5)
#> left right #> 1 1 1 #> 2 3 3
summ_hdr(d_dis, 0.9)
#> left right #> 1 1 3
## Zero width interval at global mode summ_hdr(d_dis, 0)
#> left right #> 1 1 1
# "continuous" functions d_norm <- as_d(dnorm) summ_hdr(d_norm, 0.95)
#> left right #> 1 -1.960312 1.960312
## Zero width interval at global mode summ_hdr(d_norm, 0)
#> left right #> 1 -2.904343e-12 -2.904343e-12
# Works well with mixture distributions d_mix <- form_mix(list(as_d(dnorm), as_d(dnorm, mean = 5))) summ_hdr(d_mix, 0.95)
#> left right #> 1 -1.943712 1.980277 #> 2 3.019723 6.943712
# Plateaus d_unif <- as_d(dunif) ## Returns all support because of density "plateau" summ_hdr(d_unif, 0.1)
#> left right #> 1 0 1
# Draw HDR plot(d_mix)
region_draw(summ_hdr(d_mix, 0.95))