summ_hdr() computes a Highest Density Region (HDR) of some pdqr-function
for a supplied
level: a union of (closed) intervals total probability of
which is not less than
level and probability/density at any point inside it
is bigger than some threshold (which should be maximum one with a property
of HDR having total probability not less than
level). This also represents
a set of intervals with the lowest total width among all sets with total
probability not less than a
summ_hdr(f, level = 0.95)
A pdqr-function representing distribution.
A desired lower bound for a total probability of an output set of intervals.
A data frame with one row representing one closed interval of HDR and the following columns:
left <dbl> : Left end of intervals.
right <dbl> : Right end of intervals.
General algorithm of
summ_hdr() consists from two steps:
Find "target height". That is a value of probability/density which
divides all support into two sets: the one with
probability/density not less than target height (it is a desired HDR) and the
other - with strictly less. The first set should also have total probability
not less than
Form a HDR as a set of closed intervals.
f has "discrete" type, target height is computed by looking at "x"
values of "x_tbl" metadata in order of decreasing probability
until their total probability is not less than
level. After that, all "x"
values with probability not less than height are considered to form a HDR.
Output is formed as a set of closed intervals (i.e. both edges included)
inside of which lie all HDR "x" elements and others - don't.
f has "continuous" type, target height is estimated as
Y = d_f(X) distribution, where
d_f is d-function
as_d(f) in other words) and
X is a random
variable represented by
Y has a distribution of
density values and its
1-level quantile is a target height. After that, HDR
is formed as a set of intervals with positive width (if
level is more
than 0, see Notes) inside which density is not less than target height.
level is 0, output has one interval of zero width at point of global mode.
level is 1, output has one interval equal to support.
Computation of target height in case of "continuous" type is approximate
which in some extreme cases (for example, like winsorized
distributions) can lead to HDR having total probability very approximate to
and even slightly lower than
If d-function has "plateaus" (consecutive values with equal
probability/density) at computed target height, total probability of HDR can
be considerably bigger than
level (see examples). However, this aligns with
HDR definition, as density values should be not less than target height
and total probability should be not less than
region_*() family of functions for working with output
summ_interval() for computing of single interval summary of distribution.
Other summary functions:
# "discrete" functions d_dis <- new_d(data.frame(x = 1:4, prob = c(0.4, 0.2, 0.3, 0.1)), "discrete") summ_hdr(d_dis, 0.3)#> left right #> 1 1 1summ_hdr(d_dis, 0.5)#> left right #> 1 1 1 #> 2 3 3summ_hdr(d_dis, 0.9)#> left right #> 1 1 3# Zero width interval at global mode summ_hdr(d_dis, 0)#> left right #> 1 1 1# "continuous" functions d_norm <- as_d(dnorm) summ_hdr(d_norm, 0.95)#> left right #> 1 -1.960312 1.960312# Zero width interval at global mode summ_hdr(d_norm, 0)#> left right #> 1 -2.904343e-12 -2.904343e-12# Works well with mixture distributions d_mix <- form_mix(list(as_d(dnorm), as_d(dnorm, mean = 5))) summ_hdr(d_mix, 0.95)#> left right #> 1 -1.943712 1.980277 #> 2 3.019723 6.943712# Plateaus d_unif <- as_d(dunif) # Returns all support because of density "plateau" summ_hdr(d_unif, 0.1)#> left right #> 1 0 1# Draw HDR plot(d_mix)region_draw(summ_hdr(d_mix, 0.95))