These functions provide ways of working with a region: a data frame with
numeric "left" and "right" columns, each row of which represents a unique
finite interval (open, either type of half-open, or closed). Values of "left"
and "right" columns should create an "ordered" set of intervals:
left[1] <= right[1] <= left[2] <= right[2] <= ...
(intervals with zero
width are accepted). Originally, region_*()
functions were designed to work
with output of summ_hdr()
and summ_interval()
, but can be used for any
data frame which satisfies the definition of a region.
region_is_in(region, x, left_closed = TRUE, right_closed = TRUE)
region_prob(region, f, left_closed = TRUE, right_closed = TRUE)
region_height(region, f, left_closed = TRUE, right_closed = TRUE)
region_width(region)
region_distance(region, region2, method = "Jaccard")
region_draw(region, col = "blue", alpha = 0.2)
region | A data frame representing region. |
---|---|
x | Numeric vector to be tested for being inside region. |
left_closed | A single logical value representing whether to treat left ends of intervals as their parts. |
right_closed | A single logical value representing whether to treat right ends of intervals as their parts. |
f | A pdqr-function. |
region2 | A data frame representing region. |
method | Method for computing distance between regions in
|
col | Single color of rectangles to be used. Should be appropriate for
|
alpha | Single number representing factor modifying the opacity alpha; typically in [0; 1]. |
region_is_in()
returns a logical vector (with length equal to
length of x
) representing whether certain element of x
is inside a
region.
region_prob()
returns a single number between 0 and 1 representing total
probability of region.
region_height()
returns a single number representing a height of a region
with respect to f
, i.e. minimum value that corresponding d-function can
return based on relevant points inside a region.
region_width()
returns a single number representing total width of a
region.
region_draw()
draws colored rectangles filling region
intervals.
region_is_in()
tests each value of x
for being inside interval.
In other words, if there is a row for which element of x
is between "left"
and "right" value (respecting left_closed
and right_closed
options),
output for that element will be TRUE
. Note that for zero-width
intervals one of left_closed
or right_closed
being TRUE
is enough to
accept that point as "in region".
region_prob()
computes total probability of region according to
pdqr-function f
. If f
has "discrete" type, output is
computed as sum of probabilities for all "x" values from "x_tbl" metadata which lie inside a region (respecting left_closed
and right_closed
options while using region_is_in()
). If f
has
"continuous" type, output is computed as integral of density over a region
(*_closed
options having any effect).
region_height()
computes "height" of a region (with respect to f
):
minimum value of corresponding to f
d-function can return based on relevant
points inside a region. If f
has "discrete" type, those relevant points are
computed as "x" values from "x_tbl" metadata which lie inside a region (if
there are no such points, output is 0). If f
has "continuous" type, the
whole intervals are used as relevant points. The notion of "height" comes
from summ_hdr()
function: if region
is summ_hdr(f, level)
for some
level
, then region_height(region, f)
is what is called in summ_hdr()
's
docs as "target height" of HDR. That is, a maximum value of d-function for
which a set consisting from points at which d-function has values not less
than target height and total probability of the set being not less than
level
.
region_width()
computes total width of a region, i.e. sum of differences
between "right" and "left" columns.
region_distance()
computes distance between a pair of regions. As in
summ_distance()
, it is a single non-negative number representing how much
two regions differ from one another (bigger values indicate bigger
difference). Argument method
represents method of computing distance.
Method "Jaccard" computes Jaccard distance: one minus ratio of intersection
width and union width. Other methods come from summ_distance()
and
represent distance between regions as probability distributions:
If total width of region is zero (i.e. it consists only from points), distribution is a uniform discrete one based on points from region.
If total width is positive, then distribution is a uniform continuous one based on intervals with positive width.
region_draw()
draws (on current plot) intervals stored in region
as
colored rectangles vertically starting from zero and ending in the top of the
plot (technically, at "y" value of 2e8
).
summ_hdr()
for computing of Highest Density Region.
summ_interval()
for computing of single interval summary of distribution.
# Type "discrete"
d_binom <- as_d(dbinom, size = 10, prob = 0.7)
hdr_dis <- summ_hdr(d_binom, level = 0.6)
region_is_in(hdr_dis, 0:10)
#> [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE## This should be not less than 0.6
region_prob(hdr_dis, d_binom)
#> [1] 0.7004233region_height(hdr_dis, d_binom)
#> [1] 0.2001209region_width(hdr_dis)
#> [1] 2
# Type "continuous"
d_norm <- as_d(dnorm)
hdr_con <- summ_hdr(d_norm, level = 0.95)
region_is_in(hdr_con, c(-Inf, -2, 0, 2, Inf))
#> [1] FALSE FALSE TRUE FALSE FALSE## This should be approximately equal to 0.95
region_prob(hdr_con, d_norm)
#> [1] 0.9500426## This should be equal to `d_norm(hdr_con[["left"]][1])`
region_height(hdr_con, d_norm)
#> [1] 0.05840531region_width(hdr_con)
#> [1] 3.920624
# Usage of `*_closed` options
region <- data.frame(left = 1, right = 3)
## Closed intervals
region_is_in(region, 1:3)
#> [1] TRUE TRUE TRUE## Open from left, closed from right
region_is_in(region, 1:3, left_closed = FALSE)
#> [1] FALSE TRUE TRUE## Closed from left, open from right
region_is_in(region, 1:3, right_closed = FALSE)
#> [1] TRUE TRUE FALSE## Open intervals
region_is_in(region, 1:3, left_closed = FALSE, right_closed = FALSE)
#> [1] FALSE TRUE FALSE
# Handling of intervals with zero width
region <- data.frame(left = 1, right = 1)
## If at least one of `*_closed` options is `TRUE`, 1 will be considered as
## "in a region"
region_is_in(region, 1)
#> [1] TRUEregion_is_in(region, 1, left_closed = FALSE)
#> [1] TRUEregion_is_in(region, 1, right_closed = FALSE)
#> [1] TRUE## Only this will return `FALSE`
region_is_in(region, 1, left_closed = FALSE, right_closed = FALSE)
#> [1] FALSE
# Distance between regions
region1 <- data.frame(left = c(0, 2), right = c(1, 2))
region2 <- data.frame(left = 0.5, right = 1.5)
region_distance(region1, region2, method = "Jaccard")
#> [1] 0.6666667region_distance(region1, region2, method = "KS")
#> [1] 0.5