Convert some function to be a proper pdqr-function of specific class, i.e. a function describing distribution with finite support and finite values of probability/density.
as_p(f, ...)
# S3 method for default
as_p(f, support = NULL, ..., n_grid = 10001)
# S3 method for pdqr
as_p(f, ...)
as_d(f, ...)
# S3 method for default
as_d(f, support = NULL, ..., n_grid = 10001)
# S3 method for pdqr
as_d(f, ...)
as_q(f, ...)
# S3 method for default
as_q(f, support = NULL, ..., n_grid = 10001)
# S3 method for pdqr
as_q(f, ...)
as_r(f, ...)
# S3 method for default
as_r(f, support = NULL, ..., n_grid = 10001,
n_sample = 10000, args_new = list())
# S3 method for pdqr
as_r(f, ...)
f | Appropriate function to be converted (see Details). |
---|---|
... | Extra arguments to |
support | Numeric vector with two increasing elements describing desired
support of output. If |
n_grid | Number of grid points at which |
n_sample | Number of points to sample from |
args_new | List of extra arguments for |
A pdqr-function of corresponding class.
General purpose of as_*()
functions is to create a proper
pdqr-function of desired class from input which doesn't satisfy these
conditions. Here is described sequence of steps which are taken to achieve
that goal.
If f
is already a pdqr-function, as_*()
functions properly update it
to have specific class. They take input's "x_tbl" metadata
and type to use with corresponding new_*()
function. For example, as_p(f)
in case of pdqr-function f
is essentially
the same as new_p(x = meta_x_tbl(f), type = meta_type(f))
.
If f
is a function describing "honored" distribution, it is detected
and output is created in predefined way taking into account extra arguments
in ...
. For more details see "Honored distributions" section.
If f
is some other unknown function, as_*()
functions use heuristics
for approximating input distribution with a "proper" pdqr-function. Outputs
of as_*()
can be only pdqr-functions of type "continuous" (because of
issues with support detection). It is assumed that f
returns values
appropriate for desired output class of as_*()
function and output type
"continuous". For example, input for as_p()
should return values of some
continuous cumulative distribution function (monotonically non-increasing
values from 0 to 1). To manually create function of type "discrete", supply
data frame input describing it to appropriate new_*()
function.
General algorithm of how as_*()
functions work for unknown function is as
follows:
Detect support. See "Support detection" section for more details.
Create data frame input for new_*()
. The exact process differs:
In as_p()
equidistant grid of n_grid
points is created inside
detected support. After that, input's values at the grid is taken as
reference points of cumulative distribution function used to
approximate density at that same grid. This method showed to work more
reliably in case density goes to infinity. That grid and density values
are used as "x" and "y" columns of data frame input for new_p()
.
In as_d()
"x" column of data frame is the same equidistant grid is
taken as in as_p()
. "y" column is taken as input's values at this grid
after possibly imputing infinity values. This imputation is done by
taking maximum from left and right linear extrapolations on mentioned
grid.
In as_q()
, at first inverse of input f
function is computed on [0;
1] interval. It is done by approximating it with piecewise-linear
function on [0; 1] equidistant grid with n_grid
points, imputing
infinity values (which ensures finite support), and computing inverse of
approximation. This inverse of f
is used to create data frame input
with as_p()
.
In as_r()
at first d-function with new_d()
is created based on the
same sample used for support detection and extra arguments supplied as
list in args_new
argument. In other words, density estimation is done
based on sample, generated from input f
. After that, its values are
used to create data frame with as_d()
.
Use appropriate new_*()
function with data frame from previous step
and type = "continuous"
. This step implies that all tails outside detected
support are trimmed and data frame is normalized to represent proper
piecewise-linear density.
For efficient workflow, some commonly used distributions are recognized as
special ("honored"). Those receive different treatment in as_*()
functions.
Basically, there is a manually selected list of "honored" distributions with all their information enough to detect them. Currently that list has all common univariate distributions from 'stats' package, i.e. all except multinomial and "less common distributions of test statistics".
"Honored" distribution is recognized only if f
is one of p*()
, d*()
,
q*()
, or r*()
function describing honored distribution and is supplied as
variable with original name. For example, as_d(dunif)
will be treated as
"honored" distribution but as_d(function(x) {dunif(x)})
will not.
After it is recognized that input f
represents "honored" distribution,
its support is computed based on predefined rules. Those take into
account special features of distribution (like infinite support or infinite
density values) and supplied extra arguments in ...
. Usually output support
"loses" only around 1e-6
probability on each infinite tail.
After that, for "discrete" type output new_d()
is used for appropriate data
frame input and for "continuous" - as_d()
with appropriate d*()
function
and support. D-function is then converted to desired class with as_*()
.
In case input is a function without any extra information, as_*()
functions
must know which finite support its output should have. User can supply
desired support directly with support
argument, which can also be NULL
(mean automatic detection of both edges) or have NA
to detect only those
edges.
Support is detected in order to preserve as much information as practically reasonable. Exact methods differ:
In as_p()
support is detected as values at which input function is equal
to 1e-6
(left edge detection) and 1 - 1e-6
(right edge), which means
"losing" 1e-6
probability on each tail. Note that those values are
searched inside [-10^100; 10^100] interval.
In as_d()
, at first an attempt at finding one point of non-zero density
is made by probing 10000 points spread across wide range of real line
(approximately from -1e7
to 1e7
). If input's value at all of them is
zero, error is thrown. After finding such point, cumulative distribution
function is made by integrating input with integrate()
using found point as reference (without this there will be poor accuracy of
integrate()
). Created CDF function is used to find 1e-6
and 1 - 1e-6
quantiles as in as_p()
, which serve as detected support.
In as_q()
quantiles for 0 and 1 are probed for being infinite. If they
are, 1e-6
and 1 - 1e-6
quantiles are used respectively instead of
infinite values to form detected support.
In as_r()
sample of size n_sample
is generated and detected support is
its range stretched by mean difference of sorted points (to account for
possible tails at which points were not generated). Note that this means
that original input f
"demonstrates its randomness" only once inside
as_r()
, with output then used for approximation of "original randomness".
pdqr_approx_error()
for computing approximation errors compared to
some reference function (usually input to as_*()
family).
# Convert existing "proper" pdqr-function
set.seed(101)
x <- rnorm(10)
my_d <- new_d(x, "continuous")
my_p <- as_p(my_d)
# Convert "honored" function to be a proper pdqr-function. To use this
# option, supply originally named function.
p_unif <- as_p(punif)
r_beta <- as_r(rbeta, shape1 = 2, shape2 = 2)
d_pois <- as_d(dpois, lambda = 5)
## `pdqr_approx_error()` computes pdqr approximation error
summary(pdqr_approx_error(as_d(dnorm), dnorm))
#> grid error abserror
#> Min. :-4.753 Min. :-7.979e-07 Min. :9.900e-12
#> 1st Qu.:-2.377 1st Qu.:-4.000e-07 1st Qu.:1.975e-09
#> Median : 0.000 Median :-5.552e-08 Median :5.552e-08
#> Mean : 0.000 Mean :-2.104e-07 Mean :2.104e-07
#> 3rd Qu.: 2.377 3rd Qu.:-1.975e-09 3rd Qu.:4.000e-07
#> Max. : 4.753 Max. :-9.900e-12 Max. :7.979e-07
## This will work as if input is unkonw function because of unsupported
## variable name
my_runif <- function(n) {
runif(n)
}
r_unif_2 <- as_r(my_runif)
plot(as_d(r_unif_2))
# Convert some other function to be a "proper" pdqr-function
my_d_quadr <- as_d(function(x) {
0.75 * (1 - x^2)
}, support = c(-1, 1))
# Support detection
unknown <- function(x) {
dnorm(x, mean = 1)
}
## Completely automatic support detection
as_d(unknown)
#> Density function of continuous type
#> Support: ~[-37.36926, 39.36951] (10000 intervals)#> Density function of continuous type
#> Support: ~[-4, 39.36951] (10000 intervals)#> Density function of continuous type
#> Support: ~[-37.36926, 5] (10000 intervals)
## If support is very small and very distant from zero, it probably won't
## get detected in `as_d()` (throwing a relevant error)
if (FALSE) {
as_d(function(x) {
dnorm(x, mean = 10000, sd = 0.1)
})
}
# Using different level of granularity
as_d(unknown, n_grid = 1001)
#> Density function of continuous type
#> Support: ~[-37.36926, 39.36951] (1000 intervals)