Convert some function to be a proper pdqr-function of specific class, i.e. a function describing distribution with finite support and finite values of probability/density.

as_p(f, ...)

# S3 method for default
as_p(f, support = NULL, ..., n_grid = 10001)

# S3 method for pdqr
as_p(f, ...)

as_d(f, ...)

# S3 method for default
as_d(f, support = NULL, ..., n_grid = 10001)

# S3 method for pdqr
as_d(f, ...)

as_q(f, ...)

# S3 method for default
as_q(f, support = NULL, ..., n_grid = 10001)

# S3 method for pdqr
as_q(f, ...)

as_r(f, ...)

# S3 method for default
as_r(f, support = NULL, ..., n_grid = 10001,
  n_sample = 10000, args_new = list())

# S3 method for pdqr
as_r(f, ...)

Arguments

f

Appropriate function to be converted (see Details).

...

Extra arguments to f.

support

Numeric vector with two increasing elements describing desired support of output. If NULL or any its value is NA, detection is done using specific algorithms (see Details).

n_grid

Number of grid points at which f will be evaluated (see Details). Bigger values lead to better approximation precision, but worse memory usage and evaluation speed (direct and in summ_*() functions).

n_sample

Number of points to sample from f inside as_r().

args_new

List of extra arguments for new_d() to control density() inside as_r().

Value

A pdqr-function of corresponding class.

Details

General purpose of as_*() functions is to create a proper pdqr-function of desired class from input which doesn't satisfy these conditions. Here is described sequence of steps which are taken to achieve that goal.

If f is already a pdqr-function, as_*() functions properly update it to have specific class. They take input's "x_tbl" metadata and type to use with corresponding new_*() function. For example, as_p(f) in case of pdqr-function f is essentially the same as new_p(x = meta_x_tbl(f), type = meta_type(f)).

If f is a function describing "honored" distribution, it is detected and output is created in predefined way taking into account extra arguments in .... For more details see "Honored distributions" section.

If f is some other unknown function, as_*() functions use heuristics for approximating input distribution with a "proper" pdqr-function. Outputs of as_*() can be only pdqr-functions of type "continuous" (because of issues with support detection). It is assumed that f returns values appropriate for desired output class of as_*() function and output type "continuous". For example, input for as_p() should return values of some continuous cumulative distribution function (monotonically non-increasing values from 0 to 1). To manually create function of type "discrete", supply data frame input describing it to appropriate new_*() function.

General algorithm of how as_*() functions work for unknown function is as follows:

  • Detect support. See "Support detection" section for more details.

  • Create data frame input for new_*(). The exact process differs:

    • In as_p() equidistant grid of n_grid points is created inside detected support. After that, input's values at the grid is taken as reference points of cumulative distribution function used to approximate density at that same grid. This method showed to work more reliably in case density goes to infinity. That grid and density values are used as "x" and "y" columns of data frame input for new_p().

    • In as_d() "x" column of data frame is the same equidistant grid is taken as in as_p(). "y" column is taken as input's values at this grid after possibly imputing infinity values. This imputation is done by taking maximum from left and right linear extrapolations on mentioned grid.

    • In as_q(), at first inverse of input f function is computed on [0; 1] interval. It is done by approximating it with piecewise-linear function on [0; 1] equidistant grid with n_grid points, imputing infinity values (which ensures finite support), and computing inverse of approximation. This inverse of f is used to create data frame input with as_p().

    • In as_r() at first d-function with new_d() is created based on the same sample used for support detection and extra arguments supplied as list in args_new argument. In other words, density estimation is done based on sample, generated from input f. After that, its values are used to create data frame with as_d().

  • Use appropriate new_*() function with data frame from previous step and type = "continuous". This step implies that all tails outside detected support are trimmed and data frame is normalized to represent proper piecewise-linear density.

Honored distributions

For efficient workflow, some commonly used distributions are recognized as special ("honored"). Those receive different treatment in as_*() functions.

Basically, there is a manually selected list of "honored" distributions with all their information enough to detect them. Currently that list has all common univariate distributions from 'stats' package, i.e. all except multinomial and "less common distributions of test statistics".

"Honored" distribution is recognized only if f is one of p*(), d*(), q*(), or r*() function describing honored distribution and is supplied as variable with original name. For example, as_d(dunif) will be treated as "honored" distribution but as_d(function(x) {dunif(x)}) will not.

After it is recognized that input f represents "honored" distribution, its support is computed based on predefined rules. Those take into account special features of distribution (like infinite support or infinite density values) and supplied extra arguments in .... Usually output support "loses" only around 1e-6 probability on each infinite tail.

After that, for "discrete" type output new_d() is used for appropriate data frame input and for "continuous" - as_d() with appropriate d*() function and support. D-function is then converted to desired class with as_*().

Support detection

In case input is a function without any extra information, as_*() functions must know which finite support its output should have. User can supply desired support directly with support argument, which can also be NULL (mean automatic detection of both edges) or have NA to detect only those edges.

Support is detected in order to preserve as much information as practically reasonable. Exact methods differ:

  • In as_p() support is detected as values at which input function is equal to 1e-6 (left edge detection) and 1 - 1e-6 (right edge), which means "losing" 1e-6 probability on each tail. Note that those values are searched inside [-10^100; 10^100] interval.

  • In as_d(), at first an attempt at finding one point of non-zero density is made by probing 10000 points spread across wide range of real line (approximately from -1e7 to 1e7). If input's value at all of them is zero, error is thrown. After finding such point, cumulative distribution function is made by integrating input with integrate() using found point as reference (without this there will be poor accuracy of integrate()). Created CDF function is used to find 1e-6 and 1 - 1e-6 quantiles as in as_p(), which serve as detected support.

  • In as_q() quantiles for 0 and 1 are probed for being infinite. If they are, 1e-6 and 1 - 1e-6 quantiles are used respectively instead of infinite values to form detected support.

  • In as_r() sample of size n_sample is generated and detected support is its range stretched by mean difference of sorted points (to account for possible tails at which points were not generated). Note that this means that original input f "demonstrates its randomness" only once inside as_r(), with output then used for approximation of "original randomness".

See also

pdqr_approx_error() for computing approximation errors compared to some reference function (usually input to as_*() family).

Examples

# Convert existing "proper" pdqr-function set.seed(101) x <- rnorm(10) my_d <- new_d(x, "continuous") my_p <- as_p(my_d) # Convert "honored" function to be a proper pdqr-function. To use this # option, supply originally named function. p_unif <- as_p(punif) r_beta <- as_r(rbeta, shape1 = 2, shape2 = 2) d_pois <- as_d(dpois, lambda = 5) ## `pdqr_approx_error()` computes pdqr approximation error summary(pdqr_approx_error(as_d(dnorm), dnorm))
#> grid error abserror #> Min. :-4.753 Min. :-7.979e-07 Min. :9.900e-12 #> 1st Qu.:-2.377 1st Qu.:-4.000e-07 1st Qu.:1.975e-09 #> Median : 0.000 Median :-5.552e-08 Median :5.552e-08 #> Mean : 0.000 Mean :-2.104e-07 Mean :2.104e-07 #> 3rd Qu.: 2.377 3rd Qu.:-1.975e-09 3rd Qu.:4.000e-07 #> Max. : 4.753 Max. :-9.900e-12 Max. :7.979e-07
## This will work as if input is unkonw function because of unsupported ## variable name my_runif <- function(n) { runif(n) } r_unif_2 <- as_r(my_runif) plot(as_d(r_unif_2))
# Convert some other function to be a "proper" pdqr-function my_d_quadr <- as_d(function(x) { 0.75 * (1 - x^2) }, support = c(-1, 1)) # Support detection unknown <- function(x) { dnorm(x, mean = 1) } ## Completely automatic support detection as_d(unknown)
#> Density function of continuous type #> Support: ~[-37.36926, 39.36951] (10000 intervals)
## Semi-automatic support detection as_d(unknown, support = c(-4, NA))
#> Density function of continuous type #> Support: ~[-4, 39.36951] (10000 intervals)
as_d(unknown, support = c(NA, 5))
#> Density function of continuous type #> Support: ~[-37.36926, 5] (10000 intervals)
## If support is very small and very distant from zero, it probably won't ## get detected in `as_d()` (throwing a relevant error) if (FALSE) { as_d(function(x) { dnorm(x, mean = 10000, sd = 0.1) }) } # Using different level of granularity as_d(unknown, n_grid = 1001)
#> Density function of continuous type #> Support: ~[-37.36926, 39.36951] (1000 intervals)