Functions for creating new pdqr-functions based on numeric sample or data frame describing distribution. They construct appropriate "x_tbl" metadata based on the input and then create pdqr-function (of corresponding pdqr class) defined by that "x_tbl".

```
new_p(x, type, ...)
new_d(x, type, ...)
new_q(x, type, ...)
new_r(x, type, ...)
```

x | Numeric vector or data frame with appropriate columns (see "Data frame input" section). |
---|---|

type | Type of pdqr-function. Should be one of "discrete" or "continuous". |

... | Extra arguments for density(). |

A pdqr-function of corresponding class ("p" for
`new_p()`

, etc.) and type.

Data frame input `x`

is treated as having enough information for
creating (including normalization of "y" column) an "x_tbl" metadata. For
more details see "Data frame input" section.

Numeric input is transformed into data frame which is then used as "x_tbl" metadata (for more details see "Numeric input" section):

If

`type`

is`"discrete"`

then`x`

is viewed as sample from distribution that can produce only values from`x`

. Input is tabulated and normalized to form "x_tbl" metadata.If

`type`

is`"continuous"`

then:If

`x`

has 1 element, output distribution represents a**dirac-like**distribution which is an approximation to singular dirac distribution.If

`x`

has more than 1 element, output distribution represents a**density estimation**with density() treating`x`

as sample.

If `x`

is a numeric vector, it is transformed into a data frame which is then
used as "x_tbl" metadata to create pdqr-function of
corresponding class.

First, all `NaN`

, `NA`

, and infinite values are removed with warnings. If
there are no elements left, error is thrown. Then data frame is created in
the way which depends on the `type`

argument.

**For "discrete" type** elements of filtered `x`

are:

Rounded to 10th digit to avoid numerical representation issues (see Note in

`==`

's help page).Tabulated (all unique values are counted). Output data frame has three columns: "x" with unique values, "prob" with normalized (divided by sum) counts, "cumprob" with cumulative sum of "prob" column.

**For "continuous" type** output data frame has columns "x", "y", "cumprob".
Choice of algorithm depends on the number of `x`

elements:

If

`x`

has 1 element, an "x_tbl" metadata describes**dirac-like**"continuous" pdqr-function. It is implemented as triangular peak with center at`x`

's value and width of`2e-8`

(see Examples). This is an approximation of singular dirac distribution. Data frame has columns "x" with value`c(x-1e-8, x, x+1e-8)`

, "y" with value`c(0, 1e8, 0)`

normalized to have total integral of "x"-"y" points of 1, "cumprob"`c(0, 0.5, 1)`

.If

`x`

has more than 1 element, it serves as input to density(x, ...) for density estimation (here arguments in`...`

of`new_*()`

serve as extra arguments to`density()`

). The output's "x" element is used as "x" column in output data frame. Column "y" is taken as "y" element of`density()`

output, normalized so that piecewise-linear function passing through "x"-"y" points has total integral of 1. Column "cumprob" has cumulative probability of piecewise-linear d-function.

If `x`

is a data frame, it should have numeric columns appropriate for
"x_tbl" metadata of input `type`

: "x", "prob" for "discrete"
`type`

and "x", "y" for "continuous" type ("cumprob" column will be computed
inside `new_*()`

). To become an appropriate "x_tbl" metadata, input data
frame is ordered in increasing order of "x" column and then **imputed** in
the way which depends on the `type`

argument.

**For "discrete" type**:

Values in column "x" are rounded to 10th digit to avoid numerical representation issues (see Note in

`==`

's help page).If there are duplicate values in "x" column, they are "squashed" into one having sum of their probability in "prob" column.

Column "prob" is normalized by its sum to have total sum of 1.

Column "cumprob" is computed as cumulative sum of "prob" column.

**For "continuous" type** column "y" is normalized so that piecewise-linear
function passing through "x"-"y" points has total integral of 1. Column
"cumprob" has cumulative probability of piecewise-linear d-function.

```
set.seed(101)
x <- rnorm(10)
# Type "discrete": `x` values are directly tabulated
my_d_dis <- new_d(x, "discrete")
meta_x_tbl(my_d_dis)
#> x prob cumprob
#> 1 -0.6749438 0.1 0.1
#> 2 -0.3260365 0.1 0.2
#> 3 -0.2232594 0.1 0.3
#> 4 -0.1127343 0.1 0.4
#> 5 0.2143595 0.1 0.5
#> 6 0.3107692 0.1 0.6
#> 7 0.5524619 0.1 0.7
#> 8 0.6187899 0.1 0.8
#> 9 0.9170283 0.1 0.9
#> 10 1.1739663 0.1 1.0
# Type "continuous": `x` serves as input to `density()`
my_d_con <- new_d(x, "continuous")
head(meta_x_tbl(my_d_con))
#> x y cumprob
#> 1 -1.670523 0.001394314 0.000000e+00
#> 2 -1.663008 0.001491008 1.084133e-05
#> 3 -1.655493 0.001597622 2.244656e-05
#> 4 -1.647979 0.001708323 3.486833e-05
#> 5 -1.640464 0.001826333 4.814947e-05
#> 6 -1.632949 0.001952627 6.234855e-05
# Data frame input
## Values in "prob" column will be normalized automatically
my_p_dis <- new_p(data.frame(x = 1:4, prob = 1:4), "discrete")
## As are values in "y" column
my_p_con <- new_p(data.frame(x = 1:3, y = c(0, 10, 0)), "continuous")
# Using bigger bandwidth in `density()`
my_d_con_2 <- new_d(x, "continuous", adjust = 2)
plot(my_d_con, main = "Comparison of density bandwidths")
# Dirac-like "continuous" pdqr-function is created if `x` is a single number
meta_x_tbl(new_d(1, "continuous"))
#> x y cumprob
#> 1 1 0e+00 0.0
#> 2 1 1e+08 0.5
#> 3 1 0e+00 1.0
```