R/summ_classmetric.R
summ_classmetric.Rd
Compute metric of the following one-dimensional binary classification setup:
any x
value not more than threshold
value is classified as "negative"; if
strictly greater - "positive". Classification metrics are computed based on
two pdqr-functions: f
, which represents the distribution of values which
should be classified as "negative" ("true negative"), and g
- the same
for "positive" ("true positive").
summ_classmetric(f, g, threshold, method = "F1")
summ_classmetric_df(f, g, threshold, method = "F1")
f | A pdqr-function of any type and class. Represents distribution of "true negative" values. |
---|---|
g | A pdqr-function of any type and class. Represents distribution of "true positive" values. |
threshold | A numeric vector of classification threshold(s). |
method | Method of classification metric (might be a vector for
|
summ_classmetric()
returns a numeric vector, of the same length as
threshold
, representing classification metrics for different threshold
values.
summ_classmetric_df()
returns a data frame with rows corresponding to
threshold
values. First column is "threshold" (with threshold
values),
and all other represent classification metric for every input method (see
Examples).
Binary classification setup used here to compute metrics is a
simplified version of the most common one, when there is a finite set of
already classified objects. Usually, there are N
objects which are truly
"negative" and P
truly "positive" ones. Values N
and P
can vary, which
often results in class imbalance. However, in current setup both N
and
P
are equal to 1 (total probability of f
and g
).
In common setup, classification of all N + P
objects results into the
following values: "TP" (number of truly "positive" values classified as
"positive"), "TN" (number of negatives classified as "negative"), "FP"
(number of negatives falsely classified as "positive"), and "FN" (number of
positives falsely classified as "negative"). In current setup all those
values are equal to respective "rates" (because N
and P
are both equal to
1).
Both summ_classmetric()
and summ_classmetric_df()
allow aliases to some
classification metrics (for readability purposes).
Following classification metrics are available:
Simple metrics:
True positive rate, method
"TPR" (aliases: "TP", "sensitivity",
"recall"): proportion of actual positives correctly classified as such.
Computed as 1 - as_p(g)(threshold)
.
True negative rate, method
"TNR" (aliases: "TN", "specificity"):
proportion of actual negatives correctly classified as such. Computed as
as_p(f)(threshold)
.
False positive rate, method
"FPR" (aliases: "FP", "fall-out"):
proportion of actual negatives falsely classified as "positive". Computed
as 1 - as_p(f)(threshold)
.
False negative rate, method
"FNR" (aliases: "FN", "miss_rate"):
proportion of actual positives falsely classified as "negative". Computed
as as_p(g)(threshold)
.
Positive predictive value, method
"PPV" (alias: "precision"):
proportion of output positives that are actually "positive". Computed as
TP / (TP + FP)
.
Negative predictive value, method
"NPV": proportion of output
negatives that are actually "negative". Computed as TN / (TN + FN)
.
False discovery rate, method
"FDR": proportion of output positives
that are actually "negative". Computed as FP / (TP + FP)
.
False omission rate, method
"FOR": proportion of output negatives
that are actually "positive". Computed as FN / (TN + FN)
.
Positive likelihood, method
"LR+": measures how much the odds of
being "positive" increase when value is classified as "positive".
Computed as TPR / (1 - TNR)
.
Negative likelihood, method
"LR-": measures how much the odds of
being "positive" decrease when value is classified as "negative".
Computed as (1 - TPR) / TNR
.
Combined metrics (for all, except "error rate", bigger value represents better classification performance):
Accuracy, method
"Acc" (alias: "accuracy"): proportion of total
number of input values that were correctly classified. Computed as (TP + TN) / 2
(here 2 is used because of special classification setup,
TP + TN + FP + FN = 2
).
Error rate, method
"ER" (alias: "error_rate"): proportion of
total number of input values that were incorrectly classified. Computed
as (FP + FN) / 2
.
Geometric mean, method
"GM": geometric mean of TPR and TNR.
Computed as sqrt(TPR * TNR)
.
F1 score, method
"F1": harmonic mean of PPV and TPR. Computed as
2*TP / (2*TP + FP + FN)
.
Optimized precision, method
"OP": accuracy, penalized for
imbalanced class performance. Computed as Acc - abs(TPR - TNR) / (TPR + TNR)
.
Matthews correlation coefficient, method
"MCC" (alias: "corr"):
correlation between the observed and predicted classifications. Computed
as (TP*TN - FP*FN) / sqrt((TP+FP) * (TN+FN))
(here equalities TP+FN = 1
and TN+FP = 1
are used to simplify formula).
Youden’s index, method
"YI" (aliases: "youden", "informedness"):
evaluates the discriminative power of the classification setup. Computed
as TPR + TNR - 1
.
Markedness, method
"MK" (alias: "markedness"): evaluates the
predictive power of the classification setup. Computed as PPV + NPV - 1
.
Jaccard, method
"Jaccard": accuracy ignoring correct classification
of negatives. Computed as TP / (TP + FP + FN)
.
Diagnostic odds ratio, method
"DOR" (alias: "odds_ratio"): ratio
between positive and negative likelihoods. Computed as "LR+" / "LR-"
.
summ_separation for computing optimal separation threshold (which
is symmetrical with respect to f
and g
).
Other summary functions:
summ_center()
,
summ_distance()
,
summ_entropy()
,
summ_hdr()
,
summ_interval()
,
summ_moment()
,
summ_order()
,
summ_prob_true()
,
summ_pval()
,
summ_quantile()
,
summ_roc()
,
summ_separation()
,
summ_spread()
d_unif <- as_d(dunif)
d_norm <- as_d(dnorm)
t_vec <- c(0, 0.5, 0.75, 1.5)
summ_classmetric(d_unif, d_norm, threshold = t_vec, method = "F1")
#> [1] 0.4000000 0.3412008 0.3069521 0.1252455summ_classmetric(d_unif, d_norm, threshold = t_vec, method = "Acc")
#> [1] 0.2500000 0.4042686 0.4883134 0.5334032#> threshold F1 Acc
#> 1 0.00 0.4000000 0.2500000
#> 2 0.50 0.3412008 0.4042686
#> 3 0.75 0.3069521 0.4883134
#> 4 1.50 0.1252455 0.5334032
# Using method aliases
summ_classmetric_df(
d_unif, d_norm, threshold = t_vec, method = c("TPR", "sensitivity")
)
#> threshold TPR sensitivity
#> 1 0.00 0.50000000 0.50000000
#> 2 0.50 0.30853717 0.30853717
#> 3 0.75 0.22662682 0.22662682
#> 4 1.50 0.06680635 0.06680635