Title: | K's "Don't Repeat Yourself"-Collection |
---|---|
Description: | A collection of personal helper functions to avoid redundancy in the spirit of the "Don't repeat yourself" principle of software development (<https://en.wikipedia.org/wiki/Don%27t_repeat_yourself>). |
Authors: | Lorenz A. Kapsner [cre, aut, cph]
|
Maintainer: | Lorenz A. Kapsner <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.0.2 |
Built: | 2025-02-06 05:27:14 UTC |
Source: | https://github.com/kapsner/kdry |
Data transformation: Converts a matrix
to data.table
and
encodes categorical variables as factor
.
dtr_matrix2df(matrix, cat_vars = NULL)
dtr_matrix2df(matrix, cat_vars = NULL)
matrix |
An R |
cat_vars |
A character vector with colnames that should be converted to
|
A data.table
is returned.
data("iris") mat <- data.matrix(iris) dataset <- dtr_matrix2df(mat) str(dataset) dataset <- dtr_matrix2df(mat, cat_vars = "Species") str(dataset)
data("iris") mat <- data.matrix(iris) dataset <- dtr_matrix2df(mat) str(dataset) dataset <- dtr_matrix2df(mat, cat_vars = "Species") str(dataset)
Return colnames in a table with index numbers.
icolnames(df)
icolnames(df)
df |
A data.frame object. |
A data.table
with the two columns index
and name
is returned.
data("iris") icolnames(iris)
data("iris") icolnames(iris)
Helper function to append an R list.
list.append(main_list, append_list, ...)
list.append(main_list, append_list, ...)
main_list |
A list, to which another should be appended. |
append_list |
A list to append to |
... |
Further arguments passed to |
This function is a save wrapper around utils::modifyLists
to
combine lists as it checks for the input types and only appends the new
list if its length is greater than 0.
A list
is returned.
l1 <- list("a" = 1, "b" = 2) l2 <- list("c" = 3, "d" = 4) list.append(l1, l2)
l1 <- list("a" = 1, "b" = 2) l2 <- list("c" = 3, "d" = 4) list.append(l1, l2)
Helper function to update items in an R list.
list.update(main_list, new_list, ...)
list.update(main_list, new_list, ...)
main_list |
A list, which items should be updated. |
new_list |
A list with new values of items from |
... |
Further arguments passed to |
This function is a save wrapper around utils::modifyLists
to
update items in R lists as it checks for the input types and only accepts
named lists.
A list
is returned.
l1 <- list("a" = 1, "b" = 2) l2 <- list("a" = 3, "b" = 4) list.update(l1, l2)
l1 <- list("a" = 1, "b" = 2) l2 <- list("a" = 3, "b" = 4) list.update(l1, l2)
Miscellaneous helper function to type-save catch arguments passed with R's ellipsis ("...").
misc_argument_catcher(...)
misc_argument_catcher(...)
... |
Named arguments passed to a function. |
This function aims at catching arguments that have been passed to an R function using R's ellipsis ("..."). Its purpos is to catch these arguments even in the case, if a list with arguments was provided to the ellipsis.
A list
is returned.
misc_argument_catcher(a = 1) misc_argument_catcher(a = 1, b = 2, c = 3, d = "car") misc_argument_catcher(list(a = 1, b = 2, c = 3, d = "car")) misc_argument_catcher(list(a = 1, b = 2, c = 3, d = "car"), f = 9)
misc_argument_catcher(a = 1) misc_argument_catcher(a = 1, b = 2, c = 3, d = "car") misc_argument_catcher(list(a = 1, b = 2, c = 3, d = "car")) misc_argument_catcher(list(a = 1, b = 2, c = 3, d = "car"), f = 9)
Miscellaneous helper function to detect items in an object with duplicated names, e.g. in named vectors or named lists.
misc_duplicated_by_names(object, ...)
misc_duplicated_by_names(object, ...)
object |
An R object that has names. |
... |
Named arguments passed on to |
Returns a logical vector of length(object)
with TRUE
indicating
the identified items with duplicated names.
misc_duplicated_by_names(list(a = 1, a = 1))
misc_duplicated_by_names(list(a = 1, a = 1))
Recursively copying directories and subdirectories.
misc_recursive_copy(source_dir, target_dir, force = FALSE)
misc_recursive_copy(source_dir, target_dir, force = FALSE)
source_dir |
A character string. The path to the directory to be copied. |
target_dir |
A character string. The target path. |
force |
A boolean. If |
This function has no return value.
if (interactive()) { d1 <- file.path(tempdir(), "folder1") d2 <- file.path(d1, "folder2") d3 <- file.path(tempdir(), "new_folder") f1 <- file.path(d1, "file.one") dir.create(d2, recursive = TRUE) file.create(f1) misc_recursive_copy(d1, d3) }
if (interactive()) { d1 <- file.path(tempdir(), "folder1") d2 <- file.path(d1, "folder2") d3 <- file.path(tempdir(), "new_folder") f1 <- file.path(d1, "file.one") dir.create(d2, recursive = TRUE) file.create(f1) misc_recursive_copy(d1, d3) }
Miscellaneous helper function to subset R options by a keyword.
misc_subset_options(keyword)
misc_subset_options(keyword)
keyword |
A character. The keyword to subset the R options. |
This function subsets R's options()
by a keyword. It returns a
list of all available options that match with the keyword
. The keyword
is evaluated as a regular expression.
A list
is returned, containing the subset of R's options()
that
matches with the keyword
.
misc_subset_options("default")
misc_subset_options("default")
Machine learning helper function to convert a vector of (in- sample) row indices of a fold into out-of-sample row indices.
mlh_outsample_row_indices(fold_list, dataset_nrows, type = NULL)
mlh_outsample_row_indices(fold_list, dataset_nrows, type = NULL)
fold_list |
A list of integer vectors that describe the row indices of cross-validation folds. The list must be named. |
dataset_nrows |
An integer. The number of rows in the dataset dataset. This parameter is required in order to compute the out-of-sample row indices. |
type |
A character. To be used if the out-of-sample row indices need to
be formatted in a special manner (default: |
If type = NULL
, returns a list of same length as fold_list
with
each item containing a vector of out-of-sample row indices. If
type = "glmnet"
, a data.table is returned with two columns and each row
representing one observation of the dataset that is assigned to a specific
test fold. The column "fold_id" should be passed further on to the argument
foldid
of glmnet::cv.glmnet
.
fold_list <- list( "Fold1" = setdiff(seq_len(100), 1:33), "Fold2" = setdiff(seq_len(100),66:100), "Fold3" = setdiff(seq_len(100),34:65) ) mlh_outsample_row_indices(fold_list, 100) mlh_outsample_row_indices(fold_list, 100, "glmnet")
fold_list <- list( "Fold1" = setdiff(seq_len(100), 1:33), "Fold2" = setdiff(seq_len(100),66:100), "Fold3" = setdiff(seq_len(100),34:65) ) mlh_outsample_row_indices(fold_list, 100) mlh_outsample_row_indices(fold_list, 100, "glmnet")
Machine learning helper function to reshape a matrix of predicted probabilities to classes.
mlh_reshape(object)
mlh_reshape(object)
object |
A matrix with predicted probabilities for several classes. Each row must sum up to 1. |
Returns a vector of type factor of the same length as rows in object, representing the class with the highest probability for each observation in object.
set.seed(123) class_0 <- rbeta(100, 2, 4) class_1 <- (1 - class_0) * 0.4 class_2 <- (1 - class_0) * 0.6 dataset <- cbind("0" = class_0, "1" = class_1, "2" = class_2) mlh_reshape(dataset)
set.seed(123) class_0 <- rbeta(100, 2, 4) class_1 <- (1 - class_0) * 0.4 class_2 <- (1 - class_0) * 0.6 dataset <- cbind("0" = class_0, "1" = class_1, "2" = class_2) mlh_reshape(dataset)
Machine learning helper function to select a subset from a data matrix or a response vector.
mlh_subset(object, ids)
mlh_subset(object, ids)
object |
A vector or a data matrix. Supports also subsetting of "Surv" objects. |
ids |
An integer vector specifying the indices that should be selected from the object. |
Returns the specified subset of the object.
data("iris") mlh_subset(iris, c(1:30)) mlh_subset(iris[, 5], c(1:30))
data("iris") mlh_subset(iris, c(1:30)) mlh_subset(iris[, 5], c(1:30))
Parallel computing helper function to check for the available cores.
pch_check_available_cores(ncores = -1L)
pch_check_available_cores(ncores = -1L)
ncores |
An integer. A number of cores requested for parallel computing
(default: |
The function returns an integer that indicates the number of cores
available. If ncores <= parallel::detectCores()
the function returns
ncores
. If ncores > parallel::detectCores()
, the function returns
parallel::detectCores() - 1L
.
pch_check_available_cores(2)
pch_check_available_cores(2)
Parallel computing helper function to clean up the parallel backend.
pch_clean_up(cl)
pch_clean_up(cl)
cl |
A cluster object of class |
The function returns nothing. Internally, it calls
parallel::stopCluster()
and foreach::registerDoSEQ()
.
parallel::stopCluster()
, foreach::registerDoSEQ()
if (require("doParallel") && require("foreach")) { cl <- pch_register_parallel(pch_check_available_cores(2)) pch_clean_up(cl) }
if (require("doParallel") && require("foreach")) { cl <- pch_register_parallel(pch_check_available_cores(2)) pch_clean_up(cl) }
Parallel computing helper function to register a parallel backend.
pch_register_parallel(ncores)
pch_register_parallel(ncores)
ncores |
An integer. A number of cores requested for parallel computing
(default: |
The function returns a object of class c("SOCKcluster", "cluster")
,
created with parallel::makePSOCKcluster()
.
parallel::makePSOCKcluster()
, doParallel::registerDoParallel()
if (require("doParallel") && require("foreach")) { cl <- pch_register_parallel(pch_check_available_cores(2)) pch_clean_up(cl) }
if (require("doParallel") && require("foreach")) { cl <- pch_register_parallel(pch_check_available_cores(2)) pch_clean_up(cl) }
Parallel coordinates plot
plt_parallel_coordinates( data, cols = NULL, color_variable = NULL, color_args = list(alpha = 0.6, begin = 0.1, end = 0.9, option = "inferno", direction = 1), line_jitter = list(w = 0.04, h = 0.04), text_label_size = 3.5 )
plt_parallel_coordinates( data, cols = NULL, color_variable = NULL, color_args = list(alpha = 0.6, begin = 0.1, end = 0.9, option = "inferno", direction = 1), line_jitter = list(w = 0.04, h = 0.04), text_label_size = 3.5 )
data |
A data.table object with the columns containing the parameters to be plotted with the parallel coordinates plot. |
cols |
A character vector with column names to subset |
color_variable |
A character. The name of the column to be used to
colorize the lines of the plot (default: |
color_args |
A list with parameters for the color gradient (see details). |
line_jitter |
A list with the elements |
text_label_size |
A numeric value to define the size of the text
annotations (default: |
The color gradient of the plotted lines can be defined with a list
provided to the argument color_args
. Its default values are
alpha = 0.6
, begin = .1
, end = .9
, option = "inferno"
, and
direction = 1
and are passed furhter on to
ggplot2::scale_color_viridis_c()
.
The implementation to display categorical variables is still experimental.
Returns a parallel coordinates plot as ggplot2
object.
ggplot2::scale_color_viridis_c()
if (require("ggplot2")) { data("iris") plt_parallel_coordinates( data = data.table::as.data.table(iris[, -5]), cols = colnames(iris)[c(-1, -5)], color_variable = "Sepal.Length" ) }
if (require("ggplot2")) { data("iris") plt_parallel_coordinates( data = data.table::as.data.table(iris[, -5]), cols = colnames(iris)[c(-1, -5)], color_variable = "Sepal.Length" ) }
Reporting helper function: computes and formats the relative percentage of a fraction.
rep_frac_pct( count, count_reference, digits = 2, na.rm = TRUE, brackets = c("round", "square"), suffix = TRUE )
rep_frac_pct( count, count_reference, digits = 2, na.rm = TRUE, brackets = c("round", "square"), suffix = TRUE )
count |
A numeric. The numerator. |
count_reference |
A numeric. The denominator. |
digits |
An integer indicating the number of decimal places. |
na.rm |
A logical indicating if missings should be removed from |
brackets |
A character. Either |
suffix |
A character which is placed between the lower and the upper confidence bound in the formatted output. |
A character with the formatted output.
stats::median, stats::quantile, Hmisc::wtd.quantile()
rep_frac_pct(count = 40, count_reference = 200) rep_frac_pct(count = 40, count_reference = 200, brackets = "square") rep_frac_pct(40, 200, brackets = "square", suffix = FALSE)
rep_frac_pct(count = 40, count_reference = 200) rep_frac_pct(count = 40, count_reference = 200, brackets = "square") rep_frac_pct(40, 200, brackets = "square", suffix = FALSE)
Reporting helper function: computes and formats mean and standard deviation from a numeric vector.
rep_mean_sd( x, digits = 2, na.rm = TRUE, sd_brackets = c("round", "square"), sd_prefix = TRUE, weighted = FALSE, weights = NA )
rep_mean_sd( x, digits = 2, na.rm = TRUE, sd_brackets = c("round", "square"), sd_prefix = TRUE, weighted = FALSE, weights = NA )
x |
A numeric vector. |
digits |
An integer indicating the number of decimal places. |
na.rm |
A logical indicating if missings should be removed from |
sd_brackets |
A character. Either |
sd_prefix |
A logical. If |
weighted |
A logical. If |
weights |
A vector with the weights (if |
A character with the formatted output.
mean()
, stats::sd()
, stats::weighted.mean()
,
Hmisc::wtd.var()
set.seed(123) x <- rnorm(1000) rep_mean_sd(x) rep_mean_sd(rep(1, 10)) rep_mean_sd(x, sd_brackets = "square") rep_mean_sd(x, sd_brackets = "square", sd_prefix = FALSE)
set.seed(123) x <- rnorm(1000) rep_mean_sd(x) rep_mean_sd(rep(1, 10)) rep_mean_sd(x, sd_brackets = "square") rep_mean_sd(x, sd_brackets = "square", sd_prefix = FALSE)
Reporting helper function: computes and formats median and confidence interval from a numeric vector.
rep_median_ci( x, conf_int, digits = 2, na.rm = TRUE, collapse = "to", iqr_brackets = c("round", "square"), iqr_prefix = TRUE, weighted = FALSE, weights = NA )
rep_median_ci( x, conf_int, digits = 2, na.rm = TRUE, collapse = "to", iqr_brackets = c("round", "square"), iqr_prefix = TRUE, weighted = FALSE, weights = NA )
x |
A numeric vector. |
conf_int |
A numeric between 0 and 100 to indicate the confidence interval that should be computed. |
digits |
An integer indicating the number of decimal places. |
na.rm |
A logical indicating if missings should be removed from |
collapse |
A character which is placed between the lower and the upper confidence bound in the formatted output. |
iqr_brackets |
A character. Either |
iqr_prefix |
A logical. If |
weighted |
A logical. If |
weights |
A numeric vector of weights passed further on to
|
A character with the formatted output.
stats::median, stats::quantile, Hmisc::wtd.quantile()
set.seed(123) x <- rnorm(1000) rep_median_ci(x, conf_int = 95) rep_median_ci(rep(1, 10), conf_int = 95) rep_median_ci(x, conf_int = 95, collapse = "-") rep_median_ci(x, iqr_brackets = "square", conf_int = 50)
set.seed(123) x <- rnorm(1000) rep_median_ci(x, conf_int = 95) rep_median_ci(rep(1, 10), conf_int = 95) rep_median_ci(x, conf_int = 95, collapse = "-") rep_median_ci(x, iqr_brackets = "square", conf_int = 50)
Reporting helper function: computes and formats median and interquartile range from a numeric vector.
rep_median_iqr( x, digits = 2, na.rm = TRUE, collapse = "to", iqr_brackets = c("round", "square"), iqr_prefix = TRUE )
rep_median_iqr( x, digits = 2, na.rm = TRUE, collapse = "to", iqr_brackets = c("round", "square"), iqr_prefix = TRUE )
x |
A numeric vector. |
digits |
An integer indicating the number of decimal places. |
na.rm |
A logical indicating if missings should be removed from |
collapse |
A character which is placed between the lower and the upper confidence bound in the formatted output. |
iqr_brackets |
A character. Either |
iqr_prefix |
A logical. If |
This is just a special case of rep_median_ci()
with the parameter
conf_int
set to 50
.
A character with the formatted output.
set.seed(123) x <- rnorm(1000) rep_median_iqr(x) rep_median_iqr(rep(1, 10)) rep_median_iqr(x, collapse = "-") rep_median_iqr(x, iqr_brackets = "square") rep_median_iqr(x, iqr_brackets = "square", iqr_prefix = FALSE) rep_median_iqr(x, collapse = ";", iqr_prefix = FALSE)
set.seed(123) x <- rnorm(1000) rep_median_iqr(x) rep_median_iqr(rep(1, 10)) rep_median_iqr(x, collapse = "-") rep_median_iqr(x, iqr_brackets = "square") rep_median_iqr(x, iqr_brackets = "square", iqr_prefix = FALSE) rep_median_iqr(x, collapse = ";", iqr_prefix = FALSE)
Reporting helper function: formats p-value.
rep_pval(p, threshold = 0.001, digits = 3L)
rep_pval(p, threshold = 0.001, digits = 3L)
p |
The p-value that should be formatted. |
threshold |
A threshold to indicate that only "< threshold" is printed as output (default: 0.001). |
digits |
The number of digits of the formatted p-value (digits). |
If the p-value is lower than the threshold, the output of the function is "< threshold". Otherwise, the p-value is formatted to the number of digits.
A character with the formatted p-value.
rep_pval(0.032) rep_pval(0.00032)
rep_pval(0.032) rep_pval(0.00032)
Reporting helper function: computes and formats the relative percentage of a count.
rep_sum_pct( count, count_reference, digits = 2, na.rm = TRUE, brackets = c("round", "square"), suffix = TRUE )
rep_sum_pct( count, count_reference, digits = 2, na.rm = TRUE, brackets = c("round", "square"), suffix = TRUE )
count |
A numeric. The numerator. |
count_reference |
A numeric. The denominator. |
digits |
An integer indicating the number of decimal places. |
na.rm |
A logical indicating if missings should be removed from |
brackets |
A character. Either |
suffix |
A character which is placed between the lower and the upper confidence bound in the formatted output. |
A character with the formatted output.
stats::median, stats::quantile, Hmisc::wtd.quantile()
rep_sum_pct(count = 40, count_reference = 200) rep_sum_pct(count = 40, count_reference = 200, brackets = "square") rep_sum_pct(40, 200, brackets = "square", suffix = FALSE)
rep_sum_pct(count = 40, count_reference = 200) rep_sum_pct(count = 40, count_reference = 200, brackets = "square") rep_sum_pct(40, 200, brackets = "square", suffix = FALSE)
Statistic helper function to normalize a continuous variable between zero and one.
sts_normalize(x, na.rm = FALSE)
sts_normalize(x, na.rm = FALSE)
x |
A vector of type |
na.rm |
A logical to indicate, if missings should be removed. |
Returns a vector of same length as x
with values normalized between
zero and one. If x
contains missings and na.rm = TRUE
, the missings are
removed before normalization; otherwise, a vector of NA
is returend.
sts_normalize(1:100)
sts_normalize(1:100)