Title: | Machine Learning Experiments |
---|---|
Description: | Provides 'R6' objects to perform parallelized hyperparameter optimization and cross-validation. Hyperparameter optimization can be performed with Bayesian optimization (via 'ParBayesianOptimization' <https://cran.r-project.org/package=ParBayesianOptimization>) and grid search. The optimized hyperparameters can be validated using k-fold cross-validation. Alternatively, hyperparameter optimization and validation can be performed with nested cross-validation. While 'mlexperiments' focuses on core wrappers for machine learning experiments, additional learner algorithms can be supplemented by inheriting from the provided learner base class. |
Authors: | Lorenz A. Kapsner [cre, aut, cph] |
Maintainer: | Lorenz A. Kapsner <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.0.4 |
Built: | 2024-11-02 04:43:56 UTC |
Source: | https://github.com/kapsner/mlexperiments |
Helper function to handle categorical variables
handle_cat_vars(kwargs)
handle_cat_vars(kwargs)
kwargs |
A list containing keyword arguments. |
This function is a utility function to separate the list element with the
names of the categorical variables from the key word arguments list to
be passed further on to kdry::dtr_matrix2df()
.
Returns a list with two elements:
params
The keyword arguments without cat_vars
.
cat_vars
The vector cat_vars
.
handle_cat_vars(list(cat_vars = c("a", "b", "c"), arg1 = 1, arg2 = 2))
handle_cat_vars(list(cat_vars = c("a", "b", "c"), arg1 = 1, arg2 = 2))
This learner is a wrapper around stats::glm()
in order to perform a
generalized linear regression. There is no implementation for tuning
parameters.
Can be used with
Implemented methods:
$fit
To fit the model.
$predict
To predict new data with the model.
mlexperiments::MLLearnerBase
-> LearnerGlm
new()
Create a new LearnerGlm
object.
LearnerGlm$new()
This learner is a wrapper around stats::glm()
in order to perform a
generalized linear regression. There is no implementation for tuning
parameters, thus the only experiment to use LearnerGlm
for is
MLCrossValidation.
A new LearnerGlm
R6 object.
LearnerGlm$new()
clone()
The objects of this class are cloneable with this method.
LearnerGlm$clone(deep = FALSE)
deep
Whether to make a deep clone.
LearnerGlm$new() ## ------------------------------------------------ ## Method `LearnerGlm$new` ## ------------------------------------------------ LearnerGlm$new()
LearnerGlm$new() ## ------------------------------------------------ ## Method `LearnerGlm$new` ## ------------------------------------------------ LearnerGlm$new()
This learner is a wrapper around class::knn()
in order to perform a
k-nearest neighbor classification.
Optimization metric: classification error rate Can be used with
Implemented methods:
$fit
To fit the model.
$predict
To predict new data with the model.
$cross_validation
To perform a grid search (hyperparameter
optimization).
$bayesian_scoring_function
To perform a Bayesian hyperparameter
optimization.
For the two hyperparameter optimization strategies ("grid" and "bayesian"),
the parameter metric_optimization_higher_better
of the learner is
set to FALSE
by default as the classification error rate
(mlr3measures::ce()
) is used as the optimization metric.
mlexperiments::MLLearnerBase
-> LearnerKnn
new()
Create a new LearnerKnn
object.
LearnerKnn$new()
This learner is a wrapper around class::knn()
in order to perform a
k-nearest neighbor classification. The following experiments are
implemented:
MLNestedCV
For the two hyperparameter optimization strategies ("grid" and
"bayesian"), the parameter metric_optimization_higher_better
of the
learner is set to FALSE
by default as the classification error rate
(mlr3measures::ce()
) is used as the optimization metric.
LearnerKnn$new()
clone()
The objects of this class are cloneable with this method.
LearnerKnn$clone(deep = FALSE)
deep
Whether to make a deep clone.
class::knn()
, mlr3measures::ce()
class::knn()
, mlr3measures::ce()
LearnerKnn$new() ## ------------------------------------------------ ## Method `LearnerKnn$new` ## ------------------------------------------------ LearnerKnn$new()
LearnerKnn$new() ## ------------------------------------------------ ## Method `LearnerKnn$new` ## ------------------------------------------------ LearnerKnn$new()
This learner is a wrapper around stats::lm()
in order to perform a
linear regression. There is no implementation for tuning
parameters.
Can be used with
mlexperiments::MLCrossValidation
Implemented methods:
$fit
To fit the model.
$predict
To predict new data with the model.
mlexperiments::MLLearnerBase
-> LearnerLm
new()
Create a new LearnerLm
object.
LearnerLm$new()
This learner is a wrapper around stats::lm()
in order to perform a
linear regression. There is no implementation for tuning
parameters, thus the only experiment to use LearnerLm
for is
MLCrossValidation
A new LearnerLm
R6 object.
LearnerLm$new()
clone()
The objects of this class are cloneable with this method.
LearnerLm$clone(deep = FALSE)
deep
Whether to make a deep clone.
LearnerLm$new() ## ------------------------------------------------ ## Method `LearnerLm$new` ## ------------------------------------------------ LearnerLm$new()
LearnerLm$new() ## ------------------------------------------------ ## Method `LearnerLm$new` ## ------------------------------------------------ LearnerLm$new()
This learner is a wrapper around rpart::rpart()
in order to fit recursive
partitioning and regression trees.
Optimization metric:
classification (method = "class"
): classification error rate
regression (method = "anova"
): mean squared error
Can be used with
Implemented methods:
$fit
To fit the model.
$predict
To predict new data with the model.
$cross_validation
To perform a grid search (hyperparameter
optimization).
$bayesian_scoring_function
To perform a Bayesian hyperparameter
optimization.
Parameters that are specified with parameter_grid
and / or learner_args
are forwarded to rpart
's argument control
(see
rpart::rpart.control()
for further details).
For the two hyperparameter optimization strategies ("grid" and "bayesian"),
the parameter metric_optimization_higher_better
of the learner is
set to FALSE
by default as the classification error rate
(mlr3measures::ce()
) is used as the optimization metric for
classification tasks and the mean squared error (mlr3measures::mse()
) is
used for regression tasks.
mlexperiments::MLLearnerBase
-> LearnerRpart
new()
Create a new LearnerRpart
object.
LearnerRpart$new()
This learner is a wrapper around rpart::rpart()
in order to fit
recursive partitioning and regression trees. The following experiments
are implemented:
For the two hyperparameter optimization strategies ("grid" and
"bayesian"), the parameter metric_optimization_higher_better
of the
learner is set to FALSE
by default as the classification error rate
(mlr3measures::ce()
) is used as the optimization metric for
classification tasks and the mean squared error (mlr3measures::mse()
)
is used for regression tasks.
LearnerRpart$new()
clone()
The objects of this class are cloneable with this method.
LearnerRpart$clone(deep = FALSE)
deep
Whether to make a deep clone.
rpart::rpart()
, mlr3measures::ce()
, mlr3measures::mse()
,
rpart::rpart.control()
rpart::rpart()
, mlr3measures::ce()
, mlr3measures::mse()
LearnerRpart$new() ## ------------------------------------------------ ## Method `LearnerRpart$new` ## ------------------------------------------------ LearnerRpart$new()
LearnerRpart$new() ## ------------------------------------------------ ## Method `LearnerRpart$new` ## ------------------------------------------------ LearnerRpart$new()
Returns a metric function which can be used for the experiments (especially the cross-validation experiments) to compute the performance.
metric(name)
metric(name)
name |
A metric name. Accepted names are the names of the metric
function exported from the |
This function is a utility function to select performance metrics from the
mlr3measures
R package and to reformat them into a form that is required
by the mlexperiments
R package. For mlexperiments
it is required that
a metric function takes the two arguments ground_truth
, and predictions
,
as well as additional names arguments that are necessary to compute the
performance, which are provided via the ellipsis argument (...).
When using the performance metric with an experiment of class
"MLCrossValidation"
, such arguments can be defined as a list provided to
the field performance_metric_args
of the R6 class.
The main purpose of mlexperiments::metric()
is convenience and to
re-use already existing implementations of the metrics. However, custom
functions can be provided easily to compute the performance of the
experiments, simply by providing a function that takes the above mentioned
arguments and returns one performance metric value.
Returns a function that can be used as function to calculate the performance metric throughout the experiments.
metric("auc")
metric("auc")
Prepares the data to be conform with the requirements of
the metrics from mlr3measures
.
metric_types_helper(FUN, y, perf_args)
metric_types_helper(FUN, y, perf_args)
FUN |
A metric function, created with |
y |
The outcome vector. |
perf_args |
A list. The arguments to call the metric function with. |
The mlr3measures
R package makes some restrictions on the data type of
the ground truth and the predictions, depending on the metric, i.e. the
type of the task (regression or classification).
Thus, it is necessary to convert the inputs to the metric function
accordingly, which is done with this helper function.
Returns the calculated performance measure.
set.seed(123) ground_truth <- sample(0:1, 100, replace = TRUE) predictions <- sample(0:1, 100, replace = TRUE) FUN <- metric("acc") perf_args <- list( ground_truth = ground_truth, predictions = predictions ) metric_types_helper( FUN = FUN, y = ground_truth, perf_args = perf_args )
set.seed(123) ground_truth <- sample(0:1, 100, replace = TRUE) predictions <- sample(0:1, 100, replace = TRUE) FUN <- metric("acc") perf_args <- list( ground_truth = ground_truth, predictions = predictions ) metric_types_helper( FUN = FUN, y = ground_truth, perf_args = perf_args )
Basic R6 Class for the mlexperiments package
Basic R6 Class for the mlexperiments package
results
A list. This field is used to store the final results of the respective methods.
new()
Create a new MLBase
object.
MLBase$new(seed, ncores = -1L)
seed
An integer. Needs to be set for reproducibility purposes.
ncores
An integer to specify the number of cores used for
parallelization (default: -1L
).
A new MLBase
R6 object.
clone()
The objects of this class are cloneable with this method.
MLBase$clone(deep = FALSE)
deep
Whether to make a deep clone.
The MLCrossValidation
class is used to construct a cross validation object
and to perform a k-fold cross validation for a specified machine learning
algorithm using one distinct hyperparameter setting.
The MLCrossValidation
class requires to provide a named list of predefined
row indices for the cross validation folds, e.g., created with the function
splitTools::create_folds()
. This list also defines the k
of the k-fold
cross-validation. When wanting to perform a repeated k-fold cross
validations, just provide a list with all repeated fold definitions, e.g.,
when specifying the argument m_rep
of splitTools::create_folds()
.
mlexperiments::MLBase
-> mlexperiments::MLExperimentsBase
-> MLCrossValidation
fold_list
A named list of predefined row indices for the cross
validation folds, e.g., created with the function
splitTools::create_folds()
.
return_models
A logical. If the fitted models should be returned
with the results (default: FALSE
).
performance_metric
Either a named list with metric functions, a
single metric function, or a character vector with metric names from
the mlr3measures
package. The provided functions must take two named
arguments: ground_truth
and predictions
. For metrics from the
mlr3measures
package, the wrapper function metric()
exists in order to prepare them for use with the mlexperiments
package.
performance_metric_args
A list. Further arguments required to compute the performance metric.
predict_args
A list. Further arguments required to compute the predictions.
new()
Create a new MLCrossValidation
object.
MLCrossValidation$new( learner, fold_list, seed, ncores = -1L, return_models = FALSE )
learner
An initialized learner object that inherits from class
"MLLearnerBase"
.
fold_list
A named list of predefined row indices for the cross
validation folds, e.g., created with the function
splitTools::create_folds()
.
seed
An integer. Needs to be set for reproducibility purposes.
ncores
An integer to specify the number of cores used for
parallelization (default: -1L
).
return_models
A logical. If the fitted models should be returned
with the results (default: FALSE
).
The MLCrossValidation
class requires to provide a named list of
predefined row indices for the cross validation folds, e.g., created
with the function splitTools::create_folds()
. This list also defines
the k
of the k-fold cross-validation. When wanting to perform a
repeated k-fold cross validations, just provide a list with all
repeated fold definitions, e.g., when specifing the argument m_rep
of
splitTools::create_folds()
.
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLCrossValidation$new( learner = LearnerKnn$new(), fold_list = fold_list, seed = 123, ncores = 2 )
execute()
Execute the cross validation.
MLCrossValidation$execute()
All results of the cross validation are saved in the field
$results
of the MLCrossValidation
class. After successful execution
of the cross validation, $results
contains a list with the items:
"fold" A list of folds containing the following items for each cross validation fold:
"fold_ids" A vector with the utilized in-sample row indices.
"ground_truth" A vector with the ground truth.
"predictions" A vector with the predictions.
"learner.args" A list with the arguments provided to the learner.
"model" If return_models = TRUE
, the fitted model.
"summary" A data.table with the summarized results (same as
the returned value of the execute
method).
"performance" A list with the value of the performance metric calculated for each of the cross validation folds.
The function returns a data.table with the results of the cross
validation. More results are accessible from the field $results
of
the MLCrossValidation
class.
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLCrossValidation$new( learner = LearnerKnn$new(), fold_list = fold_list, seed = 123, ncores = 2 ) cv$learner_args <- list( k = 20, l = 0, test = parse(text = "fold_test$x") ) cv$predict_args <- list(type = "response") cv$performance_metric <- metric("bacc") # set data cv$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv$execute()
clone()
The objects of this class are cloneable with this method.
MLCrossValidation$clone(deep = FALSE)
deep
Whether to make a deep clone.
splitTools::create_folds()
, mlr3measures::measures,
metric()
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLCrossValidation$new( learner = LearnerKnn$new(), fold_list = fold_list, seed = 123, ncores = 2 ) # learner parameters cv$learner_args <- list( k = 20, l = 0, test = parse(text = "fold_test$x") ) # performance parameters cv$predict_args <- list(type = "response") cv$performance_metric <- metric("bacc") # set data cv$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv$execute() ## ------------------------------------------------ ## Method `MLCrossValidation$new` ## ------------------------------------------------ dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLCrossValidation$new( learner = LearnerKnn$new(), fold_list = fold_list, seed = 123, ncores = 2 ) ## ------------------------------------------------ ## Method `MLCrossValidation$execute` ## ------------------------------------------------ dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLCrossValidation$new( learner = LearnerKnn$new(), fold_list = fold_list, seed = 123, ncores = 2 ) cv$learner_args <- list( k = 20, l = 0, test = parse(text = "fold_test$x") ) cv$predict_args <- list(type = "response") cv$performance_metric <- metric("bacc") # set data cv$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv$execute()
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLCrossValidation$new( learner = LearnerKnn$new(), fold_list = fold_list, seed = 123, ncores = 2 ) # learner parameters cv$learner_args <- list( k = 20, l = 0, test = parse(text = "fold_test$x") ) # performance parameters cv$predict_args <- list(type = "response") cv$performance_metric <- metric("bacc") # set data cv$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv$execute() ## ------------------------------------------------ ## Method `MLCrossValidation$new` ## ------------------------------------------------ dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLCrossValidation$new( learner = LearnerKnn$new(), fold_list = fold_list, seed = 123, ncores = 2 ) ## ------------------------------------------------ ## Method `MLCrossValidation$execute` ## ------------------------------------------------ dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLCrossValidation$new( learner = LearnerKnn$new(), fold_list = fold_list, seed = 123, ncores = 2 ) cv$learner_args <- list( k = 20, l = 0, test = parse(text = "fold_test$x") ) cv$predict_args <- list(type = "response") cv$performance_metric <- metric("bacc") # set data cv$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv$execute()
R6 Class on which the experiment classes are built on
R6 Class on which the experiment classes are built on
mlexperiments::MLBase
-> MLExperimentsBase
learner_args
A list containing the parameter settings of the learner algorithm.
learner
An initialized learner object that inherits from class
"MLLearnerBase"
.
new()
Create a new MLExperimentsBase
object.
MLExperimentsBase$new(learner, seed, ncores = -1L)
learner
An initialized learner object that inherits from class
"MLLearnerBase"
.
seed
An integer. Needs to be set for reproducibility purposes.
ncores
An integer to specify the number of cores used for
parallelization (default: -1L
).
A new MLExperimentsBase
R6 object.
set_data()
Set the data for the experiment.
MLExperimentsBase$set_data(x, y, cat_vars = NULL)
x
A matrix with the training data.
y
A vector with the target.
cat_vars
A character vector with the column names of variables that should be treated as categorical features (if applicable / supported by the respective algorithm).
The function has no return value. It internally performs quality checks on the provided data and, if passed, defines private fields of the R6 class.
clone()
The objects of this class are cloneable with this method.
MLExperimentsBase$clone(deep = FALSE)
deep
Whether to make a deep clone.
The MLLearnerBase
class is used to construct a learner object that can be
used with the experiment classes from the mlexperiments
package. It is
thought to serve as a class to inherit from when creating new learners.
The learner class exposes 4 methods that can be defined:
$fit
A wrapper around the private function fun_fit
, which needs to
be defined for every learner. The return value of this function is the
fitted model.
$predict
A wrapper around the private function fun_predict
,
which needs to be defined for every learner. The function must accept the
three arguments model
, newdata
, and ncores
and is a wrapper around
the respective learner's predict-function. In order to allow the passing of
further arguments, the ellipsis (...
) can be used. The function should
return the prediction results.
$cross_validation
A wrapper around the private function
fun_optim_cv
, which needs to be defined when hyperparameters should be
optimized with a grid search (required for use with
MLTuneParameters, and MLNestedCV).
$bayesian_scoring_function
A wrapper around the private function
fun_bayesian_scoring_function
, which needs to be defined when
hyperparameters should be optimized with a Bayesian process (required for
use with MLTuneParameters, and
MLNestedCV).
For further details please refer to the package's vignette.
cluster_export
A character vector defining the (internal)
functions that need to be exported to the parallelization cluster.
This is only required when performing a Bayesian hyperparameter
optimization. See also parallel::clusterExport()
.
metric_optimization_higher_better
A logical. Defines the direction
of the optimization metric used throughout the hyperparameter
optimization. This field is set automatically during the initialization
of the MLLearnerBase
object. Its purpose is to make it accessible by
the evaluation functions from MLTuneParameters.
environment
The environment in which to search for the functions
of the learner (default: -1L
).
seed
Seed for reproducible results.
new()
Create a new MLLearnerBase
object.
MLLearnerBase$new(metric_optimization_higher_better)
metric_optimization_higher_better
A logical. Defines the direction of the optimization metric used throughout the hyperparameter optimization.
A new MLLearnerBase
R6 object.
MLLearnerBase$new(metric_optimization_higher_better = FALSE)
cross_validation()
Perform a cross-validation with an MLLearnerBase
.
MLLearnerBase$cross_validation(...)
...
Arguments to be passed to the learner's cross-validation function.
A wrapper around the private function fun_optim_cv
, which needs to be
defined when hyperparameters should be optimized with a grid search
(required for use with MLTuneParameters, and
MLNestedCV.
However, the function should be never executed directly but by the
respective experiment wrappers MLTuneParameters, and
MLNestedCV.
For further details please refer to the package's vignette.
The fitted model.
learner <- MLLearnerBase$new(metric_optimization_higher_better = FALSE) \dontrun{ # This example cannot be run without further adaptions. # The method `$cross_validation()` needs to be overwritten when # inheriting from this class. learner$cross_validation() }
fit()
Fit a MLLearnerBase
object.
MLLearnerBase$fit(...)
...
Arguments to be passed to the learner's fitting function.
A wrapper around the private function fun_fit
, which needs to be
defined for every learner. The return value of this function is the
fitted model.
However, the function should be never executed directly but by the
respective experiment wrappers MLTuneParameters,
MLCrossValidation, and
MLNestedCV.
For further details please refer to the package's vignette.
The fitted model.
learner <- MLLearnerBase$new(metric_optimization_higher_better = FALSE) \dontrun{ # This example cannot be run without further adaptions. # The method `$fit()` needs to be overwritten when # inheriting from this class. learner$fit() }
predict()
Make predictions from a fitted MLLearnerBase
object.
MLLearnerBase$predict(model, newdata, ncores = -1L, ...)
model
A fitted model of the learner (as returned by
MLLearnerBase$fit()
).
newdata
The new data for which predictions should be made using
the model
.
ncores
An integer to specify the number of cores used for
parallelization (default: -1L
).
...
Further arguments to be passed to the learner's predict function.
A wrapper around the private function fun_predict
, which needs to be
defined for every learner. The function must accept the three arguments
model
, newdata
, and ncores
and is a wrapper around the respective
learner's predict-function. In order to allow the passing of further
arguments, the ellipsis (...
) can be used. The function should
return the prediction results.
However, the function should be never executed directly but by the
respective experiment wrappers MLTuneParameters,
MLCrossValidation, and
MLNestedCV.
For further details please refer to the package's vignette.
The predictions for newdata
.
learner <- MLLearnerBase$new(metric_optimization_higher_better = FALSE) \dontrun{ # This example cannot be run without further adaptions. # The method `$predict()` needs to be overwritten when # inheriting from this class. learner$fit() learner$predict() }
bayesian_scoring_function()
Perform a Bayesian hyperparameter optimization with an MLLearnerBase
.
MLLearnerBase$bayesian_scoring_function(...)
...
Arguments to be passed to the learner's Bayesian scoring function.
A wrapper around the private function fun_bayesian_scoring_function
,
which needs to be defined when hyperparameters should be optimized with
a Bayesian process (required for use with
MLTuneParameters, and MLNestedCV.
However, the function should be never executed directly but by the
respective experiment wrappers MLTuneParameters, and
MLNestedCV.
For further details please refer to the package's vignette.
The results of the Bayesian scoring.
learner <- MLLearnerBase$new(metric_optimization_higher_better = FALSE) \dontrun{ # This example cannot be run without further adaptions. # The method `$bayesian_scoring_function()` needs to be overwritten when # inheriting from this class. learner$bayesian_scoring_function() }
clone()
The objects of this class are cloneable with this method.
MLLearnerBase$clone(deep = FALSE)
deep
Whether to make a deep clone.
MLTuneParameters, MLCrossValidation, and MLNestedCV
MLTuneParameters, MLCrossValidation, and MLNestedCV
MLTuneParameters, MLCrossValidation, and MLNestedCV
ParBayesianOptimization::bayesOpt()
,
MLTuneParameters, and MLNestedCV
MLLearnerBase$new(metric_optimization_higher_better = FALSE) ## ------------------------------------------------ ## Method `MLLearnerBase$new` ## ------------------------------------------------ MLLearnerBase$new(metric_optimization_higher_better = FALSE) ## ------------------------------------------------ ## Method `MLLearnerBase$cross_validation` ## ------------------------------------------------ learner <- MLLearnerBase$new(metric_optimization_higher_better = FALSE) ## Not run: # This example cannot be run without further adaptions. # The method `$cross_validation()` needs to be overwritten when # inheriting from this class. learner$cross_validation() ## End(Not run) ## ------------------------------------------------ ## Method `MLLearnerBase$fit` ## ------------------------------------------------ learner <- MLLearnerBase$new(metric_optimization_higher_better = FALSE) ## Not run: # This example cannot be run without further adaptions. # The method `$fit()` needs to be overwritten when # inheriting from this class. learner$fit() ## End(Not run) ## ------------------------------------------------ ## Method `MLLearnerBase$predict` ## ------------------------------------------------ learner <- MLLearnerBase$new(metric_optimization_higher_better = FALSE) ## Not run: # This example cannot be run without further adaptions. # The method `$predict()` needs to be overwritten when # inheriting from this class. learner$fit() learner$predict() ## End(Not run) ## ------------------------------------------------ ## Method `MLLearnerBase$bayesian_scoring_function` ## ------------------------------------------------ learner <- MLLearnerBase$new(metric_optimization_higher_better = FALSE) ## Not run: # This example cannot be run without further adaptions. # The method `$bayesian_scoring_function()` needs to be overwritten when # inheriting from this class. learner$bayesian_scoring_function() ## End(Not run)
MLLearnerBase$new(metric_optimization_higher_better = FALSE) ## ------------------------------------------------ ## Method `MLLearnerBase$new` ## ------------------------------------------------ MLLearnerBase$new(metric_optimization_higher_better = FALSE) ## ------------------------------------------------ ## Method `MLLearnerBase$cross_validation` ## ------------------------------------------------ learner <- MLLearnerBase$new(metric_optimization_higher_better = FALSE) ## Not run: # This example cannot be run without further adaptions. # The method `$cross_validation()` needs to be overwritten when # inheriting from this class. learner$cross_validation() ## End(Not run) ## ------------------------------------------------ ## Method `MLLearnerBase$fit` ## ------------------------------------------------ learner <- MLLearnerBase$new(metric_optimization_higher_better = FALSE) ## Not run: # This example cannot be run without further adaptions. # The method `$fit()` needs to be overwritten when # inheriting from this class. learner$fit() ## End(Not run) ## ------------------------------------------------ ## Method `MLLearnerBase$predict` ## ------------------------------------------------ learner <- MLLearnerBase$new(metric_optimization_higher_better = FALSE) ## Not run: # This example cannot be run without further adaptions. # The method `$predict()` needs to be overwritten when # inheriting from this class. learner$fit() learner$predict() ## End(Not run) ## ------------------------------------------------ ## Method `MLLearnerBase$bayesian_scoring_function` ## ------------------------------------------------ learner <- MLLearnerBase$new(metric_optimization_higher_better = FALSE) ## Not run: # This example cannot be run without further adaptions. # The method `$bayesian_scoring_function()` needs to be overwritten when # inheriting from this class. learner$bayesian_scoring_function() ## End(Not run)
The MLNestedCV
class is used to construct a nested cross validation object
and to perform a nested cross validation for a specified machine learning
algorithm by performing a hyperparameter optimization with the in-sample
observations of each of the k outer folds and validate them directly on the
out-of-sample observations of the respective fold.
The MLNestedCV
class requires to provide a named list of predefined
row indices for the outer cross validation folds, e.g., created with the
function splitTools::create_folds()
. This list also defines the k
of
the k-fold cross-validation. Furthermore, a strategy needs to be chosen
("grid" or "bayesian") for the hyperparameter optimization as well as the
parameter k_tuning
to define the number of inner cross validation folds.
mlexperiments::MLBase
-> mlexperiments::MLExperimentsBase
-> mlexperiments::MLCrossValidation
-> MLNestedCV
strategy
A character. The strategy to optimize the hyperparameters
(either "grid"
or "bayesian"
).
parameter_bounds
A named list of tuples to define the parameter
bounds of the Bayesian hyperparameter optimization. For further details
please see the documentation of the ParBayesianOptimization
package.
parameter_grid
A matrix with named columns in which each column
represents a parameter that should be optimized and each row represents
a specific hyperparameter setting that should be tested throughout the
procedure. For strategy = "grid"
, each row of the parameter_grid
is
considered as a setting that is evaluated. For strategy = "bayesian"
,
the parameter_grid
is passed further on to the initGrid
argument of
the function ParBayesianOptimization::bayesOpt()
in order to
initialize the Bayesian process. The maximum rows considered for
initializing the Bayesian process can be specified with the R option
option("mlexperiments.bayesian.max_init")
, which is set to 50L
by
default.
optim_args
A named list of tuples to define the parameter
bounds of the Bayesian hyperparameter optimization. For further details
please see the documentation of the ParBayesianOptimization
package.
split_type
A character. The splitting strategy to construct the
k cross-validation folds. This parameter is passed further on to the
function splitTools::create_folds()
and defaults to "stratified"
.
split_vector
A vector If another criteria than the provided y
should be considered for generating the cross-validation folds, it can
be defined here. It is important, that a vector of the same length as
x
is provided here.
k_tuning
An integer to define the number of cross-validation folds used to tune the hyperparameters.
new()
Create a new MLNestedCV
object.
MLNestedCV$new( learner, strategy = c("grid", "bayesian"), k_tuning, fold_list, seed, ncores = -1L, return_models = FALSE )
learner
An initialized learner object that inherits from class
"MLLearnerBase"
.
strategy
A character. The strategy to optimize the hyperparameters
(either "grid"
or "bayesian"
).
k_tuning
An integer to define the number of cross-validation folds used to tune the hyperparameters.
fold_list
A named list of predefined row indices for the cross
validation folds, e.g., created with the function
splitTools::create_folds()
.
seed
An integer. Needs to be set for reproducibility purposes.
ncores
An integer to specify the number of cores used for
parallelization (default: -1L
).
return_models
A logical. If the fitted models should be returned
with the results (default: FALSE
).
The MLNestedCV
class requires to provide a named list of predefined
row indices for the outer cross validation folds, e.g., created with
the function splitTools::create_folds()
. This list also defines the
k
of the k-fold cross-validation. Furthermore, a strategy needs to
be chosen ("grid" or "bayesian") for the hyperparameter optimization
as well as the parameter k_tuning
to define the number of inner
cross validation folds.
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLNestedCV$new( learner = LearnerKnn$new(), strategy = "grid", fold_list = fold_list, k_tuning = 3L, seed = 123, ncores = 2 )
execute()
Execute the nested cross validation.
MLNestedCV$execute()
All results of the cross validation are saved in the field $results
of
the MLNestedCV
class. After successful execution of the nested cross
validation, $results
contains a list with the items:
"results.optimization" A list with the results of the hyperparameter optimization.
"fold" A list of folds containing the following items for each cross validation fold:
"fold_ids" A vector with the utilized in-sample row indices.
"ground_truth" A vector with the ground truth.
"predictions" A vector with the predictions.
"learner.args" A list with the arguments provided to the learner.
"model" If return_models = TRUE
, the fitted model.
"summary" A data.table with the summarized results (same as
the returned value of the execute
method).
"performance" A list with the value of the performance metric calculated for each of the cross validation folds.
The function returns a data.table with the results of the nested
cross validation. More results are accessible from the field $results
of the MLNestedCV
class.
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLNestedCV$new( learner = LearnerKnn$new(), strategy = "grid", fold_list = fold_list, k_tuning = 3L, seed = 123, ncores = 2 ) # learner args (not optimized) cv$learner_args <- list( l = 0, test = parse(text = "fold_test$x") ) # parameters for hyperparameter tuning cv$parameter_grid <- expand.grid( k = seq(4, 68, 8) ) cv$split_type <- "stratified" # performance parameters cv$predict_args <- list(type = "response") cv$performance_metric <- metric("bacc") # set data cv$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv$execute()
clone()
The objects of this class are cloneable with this method.
MLNestedCV$clone(deep = FALSE)
deep
Whether to make a deep clone.
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLNestedCV$new( learner = LearnerKnn$new(), strategy = "grid", fold_list = fold_list, k_tuning = 3L, seed = 123, ncores = 2 ) # learner args (not optimized) cv$learner_args <- list( l = 0, test = parse(text = "fold_test$x") ) # parameters for hyperparameter tuning cv$parameter_grid <- expand.grid( k = seq(4, 16, 8) ) cv$split_type <- "stratified" # performance parameters cv$predict_args <- list(type = "response") cv$performance_metric <- metric("bacc") # set data cv$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv$execute() ## ------------------------------------------------ ## Method `MLNestedCV$new` ## ------------------------------------------------ dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLNestedCV$new( learner = LearnerKnn$new(), strategy = "grid", fold_list = fold_list, k_tuning = 3L, seed = 123, ncores = 2 ) ## ------------------------------------------------ ## Method `MLNestedCV$execute` ## ------------------------------------------------ dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLNestedCV$new( learner = LearnerKnn$new(), strategy = "grid", fold_list = fold_list, k_tuning = 3L, seed = 123, ncores = 2 ) # learner args (not optimized) cv$learner_args <- list( l = 0, test = parse(text = "fold_test$x") ) # parameters for hyperparameter tuning cv$parameter_grid <- expand.grid( k = seq(4, 68, 8) ) cv$split_type <- "stratified" # performance parameters cv$predict_args <- list(type = "response") cv$performance_metric <- metric("bacc") # set data cv$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv$execute()
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLNestedCV$new( learner = LearnerKnn$new(), strategy = "grid", fold_list = fold_list, k_tuning = 3L, seed = 123, ncores = 2 ) # learner args (not optimized) cv$learner_args <- list( l = 0, test = parse(text = "fold_test$x") ) # parameters for hyperparameter tuning cv$parameter_grid <- expand.grid( k = seq(4, 16, 8) ) cv$split_type <- "stratified" # performance parameters cv$predict_args <- list(type = "response") cv$performance_metric <- metric("bacc") # set data cv$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv$execute() ## ------------------------------------------------ ## Method `MLNestedCV$new` ## ------------------------------------------------ dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLNestedCV$new( learner = LearnerKnn$new(), strategy = "grid", fold_list = fold_list, k_tuning = 3L, seed = 123, ncores = 2 ) ## ------------------------------------------------ ## Method `MLNestedCV$execute` ## ------------------------------------------------ dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) cv <- MLNestedCV$new( learner = LearnerKnn$new(), strategy = "grid", fold_list = fold_list, k_tuning = 3L, seed = 123, ncores = 2 ) # learner args (not optimized) cv$learner_args <- list( l = 0, test = parse(text = "fold_test$x") ) # parameters for hyperparameter tuning cv$parameter_grid <- expand.grid( k = seq(4, 68, 8) ) cv$split_type <- "stratified" # performance parameters cv$predict_args <- list(type = "response") cv$performance_metric <- metric("bacc") # set data cv$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv$execute()
The MLTuneParameters
class is used to construct a parameter tuner object
and to perform the tuning of a set of hyperparameters for a specified
machine learning algorithm using either a grid search or a Bayesian
optimization.
The hyperparameter tuning can be performed with a grid search or a Bayesian optimization. In both cases, each hyperparameter setting is evaluated in a k-fold cross-validation on the dataset specified.
mlexperiments::MLBase
-> mlexperiments::MLExperimentsBase
-> MLTuneParameters
parameter_bounds
A named list of tuples to define the parameter
bounds of the Bayesian hyperparameter optimization. For further details
please see the documentation of the ParBayesianOptimization
package.
parameter_grid
A matrix with named columns in which each column
represents a parameter that should be optimized and each row represents
a specific hyperparameter setting that should be tested throughout the
procedure. For strategy = "grid"
, each row of the parameter_grid
is
considered as a setting that is evaluated. For strategy = "bayesian"
,
the parameter_grid
is passed further on to the initGrid
argument of
the function ParBayesianOptimization::bayesOpt()
in order to
initialize the Bayesian process. The maximum rows considered for
initializing the Bayesian process can be specified with the R option
option("mlexperiments.bayesian.max_init")
, which is set to 50L
by
default.
optim_args
A named list of tuples to define the parameter
bounds of the Bayesian hyperparameter optimization. For further details
please see the documentation of the ParBayesianOptimization
package.
split_type
A character. The splitting strategy to construct the
k cross-validation folds. This parameter is passed further on to the
function splitTools::create_folds()
and defaults to "stratified"
.
split_vector
A vector If another criteria than the provided y
should be considered for generating the cross-validation folds, it can
be defined here. It is important, that a vector of the same length as
x
is provided here.
new()
Create a new MLTuneParameters
object.
MLTuneParameters$new( learner, seed, strategy = c("grid", "bayesian"), ncores = -1L )
learner
An initialized learner object that inherits from class
"MLLearnerBase"
.
seed
An integer. Needs to be set for reproducibility purposes.
strategy
A character. The strategy to optimize the hyperparameters
(either "grid"
or "bayesian"
).
ncores
An integer to specify the number of cores used for
parallelization (default: -1L
).
For strategy = "bayesian"
, the number of starting iterations can be
set using the R option "mlexperiments.bayesian.max_init"
, which
defaults to 50L
. This option reduces the provided initialization
grid to contain at most the specified number of rows. This
initialization grid is then further passed on to the initGrid
argument of ParBayesianOptimization::bayesOpt.
A new MLTuneParameters
R6 object.
MLTuneParameters$new( learner = LearnerKnn$new(), seed = 123, strategy = "grid", ncores = 2 )
execute()
Execute the hyperparameter tuning.
MLTuneParameters$execute(k)
k
An integer to define the number of cross-validation folds used to tune the hyperparameters.
All results of the hyperparameter tuning are saved in the field
$results
of the MLTuneParameters
class. After successful execution
of the parameter tuning, $results
contains a list with the items
A data.table with the summarized results (same as
the returned value of the execute
method).
The best setting (according to the learner's
parameter metric_optimization_higher_better
) identified during the
hyperparameter tuning.
The returned value of
ParBayesianOptimization::bayesOpt()
(only for strategy = "bayesian"
).
A data.table
with the results of the hyperparameter
optimization. The optimized metric, i.e. the cross-validated evaluation
metric is given in the column metric_optim_mean
. More results are
accessible from the field $results
of the MLTuneParameters
class.
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) tuner <- MLTuneParameters$new( learner = LearnerKnn$new(), seed = 123, strategy = "grid", ncores = 2 ) tuner$parameter_bounds <- list(k = c(2L, 80L)) tuner$parameter_grid <- expand.grid( k = seq(4, 68, 8), l = 0, test = parse(text = "fold_test$x") ) tuner$split_type <- "stratified" tuner$optim_args <- list( iters.n = 4, kappa = 3.5, acq = "ucb" ) # set data tuner$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) tuner$execute(k = 3)
clone()
The objects of this class are cloneable with this method.
MLTuneParameters$clone(deep = FALSE)
deep
Whether to make a deep clone.
ParBayesianOptimization::bayesOpt()
, splitTools::create_folds()
knn_tuner <- MLTuneParameters$new( learner = LearnerKnn$new(), seed = 123, strategy = "grid", ncores = 2 ) ## ------------------------------------------------ ## Method `MLTuneParameters$new` ## ------------------------------------------------ MLTuneParameters$new( learner = LearnerKnn$new(), seed = 123, strategy = "grid", ncores = 2 ) ## ------------------------------------------------ ## Method `MLTuneParameters$execute` ## ------------------------------------------------ dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) tuner <- MLTuneParameters$new( learner = LearnerKnn$new(), seed = 123, strategy = "grid", ncores = 2 ) tuner$parameter_bounds <- list(k = c(2L, 80L)) tuner$parameter_grid <- expand.grid( k = seq(4, 68, 8), l = 0, test = parse(text = "fold_test$x") ) tuner$split_type <- "stratified" tuner$optim_args <- list( iters.n = 4, kappa = 3.5, acq = "ucb" ) # set data tuner$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) tuner$execute(k = 3)
knn_tuner <- MLTuneParameters$new( learner = LearnerKnn$new(), seed = 123, strategy = "grid", ncores = 2 ) ## ------------------------------------------------ ## Method `MLTuneParameters$new` ## ------------------------------------------------ MLTuneParameters$new( learner = LearnerKnn$new(), seed = 123, strategy = "grid", ncores = 2 ) ## ------------------------------------------------ ## Method `MLTuneParameters$execute` ## ------------------------------------------------ dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) tuner <- MLTuneParameters$new( learner = LearnerKnn$new(), seed = 123, strategy = "grid", ncores = 2 ) tuner$parameter_bounds <- list(k = c(2L, 80L)) tuner$parameter_grid <- expand.grid( k = seq(4, 68, 8), l = 0, test = parse(text = "fold_test$x") ) tuner$split_type <- "stratified" tuner$optim_args <- list( iters.n = 4, kappa = 3.5, acq = "ucb" ) # set data tuner$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) tuner$execute(k = 3)
Calculate performance measures from the predictions results.
performance(object, prediction_results, y_ground_truth, type = NULL, ...)
performance(object, prediction_results, y_ground_truth, type = NULL, ...)
object |
An R6 object of class |
prediction_results |
An object of class |
y_ground_truth |
A vector with the ground truth of the predicted data. |
type |
A character to select a pre-defined set of metrics for "binary"
and "regression" tasks. If not specified (default: |
... |
A list. Further arguments required to compute the performance metrics. |
The performance metric has to be specified in the object
that is used to
carry out the experiment, i.e., MLCrossValidation or
MLNestedCV.
Please note that the option return_models = TRUE
must be set in the
experiment class in order to be able to compute the predictions, which are
required to conduct the calculation of the performance.
The function returns a data.table with the computed performance metric of each fold.
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) glm_optimization <- mlexperiments::MLCrossValidation$new( learner = LearnerGlm$new(), fold_list = fold_list, seed = 123 ) glm_optimization$learner_args <- list(family = binomial(link = "logit")) glm_optimization$predict_args <- list(type = "response") glm_optimization$performance_metric_args <- list(positive = "1") glm_optimization$performance_metric <- list( auc = metric("auc"), sensitivity = metric("sensitivity"), specificity = metric("specificity") ) glm_optimization$return_models <- TRUE # set data glm_optimization$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv_results <- glm_optimization$execute() # predictions preds <- mlexperiments::predictions( object = glm_optimization, newdata = data.matrix(dataset[, -7]), na.rm = FALSE, ncores = 2L, type = "response" ) # performance mlexperiments::performance( object = glm_optimization, prediction_results = preds, y_ground_truth = dataset[, 7], positive = "1" ) # performance - binary mlexperiments::performance( object = glm_optimization, prediction_results = preds, y_ground_truth = dataset[, 7], type = "binary", positive = "1" )
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) glm_optimization <- mlexperiments::MLCrossValidation$new( learner = LearnerGlm$new(), fold_list = fold_list, seed = 123 ) glm_optimization$learner_args <- list(family = binomial(link = "logit")) glm_optimization$predict_args <- list(type = "response") glm_optimization$performance_metric_args <- list(positive = "1") glm_optimization$performance_metric <- list( auc = metric("auc"), sensitivity = metric("sensitivity"), specificity = metric("specificity") ) glm_optimization$return_models <- TRUE # set data glm_optimization$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv_results <- glm_optimization$execute() # predictions preds <- mlexperiments::predictions( object = glm_optimization, newdata = data.matrix(dataset[, -7]), na.rm = FALSE, ncores = 2L, type = "response" ) # performance mlexperiments::performance( object = glm_optimization, prediction_results = preds, y_ground_truth = dataset[, 7], positive = "1" ) # performance - binary mlexperiments::performance( object = glm_optimization, prediction_results = preds, y_ground_truth = dataset[, 7], type = "binary", positive = "1" )
Apply an R6 object of class "MLCrossValidation"
to new data
to compute predictions.
predictions(object, newdata, na.rm = FALSE, ncores = -1L, ...)
predictions(object, newdata, na.rm = FALSE, ncores = -1L, ...)
object |
An R6 object of class |
newdata |
The new data for which predictions should be made using
the |
na.rm |
A logical. If missings should be removed before computing the
mean and standard deviation of the performance across different folds for
each observation in |
ncores |
An integer to specify the number of cores used for
parallelization (default: |
... |
A list. Further arguments required to compute the predictions. |
The function returns a data.table of class "mlexPredictions"
with
one row for each observation in newdata
and the columns containing
the predictions for each fold, along with the mean and standard deviation
across all folds.
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) glm_optimization <- mlexperiments::MLCrossValidation$new( learner = LearnerGlm$new(), fold_list = fold_list, seed = 123 ) glm_optimization$learner_args <- list(family = binomial(link = "logit")) glm_optimization$predict_args <- list(type = "response") glm_optimization$performance_metric_args <- list(positive = "1") glm_optimization$performance_metric <- metric("auc") glm_optimization$return_models <- TRUE # set data glm_optimization$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv_results <- glm_optimization$execute() # predictions preds <- mlexperiments::predictions( object = glm_optimization, newdata = data.matrix(dataset[, -7]), na.rm = FALSE, ncores = 2L, type = "response" ) head(preds)
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) glm_optimization <- mlexperiments::MLCrossValidation$new( learner = LearnerGlm$new(), fold_list = fold_list, seed = 123 ) glm_optimization$learner_args <- list(family = binomial(link = "logit")) glm_optimization$predict_args <- list(type = "response") glm_optimization$performance_metric_args <- list(positive = "1") glm_optimization$performance_metric <- metric("auc") glm_optimization$return_models <- TRUE # set data glm_optimization$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv_results <- glm_optimization$execute() # predictions preds <- mlexperiments::predictions( object = glm_optimization, newdata = data.matrix(dataset[, -7]), na.rm = FALSE, ncores = 2L, type = "response" ) head(preds)
Validate that the same folds were used in two or more independent experiments.
validate_fold_equality(experiments)
validate_fold_equality(experiments)
experiments |
A list of experiments. |
This function can be applied to all implemented experiments, i.e.,
MLTuneParameters, MLCrossValidation, and
MLNestedCV. However, it is required that the list
experiments
contains only experiments of the same class.
Writes messages to the console on the result of the comparison.
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) # GLM glm_optimization <- mlexperiments::MLCrossValidation$new( learner = LearnerGlm$new(), fold_list = fold_list, seed = 123 ) glm_optimization$learner_args <- list(family = binomial(link = "logit")) glm_optimization$predict_args <- list(type = "response") glm_optimization$performance_metric_args <- list(positive = "1") glm_optimization$performance_metric <- metric("auc") glm_optimization$return_models <- TRUE # set data glm_optimization$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) glm_cv_results <- glm_optimization$execute() # KNN knn_optimization <- mlexperiments::MLCrossValidation$new( learner = LearnerKnn$new(), fold_list = fold_list, seed = 123 ) knn_optimization$learner_args <- list( k = 3, l = 0, test = parse(text = "fold_test$x") ) knn_optimization$predict_args <- list(type = "prob") knn_optimization$performance_metric_args <- list(positive = "1") knn_optimization$performance_metric <- metric("auc") # set data knn_optimization$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv_results_knn <- knn_optimization$execute() # validate folds validate_fold_equality( list(glm_optimization, knn_optimization) )
dataset <- do.call( cbind, c(sapply(paste0("col", 1:6), function(x) { rnorm(n = 500) }, USE.NAMES = TRUE, simplify = FALSE ), list(target = sample(0:1, 500, TRUE)) )) fold_list <- splitTools::create_folds( y = dataset[, 7], k = 3, type = "stratified", seed = 123 ) # GLM glm_optimization <- mlexperiments::MLCrossValidation$new( learner = LearnerGlm$new(), fold_list = fold_list, seed = 123 ) glm_optimization$learner_args <- list(family = binomial(link = "logit")) glm_optimization$predict_args <- list(type = "response") glm_optimization$performance_metric_args <- list(positive = "1") glm_optimization$performance_metric <- metric("auc") glm_optimization$return_models <- TRUE # set data glm_optimization$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) glm_cv_results <- glm_optimization$execute() # KNN knn_optimization <- mlexperiments::MLCrossValidation$new( learner = LearnerKnn$new(), fold_list = fold_list, seed = 123 ) knn_optimization$learner_args <- list( k = 3, l = 0, test = parse(text = "fold_test$x") ) knn_optimization$predict_args <- list(type = "prob") knn_optimization$performance_metric_args <- list(positive = "1") knn_optimization$performance_metric <- metric("auc") # set data knn_optimization$set_data( x = data.matrix(dataset[, -7]), y = dataset[, 7] ) cv_results_knn <- knn_optimization$execute() # validate folds validate_fold_equality( list(glm_optimization, knn_optimization) )