Perform calibration sampling and use selected calibration set for model tuning

fit_rf(
  spec_chem,
  response,
  variable = NULL,
  evaluation_method = "test_set",
  validation = NULL,
  split_method = "ken_stone",
  ratio_val,
  ken_sto_pc = 2,
  pc = NULL,
  invert = TRUE,
  tuning_method = "resampling",
  resampling_seed = 123,
  cv = NULL,
  ntree_max = 500,
  print = TRUE,
  env = parent.frame()
)

Arguments

spec_chem

Tibble that contains spectra, metadata and chemical reference as list-columns. The tibble to be supplied to spec_chem can be generated by the join_chem_spc() function

response

Response variable as symbol or name (without quotes, no character string). The provided response symbol needs to be a column name in the spec_chem tibble.

variable

Depreciated and replaced by response

evaluation_method

Character string stating evaluation method. Either "test_set" (default) or "resampling". "test_set" will split the data into a calibration (training) and validation (test) set, and evaluate the final model by predicting on the validation set. If "resampling", the finally selected model will be evaluated based on the cross-validation hold-out predictions.

validation

Depreciated and replaced by evaluation_method. Default is TRUE.

split_method

Method how to to split the data into a independent test set. Default is "ken_sto", which will select samples for calibration based on Kennard-Stone sampling algorithm of preprocessed spectra. The proportion of validation to the total number of samples can be specified in the argument ratio_val. split_method = "random" will create a single random split.

ratio_val

Ratio of validation (test) samples to total number of samples (calibration (training) and validation (test)).

ken_sto_pc

Number of component used for calculating mahalanobsis distance on PCA scores for computing Kennard-Stone algorithm. Default is ken_sto_pc = 2, which will use the first two PCA components.

pc

Depreciated; renamed argument is ken_sto_pc.

invert

Logical

tuning_method

Character specifying tuning method. Tuning method affects how caret selects a final tuning value set from a list of candidate values. Possible values are "resampling", which will use a specified resampling method such as repeated k-fold cross-validation (see argument resampling_method) and the generated performance profile based on the hold-out predictions to decide on the final tuning values that lead to optimal model performance. The value "none" will force caret to compute a final model for a predefined canditate PLS tuning parameter number of PLS components. In this case, the value supplied by ncomp_fixed` is used to set model complexity at a fixed number of components.

resampling_seed

Random seed (integer) that will be used for generating resampling indices, which will be supplied to caret::trainControl. This makes sure that modeling results are constant when re-fitting. Default is resampling_seed = 123.

cv

Depreciated. Use resampling_method instead.

ntree_max

Maximum random forest trees by caret::train. Caret will aggregate a performance profile using resampling for an integer sequence from 1 to ntree_max trees.

print

Logical expression whether model evaluation graphs shall be printed

env

Environment where function is evaluated. Default is parent.frame.