Calibration sampling, model tuning, and PLS regression

Perform calibration sampling and use selected calibration set for model tuning

fit_pls(
  spec_chem,
  response,
  variable = NULL,
  center = TRUE,
  scale = TRUE,
  evaluation_method = "test_set",
  validation = TRUE,
  split_method = "ken_stone",
  ratio_val = 1/3,
  ken_sto_pc = 2,
  pc,
  invert = TRUE,
  tuning_method = "resampling",
  resampling_method = "kfold_cv",
  cv = NULL,
  resampling_seed = 123,
  pls_ncomp_max = 20,
  ncomp_fixed = 5,
  print = TRUE,
  env = parent.frame()
)

pls_ken_stone(
  spec_chem,
  response,
  variable = NULL,
  center = TRUE,
  scale = TRUE,
  evaluation_method = "test_set",
  validation = TRUE,
  split_method = "ken_stone",
  ratio_val = 1/3,
  ken_sto_pc = 2,
  pc,
  invert = TRUE,
  tuning_method = "resampling",
  resampling_method = "kfold_cv",
  cv = NULL,
  resampling_seed = 123,
  pls_ncomp_max = 20,
  ncomp_fixed = 5,
  print = TRUE,
  env = parent.frame()
)

Arguments

spec_chem: Tibble that contains spectra, metadata and chemical reference as list-columns. The tibble to be supplied to spec_chem can be generated by the join_chem_spc() function
response: Response variable as symbol or name (without quotes, no character string). The provided response symbol needs to be a column name in the spec_chem tibble.
variable: Depreciated and replaced by response
center: Logical whether to perform mean centering of each spectrum column (e.g. wavenumber or wavelength) after common spectrum preprocessing. Default is center = TRUE
scale: Logical whether to perform standard deviation scaling of each spectrum column (e.g. wavenumber or wavelength) after common spectrum preprocessing. Default is scale = TRUE
evaluation_method: Character string stating evaluation method. Either "test_set" (default) or "resampling". "test_set" will split the data into a calibration (training) and validation (test) set, and evaluate the final model by predicting on the validation set. If "resampling", the finally selected model will be evaluated based on the cross-validation hold-out predictions.
validation: Depreciated and replaced by evaluation_method. Default is TRUE.
split_method: Method how to to split the data into a independent test set. Default is "ken_sto", which will select samples for calibration based on Kennard-Stone sampling algorithm of preprocessed spectra. The proportion of validation to the total number of samples can be specified in the argument ratio_val. split_method = "random" will create a single random split.
ratio_val: Ratio of validation (test) samples to total number of samples (calibration (training) and validation (test)).
ken_sto_pc: Number of component used for calculating mahalanobsis distance on PCA scores for computing Kennard-Stone algorithm. Default is ken_sto_pc = 2, which will use the first two PCA components.
pc: Depreciated; renamed argument is ken_sto_pc.
invert: Logical
tuning_method: Character specifying tuning method. Tuning method affects how caret selects a final tuning value set from a list of candidate values. Possible values are "resampling", which will use a specified resampling method such as repeated k-fold cross-validation (see argument resampling_method) and the generated performance profile based on the hold-out predictions to decide on the final tuning values that lead to optimal model performance. The value "none" will force caret to compute a final model for a predefined canditate PLS tuning parameter number of PLS components. In this case, the value supplied by ncomp_fixed` is used to set model complexity at a fixed number of components.
resampling_method: Character specifying resampling method. Currently, "kfold_cv" (default, performs 10-fold cross-validation), "rep_kfold_cv" (performs 5-times repeated 10-fold cross-validation), "loocv" (performs leave-one-out cross-validation), and "none" (if resampling_method = "none") are supported.
cv: Depreciated. Use resampling_method instead.
resampling_seed: Random seed (integer) that will be used for generating resampling indices, which will be supplied to caret::trainControl. This makes sure that modeling results are constant when re-fitting. Default is resampling_seed = 123.
pls_ncomp_max: Maximum number of PLS components that are evaluated by caret::train. Caret will aggregate a performance profile using resampling for an integer sequence from 1 to pls_ncomp_max
ncomp_fixed: Integer of fixed number of PLS components. Will only be used when tuning_method = "none" and resampling_method = "none" are used.
print: Logical expression whether model evaluation graphs shall be printed
env: Environment where function is evaluated. Default is parent.frame.