Perform calibration sampling and use selected calibration set for model tuning

fit_pls(
  spec_chem,
  response,
  variable = NULL,
  center = TRUE,
  scale = TRUE,
  evaluation_method = "test_set",
  validation = TRUE,
  split_method = "ken_stone",
  ratio_val = 1/3,
  ken_sto_pc = 2,
  pc,
  invert = TRUE,
  tuning_method = "resampling",
  resampling_method = "kfold_cv",
  cv = NULL,
  resampling_seed = 123,
  pls_ncomp_max = 20,
  ncomp_fixed = 5,
  print = TRUE,
  env = parent.frame()
)

pls_ken_stone(
  spec_chem,
  response,
  variable = NULL,
  center = TRUE,
  scale = TRUE,
  evaluation_method = "test_set",
  validation = TRUE,
  split_method = "ken_stone",
  ratio_val = 1/3,
  ken_sto_pc = 2,
  pc,
  invert = TRUE,
  tuning_method = "resampling",
  resampling_method = "kfold_cv",
  cv = NULL,
  resampling_seed = 123,
  pls_ncomp_max = 20,
  ncomp_fixed = 5,
  print = TRUE,
  env = parent.frame()
)

Arguments

spec_chem

Tibble that contains spectra, metadata and chemical reference as list-columns. The tibble to be supplied to spec_chem can be generated by the join_chem_spc() function

response

Response variable as symbol or name (without quotes, no character string). The provided response symbol needs to be a column name in the spec_chem tibble.

variable

Depreciated and replaced by response

center

Logical whether to perform mean centering of each spectrum column (e.g. wavenumber or wavelength) after common spectrum preprocessing. Default is center = TRUE

scale

Logical whether to perform standard deviation scaling of each spectrum column (e.g. wavenumber or wavelength) after common spectrum preprocessing. Default is scale = TRUE

evaluation_method

Character string stating evaluation method. Either "test_set" (default) or "resampling". "test_set" will split the data into a calibration (training) and validation (test) set, and evaluate the final model by predicting on the validation set. If "resampling", the finally selected model will be evaluated based on the cross-validation hold-out predictions.

validation

Depreciated and replaced by evaluation_method. Default is TRUE.

split_method

Method how to to split the data into a independent test set. Default is "ken_sto", which will select samples for calibration based on Kennard-Stone sampling algorithm of preprocessed spectra. The proportion of validation to the total number of samples can be specified in the argument ratio_val. split_method = "random" will create a single random split.

ratio_val

Ratio of validation (test) samples to total number of samples (calibration (training) and validation (test)).

ken_sto_pc

Number of component used for calculating mahalanobsis distance on PCA scores for computing Kennard-Stone algorithm. Default is ken_sto_pc = 2, which will use the first two PCA components.

pc

Depreciated; renamed argument is ken_sto_pc.

invert

Logical

tuning_method

Character specifying tuning method. Tuning method affects how caret selects a final tuning value set from a list of candidate values. Possible values are "resampling", which will use a specified resampling method such as repeated k-fold cross-validation (see argument resampling_method) and the generated performance profile based on the hold-out predictions to decide on the final tuning values that lead to optimal model performance. The value "none" will force caret to compute a final model for a predefined canditate PLS tuning parameter number of PLS components. In this case, the value supplied by ncomp_fixed` is used to set model complexity at a fixed number of components.

resampling_method

Character specifying resampling method. Currently, "kfold_cv" (default, performs 10-fold cross-validation), "rep_kfold_cv" (performs 5-times repeated 10-fold cross-validation), "loocv" (performs leave-one-out cross-validation), and "none" (if resampling_method = "none") are supported.

cv

Depreciated. Use resampling_method instead.

resampling_seed

Random seed (integer) that will be used for generating resampling indices, which will be supplied to caret::trainControl. This makes sure that modeling results are constant when re-fitting. Default is resampling_seed = 123.

pls_ncomp_max

Maximum number of PLS components that are evaluated by caret::train. Caret will aggregate a performance profile using resampling for an integer sequence from 1 to pls_ncomp_max

ncomp_fixed

Integer of fixed number of PLS components. Will only be used when tuning_method = "none" and resampling_method = "none" are used.

print

Logical expression whether model evaluation graphs shall be printed

env

Environment where function is evaluated. Default is parent.frame.