R/pls-modeling.R
fit_rf.Rd
Perform calibration sampling and use selected calibration set for model tuning
fit_rf(
spec_chem,
response,
variable = NULL,
evaluation_method = "test_set",
validation = NULL,
split_method = "ken_stone",
ratio_val,
ken_sto_pc = 2,
pc = NULL,
invert = TRUE,
tuning_method = "resampling",
resampling_seed = 123,
cv = NULL,
ntree_max = 500,
print = TRUE,
env = parent.frame()
)
Tibble that contains spectra, metadata and chemical
reference as list-columns. The tibble to be supplied to spec_chem
can
be generated by the join_chem_spc() function
Response variable as symbol or name
(without quotes, no character string). The provided response symbol needs to be
a column name in the spec_chem
tibble.
Depreciated and replaced by response
Character string stating evaluation method.
Either "test_set"
(default) or "resampling"
. "test_set"
will split the data into a calibration (training) and validation (test) set,
and evaluate the final model by predicting on the validation set.
If "resampling"
, the finally selected model will be evaluated based
on the cross-validation hold-out predictions.
Depreciated and replaced by evaluation_method
.
Default is TRUE
.
Method how to to split the data into a independent test
set. Default is "ken_sto"
, which will select samples for calibration
based on Kennard-Stone sampling algorithm of preprocessed spectra. The
proportion of validation to the total number of samples can be specified
in the argument ratio_val
.
split_method = "random"
will create a single random split.
Ratio of validation (test) samples to total number of samples (calibration (training) and validation (test)).
Number of component used
for calculating mahalanobsis distance on PCA scores for computing
Kennard-Stone algorithm.
Default is ken_sto_pc = 2
, which will use the first two PCA
components.
Depreciated; renamed argument is ken_sto_pc
.
Logical
Character specifying tuning method. Tuning method
affects how caret selects a final tuning value set from a list of candidate
values. Possible values are "resampling"
, which will use a
specified resampling method such as repeated k-fold cross-validation (see
argument resampling_method
) and the generated performance profile
based on the hold-out predictions to decide on the final tuning values
that lead to optimal model performance. The value "none"
will force
caret to compute a final model for a predefined canditate PLS tuning
parameter number of PLS components. In this case, the value
supplied by ncomp_fixed
` is used to set model complexity at
a fixed number of components.
Random seed (integer) that will be used for generating
resampling indices, which will be supplied to caret::trainControl
.
This makes sure that modeling results are constant when re-fitting.
Default is resampling_seed = 123
.
Depreciated. Use resampling_method
instead.
Maximum random forest trees
by caret::train. Caret will aggregate a performance profile using resampling
for an integer sequence from 1 to ntree_max
trees.
Logical expression whether model evaluation graphs shall be printed
Environment where function is evaluated. Default is
parent.frame
.