R/gather-spc.R
gather_spc.Rd
Gather spectra, corresponding x-axis values, and device and
measurement metadata from a nested list into a spectra tibble, so that one
row represents one spectral measurement. Spectra, x-axis values and metadata
are mapped from the individual list elements (named after file name including
the extension) and transformed into (list-)columns of a spectra tibble,
which is an extended data frame. For each measurement, spectral data and
metadata are combined into one row of the tidy data frame. In addition, the ID
columns unique_id
, file_id
, and sample_id
are extracted from
"metadata"
(data frame) list entries and returned as identifier columns of
the spectra tibble. List-columns facilitate keeping related data together in
a rectangular data structure. They can be manipulated easily during
subsequent transformations, for example using the standardized functions of
the simplerspec data processing pipeline.
gather_spc(data, spc_types = "spc")
Recursive list named with filename (file_id
) at first level
entries, where each element containing a sample measurement has nested
metadata ("metadata"
), spectra types (see spc_types
), corresponding
x-axis values (see section "Details on spectra data checks and matching").
The data
list is a structural convention to organize spectra and their
metadata. It follows for example the list structure returned from the Bruker
OPUS binary reader simplerspec::read_opus_univ()
.
Character vector with the spectra types to be extracted
from data
list and gathered into list-columns. The spectra type names need
to exactly follow the naming conventions, and the element names and contents
need to be present at the second list hierarchy of data
. These values are
allowed:
"spc"
(default): final raw spectra after atmospheric compensation, if
performed (named AB
in Bruker OPUS software; results from referencing
sample to reference single channel reflectance and transforming to
absorbance).
"spc_nocomp"
: raw spectra without atmospheric correction
"sc_sm"
: Single channel reflectance spectra of the samples
"sc_rf"
: Single channel reflectance spectra of the reference (background
spectra)
"ig_sm"
: Interferograms of the sample spectra (currently only spectra
without x-axis list-columns are matched and returned)
"ig_rf"
: Interferograms of the reference spectra (currently only spectra
without x-axis list-columns are matched and returned)
Spectra tibble (spc_tbl
with classes "tbl_df"
, "tbl"
, and
"data.frame"
) with the following (list-)columns:
"unique_id"
: Character vector with unique measurement identifier, likely
a string with file names in combination with date and time (extracted from
each "metadata"
data frame column).
"file_id"
: Character vector with file name including the extension
(extracted from each "metadata"
data frame column).
"sample_id"
: Character vector with sample identifier. For Bruker OPUS
binary files, this corresponds to the file name without the file extension
in integer increments of sample replicate measurements.
One or multiple of "spc"
, "spc_nocomp"
, "sc_sm"
, or "sc_rf"
:
List(s) of data.table's containing spectra type(s).
One or multiple of "wavenumbers"
, "wavelengths"
, "x_values"
,
"wavenumbers_sc_sm"
, "wavelengths_sc_sm"
, "x_values_sc_sm"
,
"wavenumbers_sc_rf"
, "wavelengths_sc_rf"
, or "x_values_sc_rf"
:
List(s) of numeric vectors with matched x-axis values (see "Details on
spectra data checks and matching" below).
gather_spc()
checks whether these conditions are met for each measurement
in the list data
:
Make sure that the first level data
elements are named (assumed to be
the file name the data originate from), and remove missing measurements with
an informative message.
Remove any duplicated file names and raise a message if there are name duplicates at first level.
Check whether spc_types
inputs are supported (see argument spc_types
)
and present at the second level of the data
list. If not, remove
all data elements for incomplete spectral measurements.
Match spectra types and possible corresponding x-axis types from
a lookup list. For each selected spectrum type (left), at least one of
the element names of the x-axis type (right) needs to be present for each
measurement in the list data
:
"spc"
: "wavenumbers"
, "wavelengths"
, or "x_values"
"spc_nocomp"
: "wavenumbers"
, "wavelengths"
, or "x_values"
"sc_sm"
: "wavenumbers_sc_sm"
, "wavelengths_sc_sm"
, or
"x_values_sc_sm"
"sc_rf"
: "wavenumbers_sc_rf"
, "wavelengths_sc_rf"
, or
"x_values_sc_rf"
Check if "metadata"
elements are present and remove data elements for
measurements with missing or incorrectly named metadata elements
(message).