Gather spectra, corresponding x-axis values, and device and measurement metadata from a nested list into a spectra tibble, so that one row represents one spectral measurement. Spectra, x-axis values and metadata are mapped from the individual list elements (named after file name including the extension) and transformed into (list-)columns of a spectra tibble, which is an extended data frame. For each measurement, spectral data and metadata are combined into one row of the tidy data frame. In addition, the ID columns unique_id, file_id, and sample_id are extracted from "metadata" (data frame) list entries and returned as identifier columns of the spectra tibble. List-columns facilitate keeping related data together in a rectangular data structure. They can be manipulated easily during subsequent transformations, for example using the standardized functions of the simplerspec data processing pipeline.

gather_spc(data, spc_types = "spc")

Arguments

data

Recursive list named with filename (file_id) at first level entries, where each element containing a sample measurement has nested metadata ("metadata"), spectra types (see spc_types), corresponding x-axis values (see section "Details on spectra data checks and matching"). The data list is a structural convention to organize spectra and their metadata. It follows for example the list structure returned from the Bruker OPUS binary reader simplerspec::read_opus_univ().

spc_types

Character vector with the spectra types to be extracted from data list and gathered into list-columns. The spectra type names need to exactly follow the naming conventions, and the element names and contents need to be present at the second list hierarchy of data. These values are allowed:

  • "spc" (default): final raw spectra after atmospheric compensation, if performed (named AB in Bruker OPUS software; results from referencing sample to reference single channel reflectance and transforming to absorbance).

  • "spc_nocomp": raw spectra without atmospheric correction

  • "sc_sm": Single channel reflectance spectra of the samples

  • "sc_rf": Single channel reflectance spectra of the reference (background spectra)

  • "ig_sm": Interferograms of the sample spectra (currently only spectra without x-axis list-columns are matched and returned)

  • "ig_rf": Interferograms of the reference spectra (currently only spectra without x-axis list-columns are matched and returned)

Value

Spectra tibble (spc_tbl with classes "tbl_df", "tbl", and "data.frame") with the following (list-)columns:

  • "unique_id": Character vector with unique measurement identifier, likely a string with file names in combination with date and time (extracted from each "metadata" data frame column).

  • "file_id" : Character vector with file name including the extension (extracted from each "metadata" data frame column).

  • "sample_id": Character vector with sample identifier. For Bruker OPUS binary files, this corresponds to the file name without the file extension in integer increments of sample replicate measurements.

  • One or multiple of "spc", "spc_nocomp", "sc_sm", or "sc_rf": List(s) of data.table's containing spectra type(s).

  • One or multiple of "wavenumbers", "wavelengths", "x_values", "wavenumbers_sc_sm", "wavelengths_sc_sm", "x_values_sc_sm", "wavenumbers_sc_rf", "wavelengths_sc_rf", or "x_values_sc_rf": List(s) of numeric vectors with matched x-axis values (see "Details on spectra data checks and matching" below).

Details on spectra data checks and matching

gather_spc() checks whether these conditions are met for each measurement in the list data:

  1. Make sure that the first level data elements are named (assumed to be the file name the data originate from), and remove missing measurements with an informative message.

  2. Remove any duplicated file names and raise a message if there are name duplicates at first level.

  3. Check whether spc_types inputs are supported (see argument spc_types) and present at the second level of the data list. If not, remove all data elements for incomplete spectral measurements.

  4. Match spectra types and possible corresponding x-axis types from a lookup list. For each selected spectrum type (left), at least one of the element names of the x-axis type (right) needs to be present for each measurement in the list data:

    • "spc" : "wavenumbers", "wavelengths", or "x_values"

    • "spc_nocomp" : "wavenumbers", "wavelengths", or "x_values"

    • "sc_sm" : "wavenumbers_sc_sm", "wavelengths_sc_sm", or "x_values_sc_sm"

    • "sc_rf" : "wavenumbers_sc_rf", "wavelengths_sc_rf", or "x_values_sc_rf"

  5. Check if "metadata" elements are present and remove data elements for measurements with missing or incorrectly named metadata elements (message).