The gapctd package processes CTD data collected during Alaska Fisheries Science Center bottom trawl surveys. The entire data processing workflow is run in R aside from decoding of raw data files in SBE Data Processing Software. The gapctd package uses the oce package (Kelley et al., 2022) for handling CTD data. Data are processed using R versions of modules from SBE Data Processing software and methods for optimizing module parameters that have been applied to underway CTD and glider data (Garau et al., 2011; Ullman and Hebert, 2014).
The data processing workflow described below was developed for processing data from Sea-Bird SBE19plus V2 CTDs with induction pumps that are deployed on bottom-trawl survey gear in the eastern Bering Sea, Gulf of Alaska, and Aleutian Islands.
Install SBE Data Processing software (available from the manufacturer) and the gapctd package.
devtools::install_github("afsc-gap-products/gapctd")
Load the gapctd
package. Assign vessel, cruise, and region (BS, GOA,
or AI) variables and set the source directory where CTD hex files and
.xmlcon file are stored.
library(gapctd)
# Select vessel, cruise, and region
vessel <- 94
cruise <- c(202101, 202202)
region <- "BS"
ctd_dir <- "G:/RACE_CTD/data/2021/ebs/v94_ctd1" # Directory w/ CTD data (.hex) and config (.xmlcon)
processing_method <- "gapctd"
The setup_gapctd_directory
function sets up the working directory for
processing data from a single CTD, vessel, and cruise. Raw CTD data
files (.hex) and configuration files (.xmlcon) are copied from the
source directory, ctd_dir, to the /data/ and /psa_xmlcon/ directories.
If use_sbedp_to_convert = TRUE
, system commands are executed to run
SBE Data Processing convert .hex files to human-readable decoded CTD
data files (.cnv) that get saved in the /cnv/ directory. If
use_sbedp_to_convert = FALSE
, files are decoded using
gapctd::hex_to_cnv()
. SBEDP is slightly faster than the gapctd method
but the gapctd method avoids the need to install SBEDP software.
Note: A new directory/project will need to setup for each cruise/vessel/CTD combination.
gapctd:::setup_gapctd_directory(processing_method = processing_method,
ctd_dir = ctd_dir,
use_sbedp_to_convert = FALSE)
Left: SBE Data Processing converting raw data files after calling the setup_gapctd_directory(). Right: Contents of the working directory after successfully running setup_gapctd_directory().
The wrapper_run_gapctd
function runs the run_gapctd
function on each
CTD file in the working directory. The run_gapctd
function processes
data from a single CTD data file (.cnv) using the processing
steps.
Note: wrapper_run_gapctd takes 8+ hours to run for a full vessel/cruise on a typical NOAA laptop.
# Run data processing algorithm on files. Write .rds
gapctd:::wrapper_run_gapctd(cnv_dir_path = here::here("cnv"), # Path to decoded CTD data (.cnv) files
processing_method = processing_method, # Processing method
vessel = vessel,
cruise = cruise,
channel = NULL)
In the code above, user-specified conductivity cell thermal mass
correction parameters are provided to the function as:
ctm_pars = list(alpha_C = 0.04, beta_C = 1/8)
. If profile data from
the deployment do not pass QA/QC checks, these parameters are optimized
in subsequent steps.
Outputs from wrapper_run_gapctd
are stored in oce objects that are
saved in R data (.rds) files in /output/gapctd/
(example). The files include three segments
for each deployment (downcast, bottom, upcast). Haul metadata are
included with each of the segments. If any segment is missing, it will
not be included in the file (i.e., if the CTD shut-off during the
deployment and there is no upcast data, there will not be an upcast file
in the object). The end of the filename denotes which of the four
processing methods was used for the data.
Top: Console messages while running wrapper_run_gapctd(). Bottom: Contents of the /output/gapctd/ directory after successfully running wrapper_run_gapctd().
Different data processing methods are used for different casts due to variation in operating conditions among profiles. Visually review results of casts that were processed using each of the four methods and select the best method by typing the corresponding number into the console. Plots are displayed using interactive plotly graphs that allow zooming in/out, hiding layers, etc.
gapctd::select_best_method(
rds_dir_path = here::here("output", processing_method))
Plotly interface for select_best_method() showing data processed using four methods (Typical, Typical CTM, Temperature-Salinity Area, Minimal Salinity Gradient).
Console prompt for select_best_method() input, where inputs correspond with method numbers identified on the plot.
IMPORTANT: For this step, your display/GUI must be set to Actual Size in R Studio! Use the drop down menu (View > Actual Size) to set your R Studio display to Actual Size
There are often dynamic errors in profiles after the automated
processing steps that typically appear as large salinity spikes. The
qc_flag_interpolate
function provides a graphical user interface that
allows users to inspect plots of profile data for selected data and
select erroneous data that should be removed and interpolated. The
wrapper_flag_interpolate
function is a wrapper for
qc_flag_interpolate
.
# Visually inspect, flag, and interpolate
gapctd:::wrapper_flag_interpolate(rds_dir_path = here::here("output", processing_method),
review = c("density", "salinity"))
There are two sets of plots to review for every cast. Set #1 shows temperature, salinity, and density profiles. Set #2 shows the rate of change in salinity and salinity.
The goal of this step is to flag large transient errors. Pay careful attention to the range on the x-axis when assessing the magnitude of spikes in the data. Flags should not be applied to small errors and/or errors that persist for multiple consecutive depth bins. Correcting small and persistent errors is unnecessary because the other cast from a deployment is often suitable or subsequent corrections may resolve the errors.
- Review the right panel for density errors.
- Left-click on any points in the right panel (density) that should be removed and interpolated. Do not select the points if there are errors in salinity that do not produce large errors in density.
- Press Esc. If any points were selected, conductivity and temperature for the selected pressure bin will be removed and salinity and pressure will be recalculated.
- Repeat 1-3 until there are no more errors to remove.
Set 1 plots: Pressure versus temperature (left), salinity (center), and density (right).
- Review the panels for salinity errors.
- Left-click on any points in the left panel (salinity) that should be removed and interpolated.
- Press Esc. If any points were selected, conductivity and temperature for the selected pressure bin will be removed and salinity and pressure will be recalculated.
- Repeat 1-3 until there are no more errors to remove.
Set 2 plots: Rate of change in salinity (left) and salinity (right).
Example of selecting and interpolating showing set 1 profiles before (top) and after (bottom) removing data from the shallowest depth bin.
The review_profiles
function provides an interface for visually
inspecting profiles for each deployment and selecting the profiles to
include in the final data product. The goal of this step is to select
the profile(s) without dynamic errors (e.g., unreasonable salinity
spikes or density inversions).
# Review profiles
gapctd:::review_profiles(rds_dir_path = here::here("output", processing_method),
threshold = -1e-5,
in_pattern = "_qc.rds")
Data from each deployment are displayed one at a time. If a downcast and upcast are both available, both will be shown simultaneously. If only one cast is available, only the downcast or upcast will be shown.
Follow instructions in the console to select profiles to use in the final data product: both casts (1), downcast (d), upcast (u), or none (0). If both casts are available from a deployment but none are selected, data from the cast will be reprocessed using a different approach to conductivity cell thermal mass correction and profiles will be reviewed again after processing.
Plots: Downcast (top row of profile plots) and upcast (bottom row of profile plots). Left-side panels show temperature (red) and salinity (green) versus depth. Right-side panel show density anomaly (blue) and buoyancy frequency (brown) versus depth. A vertical line on the right panels shows the buoyancy frequency threshold below which the density structure of the water column could be considered unstable. R Console output: User interface for selecting casts.
Move all of the accepted profiles from /output/gapctd/ to the /final_cnv/ directory.
# Finalize
finalize_data(rds_dir_path = here::here("output", processing_method))
Final profile data in the /final_cnv/ directory.
Prepare the data product if all data from the survey have been processed and are in different directories.
Garau, B., Ruiz, S., Zhang, W. G., Pascual, A., Heslop, E., Kerfoot, J., & Tintoré, J. (2011). Thermal lag correction on slocum CTD glider data. Journal of Atmospheric and Oceanic Technology, 28(9), 1065–1071. https://doi.org/10.1175/JTECH-D-10-05030.1
Kelley, D. E., Richards, C., & Layton, C. (2022). oce: an R package for Oceanographic Analysis. Journal of Open Source Software, 7(71), 3594. https://doi.org/10.21105/joss.03594
Ullman, D. S., & Hebert, D. (2014). Processing of underway CTD data. Journal of Atmospheric and Oceanic Technology, 31(4), 984–998. https://doi.org/10.1175/JTECH-D-13-00200.1