Batch process CTD data in R

The gapctd package processes CTD data collected during Alaska Fisheries Science Center bottom trawl surveys. The entire data processing workflow is run in R aside from decoding of raw data files in SBE Data Processing Software. The gapctd package uses the oce package (Kelley et al., 2022) for handling CTD data. Data are processed using R versions of modules from SBE Data Processing software and methods for optimizing module parameters that have been applied to underway CTD and glider data (Garau et al., 2011; Ullman and Hebert, 2014).

The data processing workflow described below was developed for processing data from Sea-Bird SBE19plus V2 CTDs with induction pumps that are deployed on bottom-trawl survey gear in the eastern Bering Sea, Gulf of Alaska, and Aleutian Islands.

1. Installation

Install SBE Data Processing software (available from the manufacturer) and the gapctd package.

devtools::install_github("afsc-gap-products/gapctd")

2. Load package, define global variables, and connect to Oracle

Load the gapctd package. Assign vessel, cruise, and region (BS, GOA, or AI) variables and set the source directory where CTD hex files and .xmlcon file are stored.

library(gapctd)

# Select vessel, cruise, and region
vessel <- 94
cruise <- c(202101, 202202)
region <- "BS"
ctd_dir <- "G:/RACE_CTD/data/2021/ebs/v94_ctd1" # Directory w/ CTD data (.hex) and config (.xmlcon)
processing_method <- "gapctd"

3. Setup directory for processing

The setup_gapctd_directory function sets up the working directory for processing data from a single CTD, vessel, and cruise. Raw CTD data files (.hex) and configuration files (.xmlcon) are copied from the source directory, ctd_dir, to the /data/ and /psa_xmlcon/ directories. If use_sbedp_to_convert = TRUE, system commands are executed to run SBE Data Processing convert .hex files to human-readable decoded CTD data files (.cnv) that get saved in the /cnv/ directory. If use_sbedp_to_convert = FALSE, files are decoded using gapctd::hex_to_cnv(). SBEDP is slightly faster than the gapctd method but the gapctd method avoids the need to install SBEDP software.

Note: A new directory/project will need to setup for each cruise/vessel/CTD combination.

gapctd:::setup_gapctd_directory(processing_method = processing_method, 
                                ctd_dir = ctd_dir,
                                use_sbedp_to_convert = FALSE)

Left: SBE Data Processing converting raw data files after calling the setup_gapctd_directory(). Right: Contents of the working directory after successfully running setup_gapctd_directory().

4. Run gapctd processing methods on CTD files

The wrapper_run_gapctd function runs the run_gapctd function on each CTD file in the working directory. The run_gapctd function processes data from a single CTD data file (.cnv) using the processing steps.

Note: wrapper_run_gapctd takes 8+ hours to run for a full vessel/cruise on a typical NOAA laptop.

# Run data processing algorithm on files. Write .rds
gapctd:::wrapper_run_gapctd(cnv_dir_path = here::here("cnv"), # Path to decoded CTD data (.cnv) files
                            processing_method = processing_method, # Processing method
                            vessel = vessel,
                            cruise = cruise,
                            channel = NULL)

In the code above, user-specified conductivity cell thermal mass correction parameters are provided to the function as: ctm_pars = list(alpha_C = 0.04, beta_C = 1/8). If profile data from the deployment do not pass QA/QC checks, these parameters are optimized in subsequent steps.

Outputs from wrapper_run_gapctd are stored in oce objects that are saved in R data (.rds) files in /output/gapctd/ (example). The files include three segments for each deployment (downcast, bottom, upcast). Haul metadata are included with each of the segments. If any segment is missing, it will not be included in the file (i.e., if the CTD shut-off during the deployment and there is no upcast data, there will not be an upcast file in the object). The end of the filename denotes which of the four processing methods was used for the data.

Top: Console messages while running wrapper_run_gapctd(). Bottom: Contents of the /output/gapctd/ directory after successfully running wrapper_run_gapctd().

5. Select best method

Different data processing methods are used for different casts due to variation in operating conditions among profiles. Visually review results of casts that were processed using each of the four methods and select the best method by typing the corresponding number into the console. Plots are displayed using interactive plotly graphs that allow zooming in/out, hiding layers, etc.

gapctd::select_best_method(
  rds_dir_path = here::here("output", processing_method))

Plotly interface for select_best_method() showing data processed using four methods (Typical, Typical CTM, Temperature-Salinity Area, Minimal Salinity Gradient).

Console prompt for select_best_method() input, where inputs correspond with method numbers identified on the plot.

6. Visually inspect, flag, and interpolate bad data

IMPORTANT: For this step, your display/GUI must be set to Actual Size in R Studio! Use the drop down menu (View > Actual Size) to set your R Studio display to Actual Size

There are often dynamic errors in profiles after the automated processing steps that typically appear as large salinity spikes. The qc_flag_interpolate function provides a graphical user interface that allows users to inspect plots of profile data for selected data and select erroneous data that should be removed and interpolated. The wrapper_flag_interpolate function is a wrapper for qc_flag_interpolate.

# Visually inspect, flag, and interpolate
gapctd:::wrapper_flag_interpolate(rds_dir_path = here::here("output", processing_method),
                                  review = c("density", "salinity"))

There are two sets of plots to review for every cast. Set #1 shows temperature, salinity, and density profiles. Set #2 shows the rate of change in salinity and salinity.

The goal of this step is to flag large transient errors. Pay careful attention to the range on the x-axis when assessing the magnitude of spikes in the data. Flags should not be applied to small errors and/or errors that persist for multiple consecutive depth bins. Correcting small and persistent errors is unnecessary because the other cast from a deployment is often suitable or subsequent corrections may resolve the errors.

Set 1

Review the right panel for density errors.
Left-click on any points in the right panel (density) that should be removed and interpolated. Do not select the points if there are errors in salinity that do not produce large errors in density.
Press Esc. If any points were selected, conductivity and temperature for the selected pressure bin will be removed and salinity and pressure will be recalculated.
Repeat 1-3 until there are no more errors to remove.

Set 1 plots: Pressure versus temperature (left), salinity (center), and density (right).

Set 2

Review the panels for salinity errors.
Left-click on any points in the left panel (salinity) that should be removed and interpolated.
Press Esc. If any points were selected, conductivity and temperature for the selected pressure bin will be removed and salinity and pressure will be recalculated.
Repeat 1-3 until there are no more errors to remove.

Set 2 plots: Rate of change in salinity (left) and salinity (right).

Example of selecting and interpolating showing set 1 profiles before (top) and after (bottom) removing data from the shallowest depth bin.

7. Select profiles to include in data product (first round)

The review_profiles function provides an interface for visually inspecting profiles for each deployment and selecting the profiles to include in the final data product. The goal of this step is to select the profile(s) without dynamic errors (e.g., unreasonable salinity spikes or density inversions).

# Review profiles
gapctd:::review_profiles(rds_dir_path = here::here("output", processing_method),
                         threshold = -1e-5, 
                         in_pattern = "_qc.rds")

Data from each deployment are displayed one at a time. If a downcast and upcast are both available, both will be shown simultaneously. If only one cast is available, only the downcast or upcast will be shown.

Follow instructions in the console to select profiles to use in the final data product: both casts (1), downcast (d), upcast (u), or none (0). If both casts are available from a deployment but none are selected, data from the cast will be reprocessed using a different approach to conductivity cell thermal mass correction and profiles will be reviewed again after processing.

Plots: Downcast (top row of profile plots) and upcast (bottom row of profile plots). Left-side panels show temperature (red) and salinity (green) versus depth. Right-side panel show density anomaly (blue) and buoyancy frequency (brown) versus depth. A vertical line on the right panels shows the buoyancy frequency threshold below which the density structure of the water column could be considered unstable. R Console output: User interface for selecting casts.

8. Finalize data

Move all of the accepted profiles from /output/gapctd/ to the /final_cnv/ directory.

# Finalize
finalize_data(rds_dir_path = here::here("output", processing_method))

Final profile data in the /final_cnv/ directory.

9. Prepare the data product

Prepare the data product if all data from the survey have been processed and are in different directories.

References

Garau, B., Ruiz, S., Zhang, W. G., Pascual, A., Heslop, E., Kerfoot, J., & Tintoré, J. (2011). Thermal lag correction on slocum CTD glider data. Journal of Atmospheric and Oceanic Technology, 28(9), 1065–1071. https://doi.org/10.1175/JTECH-D-10-05030.1

Kelley, D. E., Richards, C., & Layton, C. (2022). oce: an R package for Oceanographic Analysis. Journal of Open Source Software, 7(71), 3594. https://doi.org/10.21105/joss.03594

Ullman, D. S., & Hebert, D. (2014). Processing of underway CTD data. Journal of Atmospheric and Oceanic Technology, 31(4), 984–998. https://doi.org/10.1175/JTECH-D-13-00200.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0_batch_processing.md

0_batch_processing.md

Batch process CTD data in R

1. Installation

2. Load package, define global variables, and connect to Oracle

3. Setup directory for processing

4. Run gapctd processing methods on CTD files

5. Select best method

6. Visually inspect, flag, and interpolate bad data

Set 1

Set 2

7. Select profiles to include in data product (first round)

8. Finalize data

9. Prepare the data product

References

Files

0_batch_processing.md

Latest commit

History

0_batch_processing.md

File metadata and controls

Batch process CTD data in R

1. Installation

2. Load package, define global variables, and connect to Oracle

3. Setup directory for processing

4. Run gapctd processing methods on CTD files

5. Select best method

6. Visually inspect, flag, and interpolate bad data

Set 1

Set 2

7. Select profiles to include in data product (first round)

8. Finalize data

9. Prepare the data product

References