Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDF/MOSAiC forcing options #500

Draft
wants to merge 50 commits into
base: main
Choose a base branch
from

Conversation

davidclemenssewall
Copy link
Contributor

@davidclemenssewall davidclemenssewall commented Sep 24, 2024

For detailed information about submitting Pull Requests (PRs) to the CICE-Consortium,
please refer to: https://github.com/CICE-Consortium/About-Us/wiki/Resource-Index#information-for-developers

PR checklist

  • Short (1 sentence) summary of your PR:
    Enable the ingestion of forcing data from the MOSAiC Expedition and provide more flexibility for ice and ocean initialization.
  • Developer(s):
    David Clemens-Sewall
  • Suggest PR reviewers from list in the column to the right.
  • Please copy the PR test results link or provide a summary of testing completed below.
    Icepack base_suite with baseline comparison to main branch:
    215 measured results of 215 total results
    215 of 215 tests PASSED
  • How much do the PR code changes differ from the unmodified code?
    • bit for bit
    • different at roundoff level
    • more substantial
  • Does this PR create or have dependencies on CICE or any other models?
    • Yes
    • No
  • Does this PR add any new test cases?
    • Yes
    • No
      • But we probably should add a standard case for the MOSAiC Expedition, to do so we'll need to add the forcing data to the Icepack standard
  • Is the documentation being updated? ("Documentation" includes information on the wiki or in the .rst files from doc/source/, which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/.)
    • Yes
    • No, does the documentation need to be updated at a later time?
      • Yes
      • No
  • Please document the changes in detail, including why the changes are made. This will become part of the PR commit log.
    The overall goal of these options added to the Icepack driver are to make it easier to use Icepack as a standalone tool forced with observations. These changes have been implemented specifically for the MOSAiC Expedition. But my hope is that they will be readily extendable to other field data (or reanalysis, etc), so that new datasets can simply be used in Icepack, rather than requiring code changes for every new forcing dataset.

Observational data presents the following complications:

  1. Missing values. Real observational datasets almost always have missing values (power loss, radiometer dome iced over, etc...). Furthermore, future versions of the observational datasets may fill some of these missing values (e.g., by reprocessing data that previously failed qc).
  2. Variable frequencies. Different data streams may be measured at different sampling frequencies. Data might be measured more often than the model's timestep (thus requiring aggregation) or less often (thus requiring interpolation).
  3. Different start and end time points. Different data streams may have different start or end times, and updated versions of datasets may extend the time domain.
  4. Merged observational datasets are a work-in-progress. The 3 challenges above all touch on the fact that for sea ice field campaigns, it can take years from the end of the campaign until a truly finalized observational dataset is available to force Icepack with (e.g., MOSAiC is 4 years and counting, SHEBA still does not have such a dataset). We still want to be able to use campaign data to support Icepack development during this time, without having to change the Icepack driver every time the forcing dataset is updated.

Additionally, for comparing with field data it is very useful to be able to easily adjust:

  1. The initial state of the sea ice and ocean mixed layer.
  2. Fixed values for the oceanic heat flux convergence (qdp) and sea surface salinity (sss).

To address these challenges the PR provides the following features:

  1. Precalculate the forcing values at each timestep on initialization (in init_forcing). The current paradigm for forcing data in Icepack is that when init_forcing is called, a subroutine for the specific dataset (e.g., atm_CFS) stores the forcing data in the *_data in essentially the same format that the raw data is present in, without timestamp information. It also hardcodes the maximum size that a forcing dataset can be to 8760. Then, at each timestep the get_forcing subroutine has a code block for each forcing dataset that contains the timestamp information and interpolates the forcing data to the given timestep. Under this paradigm, I could not figure out a graceful way to handle missing values, nor forcing data with higher sampling frequency than the Icepack timestep, nor changing start times of the forcing dataset. This PR adds a new option, precalc_forc, for use with the MOSAiC forcing. When precalc_forc = .true., the forcing data is aggregated/interpolated to the Icepack timestep in the call to init_forcing and stored the the *_data arrays. Then, the get_forcing subroutine merely indexes into the *_data arrays.
  2. Read forcing data from MDF-formatted netCDF files (Uttal et al. 2024; https://doi.org/10.5194/gmd-17-5225-2024, https://gitlab.com/mdf-makers/mdf-toolkit). I chose this format because its framework seemed well-aligned, but it is still in development so we can add variables to it that we need for the ice model (e.g., mixed layer depth).
  3. To handle variable frequencies and missing data, forcing data are first temporally-averaged for each timestep and then interpolated. For a give data variable (var_data), The MOSAiC_average subroutine takes the average of all forcing datapoints within each timestep +- 0.5 dt excluding missing values and stores the results in var_data. If there is no valid data within 0.5 dt of a given timestep (e.g., most timesteps if the model timestep is much smaller than the sampling period) then a missing value is placed in var_data(timestep). Then, the MOSAiC_interpolate subroutine linearly interpolates missing values in var_data.
  4. The option to explicitly set in the namelist the initial ice and snow thicknesses for the slab and ITD grid cells and the SST for the 3 ice grid cells.
  5. The option to explicitly set in the namelist fixed ocean mixed layer conditions qdp_fixed, sss_fixed, and hmix_fixed. These are overwritten by forcing data.

Outstanding issues:

  • Automatically detect cadence of the inputs and adjust accordingly (currently the code requires that the atmospheric forcing is at 1 minute cadence and that the oceanic forcing is at daily cadence)
  • It is not obvious to a user that setting the atmospheric forcing (atm_data_type = 'clim') can overwrite the oceanic heat flux convergence.
  • The atm_MOSAiC and ocn_MOSAiC subroutines are nearly identical except for the data variables. It would probably be cleaner to combine them into a single subroutine.
  • Perhaps input checking should be added to prevent users from setting precalc_forc = .true. with non MOSAiC forcings.
  • Update documentation to include:
    • How driver handles MDF formatted forcing
    • MOSAiC forcing
    • Initial ice and SST conditions in namelist
    • Fixed ocean mixed layer conditions in namelist
  • Add warning in log for if MOSAiC time domain includes leg 4-5 transition (observatory relocation)
  • Add MOSAiC test case
  • Add link to MOSAiC forcing data to wiki

davidclemenssewall and others added 28 commits May 8, 2023 13:55
Just updating for_igs to consortium main. Only conflict was in icedrv_init use section
@davidclemenssewall
Copy link
Contributor Author

@apcraig When I try running the base_suite on Windows Subsystem for Linux (-m conda -e linux), I fail the run stage for the conda_linux_smoke_col_1x1_alt03_debug_run1year and conda_linux_smoke_col_1x1_alt04_debug_run1year tests due to a floating-point exception. I've copied the relevant sections from the runlog below. Strangely, this error does not arise in the GH macos-testing. It also does not occur when I run these tests with the main branch locally on WSL. So it seems to be some combination of the platform and the code changes, although I cannot see how these code changes in the driver cause this error in the atmospheric stability. I was hoping you might have a suggestion on how I can fix this. Thanks!

Tail from the icepack runlog:

  Initial forcing data year =         2015
  Final   forcing data year =         2015
 Reading /root/icepack-dirs/input/Icepack_data/forcing/SHEBA/open_clos_lindsay.dat                                      


Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f05dd5e951f in ???
#1  0x557a0eb656c2 in compute_stability_parameter
        at /root/code/CICE/Icepack/columnphysics/icepack_atmo.F90:992
#2  0x557a0eb67de2 in atmo_boundary_layer
        at /root/code/CICE/Icepack/columnphysics/icepack_atmo.F90:276
#3  0x557a0eb65d98 in __icepack_atmo_MOD_icepack_atm_boundary
        at /root/code/CICE/Icepack/columnphysics/icepack_atmo.F90:954
#4  0x557a0eaf4c59 in __icedrv_step_MOD_ocean_mixed_layer
        at /root/code/CICE/Icepack/configuration/driver/icedrv_step.F90:1251
#5  0x557a0ea69c25 in coupling_prep
        at /root/code/CICE/Icepack/configuration/driver/icedrv_RunMod.F90:303
#6  0x557a0ea6d57b in __icedrv_runmod_MOD_ice_step
        at /root/code/CICE/Icepack/configuration/driver/icedrv_RunMod.F90:221
#7  0x557a0ea6d6dc in __icedrv_runmod_MOD_icedrv_run
        at /root/code/CICE/Icepack/configuration/driver/icedrv_RunMod.F90:57
#8  0x557a0ea6944d in icedrv
        at /root/code/CICE/Icepack/configuration/driver/icedrv_MAIN.F90:51
#9  0x557a0ea69753 in main
        at /root/code/CICE/Icepack/configuration/driver/icedrv_MAIN.F90:28

@dabail10
Copy link
Contributor

dabail10 commented Oct 9, 2024

HI all. Let's discuss this at the meeting tomorrow. I can understand that it would be nice to have this as a part of the Icepack forcing dataset on zenodo. However, the National Science Foundation requires the datasets be put somewhere like the Arctic Data Center where they do reside currently. This would mean duplicating the data on zenodo. Is there a way to have a "link" from zenodo to the ADC?

Copy link

@anton-seaice anton-seaice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this David! This will be super useful

As Elizabeth noted, it needs some documentation.

And when adding the link to the forcing data to the wiki, I think the DOI link is best:

e.g. https://doi.org/10.18739/A2GX44W6J

character(char_len), public :: &
atm_data_format, & ! 'bin'=binary or 'nc'=netcdf
ocn_data_format, & ! 'bin'=binary or 'nc'=netcdf
bgc_data_format, & ! 'bin'=binary or 'nc'=netcdf
atm_data_type, & ! 'default', 'clim', 'CFS'
ocn_data_type, & ! 'default', 'SHEBA'
atm_data_type, & ! 'default', 'clim', 'CFS', 'MOSAiC'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use a more general name than MOSAIC for the new data type - its supports any 'MDF' file right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions Anton! @eclare108213 and @apcraig, what would you think about making the atm_data_type and ocn_data_type for this 'MDF' instead of 'MOSAiC'? Then I would also propose that we make the default path for these forcing data: /icepack-dirs/input/Icepack_data/forcing/MDF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MDF works for me. But how will you distinguish between MOSAiC MDF data and some other kind of MDF data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the only place where it matters to the code that this is MOSAiC MDF data is when we check if the simulation time includes the MOSAiC leg 4 to leg 5 ship repositioning. For that check, we can check whether the forcing filename contains the string 'MOSAiC'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah - thats an annoying quirk. I don't have a straightforward answer to that. This sounds like a different measurement site though? So should the experiment start again at that point with different inititial conditions ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have two different mosaic options. MOSAIC123 and MOSAIC45 for instance or some other name. These would be two separate forcing/ic options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea. I have not worked much with the leg 5 (August-September near the North Pole) data but I will inquire about how we could have an appropriate initial condition.

Based on this discussion, I think it would be cleanest if the data archive contained separate files for legs 1-4 (October-July along transpolar drift) and leg 5 (August-September near the North Pole). I will inquire within the MOSAiC Model Forcing Dataset working group to see whether others agree and if we can update the data archive.

Comment on lines +190 to +195
! fixed ocean mixed layer properties (are overwritten by forcing data)
real (kind=dbl_kind), public :: &
sss_fixed , & ! Sea surface salinity (PSU)
qdp_fixed , & ! Deep ocean heat flux (negative upward, W/m^2)
hmix_fixed ! Mixed layer depth (m)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to do this in a different PR, so its clear what the scope of the change is (i.e. its not related to MOSAIC forcing data)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree that this would be best practice. But I admit that I'm loath to do so because the branch I originally made these modifications on is a bit stale and it doesn't feel like a productive use of time to sort through the merge conflicts and do all of the testing if the ultimate outcome for the code is the same. If others feel strongly about it I will try to get to it sometime next week. @apcraig @eclare108213 @dabail10 What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth the extra work at this point, to make a separate PR.

Comment on lines 1216 to 1218
call icedrv_system_abort(string=subname//&
' ERROR: only NetCDF input implemented for atm_MOSAiC', &
file=__FILE__,line=__LINE__)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's clearer if you put this at line 1087, if:

if (atm_data_format != 'nc') then ... abort

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This will be fixed in the next push.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@davidclemenssewall
Copy link
Contributor Author

davidclemenssewall commented Oct 11, 2024

Outstanding issues:

  • Automatically detect cadence of the inputs and adjust accordingly (currently the code requires that the atmospheric forcing is at 1 minute cadence and that the oceanic forcing is at daily cadence)

If this is easy to implement, let's do it. Otherwise I'd rather merge the PR earlier and improve it later.

  • It is not obvious to a user that setting the atmospheric forcing (atm_data_type = 'clim') can overwrite the oceanic heat flux convergence.

This could be noted in the documentation and also as a warning in the diagnostics log file when namelist values are written out.

  • The atm_MOSAiC and ocn_MOSAiC subroutines are nearly identical except for the data variables. It would probably be cleaner to combine them into a single subroutine.

Your choice.

  • Perhaps input checking should be added to prevent users from setting precalc_forc = .true. with non MOSAiC forcings.

Definitely.

I also recommend adding the forcing to our datasets and creating a few options for testing, including one that represents a 'production' configuration. I will still recommend that our codes be used in coupled configurations to capture important feedbacks, but we can adjust our stern warnings in the Icepack docs to acknowledge the MOSAiC configuration as a tool to be used thoughtfully.

Latest commit addresses these with the exception of the warnings, documentation updates, and MOSAiC testing. Working on those currently...

@apcraig
Copy link
Contributor

apcraig commented Oct 13, 2024

Looks like this is going forward nicely. I am mostly comfortable with the proposals thus far. But I think it's a mistake to have a feature that works if "MOSAIC" is in the filename. Please don't implement something like that. We should create a new namelist option if we need to. For now, we can use MDF. If something comes along later that's MDF but not Mosaic, we'll create a new option for it called "BLAHBLAH" or "BBMDF" or something. Or if what we have now is specific to mosaic even though it's MDF, lets call it MOSAIC or MosaicMDF or something. If the implementation is general for MDF, lets use MDF. How general or specific is the current implementation, that should drive the name we pick for the namelist setting.

@davidclemenssewall
Copy link
Contributor Author

Looks like this is going forward nicely. I am mostly comfortable with the proposals thus far. But I think it's a mistake to have a feature that works if "MOSAIC" is in the filename. Please don't implement something like that. We should create a new namelist option if we need to. For now, we can use MDF. If something comes along later that's MDF but not Mosaic, we'll create a new option for it called "BLAHBLAH" or "BBMDF" or something. Or if what we have now is specific to mosaic even though it's MDF, lets call it MOSAIC or MosaicMDF or something. If the implementation is general for MDF, lets use MDF. How general or specific is the current implementation, that should drive the name we pick for the namelist setting.

I think that the right approach here is to break the MOSAiC forcing data into separate MDF files for legs 1-4 and leg 5. Then the forcing code can remain totally general for MDF, and we have two different set_env options for initializing the different legs as you suggested.

@davidclemenssewall davidclemenssewall changed the title Mosaic forcing clean MDF/MOSAiC forcing options Nov 11, 2024
@xcwang2019
Copy link

@davidclemenssewall @eclare108213 @dabail10 Sorry to reply the PR so late. Our group has constructed atmopheric and oceanic forcing files for Icepack using MOSAiC observations. The hourly forcing file is separated into two periods. The format is txt. Should I attach these files here or should I put it on zenodo and provide a link here? These forcing files were used on Icepack 1.1.0 and there is no code change involved. Thanks.

@eclare108213
Copy link
Contributor

@xcwang2019 Let's move discussion of your forcing data to issue #488.
@davidclemenssewall is this PR ready for review or are you still working on it?

@davidclemenssewall
Copy link
Contributor Author

@eclare108213 apologies for the delay here. I am waiting on some additional processed precipitation data that I would like to include in the next version of the MOSAiC forcing data. Once I have the updated dataset released I will request review here.

@davidclemenssewall
Copy link
Contributor Author

Hi all,

This is almost ready for review but I'm running into two issues on testing:

  1. When I attempt to run the smoke test using the MOSAiC forcing with the debug flag set, I receive a floating point exception when I try to open the netcdf file (nf90_open(...). I've copied the error below It appears that this might be an issue with the underlying netcdf fortran library (I found another thread on this here: https://support.unidata.ucar.edu/archives/netcdf/msg15037.html). When I run without the debug flag the test passes fine. I've contacted the person who asked that support question to see if there was a resolution, but I was curious if anyone here has run into a similar issue?

  2. The standard restart test doesn't work for the MOSAiC forcing because drift 1 of the MOSAiC campaign only ran from October 2019 to July 2020. Would it be reasonable to change the restart test such that the restart point is at some point within npt (e.g., the restart point could be halfway through the length of the full run)? Or perhaps we should have a third test option (like 'restart_half') that would do this and leave the current restart test alone.

The ReadTheDocs build also failed on this push, but I think that was just a hiccup from the RTD server. The documentation builds fine when I restarted it. See here: https://davidclemenssewall-icepack.readthedocs.io/en/mosaic_forcing_clean/.

I have uploaded version 2 of the MOSAiC Model Forcing Datasets, including the changes requested by the consortium, to the Arctic Data Center. In case you want to test them before they are published, I've put them in a google drive folder here: https://drive.google.com/drive/folders/1xGAeCJDwbvkXJkA-3DZXyr75GOGnoZa4?usp=sharing. Just let me know if you need access.

Cheers,
David

Error when attempting to open netcdf file in debug mode:

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7fc0a1b4951f in ???
        at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#1  0x7fc0a18b1b72 in ???
#2  0x7fc0a1840cba in ???
#3  0x7fc0a18d01e2 in ???
#4  0x7fc0a1627a22 in ???
#5  0x7fc0a16c4385 in ???
#6  0x7fc0a21a0b9f in ???
#7  0x7fc0a21a952d in ???
#8  0x7fc0a2124213 in ???
#9  0x7fc0a2126359 in ???
#10  0x7fc0a21264d4 in ???
#11  0x7fc0a202ea7b in ???
#12  0x7fc0a20d5fd7 in ???
#13  0x559b0715f35b in atm_mdf
        at /root/code/CICE/icepack/configuration/driver/icedrv_forcing.F90:1097
#14  0x559b07174ed6 in __icedrv_forcing_MOD_init_forcing
        at /root/code/CICE/icepack/configuration/driver/icedrv_forcing.F90:195
#15  0x559b071347c2 in __icedrv_initmod_MOD_icedrv_initialize
        at /root/code/CICE/icepack/configuration/driver/icedrv_InitMod.F90:137
#16  0x559b071348dd in icedrv
        at /root/code/CICE/icepack/configuration/driver/icedrv_MAIN.F90:45
#17  0x559b07134be8 in main
        at /root/code/CICE/icepack/configuration/driver/icedrv_MAIN.F90:28

@anton-seaice
Copy link

  1. When I attempt to run the smoke test using the MOSAiC forcing with the debug flag set, I receive a floating point exception when I try to open the netcdf file (nf90_open(...). I've copied the error below It appears that this might be an issue with the underlying netcdf fortran library (I found another thread on this here: https://support.unidata.ucar.edu/archives/netcdf/msg15037.html). When I run without the debug flag the test passes fine. I've contacted the person who asked that support question to see if there was a resolution, but I was curious if anyone here has run into a similar issue?

If this is using a netcdf4 file, there was an issue with the underlying hdf5 library in some versions last year that caused this sort of error. I think it only occurred in hdf5 versions 1.14.3 and was also patched in the spack build. See HDFGroup/hdf5#3831

@davidclemenssewall
Copy link
Contributor Author

  1. When I attempt to run the smoke test using the MOSAiC forcing with the debug flag set, I receive a floating point exception when I try to open the netcdf file (nf90_open(...). I've copied the error below It appears that this might be an issue with the underlying netcdf fortran library (I found another thread on this here: https://support.unidata.ucar.edu/archives/netcdf/msg15037.html). When I run without the debug flag the test passes fine. I've contacted the person who asked that support question to see if there was a resolution, but I was curious if anyone here has run into a similar issue?

If this is using a netcdf4 file, there was an issue with the underlying hdf5 library in some versions last year that caused this sort of error. I think it only occurred in hdf5 versions 1.14.3 and was also patched in the spack build. See HDFGroup/hdf5#3831

Updating hdf5 worked, the smoke test now passes. Thanks @anton-seaice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants