Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add LICHT LIDAR on R/V Meteor #12

Merged
merged 10 commits into from
Dec 15, 2023

Conversation

leifdenby
Copy link
Collaborator

No description provided.

Copy link
Contributor

@RobertPincus RobertPincus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is a user to know from this which of b or t to choose? Is there some way to provide user feedback/info in the catalog?

@leifdenby
Copy link
Collaborator Author

How is a user to know from this which of b or t to choose? Is there some way to provide user feedback/info in the catalog?

Thanks for asking. Re-reading the wiki for LICHT and CORAL (https://wiki.mpimet.mpg.de/doku.php?id=analysis:data:bco:ramanlidars:raman-lidar-coral#data_access and https://wiki.mpimet.mpg.de/doku.php?id=analysis:data:bco:ramanlidars:raman-lidar-licht#data_access) I can see that for LICHT the b and t options aren't detailed, but I assume they have the same meaning as for CORAL. I don't quite understand the difference though. So I'll email Ilya about this and update the description in the catalog and add that before merging this pull-request.

@observingClouds
Copy link
Collaborator

Hi,
b files are files with the focus on backscatter and water vapour, while t files contain the temperature profiles.

@RobertPincus
Copy link
Contributor

Thanks, @observingClouds. Parameterizing the catalog certainly makes it smaller but also somewhat more opaque. For the P3 data I am likely to keep the files explicit.

@d70-t
Copy link
Contributor

d70-t commented Oct 5, 2020

I'd also suggest to formulate the catalog more explicitly. In the end, the catalog might be what is used to generate overview pages about the available datasets, so the catalog should include enough information to understand what's inside the dataset and to discover all data from looking at the catalog file alone. In stead of keeping the catalog file small, I'd rather suggest to generate the catalog files using a script if maintaining it manually would be too tedious.

@RobertPincus
Copy link
Contributor

@observingClouds Do we want to close this as stale?

@observingClouds
Copy link
Collaborator

@leifdenby you put already all the information together. Would you mind, just splitting the b and t datasets in separate entries?

@ninarobbins would also be a good contributor here who could help us add the right metadata to the b and t dataset.

@RobertPincus I haven't lost the hope yet 🤣 @leifdenby will proof me that I'm correct 😜

@ninarobbins
Copy link

ninarobbins commented May 25, 2023

Hi, I hope I can provide some clarification about the b and t files of the lidars. It is the same idea for both CORAL and LICHT.

The processing from Level 0 to Level 1 of the Raman lidar data results in two products: slow (t) and fast (b). The data in the slow product is smoothed in time over the temperature smoothing window; this window is by default 118 min for LICHT or 60 min for low resolution CORAL data. This slow product contains the temperature data, but also water vapor smoothed over this longer window. The fast product is smoothed in time over the (shorter) window specified for the rest of variables (default is 2min for LICHT and low resolution CORAL data, which is the time interval of the Level 0 data); this fast product contains the backscatter data and also the water vapor smoothed over the shorter window.

Both of these smoothing intervals can be specified by the user in the configuration file when doing the processing, and each run of the processing code that converts Level_0 data to Level_1 results in a slow and a fast product.

I hope that helps!

@observingClouds
Copy link
Collaborator

Thanks @ninarobbins that is very helpful. Just one more question for clarification. The fast product only contains quantities that can be retrieved at the fast speed, while the slow product includes quantities that need a longer integration time, i.e. everything that is connected with the temperature retrieval, correct? WaterVaporMixingRatio is just in both datasets because it forms the basis of the relative humidity retrieval?

import intake
cat = intake.open_catalog("https://raw.githubusercontent.com/leifdenby/eurec4a-intake/meteor-licht-lidar/catalog.yml")
b = cat.ships.meteor.LICHT_LIDAR.to_dask()
t = cat.ships.meteor.LICHT_LIDAR(content_type='t').to_dask()

b.data_vars
#Data variables:
#    Altitude                                (Length) float32 dask.array<chunksize=(484,), meta=np.ndarray>
#    VerticalResolution                      (Length) float32 dask.array<chunksize=(484,), meta=np.ndarray>
#    UnixTime                                (Time) int32 dask.array<chunksize=(720,), meta=np.ndarray>
#    Backscatter532                          (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorBackscatter532                     (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ParticleLinearDepolarisationRatio       (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorParticleLinearDepolarisationRatio  (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    VolumeLinearDepolarisationRatio         (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorVolumeLinearDepolarisationRatio    (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    Backscatter355                          (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorBackscatter355                     (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    CloudMask_float                         (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    CloudMask                               (Length, Time) int32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    WaterVaporMixingRatio                   (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorWaterVapor                         (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>

t.data_vars
#Data variables:
#    Altitude                      (Length) float32 dask.array<chunksize=(484,), meta=np.ndarray>
#    AltitudeGradients             (Length_gradients) float32 dask.array<chunksize=(483,), meta=np.ndarray>
#    VerticalResolution            (Length) float32 dask.array<chunksize=(484,), meta=np.ndarray>
#    UnixTime                      (Time) int32 dask.array<chunksize=(720,), meta=np.ndarray>
#    Temperature355                (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorTemperature355           (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    TemperatureGradients355       (Length_gradients, Time) float32 dask.array<chunksize=(483, 720), meta=np.ndarray>
#    ErrorTemperatureGradients355  (Length_gradients, Time) float32 dask.array<chunksize=(483, 720), meta=np.ndarray>
#    WaterVaporMixingRatio         (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorWaterVapor               (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    RelativeHumidity355           (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>
#    ErrorRelativeHumidity355      (Length, Time) float32 dask.array<chunksize=(484, 720), meta=np.ndarray>

@ninarobbins
Copy link

@observingClouds yes, I believe that's right!

@leifdenby
Copy link
Collaborator Author

@leifdenby you put already all the information together. Would you mind, just splitting the b and t datasets in separate entries?

Yup! I can get that done :) It might have to wait till the end of the week, but I'll put it on my TODO list.

@observingClouds
Copy link
Collaborator

Hi @leifdenby,
I hope it's okay that I took over your branch here. I applied kerchunk now to make the dataset a bit more user-friendly. There are however quite some factors that make this dataset challenging:

  • several time dimensions with unsupported/non CF-conform units
  • very small chunks (1 per tilmestep) that result in a very large reference file (about 60MB -> compressed only about 4MB)
  • dimension order is mostly Length, Time which is not standard

It is probably possible to fix these issues within the reference file as well but it would be great to fix all these issues in the original dataset. For now I think this is the best we can do. If you like my current solution I will try to add the reference files to https://observations.ipsl.fr/aeris/eurec4a-data/SHIPS/RV-METEOR/Raman_Lidar_LICHT/version_2020.07.31/nc/ and remove them from IPFS to keep things simple.

@observingClouds
Copy link
Collaborator

Just for future reference. I used this script to create the reference files.

@RobertPincus
Copy link
Contributor

@observingClouds You asked for a review. You want this now or to wait for the data set fixes you propose above?

@observingClouds
Copy link
Collaborator

@RobertPincus I don't expect the dataset issues to be fixed in the near-term future so I'm asking for a review of the current workaround.

Copy link
Contributor

@RobertPincus RobertPincus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the kerchunking bit but this seems nice motion forward. Since the tests pass it must be working :-).

@d70-t
Copy link
Contributor

d70-t commented Dec 15, 2023

Maybe this is a bit of an unfortunate timing, but as the TCO group is in the process to provide data online via zarr, the LICHT data on Meteor is now available on DKRZ's Swift store. See here for some initial documentation, or try the following:

import intake
cat = intake.open_catalog("https://tcodata.mpimet.mpg.de/catalog.yaml")
LICHT_b = cat.METEOR.EUREC4A.lidar_LICHT_LR_b_v1.to_dask()
LICHT_t = cat.METEOR.EUREC4A.lidar_LICHT_LR_t_v1.to_dask()

the data has been cleand up by @ninarobbins and should likely be preferred over older versions of the data. The data also has been rechunked and should load in a reasonable time.

@observingClouds
Copy link
Collaborator

@d70-t that's great. Thanks @ninarobbins to also reprocess the Meteor Lidar data. That is amazing!

Copy link
Contributor

@d70-t d70-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning up 👍

@observingClouds observingClouds merged commit 6446702 into eurec4a:master Dec 15, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants