Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No pdf #7

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 9 additions & 23 deletions docs/GIS-data.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
# GIS Data


The GEOGLOWS GIS data used in the hydrologic model is available for users to download and use for their own purposes. This dataset is referred to as hydrography, hydrofabric, or river network. It is vector data with points and lines with coordinates, not grid data, and it includes two main components:
The GEOGLOWS GIS data used in RFS is available for users to download and use for their own purposes. This dataset is referred to as hydrography, hydrofabric, or river network. It is vector data with points and lines with coordinates, not grid data, and it includes four main components:

- The exact **stream center lines** used in the hydrologic model.
- The exact **catchment boundaries** used in the hydrologic model

Each stream centerline corresponds to exactly one unique catchment boundary. The streams and catchments each have a unique 9-digit ID that identifies the catchment. This ID is the same for the stream and the corresponding catchment.
- The exact **stream center lines** used in RFS. Each stream has a unique 9 number ID which is referred to as a reachID, link number, or stream ID. This is the file called "streams_{vpu}.gpkg".
- The **catchment boundaries** used RFS. There are the boundaries around each of the streamlines and represent the area connected to that streamline. It is identified using the same link number as the stream center lines. This is the file called "catchments_{vpu}.spatialite". Each stream centerline corresponds to exactly one unique catchment boundary.
- The **connection points** used in RFS where different stream centerlines connect. Each point has the an attribute called DSLINKNO which represents the one downstream link number for each of the points. It has another attribute called USLINKNOs. This is a comma seperated list of the link numbers upstream of the nexus point. This is the file called "nexus_{vpu}.gpkg".
- The **merged lake catchments** used in RFS to represent the locations of lakes. Stream catchments that were identified through GIS searching to be part of a lake were merged to present the lakes. Therefore, it will have a different shape than the actual lake boundary based on the shapes of the merged stream catchments. This is the file called "lakes_{vpu}.gpkg".
Comment on lines +6 to +9
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Grammar and Typographical Corrections Needed in Bullet Points

  • On line 8, the phrase "Each point has the an attribute" contains a duplicate determiner. It should be corrected to "Each point has an attribute."
  • Additionally, "comma seperated list" should be corrected to "comma separated list."
    These fixes will improve clarity and professionalism in the documentation.
🧰 Tools
🪛 LanguageTool

[grammar] ~8-~8: Two determiners in a row. Choose either “the” or “an”.
Context: ...eam centerlines connect. Each point has the an attribute called DSLINKNO which represe...

(DT_DT)


---

## VPUs

The GIS data is divided into 125 smaller pieces called VPUs. This makes the large quantity of data easier to manage and access. Each VPU represents one watershed (such as the Amazon River Basin or the Nile River Basin) or a combination of watersheds. The following image shows the VPU breakdown throughout the world.
The GIS data is divided into 125 smaller pieces called VPUs (vector processing units). This makes the large quantity of data easier to manage and access. Each VPU represents one watershed (such as the Amazon River Basin or the Nile River Basin) or a combination of watersheds. The following image shows the VPU breakdown throughout the world.

![image](vpu-boundary.png)

The VPU boundaries are also available for download to help identify which VPU includes a user's area of interest. Then the catchments and streams are able to downloaded as an entire VPU.
The VPU boundaries are also available for download to help identify which VPU includes a user's area of interest. The other GIS data sets should be downloaded based on the VPU of interest and are downloaded as an entire VPU.

---

Expand All @@ -28,20 +28,10 @@ The V2 streams have the following attributes, which come from the TauDEM delinea

- **LINKNO** - A river ID number unique to the TDXHydro delineation. In TDXHydro v1, this is not globally unique. In future versions, this will be the same as geoglowsID.
- **DSLINKNO** - The ID of the river immediately downstream of the segment represented on that row.
- **USLINKNO*** - There will be 1 column per river segment upstream of the river on this row.
- **DSNODEID** - The node identifier for the node at the downstream end of the river.
- **strmOrder** - The Strahler stream order.*
- **Length** - Geodesic length in meters of the river segment.
- **Magnitude** - The Shreve stream magnitude.*
- **USContArea** - The total drainage area upstream of the most upstream point (i.e., the inlet) of this segment.*
- **DSContArea** - The total drainage area upstream of the most downstream point (i.e., the outlet) of this segment.
- **strmDrop** - The change in elevation between the inlet and outlet of the river segment.*
- **Slope** - The average stream slope, equal to "strmDrop / Length."
- **StraightL** - Distance from start to end of a river in a straight line between the first and last points.*
- **WSNO** - Watershed number.
- **DOUTEND** - Distance to the eventual outlet from the end of the river.*
- **DOUTSTART** - Distance to the eventual outlet from the start of the river.*
- **DOUTMID** - Distance to the eventual outlet from the midpoint of the river.*
- **LengthGeodesucMaters** - Geodesic length in meters of the river segment.

V2 streams also have the following additional attributes added by the GEOGLOWS modelers:

Expand All @@ -66,8 +56,4 @@ There were some slight modifications made to the TDX-Hydro dataset when creating
- Streams that had no length and no upstream/downstream segments were removed along with their associated catchments.
- Streams with no length but with upstream and downstream segments were removed with their associated catchments, and the upstream and/or downstream segments were modified to preserve the connectivity of the network.
- For many of the regions, headwater streams were dissolved with the downstream segments.
- Small watersheds that did not represent real flowing streams were often dropped.

## Learn More

For some more detailed examples of getting and using the GIS data, please look at [GIS Data.pdf](https://drive.google.com/file/d/10NrEV3GAQlI5OypeWn6pCAInDiGBFHLX/view?usp=sharing)
- Small watersheds that did not represent real flowing streams were often dropped.
Binary file added docs/api-window-pop-up.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/bias-correction.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Bias Correction and SABER

The GEOGLOWS Hydrologic Model exhibits biases that can limit its precision, prompting the development of a bias correction approach. To correct these systematic biases at instrumented locations, we propose the Monthly Flow Duration Curve Quantile-Mapping (MFDC-QM) method. This method targets biases related to flow variability and correlation. The GEOGLOWS Hydrologic Model does not assimilate observed streamflow data into its initial calculation. However, the bias-correction technique allows for the global data to be applied locally. Local users can have more confidence in their data because they can know that their observed data is able to be used to improve the modeled data at their location.
RFS exhibits biases that can limit its precision, prompting the development of a bias correction approach. To correct these systematic biases at instrumented locations, we propose the Monthly Flow Duration Curve Quantile-Mapping (MFDC-QM) method. This method targets biases related to flow variability and correlation. RFS does not assimilate observed streamflow data into its initial calculation. However, the bias-correction technique allows for the global data to be applied locally. Local users can have more confidence in their data because they can know that their observed data is able to be used to improve the modeled data at their location.

After applying the bias correction, we observed a significant improvement in the distribution of bias and variability ratios, with a slight improvement in correlation values across the stations, resulting in more reliable simulations and improved Kling-Gupta Efficiency (KGE) metrics: bias, variability, and correlation.

Expand All @@ -18,7 +18,7 @@ To dive deeper into the analysis of bias correction and performance evaluation,

The SABER method is a bias correction tool designed for large hydrologic models like GEOGLOWS, specifically addressing the issue of model biases in both gauged and ungauged river basins. SABER uses flow duration curves (FDC) to compare the observed discharge with the simulated values from hydrologic models, identifying and correcting biases. For ungauged locations, where direct observations are unavailable, SABER uses the scalar flow duration curve (SFDC).

Unlike bias-correction, which each institution performs locally, SABER is performed by the GEOGLOWS team and is not done by the end users. We use the gauge data made available to us to perform an improvement to all the model results. This process is still in experimentation and is not currently being applied to the data accessed by the end-users. We hope for it to be applied in future versions of the GEOGLOWS Hydrologic Model.
Unlike bias-correction, which each institution performs locally, SABER is performed by the GEOGLOWS team and is not done by the end users. We use the gauge data made available to us to perform an improvement to all the model results. This process is still in experimentation and is not currently being applied to the data accessed by the end-users. We hope for it to be applied in future versions of RFS.

SABER allows the bias correction process to extend to ungauged basins by analyzing similar watershed behaviors based on spatial proximity and clustering of flow regimes. This method is particularly useful for regions where data scarcity limits traditional calibration, such as in global models like GEOGLOWS, ensuring more accurate discharge forecasts across large spatial domains.

Expand Down
Binary file added docs/calendar-forecast.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/data-catalog.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Using AWS Buckets

The **GEOGLOWS Hydrological Model Version 2** allows users to download global streamflow data directly from AWS. This provides access to both retrospective simulation data and 15-day streamflow forecasts. These datasets are hosted in S3 buckets, optimized for time series analysis and bulk downloads.
RFS allows users to download global streamflow data directly from AWS. This provides access to both retrospective simulation data and 15-day streamflow forecasts. These datasets are hosted in S3 buckets, optimized for time series analysis and bulk downloads.

Users can easily access and analyze these data using **Python** and **Jupyter notebooks**, with detailed tutorials available.

Expand Down
36 changes: 30 additions & 6 deletions docs/data-service.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,36 @@ For more information, visit the [GEOGLOWS API Documentation](https://geoglows.ec
![image](api.png)
---

## Using the API in Applications
## Using the API

The API can be used in applications requiring streamflow data and can be integrated directly into Python workflows.
In order to use the API, most functions require you to know your river ID number. You can find more information about finding your river number here: [Finding River Numbers](https://data.geoglows.org/tutorials/finding-river-numbers). You can download the GIS data by VPU through the data catalog or select a stream on the web application and get a river number that way.

### Using the API Website

To use the API website, follow these steps:

**Step 1:** Click the blue **“Get”** button next to the command you are interested in. This opens a window where you can enter your parameters.

![API Window Pop-up](api-window-pop-up.png)

**Step 2:** Before entering any numbers, click **“Try it out”** to enable input fields. This allows you to enter numbers and select response formats.

**Step 3:** Enter the required information:
- A **9-digit river ID number** (also known as a COMID or Link Number) in the `river_id` field. This is required.
- Choose either `csv` or `json` from the dropdown menu under `format`. The default selection is `csv`.
- For **forecast data queries**, enter a date in `YYYYMMDD` format. If left blank, it will return the most recent forecast.

The following resources provide guidance:
- [Programmatic_Access Colab.ipynb](https://colab.research.google.com/drive/19PiUTU2noCvNGr6r-1i9cv0YMduTxATs?usp=sharing): A walkthrough example of how to use the GEOGLOWS API.
- [Programmatic Access 2.0.pdf](https://drive.google.com/file/d/195LGTwbi4-Ho4JW15qZT-PDgUn10qit1/view?usp=sharing): A presentation with additional details.
![Execute Button](execute-button.png)

These resources demonstrate how to leverage the API effectively for custom applications and analyses.
**Step 4:** Click the **blue “Execute”** button at the bottom of the screen. The system will process your request and load for a few seconds. Once finished, you will receive a response code along with an option to download the file.

![API Response](response-api.png)

### Accessing the API Using Python

One of the easiest ways to access the API is through Python. There is a **GEOGLOWS Python package** (documented here: [GEOGLOWS API Documentation](https://geoglows.readthedocs.io/en/latest/api-documentation.html)) that contains commands for basic analysis and querying specific types of data.

This Python notebook provides examples of using the API in Python, as well as utilizing the Python package: [Programmatic_Access Colab.ipynb](https://colab.research.google.com/drive/19PiUTU2noCvNGr6r-1i9cv0YMduTxATs?usp=sharing)


The API can be used in applications requiring streamflow data and can be integrated directly into Python workflows.
Binary file added docs/execute-button.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 2 additions & 4 deletions docs/exercises.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
## Return periods, flow duration curves, and average flows

In hydrological analysis, return periods are used to estimate the probability of extreme events like floods. While the Weibull Distribution is often used to calculate return periods based on historical data, it is limited by the length of the data series and cannot predict events beyond the observed records. For the GEOGLOWS Model, the Gumbel Distribution is applied instead, as it better models extreme values and allows for extrapolation, making it possible to calculate return periods for events beyond the available data.
In hydrological analysis, return periods are used to estimate the probability of extreme events like floods. While the Weibull Distribution is often used to calculate return periods based on historical data, it is limited by the length of the data series and cannot predict events beyond the observed records. For RFS, the Gumbel Distribution is applied instead, as it better models extreme values and allows for extrapolation, making it possible to calculate return periods for events beyond the available data.

Additionally, **Flow Duration Curves (FDCs)** are used to represent the percentage of time that streamflow is likely to equal or exceed certain flow rates, providing insights into the variability of water resources. The model also includes the analysis of daily seasonality to understand patterns of streamflow throughout the year, monthly seasonality to observe changes between months, and annual mean discharge to detect long-term trends. These analyses are critical for effective water resource management, flood forecasting, and understanding hydrological patterns. The following presentation gives more of a background on the return periods and flow duration curves. It shows the equations used in the GEOGLOWS Hydrologic Model to represent these things.

[Return_Periods-FDC-Average_Flows.pdf](https://drive.google.com/file/d/10Si933D0fxaUrJFmIJr-WdOyjOkm453m/view?usp=sharing)
Additionally, **Flow Duration Curves (FDCs)** are used to represent the percentage of time that streamflow is likely to equal or exceed certain flow rates, providing insights into the variability of water resources. The model also includes the analysis of daily seasonality to understand patterns of streamflow throughout the year, monthly seasonality to observe changes between months, and annual mean discharge to detect long-term trends. These analyses are critical for effective water resource management, flood forecasting, and understanding hydrological patterns.

To further explore the analysis of return periods, flow duration curves, and seasonal averages, we invite you to follow along with our interactive demonstration in the provided Google Colab notebook. This hands-on notebook will guide you through the process, using real data from the Tensift River in Morocco. You can access and run the notebook directly in your browser:

Expand Down
Binary file added docs/filtered-streams.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 9 additions & 3 deletions docs/forecast.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Forecast Data

The GEOGLOWS model produces ensemble streamflow forecasts using data from the ECMWF (European Centre for Medium-Range Weather Forecasts) ensemble system. Forecasts are produced daily and are available by 12 PM UTC. Similar to the retrospective data, the units are in cubic meters per second.
RFS produces ensemble streamflow forecasts using data from the ECMWF (European Centre for Medium-Range Weather Forecasts) ensemble system. Forecasts are produced daily and are available by 12 PM UTC. Similar to the retrospective data, the units are in cubic meters per second.

Each forecast includes a **50+1 member ensemble**:
- **1 baseline (control)** prediction
Expand All @@ -12,11 +12,18 @@ The forecast has a **3-hour time step**, where each flow value represents the av
![image](img17.png)
---

The streamflow forecast is updated daily using the 24-hour mean value from the ensemble members on the previous day as the initial condition.

The ensemble members have a spatial resolution of 9 kilometers horizontally.

These meteorological forecasts are converted into runoff using the HTESSEL hydrological model. These results are then downscaled using an area-weighted gridding to vector methodology and subsequently routed through the drainage network.


## Ensemble Probabilities and Interpretation

Each ensemble member has an equal probability of occurring. Therefore, forecasts are best understood by looking at summaries of the ensembles rather than individual members.

Forecast plots are designed to help users interpret the range of possible outcomes and uncertainties. The most commonly used forecast plot includes the median, the 20th percentile, and the 80th percentile. These represent 60% of the probability distribution within the ensemble members and provide insight into the potential variability of future streamflows. This approach allows users to see the range of probable scenerios for their streams.
Forecast plots are designed to help users interpret the range of possible outcomes and uncertainties. The most commonly used forecast plot includes the median, the 20th percentile, and the 80th percentile. These represent 60% of the probability distribution within the ensemble members and provide insight into the potential variability of future streamflows. This approach allows users to see the range of probable scenarios for their streams.

---

Expand All @@ -34,5 +41,4 @@ The following graph shows an example of a forecast plot:
1. The **black line** represents the best estimate of future river flow.
2. The **blue shaded region** represents the uncertainty in the prediction. The narrower the blue region, the more confident the model is. The true flow is more likely than not to fall within the blue shaded area.

For more details, refer to the document: [Forecast_Data.pdf](https://drive.google.com/file/d/1_dDtF3F74Un8PKVkZZdslDjp_MP64-dX/view?usp=sharing).

2 changes: 1 addition & 1 deletion docs/forecasted-bias-correction.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Forecast Bias Correction

The **GEOGLOWS model** applies bias correction to its forecast data by assuming the forecast shares the same biases as the retrospective simulation. This process involves mapping forecasted streamflow values to a non-exceedance probability using the historical simulation's flow duration curve and then replacing the forecasted values with corresponding values from the observed flow duration curve.
RFS applies bias correction to its forecast data by assuming the forecast shares the same biases as the retrospective simulation. This process involves mapping forecasted streamflow values to a non-exceedance probability using the historical simulation's flow duration curve and then replacing the forecasted values with corresponding values from the observed flow duration curve.

![forecasts](forecast-bias-correction.png)

Expand Down
Binary file added docs/imagen.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading