Skip to content

Commit cb6f080

Browse files
authored
Doc strings, type hints, code revisions (#34)
1 parent 9658a29 commit cb6f080

File tree

7 files changed

+86
-53
lines changed

7 files changed

+86
-53
lines changed

docs/api-documentation.rst

+1
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,5 @@ API Documentation
99
api-documentation/bias
1010
api-documentation/plots
1111
api-documentation/analyze
12+
api-documentation/streams
1213
api-documentation/streamflow

docs/api-documentation/analyze.rst

-2
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,6 @@
22
geoglows.analyze
33
================
44

5-
Analyze
6-
~~~~~~~
75
Functions which post process results from the streamflow data service into additional, useful products
86

97
.. automodule:: geoglows.analyze

docs/api-documentation/streams.rst

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
================
2+
geoglows.streams
3+
================
4+
5+
The functions in this module lookup metadata for rivers using a table of metadata about the GEOGLOWS model. This needs
6+
to be downloaded or it can be retrieved and cached by the metadata table function in the data module.
7+
8+
If you download the table in advance, you can specify it with the PYGEOGLOWS_METADATA_TABLE_PATH environment variable
9+
which will be checked at runtime. If it is not set, you need to restart the runtime or use the download function to
10+
retrieve it.
11+
12+
.. automodule:: geoglows.streams
13+
:members:
14+
river_to_vpu, latlon_to_river, river_to_latlon

docs/index.rst

+6-13
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,25 @@
11
geoglows
22
========
3-
.. image:: https://anaconda.org/geoglows/geoglows/badges/platforms.svg
3+
.. image:: https://anaconda.org/conda-forge/geoglows/badges/platforms.svg
44
:target: https://anaconda.org/geoglows/geoglows
55
.. image:: https://img.shields.io/pypi/v/geoglows
66
:target: https://pypi.org/project/geoglows
7-
.. image:: https://anaconda.org/geoglows/geoglows/badges/latest_release_date.svg
7+
.. image:: https://anaconda.org/conda-forge/geoglows/badges/latest_release_date.svg
88
:target: https://anaconda.org/geoglows/geoglows
99

1010
The geoglows Python package enables access to data, API's, and code developed for the `GEOGLOWS Streamflow Model <https://geoglows.ecmwf.int>`_.
1111
Read more about GEOGLOWS at `<https://geoglows.org>`_
1212

13-
Demos
14-
=====
15-
These links will be maintained to reference the most updated versions of the tutorials.
16-
The tutorials are GitHub Gists which you can copy and launch in a Google Collaboratory setting directly from the GitHub.
17-
18-
- Retrieve & plot GEOGLOWS model data: `<https://gist.github.com/rileyhales/873896e426a5bd1c4e68120b286bc029>`_
19-
- Finding Stream ID #'s programmatically: `<https://gist.github.com/rileyhales/ad92d1fce3aa36ef5873f2f7c2632d31>`_
20-
- Bias Evaluation and Calibration at a point: `<https://gist.github.com/rileyhales/d5290e12b5858d59960d0898fbd0ed69>`_
21-
- Generate/Download High Res Plot Images: `<https://gist.github.com/rileyhales/9b5bbb0c5f307eb14b9f1ced39d641e4>`_
13+
For demos, tutorials, and other training materials for GEOGLOWS and the geoglows Python packge, please visit
14+
`<https://data.geoglows.org>`_.
2215

2316
About GEOGLOWS ECMWF Streamflow
2417
===============================
25-
GEOGLOWS ECMWF Streamflow Project: This project provides access to the results of a hydrologic model that is run each
18+
GEOGLOWS ECMWF Streamflow Project: This project provides access to the results of a hydrological model that is run each
2619
day. The model is based on a group of unique weather forecasts, known as an ensemble, from ECMWF. Each unique
2720
precipitation forecast, known as an ensemble member, produces a unique streamflow forecast. There are 52 members of the
2821
ensemble that drives the model each day. The ERA-5 historical precipitation dataset to also used to produce a
29-
hindcasted streamflow on each river. `Read more here <https://geoglows.ecmwf.int>`_.
22+
retrospective streamflow on each river. `Read more here <https://geoglows.ecmwf.int>`_.
3023

3124
.. toctree::
3225
:caption: Table of Contents

geoglows/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,6 @@
1212
'bias', 'plots', 'data', 'analyze', 'streams', 'tables', 'streamflow',
1313
'METADATA_TABLE_PATH'
1414
]
15-
__version__ = '1.2.0'
15+
__version__ = '1.2.1'
1616
__author__ = 'Riley Hales'
1717
__license__ = 'BSD 3-Clause Clear License'

geoglows/data.py

+36-36
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
import requests
77
import s3fs
88
import xarray as xr
9+
import numpy as np
910

1011
from ._constants import METADATA_TABLE_PATH
1112
from .analyze import (
@@ -85,7 +86,7 @@ def from_aws(*args, **kwargs):
8586
df = ds.to_dataframe().round(2).reset_index()
8687

8788
# rename columns to match the REST API
88-
if isinstance(river_id, int):
89+
if isinstance(river_id, int) or isinstance(river_id, np.int64):
8990
df = df.pivot(index='time', columns='ensemble', values='Qout')
9091
else:
9192
df = df.pivot(index=['time', 'rivid'], columns='ensemble', values='Qout')
@@ -120,7 +121,7 @@ def from_rest(*args, **kwargs):
120121
endpoint = f'https://{endpoint}' if not endpoint.startswith(('https://', 'http://')) else endpoint
121122

122123
version = kwargs.get('version', DEFAULT_REST_ENDPOINT_VERSION)
123-
assert version in ('v1', 'v2', ), ValueError(f'Unrecognized model version parameter: {version}')
124+
assert version in ('v2', ), ValueError(f'Unrecognized model version parameter: {version}')
124125

125126
product_name = function.__name__.replace("_", "").lower()
126127

@@ -131,7 +132,7 @@ def from_rest(*args, **kwargs):
131132
'Use data_source="aws" and version="v2" for multiple river_ids.')
132133
river_id = int(river_id) if river_id else None
133134
if river_id and version == 'v2':
134-
assert river_id < 1_000_000_000 and river_id >= 110_000_000, ValueError('River ID must be a 9 digit integer')
135+
assert 1_000_000_000 > river_id >= 110_000_000, ValueError('River ID must be a 9 digit integer')
135136

136137
# request parameter validation before submitting
137138
for key in ('endpoint', 'version', 'river_id'):
@@ -178,8 +179,7 @@ def main(*args, **kwargs):
178179
assert source in ('rest', 'aws'), ValueError(f'Unrecognized data source requested: {source}')
179180
if source == 'rest':
180181
return from_rest(*args, **kwargs)
181-
else:
182-
return from_aws(*args, **kwargs)
182+
return from_aws(*args, **kwargs)
183183
main.__doc__ = function.__doc__ # necessary for code documentation auto generators
184184
return main
185185

@@ -191,7 +191,7 @@ def dates(**kwargs) -> dict or str:
191191
Gets a list of available forecast product dates
192192
193193
Keyword Args:
194-
data_source: location to query for data, either 'rest' or 'aws'. default is aws.
194+
data_source (str): location to query for data, either 'rest' or 'aws'. default is aws.
195195
196196
Returns:
197197
dict or str
@@ -204,14 +204,14 @@ def dates(**kwargs) -> dict or str:
204204

205205
@_forecast_endpoint_decorator
206206
def forecast(*, river_id: int, date: str, format: str, data_source: str,
207-
**kwargs) -> pd.DataFrame or dict or str:
207+
**kwargs) -> pd.DataFrame or xr.Dataset:
208208
"""
209209
Gets the average forecasted flow for a certain river_id on a certain date
210210
211211
Keyword Args:
212-
river_id (str): the ID of a stream, should be a 9 digit integer
212+
river_id (int): the ID of a stream, should be a 9 digit integer
213213
date (str): a string specifying the date to request in YYYYMMDD format, returns the latest available if not specified
214-
format (str): csv, json, or url, default csv
214+
format: if data_source=="rest": csv, json, or url, default csv. if data_source=="aws": df or xarray
215215
data_source (str): location to query for data, either 'rest' or 'aws'. default is aws.
216216
217217
Returns:
@@ -222,16 +222,16 @@ def forecast(*, river_id: int, date: str, format: str, data_source: str,
222222

223223
@_forecast_endpoint_decorator
224224
def forecast_stats(*, river_id: int, date: str, format: str, data_source: str,
225-
**kwargs) -> pd.DataFrame or dict or str:
225+
**kwargs) -> pd.DataFrame or xr.Dataset:
226226
"""
227227
Retrieves the min, 25%, mean, median, 75%, and max river discharge of the 51 ensembles members for a river_id
228228
The 52nd higher resolution member is excluded
229229
230230
Keyword Args:
231-
river_id: the ID of a stream, should be a 9 digit integer
232-
date: a string specifying the date to request in YYYYMMDD format, returns the latest available if not specified
233-
format: if data_source=="rest": csv, json, or url, default csv. if data_source=="aws": df or xarray
234-
data_source: location to query for data, either 'rest' or 'aws'. default is aws.
231+
river_id (int): the ID of a stream, should be a 9 digit integer
232+
date (str): a string specifying the date to request in YYYYMMDD format, returns the latest available if not specified
233+
format (str): if data_source=="rest": csv, json, or url, default csv. if data_source=="aws": df or xarray
234+
data_source (str): location to query for data, either 'rest' or 'aws'. default is aws.
235235
236236
Returns:
237237
pd.DataFrame or dict or str
@@ -241,15 +241,15 @@ def forecast_stats(*, river_id: int, date: str, format: str, data_source: str,
241241

242242
@_forecast_endpoint_decorator
243243
def forecast_ensembles(*, river_id: int, date: str, format: str, data_source: str,
244-
**kwargs) -> pd.DataFrame or dict or str:
244+
**kwargs) -> pd.DataFrame or xr.Dataset:
245245
"""
246246
Retrieves each of 52 time series of forecasted discharge for a river_id on a certain date
247247
248248
Keyword Args:
249-
river_id: the ID of a stream, should be a 9 digit integer
250-
date: a string specifying the date to request in YYYYMMDD format, returns the latest available if not specified
251-
format: if data_source=="rest": csv, json, or url, default csv. if data_source=="aws": df or xarray
252-
data_source: location to query for data, either 'rest' or 'aws'. default is aws.
249+
river_id (int): the ID of a stream, should be a 9 digit integer
250+
date (str): a string specifying the date to request in YYYYMMDD format, returns the latest available if not specified
251+
format (str): if data_source=="rest": csv, json, or url, default csv. if data_source=="aws": df or xarray
252+
data_source (str): location to query for data, either 'rest' or 'aws'. default is aws.
253253
254254
Returns:
255255
pd.DataFrame or dict or str
@@ -258,17 +258,16 @@ def forecast_ensembles(*, river_id: int, date: str, format: str, data_source: st
258258

259259

260260
@_forecast_endpoint_decorator
261-
def forecast_records(*, river_id: int, start_date: str, end_date: str, format: str, data_source: str,
261+
def forecast_records(*, river_id: int, start_date: str, end_date: str, format: str,
262262
**kwargs) -> pd.DataFrame or dict or str:
263263
"""
264264
Retrieves a csv showing the ensemble average forecasted flow for the year from January 1 to the current date
265265
266266
Keyword Args:
267-
river_id: the ID of a stream, should be a 9 digit integer
268-
start_date: a YYYYMMDD string giving the earliest date this year to include, defaults to 14 days ago.
269-
end_date: a YYYYMMDD string giving the latest date this year to include, defaults to latest available
270-
data_source: location to query for data, either 'rest' or 'aws'. default is aws.
271-
format: if data_source=="rest": csv, json, or url, default csv. if data_source=="aws": df or xarray
267+
river_id (int): the ID of a stream, should be a 9 digit integer
268+
start_date (str): a YYYYMMDD string giving the earliest date this year to include, defaults to 14 days ago.
269+
end_date (str): a YYYYMMDD string giving the latest date this year to include, defaults to latest available
270+
format (str): csv, json, or url, default csv.
272271
273272
Returns:
274273
pd.DataFrame or dict or str
@@ -280,11 +279,11 @@ def forecast_records(*, river_id: int, start_date: str, end_date: str, format: s
280279
def retrospective(river_id: int or list, format: str = 'df') -> pd.DataFrame or xr.Dataset:
281280
"""
282281
Retrieves the retrospective simulation of streamflow for a given river_id from the
283-
AWS Open Data Program GEOGloWS V2 S3 bucket
282+
AWS Open Data Program GEOGLOWS V2 S3 bucket
284283
285284
Args:
286-
river_id: the ID of a stream, should be a 9 digit integer
287-
format: the format to return the data, either 'df' or 'xarray'. default is 'df'
285+
river_id (int): the ID of a stream, should be a 9 digit integer
286+
format (str): the format to return the data, either 'df' or 'xarray'. default is 'df'
288287
289288
Returns:
290289
pd.DataFrame
@@ -302,12 +301,12 @@ def historical(*args, **kwargs):
302301
return retrospective(*args, **kwargs)
303302

304303

305-
def daily_averages(river_id: int or list) -> pd.DataFrame or xr.Dataset:
304+
def daily_averages(river_id: int or list) -> pd.DataFrame:
306305
"""
307306
Retrieves daily average streamflow for a given river_id
308307
309308
Args:
310-
river_id: the ID of a stream, should be a 9 digit integer
309+
river_id (int): the ID of a stream, should be a 9 digit integer
311310
312311
Returns:
313312
pd.DataFrame
@@ -321,7 +320,7 @@ def monthly_averages(river_id: int or list) -> pd.DataFrame:
321320
Retrieves monthly average streamflow for a given river_id
322321
323322
Args:
324-
river_id: the ID of a stream, should be a 9 digit integer
323+
river_id (int): the ID of a stream, should be a 9 digit integer
325324
326325
Returns:
327326
pd.DataFrame
@@ -335,7 +334,7 @@ def annual_averages(river_id: int or list) -> pd.DataFrame:
335334
Retrieves annual average streamflow for a given river_id
336335
337336
Args:
338-
river_id: the ID of a stream, should be a 9 digit integer
337+
river_id (int): the ID of a stream, should be a 9 digit integer
339338
340339
Returns:
341340
pd.DataFrame
@@ -344,13 +343,13 @@ def annual_averages(river_id: int or list) -> pd.DataFrame:
344343
return calc_annual_averages(df)
345344

346345

347-
def return_periods(river_id: int or list, format: str = 'df') -> pd.DataFrame:
346+
def return_periods(river_id: int or list, format: str = 'df') -> pd.DataFrame or xr.Dataset:
348347
"""
349348
Retrieves the return period thresholds based on a specified historic simulation forcing on a certain river_id.
350349
351350
Args:
352-
river_id: the ID of a stream, should be a 9 digit integer
353-
format: the format to return the data, either 'df' or 'xarray'. default is 'df'
351+
river_id (int): the ID of a stream, should be a 9 digit integer
352+
format (str): the format to return the data, either 'df' or 'xarray'. default is 'df'
354353
355354
Returns:
356355
pd.DataFrame
@@ -369,7 +368,7 @@ def metadata_tables(columns: list = None) -> pd.DataFrame:
369368
"""
370369
Retrieves the master table of rivers metadata and properties as a pandas DataFrame
371370
Args:
372-
columns: optional subset of columns names to read from the parquet
371+
columns (list): optional subset of columns names to read from the parquet
373372
374373
Returns:
375374
pd.DataFrame
@@ -379,6 +378,7 @@ def metadata_tables(columns: list = None) -> pd.DataFrame:
379378
warn = f"""
380379
Local copy of geoglows v2 metadata table not found. You should download a copy for optimal performance and
381380
to make the data available when you are offline. A copy of the table will be cached at {METADATA_TABLE_PATH}.
381+
Alternatively, set the environment variable PYGEOGLOWS_METADATA_TABLE_PATH to the path of the table.
382382
"""
383383
warnings.warn(warn)
384384
df = pd.read_parquet('https://geoglows-v2.s3-website-us-west-2.amazonaws.com/tables/package-metadata-table.parquet')

geoglows/streams.py

+28-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,16 @@
55
__all__ = ['river_to_vpu', 'latlon_to_river', 'river_to_latlon', ]
66

77

8-
def river_to_vpu(river_id: int) -> str or int:
8+
def river_to_vpu(river_id: int) -> int:
9+
"""
10+
Gives the VPU number for a given River ID number
11+
12+
Args:
13+
river_id (int): a 9 digit integer that is a valid GEOGLOWS River ID number
14+
15+
Returns:
16+
int: a 3 digit integer that is the VPU number for the given River ID number
17+
"""
918
return (
1019
metadata_tables(columns=['LINKNO', 'VPUCode'])
1120
.loc[lambda x: x['LINKNO'] == river_id, 'VPUCode']
@@ -14,12 +23,30 @@ def river_to_vpu(river_id: int) -> str or int:
1423

1524

1625
def latlon_to_river(lat: float, lon: float) -> int:
26+
"""
27+
Gives the River ID number whose outlet is nearest the given lat and lon
28+
Args:
29+
lat (float): a latitude
30+
lon (float): a longitude
31+
32+
Returns:
33+
int: a 9 digit integer that is a valid GEOGLOWS River ID number
34+
"""
1735
df = metadata_tables(columns=['LINKNO', 'lat', 'lon'])
1836
df['dist'] = ((df['lat'] - lat) ** 2 + (df['lon'] - lon) ** 2) ** 0.5
1937
return df.loc[lambda x: x['dist'] == df['dist'].min(), 'LINKNO'].values[0]
2038

2139

2240
def river_to_latlon(river_id: int) -> np.ndarray:
41+
"""
42+
Gives the lat and lon of the outlet of the river with the given River ID number
43+
44+
Args:
45+
river_id (int): a 9 digit integer that is a valid GEOGLOWS River ID number
46+
47+
Returns:
48+
np.ndarray: a numpy array of floats, [lat, lon]
49+
"""
2350
return (
2451
metadata_tables(columns=['LINKNO', 'lat', 'lon'])
2552
.loc[lambda x: x['LINKNO'] == river_id, ['lat', 'lon']]

0 commit comments

Comments
 (0)