[1]:
import earthkit.hydro as ekh
import numpy as np
import xarray as xr
Handling xarray and array inputs and outputs¶
For every operation, earthkit-hydro has two versions:
an xarray version, called at
ekh.submodule.operationan array version, called at
ekh.submodule.array.operation
This notebook demonstrates this for some simple operations.
[2]:
network = ekh.river_network.load("efas", "5", use_cache=False)
Cache disabled.
xarray versions¶
xarray is the high-level recommended approach for conducting operations on river networks.
For this tutorial, example xarray data is created, but in practice this can be loaded from any supported file with ekd.from_source(...).to_xarray() (or xr.open_dataset(...)).
[3]:
example_arr = np.random.rand(2, *network.shape)
lat = network.coords['lat']
lon = network.coords['lon']
step = np.arange(2)
example_da = xr.DataArray(
example_arr,
dims = ["step", "lat", "lon"],
coords = {"step": step, "lat": lat, "lon": lon},
name = "precip",
attrs={"units": "m", "description": "Sample precipitation data"}
)
example_da
[3]:
<xarray.DataArray 'precip' (step: 2, lat: 2970, lon: 4530)> Size: 215MB
array([[[0.15711655, 0.52936434, 0.56102303, ..., 0.06878928,
0.28992038, 0.92908718],
[0.91444499, 0.31846002, 0.10545373, ..., 0.59321767,
0.63550775, 0.80646403],
[0.38624433, 0.68650667, 0.16244652, ..., 0.71864169,
0.05510311, 0.42310523],
...,
[0.75660688, 0.02474129, 0.46234213, ..., 0.75432537,
0.24582409, 0.09432709],
[0.30669821, 0.81120274, 0.18280503, ..., 0.47122936,
0.45315729, 0.47209622],
[0.3014824 , 0.43804469, 0.33486898, ..., 0.91205994,
0.00772117, 0.84704023]],
[[0.57485927, 0.04379043, 0.1061605 , ..., 0.10572855,
0.03497205, 0.52542233],
[0.59805427, 0.05859767, 0.26522621, ..., 0.8811148 ,
0.7002739 , 0.10340192],
[0.68846108, 0.83715735, 0.7100234 , ..., 0.42872913,
0.62742133, 0.33977993],
...,
[0.6860338 , 0.57439944, 0.91223467, ..., 0.03131001,
0.16203806, 0.57760851],
[0.76017868, 0.43437957, 0.64427315, ..., 0.49580478,
0.70649661, 0.42344819],
[0.60303899, 0.79094081, 0.32377667, ..., 0.83884337,
0.71465062, 0.44616404]]])
Coordinates:
* step (step) int64 16B 0 1
* lat (lat) float64 24kB 72.24 72.22 72.21 72.19 ... 22.79 22.77 22.76
* lon (lon) float64 36kB -25.24 -25.23 -25.21 ... 50.21 50.22 50.24
Attributes:
units: m
description: Sample precipitation dataAll operations can then be directly conducted on this xarray dataarray, and will return back an xarray dataarray.
For example, we can do an upstream.sum, or a catchments.sum.
[4]:
ekh.upstream.sum(network, example_da)
[4]:
<xarray.DataArray 'precip' (step: 2, lat: 2970, lon: 4530)> Size: 215MB
array([[[0.15711655, 0.52936434, 0.56102303, ..., nan,
nan, nan],
[0.91444499, 1.23290502, 1.86772309, ..., nan,
nan, nan],
[0.38624433, 0.84895319, 0.16244652, ..., nan,
nan, nan],
...,
[ nan, nan, nan, ..., 0.75432537,
0.71705345, 0.09432709],
[ nan, nan, nan, ..., 0.47122936,
2.46722595, 3.7863624 ],
[ nan, nan, nan, ..., 1.28929405,
0.00772117, 0.84704023]],
[[0.57485927, 0.04379043, 0.1061605 , ..., nan,
nan, nan],
[0.59805427, 0.65665193, 0.96566858, ..., nan,
nan, nan],
[0.68846108, 1.54718075, 0.7100234 , ..., nan,
nan, nan],
...,
[ nan, nan, nan, ..., 0.03131001,
0.65784284, 0.57760851],
[ nan, nan, nan, ..., 0.49580478,
3.73241039, 4.60202261],
[ nan, nan, nan, ..., 1.65342032,
0.71465062, 0.44616404]]])
Coordinates:
* step (step) int64 16B 0 1
* lat (lat) float64 24kB 72.24 72.22 72.21 72.19 ... 22.79 22.77 22.76
* lon (lon) float64 36kB -25.24 -25.23 -25.21 ... 50.21 50.22 50.24[5]:
ekh.catchments.sum(network, example_da, locations = {
"gauge_1": (70.475, 28.32),
"gauge_2": (42.225, 50.24)
}
)
[5]:
<xarray.DataArray 'precip' (step: 2, node_index: 2)> Size: 32B
array([[ 6710.97380367, 272520.24025306],
[ 6716.52716166, 272441.37894161]])
Coordinates:
* step (step) int64 16B 0 1
* node_index (node_index) int64 16B 26066 4153831
lat (node_index) float64 16B 28.32 50.24
lon (node_index) float64 16B 70.47 42.23
name (node_index) <U7 56B 'gauge_1' 'gauge_2'We can also pass a dataset, in which case we will be returned back a corresponding dataset.
[6]:
example_ds = xr.Dataset(
data_vars={"var1": example_da, "var2": example_da+1}
)
example_ds
[6]:
<xarray.Dataset> Size: 431MB
Dimensions: (step: 2, lat: 2970, lon: 4530)
Coordinates:
* step (step) int64 16B 0 1
* lat (lat) float64 24kB 72.24 72.22 72.21 72.19 ... 22.79 22.77 22.76
* lon (lon) float64 36kB -25.24 -25.23 -25.21 ... 50.21 50.22 50.24
Data variables:
var1 (step, lat, lon) float64 215MB 0.1571 0.5294 ... 0.7147 0.4462
var2 (step, lat, lon) float64 215MB 1.157 1.529 1.561 ... 1.715 1.446[7]:
ekh.upstream.sum(network, example_ds)
[7]:
<xarray.Dataset> Size: 431MB
Dimensions: (step: 2, lat: 2970, lon: 4530)
Coordinates:
* step (step) int64 16B 0 1
* lat (lat) float64 24kB 72.24 72.22 72.21 72.19 ... 22.79 22.77 22.76
* lon (lon) float64 36kB -25.24 -25.23 -25.21 ... 50.21 50.22 50.24
Data variables:
var1 (step, lat, lon) float64 215MB 0.1571 0.5294 ... 0.7147 0.4462
var2 (step, lat, lon) float64 215MB 1.157 1.529 1.561 ... 1.715 1.446Note that the coordinates were automatically detected in the above example. In some cases, it will not be possible infer the spatial coordinates in the xarray object. See for example the following:
[8]:
example_da_uninferrable = xr.DataArray(
example_arr,
dims = ["step", "uninferrable_name_1", "uninferrable_name_2"],
coords = {"step": step, "uninferrable_name_1": lat, "uninferrable_name_2": lon},
name = "precip",
attrs={"units": "m", "description": "Sample precipitation data"}
)
This will raise a ValueError: Could not autodetect xarray core dims. However, we can manually specify the input core dimensions for each of the xarray inputs via the argument input_core_dims.
[9]:
ekh.upstream.sum(network, example_da_uninferrable, input_core_dims=[["uninferrable_name_1", "uninferrable_name_2"]])
[9]:
<xarray.DataArray 'precip' (step: 2, uninferrable_name_1: 2970,
uninferrable_name_2: 4530)> Size: 215MB
array([[[0.15711655, 0.52936434, 0.56102303, ..., nan,
nan, nan],
[0.91444499, 1.23290502, 1.86772309, ..., nan,
nan, nan],
[0.38624433, 0.84895319, 0.16244652, ..., nan,
nan, nan],
...,
[ nan, nan, nan, ..., 0.75432537,
0.71705345, 0.09432709],
[ nan, nan, nan, ..., 0.47122936,
2.46722595, 3.7863624 ],
[ nan, nan, nan, ..., 1.28929405,
0.00772117, 0.84704023]],
[[0.57485927, 0.04379043, 0.1061605 , ..., nan,
nan, nan],
[0.59805427, 0.65665193, 0.96566858, ..., nan,
nan, nan],
[0.68846108, 1.54718075, 0.7100234 , ..., nan,
nan, nan],
...,
[ nan, nan, nan, ..., 0.03131001,
0.65784284, 0.57760851],
[ nan, nan, nan, ..., 0.49580478,
3.73241039, 4.60202261],
[ nan, nan, nan, ..., 1.65342032,
0.71465062, 0.44616404]]])
Coordinates:
* step (step) int64 16B 0 1
* uninferrable_name_1 (uninferrable_name_1) float64 24kB 72.24 ... 22.76
* uninferrable_name_2 (uninferrable_name_2) float64 36kB -25.24 ... 50.24We need to specify this for each xarray argument. In the below example, we have two xarray arguments so we specify the input_core_dims for both, in the order defined by the function signature.
[10]:
ekh.upstream.sum(network, example_da, node_weights=example_da_uninferrable, input_core_dims=[["lat", "lon"], ["uninferrable_name_1", "uninferrable_name_2"]])
[10]:
<xarray.DataArray 'precip' (step: 2, lat: 2970, lon: 4530)> Size: 215MB
array([[[2.46856100e-02, 2.80226605e-01, 3.14746841e-01, ...,
nan, nan, nan],
[8.36209649e-01, 9.37626435e-01, 1.22897353e+00, ...,
nan, nan, nan],
[1.49184683e-01, 4.97680282e-01, 2.63888722e-02, ...,
nan, nan, nan],
...,
[ nan, nan, nan, ...,
5.69006757e-01, 2.82486592e-01, 8.89760050e-03],
[ nan, nan, nan, ...,
2.22057110e-01, 1.46205664e+00, 2.40240863e+00],
[ nan, nan, nan, ...,
9.74158904e-01, 5.96164933e-05, 7.17477153e-01]],
[[3.30463180e-01, 1.91760206e-03, 1.12700525e-02, ...,
nan, nan, nan],
[3.57668905e-01, 3.61102592e-01, 4.33365136e-01, ...,
nan, nan, nan],
[4.73978662e-01, 1.20496566e+00, 5.04133234e-01, ...,
nan, nan, nan],
...,
[ nan, nan, nan, ...,
9.80316509e-04, 2.72078712e-01, 3.33631585e-01],
[ nan, nan, nan, ...,
2.45822378e-01, 2.64913548e+00, 3.02750620e+00],
[ nan, nan, nan, ...,
1.36719381e+00, 5.10725503e-01, 1.99062347e-01]]])
Coordinates:
* step (step) int64 16B 0 1
* lat (lat) float64 24kB 72.24 72.22 72.21 72.19 ... 22.79 22.77 22.76
* lon (lon) float64 36kB -25.24 -25.23 -25.21 ... 50.21 50.22 50.24It is also possible to give purely array inputs, in which case an xarray dataarray is returned with the coordinate information of the river network. For example,
[11]:
ekh.upstream.sum(network, example_arr)
[11]:
<xarray.DataArray 'out' (axis1: 2, lat: 2970, lon: 4530)> Size: 215MB
array([[[0.15711655, 0.52936434, 0.56102303, ..., nan,
nan, nan],
[0.91444499, 1.23290502, 1.86772309, ..., nan,
nan, nan],
[0.38624433, 0.84895319, 0.16244652, ..., nan,
nan, nan],
...,
[ nan, nan, nan, ..., 0.75432537,
0.71705345, 0.09432709],
[ nan, nan, nan, ..., 0.47122936,
2.46722595, 3.7863624 ],
[ nan, nan, nan, ..., 1.28929405,
0.00772117, 0.84704023]],
[[0.57485927, 0.04379043, 0.1061605 , ..., nan,
nan, nan],
[0.59805427, 0.65665193, 0.96566858, ..., nan,
nan, nan],
[0.68846108, 1.54718075, 0.7100234 , ..., nan,
nan, nan],
...,
[ nan, nan, nan, ..., 0.03131001,
0.65784284, 0.57760851],
[ nan, nan, nan, ..., 0.49580478,
3.73241039, 4.60202261],
[ nan, nan, nan, ..., 1.65342032,
0.71465062, 0.44616404]]])
Coordinates:
* axis1 (axis1) int64 16B 0 1
* lat (lat) float64 24kB 72.24 72.22 72.21 72.19 ... 22.79 22.77 22.76
* lon (lon) float64 36kB -25.24 -25.23 -25.21 ... 50.21 50.22 50.24Note: Array inputs must follow the convention that the last dimensions are the river network related dimensions, in this case lat and lon. Therefore, for any vectorised operations, the extra dimensions must be leading.
array versions¶
A lower level API is also available for users interested in working directly with arrays. This API does not accept any xarray inputs, but works analagously to the xarray API. Taking the two above examples, the array forms are:
[12]:
ekh.upstream.array.sum(network, example_arr)
[12]:
array([[[0.15711655, 0.52936434, 0.56102303, ..., nan,
nan, nan],
[0.91444499, 1.23290502, 1.86772309, ..., nan,
nan, nan],
[0.38624433, 0.84895319, 0.16244652, ..., nan,
nan, nan],
...,
[ nan, nan, nan, ..., 0.75432537,
0.71705345, 0.09432709],
[ nan, nan, nan, ..., 0.47122936,
2.46722595, 3.7863624 ],
[ nan, nan, nan, ..., 1.28929405,
0.00772117, 0.84704023]],
[[0.57485927, 0.04379043, 0.1061605 , ..., nan,
nan, nan],
[0.59805427, 0.65665193, 0.96566858, ..., nan,
nan, nan],
[0.68846108, 1.54718075, 0.7100234 , ..., nan,
nan, nan],
...,
[ nan, nan, nan, ..., 0.03131001,
0.65784284, 0.57760851],
[ nan, nan, nan, ..., 0.49580478,
3.73241039, 4.60202261],
[ nan, nan, nan, ..., 1.65342032,
0.71465062, 0.44616404]]])
[13]:
ekh.catchments.array.sum(network, example_arr, locations = {
"gauge_1": (70.475, 28.32),
"gauge_2": (42.225, 50.24)
}
)
[13]:
array([[ 6710.97380367, 272520.24025306],
[ 6716.52716166, 272441.37894161]])