[1]:
import earthkit.hydro as ekh
import numpy as np
import xarray as xr
network = ekh.river_network.load("efas", "5", use_cache=False)
Cache disabled.
Gridded/raster vs masked/vector river networks¶
By default, earthkit-hydro returns gridded outputs when possible. This is generally easier to interpret. However, each river network also has a masked/vector representation which returns just values for each node in the river network graph. This is harder to interpret but computationall more efficient, and so can be used in performance-sensitive cirumstances. It is possible to change this behaviour via the return_type function argument.
The classical example returns a full grid, with nans for masked locations.
[2]:
ekh.upstream.sum(network, np.random.rand(*network.shape), return_type="gridded")
[2]:
<xarray.DataArray 'out' (lat: 2970, lon: 4530)> Size: 108MB
array([[0.31441124, 0.59091242, 0.97068519, ..., nan, nan,
nan],
[0.27791134, 0.90448908, 2.01488283, ..., nan, nan,
nan],
[0.18090902, 1.50039727, 0.53583085, ..., nan, nan,
nan],
...,
[ nan, nan, nan, ..., 0.71282058, 1.2345076 ,
0.32471655],
[ nan, nan, nan, ..., 0.30923209, 2.73792786,
3.34827361],
[ nan, nan, nan, ..., 0.20266639, 0.91326591,
0.30667545]])
Coordinates:
* lat (lat) float64 24kB 72.24 72.22 72.21 72.19 ... 22.79 22.77 22.76
* lon (lon) float64 36kB -25.24 -25.23 -25.21 ... 50.21 50.22 50.24The masked version returns just at the river network nodes.
[3]:
ekh.upstream.sum(network, np.random.rand(*network.shape), return_type="masked")
[3]:
<xarray.DataArray 'out' (node_index: 7446075)> Size: 60MB
array([0.23856868, 0.63199563, 0.70555127, ..., 1.50910996, 0.10167544,
0.40516712])
Coordinates:
* node_index (node_index) int64 60MB 0 1 2 3 ... 7446072 7446073 7446074
lat (node_index) float64 60MB 72.24 72.22 72.21 ... 22.77 22.76
lon (node_index) float64 60MB -25.24 -25.24 -25.24 ... 50.24 50.24The difference can be clearly seen in the array shapes:
[4]:
print("gridded shape:", ekh.upstream.array.sum(network, np.random.rand(*network.shape), return_type="gridded").shape, "=", "network.shape:", network.shape)
print("masked shape:", ekh.upstream.array.sum(network, np.random.rand(*network.shape), return_type="masked").shape, "=", "network.n_nodes", network.n_nodes)
gridded shape: (2970, 4530) = network.shape: (2970, 4530)
masked shape: (7446075,) = network.n_nodes 7446075
Note furthermore that the input field we provided was gridded for both cases. We can however also specify the field as a masked field, and still return either gridded or masked.
[5]:
ekh.upstream.sum(network, np.random.rand(network.n_nodes), return_type="gridded")
[5]:
<xarray.DataArray 'out' (lat: 2970, lon: 4530)> Size: 108MB
array([[0.15397929, 0.99131352, 0.08821796, ..., nan, nan,
nan],
[0.61874943, 1.06494109, 2.85583533, ..., nan, nan,
nan],
[0.86773297, 0.56631639, 0.03820655, ..., nan, nan,
nan],
...,
[ nan, nan, nan, ..., 0.74946061, 0.89856876,
0.83419313],
[ nan, nan, nan, ..., 0.73135084, 2.78773471,
3.86566138],
[ nan, nan, nan, ..., 1.75621633, 0.07463169,
0.33194373]])
Coordinates:
* lat (lat) float64 24kB 72.24 72.22 72.21 72.19 ... 22.79 22.77 22.76
* lon (lon) float64 36kB -25.24 -25.23 -25.21 ... 50.21 50.22 50.24This also works with xarray inputs.
[6]:
example_arr = np.random.rand(network.n_nodes)
index = np.arange(network.n_nodes)
example_da = xr.DataArray(
example_arr,
dims = ["index"],
coords = {"index": index},
name = "precip",
attrs={"units": "m", "description": "Sample precipitation data"}
)
ekh.upstream.sum(network, example_da, return_type="gridded")
[6]:
<xarray.DataArray 'precip' (lat: 2970, lon: 4530)> Size: 108MB
array([[0.44191097, 0.21351225, 0.29030066, ..., nan, nan,
nan],
[0.63691408, 1.23258398, 2.42445828, ..., nan, nan,
nan],
[0.63425073, 0.64538632, 0.54336083, ..., nan, nan,
nan],
...,
[ nan, nan, nan, ..., 0.28802293, 1.21333914,
0.18728341],
[ nan, nan, nan, ..., 0.38447251, 2.7660881 ,
3.74619826],
[ nan, nan, nan, ..., 0.6940397 , 0.54701896,
0.690217 ]])
Dimensions without coordinates: lat, lonChanging the return_type in each function call can be cumbersome, so a default can be set for the river network object itself.
[7]:
print("default is gridded: ", ekh.upstream.sum(network, np.random.rand(*network.shape)).shape)
network.set_default_return_type("masked")
print("default is now masked: ", ekh.upstream.sum(network, np.random.rand(*network.shape)).shape)
default is gridded: (2970, 4530)
default is now masked: (7446075,)