[1]:
import earthkit.hydro as ekh
import numpy as np
import xarray as xr

network = ekh.river_network.load("efas", "5", use_cache=False)
Cache disabled.

Gridded/raster vs masked/vector river networks

By default, earthkit-hydro returns gridded outputs when possible. This is generally easier to interpret. However, each river network also has a masked/vector representation which returns just values for each node in the river network graph. This is harder to interpret but computationall more efficient, and so can be used in performance-sensitive cirumstances. It is possible to change this behaviour via the return_type function argument.

The classical example returns a full grid, with nans for masked locations.

[2]:
ekh.upstream.sum(network, np.random.rand(*network.shape), return_type="gridded")
[2]:
<xarray.DataArray 'out' (lat: 2970, lon: 4530)> Size: 108MB
array([[0.31441124, 0.59091242, 0.97068519, ...,        nan,        nan,
               nan],
       [0.27791134, 0.90448908, 2.01488283, ...,        nan,        nan,
               nan],
       [0.18090902, 1.50039727, 0.53583085, ...,        nan,        nan,
               nan],
       ...,
       [       nan,        nan,        nan, ..., 0.71282058, 1.2345076 ,
        0.32471655],
       [       nan,        nan,        nan, ..., 0.30923209, 2.73792786,
        3.34827361],
       [       nan,        nan,        nan, ..., 0.20266639, 0.91326591,
        0.30667545]])
Coordinates:
  * lat      (lat) float64 24kB 72.24 72.22 72.21 72.19 ... 22.79 22.77 22.76
  * lon      (lon) float64 36kB -25.24 -25.23 -25.21 ... 50.21 50.22 50.24

The masked version returns just at the river network nodes.

[3]:
ekh.upstream.sum(network, np.random.rand(*network.shape), return_type="masked")
[3]:
<xarray.DataArray 'out' (node_index: 7446075)> Size: 60MB
array([0.23856868, 0.63199563, 0.70555127, ..., 1.50910996, 0.10167544,
       0.40516712])
Coordinates:
  * node_index  (node_index) int64 60MB 0 1 2 3 ... 7446072 7446073 7446074
    lat         (node_index) float64 60MB 72.24 72.22 72.21 ... 22.77 22.76
    lon         (node_index) float64 60MB -25.24 -25.24 -25.24 ... 50.24 50.24

The difference can be clearly seen in the array shapes:

[4]:
print("gridded shape:", ekh.upstream.array.sum(network, np.random.rand(*network.shape), return_type="gridded").shape, "=", "network.shape:", network.shape)
print("masked shape:", ekh.upstream.array.sum(network, np.random.rand(*network.shape), return_type="masked").shape, "=", "network.n_nodes", network.n_nodes)
gridded shape: (2970, 4530) = network.shape: (2970, 4530)
masked shape: (7446075,) = network.n_nodes 7446075

Note furthermore that the input field we provided was gridded for both cases. We can however also specify the field as a masked field, and still return either gridded or masked.

[5]:
ekh.upstream.sum(network, np.random.rand(network.n_nodes), return_type="gridded")
[5]:
<xarray.DataArray 'out' (lat: 2970, lon: 4530)> Size: 108MB
array([[0.15397929, 0.99131352, 0.08821796, ...,        nan,        nan,
               nan],
       [0.61874943, 1.06494109, 2.85583533, ...,        nan,        nan,
               nan],
       [0.86773297, 0.56631639, 0.03820655, ...,        nan,        nan,
               nan],
       ...,
       [       nan,        nan,        nan, ..., 0.74946061, 0.89856876,
        0.83419313],
       [       nan,        nan,        nan, ..., 0.73135084, 2.78773471,
        3.86566138],
       [       nan,        nan,        nan, ..., 1.75621633, 0.07463169,
        0.33194373]])
Coordinates:
  * lat      (lat) float64 24kB 72.24 72.22 72.21 72.19 ... 22.79 22.77 22.76
  * lon      (lon) float64 36kB -25.24 -25.23 -25.21 ... 50.21 50.22 50.24

This also works with xarray inputs.

[6]:
example_arr = np.random.rand(network.n_nodes)

index = np.arange(network.n_nodes)

example_da = xr.DataArray(
    example_arr,
    dims = ["index"],
    coords = {"index": index},
    name = "precip",
    attrs={"units": "m", "description": "Sample precipitation data"}
)

ekh.upstream.sum(network, example_da, return_type="gridded")
[6]:
<xarray.DataArray 'precip' (lat: 2970, lon: 4530)> Size: 108MB
array([[0.44191097, 0.21351225, 0.29030066, ...,        nan,        nan,
               nan],
       [0.63691408, 1.23258398, 2.42445828, ...,        nan,        nan,
               nan],
       [0.63425073, 0.64538632, 0.54336083, ...,        nan,        nan,
               nan],
       ...,
       [       nan,        nan,        nan, ..., 0.28802293, 1.21333914,
        0.18728341],
       [       nan,        nan,        nan, ..., 0.38447251, 2.7660881 ,
        3.74619826],
       [       nan,        nan,        nan, ..., 0.6940397 , 0.54701896,
        0.690217  ]])
Dimensions without coordinates: lat, lon

Changing the return_type in each function call can be cumbersome, so a default can be set for the river network object itself.

[7]:
print("default is gridded: ", ekh.upstream.sum(network, np.random.rand(*network.shape)).shape)
network.set_default_return_type("masked")
print("default is now masked: ", ekh.upstream.sum(network, np.random.rand(*network.shape)).shape)
default is gridded:  (2970, 4530)
default is now masked:  (7446075,)