Frequently Asked Questions

General

What river network formats does earthkit-hydro support?

PCRaster D8, ArcGIS/ESRI D8, CaMa-Flood, HydroSHEDS, MERIT-Hydro and GRIT formats are all supported. See earthkit.hydro.river_network for the full list of formats and pre-computed networks.

What array backends are supported?

NumPy, xarray, CuPy, PyTorch, JAX, MLX and TensorFlow. The default backend is NumPy. You can switch backends via network.to_device(array_backend=chosen_array_backend). See Handling xarray and multiple array backends for details.

Does earthkit-hydro support GPU acceleration?

Yes. Any backend with GPU support (CuPy, PyTorch, JAX, etc.) can be used directly. Load or convert a river network to the desired backend and all subsequent operations run on the GPU.

Can I use earthkit-hydro with xarray datasets?

Yes. All top-level functions accept xarray and return xarray objects with coordinates preserved. This integrates naturally with common climate and weather data workflows.

Does earthkit-hydro handle bifurcating river networks?

Yes. Networks where flow splits at a node (e.g. distributary channels, braided rivers) are fully supported. This is a key distinction from many other hydrological tools that assume tree-structured networks.

Installation

What Python version do I need?

We adopt stable Python versions. Check the [status of Python versions](https://devguide.python.org/versions/) for the latest information. As of April 2025, Python 3.10+ is required.

How do I install GPU support?

Install the GPU backend of your choice separately (e.g. pip install torch), then convert your river network with network.to_device(array_backend=chosen_array_backend, device=chosen_device).

Data and performance

Loading a custom river network is slow. How can I speed it up?

Creating a river network from a raw flow direction file requires topological sorting, which is expensive for large grids. Export the processed network once with network.export("my_network.joblib") and reload it with ekh.river_network.create("my_network.joblib", "precomputed"). See Loading a river network for details.

How are missing values handled?

earthkit-hydro follows the NumPy convention: missing values are represented as np.nan and propagate through all operations. See Missing value handling philosophy for the rationale behind this design.

I’m migrating from PCRaster. Where do I start?

See the PCRaster compatibility page for a function-by-function translation table and a summary of the key differences.

What is the difference between distance and length?

In earthkit-hydro, distances are edge properties (the cost of traversing a connection between two nodes) while lengths are node properties (the extent associated with each node). This distinction matters at confluences and bifurcations. See Distance vs. length concepts for a full explanation.