Array backend design¶

This page explains why earthkit-hydro supports multiple array backends and how this design choice affects your work.

The flexibility challenge¶

Hydrological datasets come in many forms - NetCDF files, GeoTIFFs, CSV tables, databases - and scientists use different computing ecosystems:

Climate scientists often use xarray for labeled multi-dimensional arrays
Machine learning practitioners use PyTorch, JAX, or TensorFlow
Performance-focused users leverage CuPy for GPU acceleration
Traditional scientific computing relies on NumPy

Rather than forcing everyone into one framework, earthkit-hydro is backend-agnostic.

How it works¶

Earthkit in general, but specifically also earthkit-hydro operations work with any array backend that supports basic operations like indexing, aggregation, and mathematical operations. The library:

Detects what array type you provide (numpy, cupy, torch, etc.)
Dispatches operations to backend-appropriate implementations
Returns results in the same array type you provided

This means:

import numpy as np
import torch
import earthkit.hydro as ekh

network = ekh.river_network.load("efas", "5")

# Works with NumPy
numpy_data = np.ones(network.shape)
result_np = ekh.upstream.array.sum(network, numpy_data)  # Returns NumPy array

# Works with PyTorch
torch_data = torch.ones(network.shape)
result_torch = ekh.upstream.array.sum(network, torch_data)  # Returns PyTorch tensor

No conversion needed, no backend lock-in.

Benefits for machine learning¶

Supporting ML frameworks natively enables:

Differentiability: Operations with PyTorch/JAX are differentiable, allowing:

Gradient-based optimization of hydrological models
Integration of physical constraints in neural networks
Parameter estimation through backpropagation

GPU acceleration: Automatic GPU execution with CuPy/PyTorch CUDA tensors for:

Processing large spatial domains
Ensemble simulations
Real-time applications

Framework integration: Seamless use in existing ML pipelines without data conversion overhead.

Benefits for traditional workflows¶

Even if you don’t use ML frameworks, backend flexibility provides:

xarray integration: Preserve dimension labels, coordinates, and metadata throughout your workflow:

import earthkit.data as ekd
import earthkit.hydro as ekh

# Input as xarray with coordinates and metadata
runoff = ekd.from_source("file", "runoff.nc").to_xarray()["runoff"]

network = ekh.river_network.load("efas", "5")

discharge = ekh.upstream.sum(network, runoff)
# discharge is still an xarray DataArray with coordinates!

Performance portability: Switch to GPU execution by changing array type, not code:

# CPU version
import numpy as np
data = np.array(...)
result = ekh.upstream.array.sum(network, data)

# GPU version - same operation, just different array type
import cupy as cp
data = cp.array(...)
result = ekh.upstream.array.sum(network, data)  # Runs on GPU!

Array backend design¶

The flexibility challenge¶

How it works¶

Benefits for machine learning¶

Benefits for traditional workflows¶

See also¶