Array backend design¶
This page explains why earthkit-hydro supports multiple array backends and how this design choice affects your work.
The flexibility challenge¶
Hydrological datasets come in many forms - NetCDF files, GeoTIFFs, CSV tables, databases - and scientists use different computing ecosystems:
Climate scientists often use xarray for labeled multi-dimensional arrays
Machine learning practitioners use PyTorch, JAX, or TensorFlow
Performance-focused users leverage CuPy for GPU acceleration
Traditional scientific computing relies on NumPy
Rather than forcing everyone into one framework, earthkit-hydro is backend-agnostic.
How it works¶
earthkit-hydro operations work with any array backend that supports basic operations like indexing, aggregation, and mathematical operations. The library:
Detects what array type you provide (numpy, cupy, torch, etc.)
Dispatches operations to backend-appropriate implementations
Returns results in the same array type you provided
This means:
import numpy as np
import torch
import earthkit.hydro as ekh
network = ekh.river_network.load("efas", "5")
# Works with NumPy
numpy_data = np.ones(network.shape)
result_np = ekh.accumulate(network, numpy_data) # Returns NumPy array
# Works with PyTorch
torch_data = torch.ones(network.shape)
result_torch = ekh.accumulate(network, torch_data) # Returns PyTorch tensor
No conversion needed, no backend lock-in.
Benefits for machine learning¶
Supporting ML frameworks natively enables:
Differentiability: Operations with PyTorch/JAX are differentiable, allowing:
Gradient-based optimization of hydrological models
Integration of physical constraints in neural networks
Parameter estimation through backpropagation
GPU acceleration: Automatic GPU execution with CuPy/PyTorch CUDA tensors for:
Processing large spatial domains
Ensemble simulations
Real-time applications
Framework integration: Seamless use in existing ML pipelines without data conversion overhead.
Benefits for traditional workflows¶
Even if you don’t use ML frameworks, backend flexibility provides:
xarray integration: Preserve dimension labels, coordinates, and metadata throughout your workflow:
import xarray as xr
import earthkit.hydro as ekh
# Input as xarray with coordinates and metadata
runoff = xr.open_dataset("runoff.nc")["runoff"]
network = ekh.river_network.load("efas", "5")
# Output preserves xarray structure
discharge = ekh.accumulate(network, runoff)
# discharge is still an xarray DataArray with coordinates!
Performance portability: Switch to GPU execution by changing array type, not code:
# CPU version
import numpy as np
data = np.array(...)
result = ekh.accumulate(network, data)
# GPU version - same operation, just different array type
import cupy as cp
data = cp.array(...)
result = ekh.accumulate(network, data) # Runs on GPU!
Trade-offs and limitations¶
Not all operations possible on all backends: Some backends lack certain features. For example:
JAX arrays are immutable, limiting in-place operations
Some sparse operations aren’t available in all frameworks
earthkit-hydro will raise clear errors when operations aren’t supported.
Performance varies: While all backends work, performance characteristics differ:
NumPy: Mature, CPU-optimized
CuPy: Drop-in GPU acceleration
PyTorch: Optimized for ML, good GPU support
JAX: JIT compilation can be very fast
Memory layout matters: Each framework has preferred memory layouts. earthkit-hydro tries to respect these, but cross-framework conversion can be expensive.
Choosing a backend¶
Use NumPy if: You want simplicity, wide compatibility, and CPU-based computation
Use xarray if: You work with labeled multi-dimensional earth science data
Use CuPy if: You need GPU acceleration without PyTorch/TensorFlow overhead
Use PyTorch if: You’re integrating with ML models or need autograd
Use JAX if: You want JIT compilation and functional programming benefits
Use TensorFlow if: You’re already in TensorFlow ecosystem
Implementation details¶
earthkit-hydro uses array protocols rather than explicit type checking:
Duck typing for compatibility
__array_namespace__support for Array API standardMinimal backend-specific code
This approach future-proofs the library as new array libraries emerge.
See also¶
Handling xarray and multiple array backends - Practical guide to using different backends
Switching array backends - Tutorial on array backend usage
Performance considerations - When to choose which backend for performance