Optimising performance¶

Cache river networks¶

Creating a river network from a raw flow direction file requires topological sorting, which is the most expensive step. Export the result and reload it in subsequent runs:

import earthkit.hydro as ekh

# First time: create and export
network = ekh.river_network.create("my_network.nc", "pcr_d8", "file")
network.export("my_network.joblib")

# Subsequent runs: 100–1000x faster
network = ekh.river_network.create("my_network.joblib", "precomputed", "file")

Pre-computed networks loaded via ekh.river_network.load are already optimised.

GPU acceleration¶

For large domains (> 1M cells), moving to a GPU backend can give significant speedups:

import cupy as cp

network = ekh.river_network.load("efas", "5").to_device(array_backend="cupy")
field_gpu = cp.asarray(field)
result_gpu = ekh.upstream.sum(network, field_gpu)

# Move result back to CPU if needed
result = cp.asnumpy(result_gpu)

PyTorch, JAX, and other GPU-capable backends work the same way.

Reduce network size for testing¶

Extract a regional subnetwork for faster development cycles:

mask = (lats > 40) & (lats < 50) & (lons > 0) & (lons < 10)
small_network = ekh.subnetwork.from_mask(network, mask)

See also¶

Performance considerations — Performance characteristics in depth
Array backend design — Choosing the right backend
Handling xarray and multiple array backends — Switching backends