Distance vs. length concepts¶

Understanding the distinction between distance and length is fundamental to working with river networks in earthkit-hydro. While these terms are often used interchangeably in everyday language, they represent fundamentally different concepts in hydrological network analysis.

Why this distinction matters¶

In many hydrological tools, distance and length calculations are conflated, which can lead to:

Incorrect routing time estimates
Errors in distance-decay calculations
Confusion at confluences and bifurcations
Inappropriate weighting schemes

earthkit-hydro makes this distinction explicit to enable more accurate and flexible analysis.

Node properties vs. edge properties¶

The fundamental difference lies in what is being measured:

Lengths are node properties

A length is associated with a grid cell or graph node itself. It represents the length of river channel within that cell.

One length value per node
Represents channel length within the cell
Remains constant even at confluences
Used for: channel residence time, friction calculations, node-based weighting

Distances are edge properties

A distance is associated with the connection (edge) between two nodes. It represents the distance traveled along the flow path from one node to another.

One distance value per edge (connection between nodes)
Represents distance between cell centers (or similar)
Can differ for each branch at confluences/bifurcations
Used for: travel distance calculations, path finding, edge-based weighting

Visual explanation¶

Consider this simple river network:

In the highlighted segment:

Length = 3 (sum of channel lengths within the 3 cells)
Distance = 2 (number of connections/edges traveled)

Even for a straight channel with uniform cells, these values differ because:

Length accounts for the actual channel path within each cell
Distance counts the connections between cell centers

Implications at confluences¶

The distinction becomes especially important at confluences:

Scenario: Two tributaries meet at a confluence node.

The confluence node has one length (the channel length within that cell)
But there are multiple distances (one for each incoming branch)

This means:

A parcel of water traveling down tributary A experiences distance_A to the confluence
A parcel from tributary B experiences distance_B to the confluence
Both experience the same length when passing through the confluence cell

Why this matters: If you’re calculating travel time with distance-dependent decay (e.g., for pollutant attenuation), you need edge-based distances to correctly account for different paths to the confluence.

Mathematical formulation¶

For a path through nodes \(n_1, n_2, ..., n_k\):

\[ \begin{align}\begin{aligned}\text{Total length} = \sum_{i=1}^{k} L(n_i)\\\text{Total distance} = \sum_{i=1}^{k-1} D(n_i, n_{i+1})\end{aligned}\end{align} \]

Where:

\(L(n_i)\) is the length property of node \(i\)
\(D(n_i, n_{i+1})\) is the distance property of the edge from node \(i\) to \(i+1\)

Note that length is summed over \(k\) nodes, while distance is summed over \(k-1\) edges.

Relationship to graph theory¶

This distinction aligns with standard graph theory terminology:

Node weights (lengths) = properties of vertices
Edge weights (distances) = properties of edges

River networks are edge-weighted directed graphs where both node and edge properties matter for hydrological calculations.

Implementation in earthkit-hydro¶

earthkit-hydro provides separate APIs for distances and lengths:

ekh.distance.* - for edge-based distance calculations
ekh.length.* - for node-based length calculations

Both support:

Minimum and maximum (shortest/longest path)
Upstream and downstream directions
Custom weighting

See the how-to guides for practical usage examples.