# Averaging over a region

Although this may not sound like *real* regridding, averaging a gridded field over a region is supported by `ESMF`

. This works because the `conservative`

regridding method preserves the areal average of the input field. That is, *the value at each output grid cell is the average input value over the output grid area*. Instead of mapping the input field unto rectangular outputs cells, it’s mapped unto an irregular mesh defined by an outer polygon. In other words, applying the regridding
weights computes the exact areal-average of the input grid over each polygon.

This process relies on converting `shapely.Polygon`

and `shapely.MultiPolygon`

objects into `ESMF.Mesh`

objects. However, ESMF meshes do not support all features that come with shapely’s (Multi)Polyons. Indeed, mesh elements do not support interior holes, or multiple non-touching parts, as do `shapely`

objects. The `xesmf.SpatialAverager`

class works around these issues by computing independent weights for interior holes and multi-part geometries, before combining the weights.

Transforming polygons into a `ESMF.Mesh`

is a slow process. Users looking for faster (but approximate) methods may want to explore regionmask or clisops.

Also, note that low resolution polygons such as large boxes might get distorted when mapped on a sphere. Make sure polygon segments are at sufficiently high resolution, on the order of 1°. The `shapely`

package (v2) has a ``segmentize`

function <https://shapely.readthedocs.io/en/latest/reference/shapely.segmentize.html>`__ to add vertices to polygon segments.

The following example shows just how simple it is to compute the average over different countries. The notebook used `geopandas`

, a simple and efficient container for geometries, and `descartes`

for plotting maps. Make sure both packages are installed, as they are not `xesmf`

dependencies.

```
[1]:
```

```
%matplotlib inline
import matplotlib.pyplot as plt
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon, MultiPolygon
import numpy as np
import xarray as xr
import xesmf as xe
import warnings
warnings.filterwarnings("ignore")
xr.set_options(display_style='text')
```

```
[1]:
```

```
<xarray.core.options.set_options at 0x7fdc3c03b1c0>
```

## Simple example

In this example we’ll create a synthetic global field, then compute its average over six countries.

### Download country outlines

```
[2]:
```

```
# Load some polygons from the internet
regs = gpd.read_file(
"https://cdn.jsdelivr.net/npm/world-atlas@2/countries-10m.json"
)
# Select a few countries for the sake of the example
regs = regs.iloc[[5, 9, 37, 67, 98, 155]]
# Simplify the geometries to a 0.02 deg tolerance, which is 1/100 of our grid.
# The simpler the polygons, the faster the averaging, but we lose some precision.
regs["geometry"] = regs.simplify(tolerance=0.02, preserve_topology=True)
regs
```

```
[2]:
```

id | name | geometry | |
---|---|---|---|

5 | 032 | Argentina | MULTIPOLYGON (((-67.19287 -22.82225, -67.02727... |

9 | 156 | China | MULTIPOLYGON (((77.79858 35.49614, 77.66178 35... |

37 | 710 | South Africa | MULTIPOLYGON (((19.98200 -24.75230, 20.10800 -... |

67 | 724 | Spain | MULTIPOLYGON (((-5.34065 35.84736, -5.37665 35... |

98 | 466 | Mali | POLYGON ((-12.26352 14.77561, -12.13752 14.784... |

155 | 484 | Mexico | MULTIPOLYGON (((-97.13797 25.96581, -97.16677 ... |

```
[3]:
```

```
# Create synthetic global data
ds = xe.util.grid_global(2, 2)
ds = ds.assign(field=xe.data.wave_smooth(ds.lon, ds.lat))
ds
```

```
[3]:
```

<xarray.Dataset> Dimensions: (x: 180, x_b: 181, y: 90, y_b: 91) Coordinates: lon (y, x) float64 -179.0 -177.0 -175.0 -173.0 ... 175.0 177.0 179.0 lat (y, x) float64 -89.0 -89.0 -89.0 -89.0 ... 89.0 89.0 89.0 89.0 lon_b (y_b, x_b) int64 -180 -178 -176 -174 -172 ... 172 174 176 178 180 lat_b (y_b, x_b) int64 -90 -90 -90 -90 -90 -90 -90 ... 90 90 90 90 90 90 Dimensions without coordinates: x, x_b, y, y_b Data variables: field (y, x) float64 2.0 2.0 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0 2.0 2.0

```
[4]:
```

```
# Display the global field and countries' outline.
fig, ax = plt.subplots()
ds.field.plot(ax=ax, x="lon", y="lat")
regs.plot(ax=ax, edgecolor="k", facecolor="none")
```

```
[4]:
```

```
<AxesSubplot:xlabel='lon', ylabel='lat'>
```

### Compute the field average over each country

`xesmf.SpatialAverager`

is a class designed to average an `xarray.DataArray`

over a list of polygons. It behaves similarly to `xesmf.Regridder`

, but has options to deal specifically with polygon outputs. It uses the `conservative`

regridding, and can store and reuse weights.

```
[5]:
```

```
savg = xe.SpatialAverager(ds, regs.geometry, geom_dim_name="country")
savg
```

```
[5]:
```

```
xESMF SpatialAverager
Weight filename: spatialavg_90x180_6.nc
Reuse pre-computed weights? False
Input grid shape: (90, 180)
Output list length: 6
```

When called, the `SpatialAverager`

instance returns a `DataArray`

of averages over the `geom`

dimension, here countries. `lon`

and `lat`

coordinates are the centroids each polygon.

```
[6]:
```

```
out = savg(ds.field)
out = out.assign_coords(country=xr.DataArray(regs["name"], dims=("country",)))
out
```

```
[6]:
```

<xarray.DataArray 'field' (country: 6)> array([1.58448717, 1.49195654, 2.48197742, 2.5740491 , 2.89305772, 1.26271837]) Coordinates: lon (country) float64 -65.18 103.8 25.09 -3.646 -3.541 -102.5 lat (country) float64 -35.39 36.56 -29.0 40.23 17.35 23.95 * country (country) object 'Argentina' 'China' ... 'Mali' 'Mexico' Attributes: regrid_method: conservative

As the order of the polygons is conserved in the output, we can easily include the results back into our `geopandas`

dataframe.

```
[7]:
```

```
regs["field_avg"] = out.values
regs
```

```
[7]:
```

id | name | geometry | field_avg | |
---|---|---|---|---|

5 | 032 | Argentina | MULTIPOLYGON (((-67.19287 -22.82225, -67.02727... | 1.584487 |

9 | 156 | China | MULTIPOLYGON (((77.79858 35.49614, 77.66178 35... | 1.491957 |

37 | 710 | South Africa | MULTIPOLYGON (((19.98200 -24.75230, 20.10800 -... | 2.481977 |

67 | 724 | Spain | MULTIPOLYGON (((-5.34065 35.84736, -5.37665 35... | 2.574049 |

98 | 466 | Mali | POLYGON ((-12.26352 14.77561, -12.13752 14.784... | 2.893058 |

155 | 484 | Mexico | MULTIPOLYGON (((-97.13797 25.96581, -97.16677 ... | 1.262718 |

```
[8]:
```

```
fig, ax = plt.subplots(figsize=(12, 5))
ds.field.plot(ax=ax, x="lon", y="lat", cmap="Greys_r")
handles = regs.plot(
column="field_avg", ax=ax, edgecolor="k", vmin=1, vmax=3, cmap="viridis"
)
```

### Extract the weight mask from the averager

The weights are stored in a sparse matrix structure in `SpatialAverager.weights`

. The sparse matrix can be converted to a full DataArray, but note that this will increase memory usage proportional to the number of polygons.

```
[9]:
```

```
# Convert sparse matrix to numpy array, it has size : (n_in, n_out)
# So reshape to the same shape as ds + polygons
w = xr.DataArray(
savg.weights.toarray().reshape(regs.geometry.size, *ds.lon.shape),
dims=("country", *ds.lon.dims),
coords=dict(country=out.country, **ds.lon.coords),
)
plt.subplots_adjust(top=0.9)
facets = w.plot(col="country", col_wrap=2, aspect=2, vmin=0, vmax=0.05)
facets.cbar.set_label("Averaging weights")
```

```
<Figure size 432x288 with 0 Axes>
```

This also allows to quickly check that the weights are indeed normalized, that the sum of each mask is 1.

```
[10]:
```

```
w.sum(dim=["y", "x"]).values
```

```
[10]:
```

```
array([1., 1., 1., 1., 1., 1.])
```