xESMF backend usage and benchmark
xESMF isn’t just a wrapper of ESMPy. It only uses ESMPy to generate regridding weights, but has its own Scipy-based method for applying weights (see more about regridding weights).
We switch to the Scipy method because its serial performance is much higher than ESMPy’s own engine and can also reuse weights (issue#2). ESMPy’s native method is available in the backend, mainly for benchmarking Scipy results in unit tests.
Here we show how to use xESMF backend and compare the performance of two methods. Note that the backend is still pretty easy to use compared to the original ESMPy – it just doesn’t have a fancy API and cannot deal with xarray metadata.
[1]:
import os
import numpy as np
import xesmf as xe
# backend functions
from xesmf.backend import (
Grid,
esmf_regrid_build,
esmf_regrid_apply,
)
from xesmf.smm import read_weights, apply_weights
Prepare data
We use the same data as in the reusing regridder example, but convert xarray DataSet to pure numpy arrays to work with the backend.
[2]:
ds_in = xe.util.grid_2d(
-120, 120, 0.4, -60, 60, 0.3 # longitude range and resolution
) # latitude range and resolution
ds_out = xe.util.grid_2d(-120, 120, 0.6, -60, 60, 0.4)
ds_in.coords["time"] = np.arange(1, 11)
ds_in.coords["lev"] = np.arange(1, 51)
ds_in["data2D"] = xe.data.wave_smooth(ds_in["lon"], ds_in["lat"])
ds_in["data4D"] = ds_in["time"] * ds_in["lev"] * ds_in["data2D"]
[3]:
# backend only accepts pure numpy array
lon_in = ds_in["lon"].values
lat_in = ds_in["lat"].values
lon_out = ds_out["lon"].values
lat_out = ds_out["lat"].values
data_in = ds_in["data4D"].values
data_in.shape
[3]:
(10, 50, 400, 600)
Make ESMF Grid objects
[4]:
grid_in = Grid.from_xarray(lon_in.T, lat_in.T)
grid_out = Grid.from_xarray(lon_out.T, lat_out.T)
This is a native ESMPy Grid object:
[5]:
type(grid_in)
[5]:
xesmf.backend.Grid
We pass the transpose (lon.T
) because ESMPy prefer Fortran-ordering to C-ordering (see this issue).
[6]:
lon_in.flags # numpy arrays are mostly C-ordered
[6]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
[7]:
lon_in.T.flags # a memory view on its tranpose would be Fortran-ordered
[7]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
Compute weights
[8]:
filename = "test_weights.nc" # weight filename
if os.path.exists(filename):
os.remove(filename) # ESMPy will complain if the file exists
Computing weights takes ~7s, as in the reusing regridder example.
[9]:
%%time
regrid = esmf_regrid_build(grid_in, grid_out, 'bilinear',
extra_dims=[50, 10], # reversed to Fortran-ordering
filename=filename)
CPU times: user 4.08 s, sys: 207 ms, total: 4.29 s
Wall time: 4.35 s
It returns a native ESMPy Regrid object:
[10]:
type(regrid)
[10]:
ESMF.api.regrid.Regrid
It also writes weights to disk so we can then read them back for Scipy.
[11]:
%%bash
ncdump -h test_weights.nc
netcdf test_weights {
dimensions:
n_s = 480000 ;
variables:
double S(n_s) ;
int col(n_s) ;
int row(n_s) ;
}
Apply weights using ESMPy backend
It takes ~3s with ESMPy’s native method.
[12]:
%%time
data_out_esmpy = esmf_regrid_apply(regrid, data_in.T).T
CPU times: user 1.59 s, sys: 2.51 s, total: 4.1 s
Wall time: 12.3 s
The first .T
converts C-ordering to F-ordering for ESMPy, and the second .T
converts the result back to C-ordering. It just gets a memory view and thus incurs almost no overhead.
[13]:
data_out_esmpy.flags
[13]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
[14]:
data_out_esmpy.shape # broadcasted over extra dimensions
[14]:
(10, 50, 300, 400)
Apply weights using Scipy backend
Read weights back for Scipy. read_weights
needs to know the shape of the sparse matrix, i.e. how many points in input and output grids.
[15]:
weights = read_weights(filename, lon_in.size, lon_out.size)
weights
[15]:
<120000x240000 sparse matrix of type '<class 'numpy.float64'>'
with 480000 stored elements in COOrdinate format>
apply_weights
needs to know shape of the output grid.
[16]:
lon_out.shape
[16]:
(300, 400)
[17]:
%%time
data_out_scipy = apply_weights(weights, data_in, lon_in.shape, lon_out.shape)
CPU times: user 842 ms, sys: 776 ms, total: 1.62 s
Wall time: 6.07 s
It is several times faster than ESMPy’s native method. The conclusion seems to be pretty robust across different platforms (feel free to verify on your own), so we choose Scipy as the default backend.
A likely explanation for this performance discrepancy is, the original ESMF is optimized for large processor counts (~1000 CPUs) at the expense of serial performance (ESMF team, personal communication).
[18]:
data_out_scipy.shape # broadcasted over extra dimensions
[18]:
(10, 50, 300, 400)
[19]:
np.testing.assert_equal(data_out_scipy, data_out_esmpy) # exactly the same
[20]:
os.remove(filename) # clean-up