{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# xESMF backend usage and benchmark\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "xESMF isn't just a wrapper of ESMPy. It only uses ESMPy to generate regridding\n", "weights, but has its own Scipy-based method for applying weights (see\n", "[more about regridding weights](./Reuse_regridder.ipynb#Why-applying-regridding-is-so-fast?)).\n", "\n", "We switch to the Scipy method because its serial performance is much higher than\n", "ESMPy's own engine and can also reuse weights\n", "([issue#2](https://github.com/JiaweiZhuang/xESMF/issues/2)). ESMPy's native\n", "method is available in the backend, mainly for benchmarking Scipy results in\n", "unit tests.\n", "\n", "Here we show how to use xESMF backend and compare the performance of two\n", "methods. Note that the backend is still pretty easy to use compared to the\n", "original ESMPy -- it just doesn't have a fancy API and cannot deal with xarray\n", "metadata.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "import numpy as np\n", "import xesmf as xe\n", "\n", "# backend functions\n", "from xesmf.backend import (\n", " Grid,\n", " esmf_regrid_build,\n", " esmf_regrid_apply,\n", ")\n", "from xesmf.smm import read_weights, apply_weights" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepare data\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use the same data as in the\n", "[reusing regridder example](./Reuse_regridder.ipynb), but convert xarray DataSet\n", "to pure numpy arrays to work with the backend.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "ds_in = xe.util.grid_2d(\n", " -120, 120, 0.4, -60, 60, 0.3 # longitude range and resolution\n", ") # latitude range and resolution\n", "ds_out = xe.util.grid_2d(-120, 120, 0.6, -60, 60, 0.4)\n", "ds_in.coords[\"time\"] = np.arange(1, 11)\n", "ds_in.coords[\"lev\"] = np.arange(1, 51)\n", "ds_in[\"data2D\"] = xe.data.wave_smooth(ds_in[\"lon\"], ds_in[\"lat\"])\n", "ds_in[\"data4D\"] = ds_in[\"time\"] * ds_in[\"lev\"] * ds_in[\"data2D\"]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(10, 50, 400, 600)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# backend only accepts pure numpy array\n", "lon_in = ds_in[\"lon\"].values\n", "lat_in = ds_in[\"lat\"].values\n", "\n", "lon_out = ds_out[\"lon\"].values\n", "lat_out = ds_out[\"lat\"].values\n", "\n", "data_in = ds_in[\"data4D\"].values\n", "data_in.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make ESMF Grid objects\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "grid_in = Grid.from_xarray(lon_in.T, lat_in.T)\n", "grid_out = Grid.from_xarray(lon_out.T, lat_out.T)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a native ESMPy Grid object:\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "xesmf.backend.Grid" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(grid_in)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We pass the transpose (`lon.T`) because ESMPy prefer Fortran-ordering to\n", "C-ordering (see this\n", "[issue](https://github.com/nawendt/esmpy-tutorial/issues/4)).\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " C_CONTIGUOUS : True\n", " F_CONTIGUOUS : False\n", " OWNDATA : True\n", " WRITEABLE : True\n", " ALIGNED : True\n", " WRITEBACKIFCOPY : False\n", " UPDATEIFCOPY : False" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lon_in.flags # numpy arrays are mostly C-ordered" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " C_CONTIGUOUS : False\n", " F_CONTIGUOUS : True\n", " OWNDATA : False\n", " WRITEABLE : True\n", " ALIGNED : True\n", " WRITEBACKIFCOPY : False\n", " UPDATEIFCOPY : False" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lon_in.T.flags # a memory view on its tranpose would be Fortran-ordered" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compute weights\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "filename = \"test_weights.nc\" # weight filename\n", "if os.path.exists(filename):\n", " os.remove(filename) # ESMPy will complain if the file exists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Computing weights takes ~7s, as in the\n", "[reusing regridder example](./Reuse_regridder.ipynb#Build-Regridder).\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 4.08 s, sys: 207 ms, total: 4.29 s\n", "Wall time: 4.35 s\n" ] } ], "source": [ "%%time\n", "regrid = esmf_regrid_build(grid_in, grid_out, 'bilinear',\n", " extra_dims=[50, 10], # reversed to Fortran-ordering\n", " filename=filename)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It returns a native ESMPy Regrid object:\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ESMF.api.regrid.Regrid" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(regrid)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It also writes weights to disk so we can then read them back for Scipy.\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "netcdf test_weights {\n", "dimensions:\n", "\tn_s = 480000 ;\n", "variables:\n", "\tdouble S(n_s) ;\n", "\tint col(n_s) ;\n", "\tint row(n_s) ;\n", "}\n" ] } ], "source": [ "%%bash\n", "ncdump -h test_weights.nc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Apply weights using ESMPy backend\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It takes ~3s with ESMPy's native method.\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1.59 s, sys: 2.51 s, total: 4.1 s\n", "Wall time: 12.3 s\n" ] } ], "source": [ "%%time\n", "data_out_esmpy = esmf_regrid_apply(regrid, data_in.T).T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first `.T` converts C-ordering to F-ordering for ESMPy, and the second `.T`\n", "converts the result back to C-ordering. It just gets a memory view and thus\n", "incurs almost no overhead.\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " C_CONTIGUOUS : True\n", " F_CONTIGUOUS : False\n", " OWNDATA : False\n", " WRITEABLE : True\n", " ALIGNED : True\n", " WRITEBACKIFCOPY : False\n", " UPDATEIFCOPY : False" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_out_esmpy.flags" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(10, 50, 300, 400)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_out_esmpy.shape # broadcasted over extra dimensions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Apply weights using Scipy backend\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read weights back for Scipy. `read_weights` needs to know the shape of the\n", "sparse matrix, i.e. how many points in input and output grids.\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<120000x240000 sparse matrix of type ''\n", "\twith 480000 stored elements in COOrdinate format>" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weights = read_weights(filename, lon_in.size, lon_out.size)\n", "weights" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`apply_weights` needs to know shape of the output grid.\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(300, 400)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lon_out.shape" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 842 ms, sys: 776 ms, total: 1.62 s\n", "Wall time: 6.07 s\n" ] } ], "source": [ "%%time\n", "data_out_scipy = apply_weights(weights, data_in, lon_in.shape, lon_out.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is several times faster than ESMPy's native method. The conclusion seems to\n", "be pretty robust across different platforms (feel free to verify on your own),\n", "so we choose Scipy as the default backend.\n", "\n", "A likely explanation for this performance discrepancy is, the original ESMF is\n", "optimized for large processor counts (~1000 CPUs) at the expense of serial\n", "performance (ESMF team, personal communication).\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(10, 50, 300, 400)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_out_scipy.shape # broadcasted over extra dimensions" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "np.testing.assert_equal(data_out_scipy, data_out_esmpy) # exactly the same" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "os.remove(filename) # clean-up" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" }, "toc": { "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "toc_cell": false, "toc_position": {}, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }