Lazy evaluation on Dask arrays

If you are unfamiliar with Dask, read Parallel computing with Dask in Xarray documentation first. The current version only supports dask arrays on a single machine. Support of Dask.distributed is in roadmap.

xESMF’s Dask support is mostly for lazy evaluation and out-of-core computing, to allow processing large volumes of data with limited memory. You might also get moderate speed-up on a multi-core machine by choosing proper chunk sizes, but that generally won’t help your entire pipeline too much, because the read-regrid-write pipeline is severely I/O limited (see this issue for more discussions). On a single machine, the disk bandwidth is typically limited to ~500 MB/s, and you cannot process data faster than such rate. If you need much faster data processing rate, you should resort to parallel file systems on HPC clusters or distributed storage on public cloud platforms. Please refer to the Pangeo project for more information.

[1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import dask.array as da  # need to have dask.array installed, although not directly using it here.
import xarray as xr
import xesmf as xe

A simple example

Prepare input data

[2]:
ds = xr.tutorial.open_dataset("air_temperature", chunks={"time": 500})
ds
[2]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • lat: 25
    • lon: 53
    • time: 2920
    • lat
      (lat)
      float32
      75.0 72.5 70.0 ... 20.0 17.5 15.0
      standard_name :
      latitude
      long_name :
      Latitude
      units :
      degrees_north
      axis :
      Y
      array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5,
             45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5,
             15. ], dtype=float32)
    • lon
      (lon)
      float32
      200.0 202.5 205.0 ... 327.5 330.0
      standard_name :
      longitude
      long_name :
      Longitude
      units :
      degrees_east
      axis :
      X
      array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5,
             225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5,
             250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5,
             275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5,
             300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5,
             325. , 327.5, 330. ], dtype=float32)
    • time
      (time)
      datetime64[ns]
      2013-01-01 ... 2014-12-31T18:00:00
      standard_name :
      time
      long_name :
      Time
      array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000',
             '2013-01-01T12:00:00.000000000', ..., '2014-12-31T06:00:00.000000000',
             '2014-12-31T12:00:00.000000000', '2014-12-31T18:00:00.000000000'],
            dtype='datetime64[ns]')
    • air
      (time, lat, lon)
      float32
      dask.array<chunksize=(500, 25, 53), meta=np.ndarray>
      long_name :
      4xDaily Air temperature at sigma level 995
      units :
      degK
      precision :
      2
      GRIB_id :
      11
      GRIB_name :
      TMP
      var_desc :
      Air temperature
      dataset :
      NMC Reanalysis
      level_desc :
      Surface
      statistic :
      Individual Obs
      parent_stat :
      Other
      actual_range :
      [185.16 322.1 ]
      Array Chunk
      Bytes 15.48 MB 2.65 MB
      Shape (2920, 25, 53) (500, 25, 53)
      Count 7 Tasks 6 Chunks
      Type float32 numpy.ndarray
      53 25 2920
  • Conventions :
    COARDS
    title :
    4x daily NMC reanalysis (1948)
    description :
    Data is from NMC initialized reanalysis (4x/day). These are the 0.9950 sigma level values.
    platform :
    Model
    references :
    http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html
[3]:
ds.chunks
[3]:
Frozen(SortedKeysDict({'time': (500, 500, 500, 500, 500, 420), 'lat': (25,), 'lon': (53,)}))
[4]:
ds["air"].data
[4]:
Array Chunk
Bytes 15.48 MB 2.65 MB
Shape (2920, 25, 53) (500, 25, 53)
Count 7 Tasks 6 Chunks
Type float32 numpy.ndarray
53 25 2920

Build regridder

[5]:
ds_out = xr.Dataset(
    {
        "lat": (["lat"], np.arange(16, 75, 1.0)),
        "lon": (["lon"], np.arange(200, 330, 1.5)),
    }
)

regridder = xe.Regridder(ds, ds_out, "bilinear")
regridder
[5]:
xESMF Regridder
Regridding algorithm:       bilinear
Input grid shape:           (25, 53)
Output grid shape:          (59, 87)
Output grid dimension name: ('lat', 'lon')
Periodic in longitude?      False

Apply to xarray Dataset/DataArray

[6]:
# only build the dask graph; actual computation happens later when calling compute()
%time ds_out = regridder(ds)
ds_out
using dimensions ('lat', 'lon') from data variable air as the horizontal dimensions for this dataset.
CPU times: user 2.09 ms, sys: 0 ns, total: 2.09 ms
Wall time: 2.06 ms
[6]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • lat: 59
    • lon: 87
    • time: 2920
    • time
      (time)
      datetime64[ns]
      2013-01-01 ... 2014-12-31T18:00:00
      standard_name :
      time
      long_name :
      Time
      array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000',
             '2013-01-01T12:00:00.000000000', ..., '2014-12-31T06:00:00.000000000',
             '2014-12-31T12:00:00.000000000', '2014-12-31T18:00:00.000000000'],
            dtype='datetime64[ns]')
    • lon
      (lon)
      float64
      200.0 201.5 203.0 ... 327.5 329.0
      array([200. , 201.5, 203. , 204.5, 206. , 207.5, 209. , 210.5, 212. , 213.5,
             215. , 216.5, 218. , 219.5, 221. , 222.5, 224. , 225.5, 227. , 228.5,
             230. , 231.5, 233. , 234.5, 236. , 237.5, 239. , 240.5, 242. , 243.5,
             245. , 246.5, 248. , 249.5, 251. , 252.5, 254. , 255.5, 257. , 258.5,
             260. , 261.5, 263. , 264.5, 266. , 267.5, 269. , 270.5, 272. , 273.5,
             275. , 276.5, 278. , 279.5, 281. , 282.5, 284. , 285.5, 287. , 288.5,
             290. , 291.5, 293. , 294.5, 296. , 297.5, 299. , 300.5, 302. , 303.5,
             305. , 306.5, 308. , 309.5, 311. , 312.5, 314. , 315.5, 317. , 318.5,
             320. , 321.5, 323. , 324.5, 326. , 327.5, 329. ])
    • lat
      (lat)
      float64
      16.0 17.0 18.0 ... 72.0 73.0 74.0
      array([16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29.,
             30., 31., 32., 33., 34., 35., 36., 37., 38., 39., 40., 41., 42., 43.,
             44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55., 56., 57.,
             58., 59., 60., 61., 62., 63., 64., 65., 66., 67., 68., 69., 70., 71.,
             72., 73., 74.])
    • air
      (time, lat, lon)
      float64
      dask.array<chunksize=(500, 59, 87), meta=np.ndarray>
      Array Chunk
      Bytes 119.91 MB 20.53 MB
      Shape (2920, 59, 87) (500, 59, 87)
      Count 13 Tasks 6 Chunks
      Type float64 numpy.ndarray
      87 59 2920
  • regrid_method :
    bilinear
[7]:
ds_out["air"].data  # chunks are preserved
[7]:
Array Chunk
Bytes 119.91 MB 20.53 MB
Shape (2920, 59, 87) (500, 59, 87)
Count 13 Tasks 6 Chunks
Type float64 numpy.ndarray
87 59 2920
[8]:
%time result = ds_out['air'].compute()  # actually applies regridding
CPU times: user 298 ms, sys: 212 ms, total: 510 ms
Wall time: 154 ms
[9]:
type(result.data), result.data.shape
[9]:
(numpy.ndarray, (2920, 59, 87))

Invalid chunk sizes to avoid

Dask support relies on xarray.apply_ufunc, which requires only chunking over extra/broadcasting dimensions (like time and lev), not horizontal/core dimensions (lat, lon, or x, y).

[10]:
# xESMF doesn't like chunking over horizontal dimensions
ds_bad = ds.chunk({"lat": 10, "lon": 10, "time": None})
ds_bad
[10]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • lat: 25
    • lon: 53
    • time: 2920
    • lat
      (lat)
      float32
      75.0 72.5 70.0 ... 20.0 17.5 15.0
      standard_name :
      latitude
      long_name :
      Latitude
      units :
      degrees_north
      axis :
      Y
      array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5,
             45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5,
             15. ], dtype=float32)
    • lon
      (lon)
      float32
      200.0 202.5 205.0 ... 327.5 330.0
      standard_name :
      longitude
      long_name :
      Longitude
      units :
      degrees_east
      axis :
      X
      array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5,
             225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5,
             250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5,
             275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5,
             300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5,
             325. , 327.5, 330. ], dtype=float32)
    • time
      (time)
      datetime64[ns]
      2013-01-01 ... 2014-12-31T18:00:00
      standard_name :
      time
      long_name :
      Time
      array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000',
             '2013-01-01T12:00:00.000000000', ..., '2014-12-31T06:00:00.000000000',
             '2014-12-31T12:00:00.000000000', '2014-12-31T18:00:00.000000000'],
            dtype='datetime64[ns]')
    • air
      (time, lat, lon)
      float32
      dask.array<chunksize=(2920, 10, 10), meta=np.ndarray>
      long_name :
      4xDaily Air temperature at sigma level 995
      units :
      degK
      precision :
      2
      GRIB_id :
      11
      GRIB_name :
      TMP
      var_desc :
      Air temperature
      dataset :
      NMC Reanalysis
      level_desc :
      Surface
      statistic :
      Individual Obs
      parent_stat :
      Other
      actual_range :
      [185.16 322.1 ]
      Array Chunk
      Bytes 15.48 MB 1.17 MB
      Shape (2920, 25, 53) (2920, 10, 10)
      Count 44 Tasks 18 Chunks
      Type float32 numpy.ndarray
      53 25 2920
  • Conventions :
    COARDS
    title :
    4x daily NMC reanalysis (1948)
    description :
    Data is from NMC initialized reanalysis (4x/day). These are the 0.9950 sigma level values.
    platform :
    Model
    references :
    http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html
[11]:
# regridder(ds_bad)  # uncomment this line to see the error message
[12]:
# besides rechunking data properly, another simple fix is to convert to numpy array without chunking
regridder(ds_bad.load())
using dimensions ('lat', 'lon') from data variable air as the horizontal dimensions for this dataset.
[12]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • lat: 59
    • lon: 87
    • time: 2920
    • time
      (time)
      datetime64[ns]
      2013-01-01 ... 2014-12-31T18:00:00
      standard_name :
      time
      long_name :
      Time
      array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000',
             '2013-01-01T12:00:00.000000000', ..., '2014-12-31T06:00:00.000000000',
             '2014-12-31T12:00:00.000000000', '2014-12-31T18:00:00.000000000'],
            dtype='datetime64[ns]')
    • lon
      (lon)
      float64
      200.0 201.5 203.0 ... 327.5 329.0
      array([200. , 201.5, 203. , 204.5, 206. , 207.5, 209. , 210.5, 212. , 213.5,
             215. , 216.5, 218. , 219.5, 221. , 222.5, 224. , 225.5, 227. , 228.5,
             230. , 231.5, 233. , 234.5, 236. , 237.5, 239. , 240.5, 242. , 243.5,
             245. , 246.5, 248. , 249.5, 251. , 252.5, 254. , 255.5, 257. , 258.5,
             260. , 261.5, 263. , 264.5, 266. , 267.5, 269. , 270.5, 272. , 273.5,
             275. , 276.5, 278. , 279.5, 281. , 282.5, 284. , 285.5, 287. , 288.5,
             290. , 291.5, 293. , 294.5, 296. , 297.5, 299. , 300.5, 302. , 303.5,
             305. , 306.5, 308. , 309.5, 311. , 312.5, 314. , 315.5, 317. , 318.5,
             320. , 321.5, 323. , 324.5, 326. , 327.5, 329. ])
    • lat
      (lat)
      float64
      16.0 17.0 18.0 ... 72.0 73.0 74.0
      array([16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29.,
             30., 31., 32., 33., 34., 35., 36., 37., 38., 39., 40., 41., 42., 43.,
             44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55., 56., 57.,
             58., 59., 60., 61., 62., 63., 64., 65., 66., 67., 68., 69., 70., 71.,
             72., 73., 74.])
    • air
      (time, lat, lon)
      float64
      296.1 296.4 296.6 ... 241.0 241.5
      array([[[296.13399675, 296.38669304, 296.63889823, ..., 296.47490793,
               296.43398913, 296.19924566],
              [295.97800871, 296.18274797, 296.42534501, ..., 296.09262341,
               296.07802394, 295.72098714],
              [296.04001766, 296.13556275, 296.30247974, ..., 295.77692914,
               295.73997197, 295.35693248],
              ...,
              [245.04017912, 245.36087049, 245.56096188, ..., 233.93629106,
               235.51802332, 238.0780694 ],
              [243.27991042, 243.77519503, 244.17375053, ..., 233.81591274,
               235.33999633, 237.63241841],
              [242.24003289, 242.87912303, 243.43775032, ..., 233.84791841,
               235.41999207, 237.49641598]],
      
             [[296.25399643, 296.70203773, 297.03166485, ..., 296.06514956,
               296.03998263, 296.01773136],
              [296.2179898 , 296.56767711, 296.82291528, ..., 295.7292558 ,
               295.6800262 , 295.5138904 ],
              [296.23999022, 296.42058286, 296.56714652, ..., 295.50442291,
               295.41998903, 295.19133215],
              ...,
              [245.52028453, 245.73709231, 245.85148963, ..., 231.64759509,
               232.67802699, 234.83033953],
              [243.29994515, 243.61404829, 243.85326489, ..., 231.80653129,
               232.72003168, 234.51923375],
              [242.70001369, 243.03800427, 243.31726258, ..., 232.22256285,
               233.15997775, 234.71925176]],
      
             [[296.31998597, 296.35233477, 296.37027072, ..., 296.69703874,
               296.59998477, 296.42993717],
              [296.23999022, 296.37072264, 296.42865111, ..., 296.39312798,
               296.20003046, 295.98447426],
              [296.07996829, 296.20134835, 296.24744824, ..., 296.17051878,
               295.85798008, 295.63218076],
              ...,
              [246.92034241, 246.75294557, 246.50912779, ..., 231.18562131,
               232.24003595, 234.61904532],
              [244.13992017, 244.03040136, 243.89556811, ..., 231.78222568,
               232.82012307, 234.90301222],
              [243.22002405, 243.13672999, 243.05876072, ..., 233.39835074,
               234.45993206, 236.27912467]],
      
             ...,
      
             [[297.62998356, 298.25582152, 298.65503226, ..., 295.7786526 ,
               295.66999665, 295.50299763],
              [297.07004997, 297.7199817 , 298.19914665, ..., 295.58670885,
               295.55000304, 295.2150872 ],
              [296.38994762, 296.98154159, 297.52424514, ..., 295.48204978,
               295.46998597, 295.05016151],
              ...,
              [251.81041188, 251.72296686, 251.55990593, ..., 240.75713161,
               241.37000425, 242.46456685],
              [247.96982451, 247.87036819, 247.69574495, ..., 241.55307104,
               241.93009016, 242.64624471],
              [245.73007797, 245.53418764, 245.25570503, ..., 242.92917313,
               243.20994271, 243.68633428]],
      
             [[297.1899857 , 297.6237982 , 297.95503528, ..., 295.4505457 ,
               295.32998658, 295.05487915],
              [296.59005424, 297.09596253, 297.49915579, ..., 295.29059716,
               295.1700073 , 294.7989566 ],
              [295.76992812, 296.23054587, 296.65708274, ..., 295.15442138,
               295.00998537, 294.626334  ],
              ...,
              [252.39037715, 252.04962511, 251.65524335, ..., 241.72163807,
               242.61000973, 243.93278497],
              [249.06987326, 248.66822819, 248.21431545, ..., 242.12742803,
               242.79003593, 243.74867727],
              [247.43005818, 246.93209439, 246.37428385, ..., 242.84748108,
               243.38996739, 244.10871142]],
      
             [[297.04997563, 297.38785212, 297.63503348, ..., 296.09892817,
               295.98999483, 295.690696  ],
              [296.4100463 , 296.84401245, 297.17914422, ..., 295.82701626,
               295.79001767, 295.49075727],
              [295.62992871, 296.00626644, 296.35697279, ..., 295.59472609,
               295.56998294, 295.28282378],
              ...,
              [252.51039665, 252.14637887, 251.70761893, ..., 240.09173906,
               240.75006397, 241.87173824],
              [248.92985255, 248.55305942, 248.11071361, ..., 240.21550695,
               240.53002923, 241.23719787],
              [247.01007067, 246.58490681, 246.10268406, ..., 240.91155301,
               241.00997318, 241.45322238]]])
  • regrid_method :
    bilinear