NEMO Atmospheric Forcing Files

Description of the algorithms used to construct NEMO atmospheric forcing files from the Environment Canada GEM 2.5km HRDPS research model product (http://collaboration.cmc.ec.gc.ca/science/outgoing/dominik.jacques/meopar_east/).

The algorithms are implemented in the nowcast.workers.make_weather_forcing worker module of the GoMSS_Nowcast package.

Contents

In [1]:
from glob import glob
from pathlib import Path

import matplotlib.pyplot as plt
import pandas
import xarray
In [2]:
%matplotlib inline

Concatentation of gzipped hourly forecast files

The hourly forecast files downloaded from http://collaboration.cmc.ec.gc.ca/science/outgoing/dominik.jacques/meopar_east/ are gzip compressed netCDF files. The NEMO atmospheric forcing files that the make_weather_forcing worker produces contain all of the hours from a given day’s forecast. In order to perform that concatenation in conjunction with the calculations on the precipitation variable, the hourly forecast files are decompressed into a temporary directory.

For this example, we’ll produce the 2016-10-26 forcing file. To do that we need to work with the files from the 2016-10-26 forecast (excluding the hour 0 file) as well as the final 2 hourly files from the 2016-10-25 forecast:

In [3]:
gunzip_dir = Path('/tmp/gunzipped/')
for f in sorted([f for f in gunzip_dir.iterdir()]):
    print(f)
/tmp/gunzipped/2016102500_023_2.5km_534x684.nc
/tmp/gunzipped/2016102500_024_2.5km_534x684.nc
/tmp/gunzipped/2016102600_001_2.5km_534x684.nc
/tmp/gunzipped/2016102600_002_2.5km_534x684.nc
/tmp/gunzipped/2016102600_003_2.5km_534x684.nc
/tmp/gunzipped/2016102600_004_2.5km_534x684.nc
/tmp/gunzipped/2016102600_005_2.5km_534x684.nc
/tmp/gunzipped/2016102600_006_2.5km_534x684.nc
/tmp/gunzipped/2016102600_007_2.5km_534x684.nc
/tmp/gunzipped/2016102600_008_2.5km_534x684.nc
/tmp/gunzipped/2016102600_009_2.5km_534x684.nc
/tmp/gunzipped/2016102600_010_2.5km_534x684.nc
/tmp/gunzipped/2016102600_011_2.5km_534x684.nc
/tmp/gunzipped/2016102600_012_2.5km_534x684.nc
/tmp/gunzipped/2016102600_013_2.5km_534x684.nc
/tmp/gunzipped/2016102600_014_2.5km_534x684.nc
/tmp/gunzipped/2016102600_015_2.5km_534x684.nc
/tmp/gunzipped/2016102600_016_2.5km_534x684.nc
/tmp/gunzipped/2016102600_017_2.5km_534x684.nc
/tmp/gunzipped/2016102600_018_2.5km_534x684.nc
/tmp/gunzipped/2016102600_019_2.5km_534x684.nc
/tmp/gunzipped/2016102600_020_2.5km_534x684.nc
/tmp/gunzipped/2016102600_021_2.5km_534x684.nc
/tmp/gunzipped/2016102600_022_2.5km_534x684.nc
/tmp/gunzipped/2016102600_023_2.5km_534x684.nc
/tmp/gunzipped/2016102600_024_2.5km_534x684.nc

To avoid restart artifacts from the HRDPS product, the final hour of the previous day’s forecast is used as the 0th hour of weather forcing:

In [4]:
nc_filepaths = [gunzip_dir/'2016102500_024_2.5km_534x684.nc']

We extend that list with the hourly forecast files for the day of interest:

In [5]:
nc_filepaths.extend(
    [gunzip_dir/Path(f).name for f in sorted(glob((gunzip_dir/'20161026*').as_posix()))])
for f in nc_filepaths:
    print(f)
/tmp/gunzipped/2016102500_024_2.5km_534x684.nc
/tmp/gunzipped/2016102600_001_2.5km_534x684.nc
/tmp/gunzipped/2016102600_002_2.5km_534x684.nc
/tmp/gunzipped/2016102600_003_2.5km_534x684.nc
/tmp/gunzipped/2016102600_004_2.5km_534x684.nc
/tmp/gunzipped/2016102600_005_2.5km_534x684.nc
/tmp/gunzipped/2016102600_006_2.5km_534x684.nc
/tmp/gunzipped/2016102600_007_2.5km_534x684.nc
/tmp/gunzipped/2016102600_008_2.5km_534x684.nc
/tmp/gunzipped/2016102600_009_2.5km_534x684.nc
/tmp/gunzipped/2016102600_010_2.5km_534x684.nc
/tmp/gunzipped/2016102600_011_2.5km_534x684.nc
/tmp/gunzipped/2016102600_012_2.5km_534x684.nc
/tmp/gunzipped/2016102600_013_2.5km_534x684.nc
/tmp/gunzipped/2016102600_014_2.5km_534x684.nc
/tmp/gunzipped/2016102600_015_2.5km_534x684.nc
/tmp/gunzipped/2016102600_016_2.5km_534x684.nc
/tmp/gunzipped/2016102600_017_2.5km_534x684.nc
/tmp/gunzipped/2016102600_018_2.5km_534x684.nc
/tmp/gunzipped/2016102600_019_2.5km_534x684.nc
/tmp/gunzipped/2016102600_020_2.5km_534x684.nc
/tmp/gunzipped/2016102600_021_2.5km_534x684.nc
/tmp/gunzipped/2016102600_022_2.5km_534x684.nc
/tmp/gunzipped/2016102600_023_2.5km_534x684.nc
/tmp/gunzipped/2016102600_024_2.5km_534x684.nc

Now we construct an “in-memory” representation of the the netCDF file that is our objective. To do that we use the xarray package.

In [6]:
ds = xarray.open_mfdataset([nc_filepath.as_posix() for nc_filepath in nc_filepaths])

“In-memory” is not strictly accurate because the concatenated dataset is rather large. The xarray.open_mfdataset() function employs another Python library called dask behind the scenes. dask delays the processing of the concatenated dataset until it is written to disk, enabling the the processing to be applied to the constituent forecast files one at a time. Doing so avoids loading all of the forecast files into memory at once, and takes advantage of multiple cores to do the processing.

So, we now have a data structure in which the hourly forecast files that we want are concatenated into time-series of atmospheric forcing variables.

Calculation of precipitation flux from cumulative precipitation

To visualize the dataset we will use a point in the HRDPS grid that is close to Yarmouth Airport (YQI). There is an Environment Canada meteo monitoring station at YQI from which hourly and daily observations are available.

In [7]:
y, x = 274, 155
print(ds.nav_lat.values[0, y, x], ds.nav_lon.values[0, y, x] - 360)
43.7817 -66.1517944336

Looking at the precipitation variable in our dataset:

In [8]:
ds.precip[:, y, x].plot(
    marker='o', label='{0.long_name} [{0.units}]'.format(ds.precip))
lgd = plt.legend(loc='best')
../_images/GoMSS-NEMO-model_NEMO-AtmosForcing_15_0.png

It is evident that the forecast precipitation is a cumulative value.

The hourly precipitation amount (in \(\frac{m}{hr}\)) is the hour-to-hour differences of the total precipitation values. To convert that to precipitation flux the factor is:

\[\frac{m}{hr} \times 1000 \frac{kg}{m^3} \times \frac{1}{3600} \frac{hr}{s} = \frac{1}{3.6}\]

So, working backward from the end of the forecast period, we calculate the flux for each hour:

In [9]:
for hr in range(24, 1, -1):
    ds.precip.values[hr] = (ds.precip.values[hr] - ds.precip.values[hr-1]) / 3.6
    print(hr, ds.precip.values[hr, y, x])
24 2.53478194483e-06
23 7.41960118628e-06
22 3.80719171113e-05
21 3.07586419189e-05
20 3.72315601756e-06
19 3.96086963721e-06
18 5.9035503202e-07
17 4.94063392075e-06
16 1.63548651876e-05
15 7.79322969417e-06
14 3.79420170147e-06
13 3.09635652229e-06
12 3.78012191504e-05
11 0.000188606726523
10 7.70576459925e-05
9 3.61938469319e-07
8 1.35291985417e-06
7 1.09993359527e-05
6 5.07865479449e-06
5 1.09111574097e-05
4 4.44664894733e-06
3 1.38603910374e-05
2 9.56290983191e-06

The hour 1 flux is the hour 1 cumulative value divided by 3.6 because the cumulative value at hour 0 is, by definition, zero:

In [10]:
ds.precip.values[1] /= 36
print(1, ds.precip.values[1, y, x])
1 5.98205941868e-07

As above, we avoid restart artifacts from the HRDPS product by calculating the hour 0 flux from the difference between the cumulative precipitation values in the final 2 hours of the previous day’s forecast files:

In [11]:
day_m1_precip = {}
for hr in (23, 24):
    nc_filepath = gunzip_dir/'2016102500_0{}_2.5km_534x684.nc'.format(hr)
    with xarray.open_dataset(nc_filepath.as_posix()) as hr_ds:
        day_m1_precip[hr] = hr_ds.precip.values[0, ...]
ds.precip.values[0] = (day_m1_precip[24] - day_m1_precip[23]) / 36
print(0, ds.precip.values[0, y, x])
0 2.44207007604e-07

Finally, we update the metadata attributes of the precip variable to reflect the change from cumulative to flux values:

In [12]:
ds.precip.attrs['units'] = 'kg m-2 s-1'
ds.precip.attrs['long_name'] = 'precipitation flux'
ds.precip.attrs['valid_max'] = 3

With that our precipitation near YQI now looks like:

In [13]:
ds.precip[:, y, x].plot(
    marker='o', label='{0.long_name} [{0.units}]'.format(ds.precip))
lgd = plt.legend(loc='best')
../_images/GoMSS-NEMO-model_NEMO-AtmosForcing_25_0.png

Calculation of solid precipitation component

NEMO uses the solid phase precipitation (snow) flux in addition to the total precipitation flux. Since the HRDPS product provides only total precipitation (which we have converted to flux above), we will calculate the snow flux by assuming that precipitation at grid points where the air temperature is >273.15 Kelvin is snow.

In [14]:
snow = ds.precip.copy()
snow.name = 'snow'
snow.attrs['short_name'] = 'snow'
snow.attrs['long_name'] = 'solid precipitation flux'
snow.values[ds.tair.values >= 273.15] = 0
snow
Out[14]:
<xarray.DataArray 'snow' (time_counter: 25, y: 684, x: 534)>
array([[[  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        ...,
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00]],

       [[  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        ...,
        [  4.71105673e-10,   3.35895238e-10,   1.86492365e-10, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  4.66930149e-10,   3.34405960e-10,   1.79570322e-10, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  4.60884318e-10,   3.14873707e-10,   1.62259786e-10, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00]],

       [[  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        ...,
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00]],

       ...,
       [[  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        ...,
        [  8.63036891e-08,   4.41076745e-08,   1.20501846e-08, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  6.95294449e-08,   3.00454644e-08,   7.77361731e-09, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  4.83119125e-08,   2.01756123e-08,   6.08449439e-09, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00]],

       [[  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        ...,
        [  1.90385521e-07,   1.24513437e-07,   6.79671233e-08, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  1.85208925e-07,   1.17347238e-07,   6.19586229e-08, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  1.67548953e-07,   1.08436949e-07,   6.12702590e-08, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00]],

       [[  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        ...,
        [  3.47586359e-07,   2.59830652e-07,   1.76381094e-07, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  3.47946904e-07,   2.53843366e-07,   1.63869116e-07, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
        [  3.07654394e-07,   2.29752730e-07,   1.54432540e-07, ...,
           0.00000000e+00,   0.00000000e+00,   0.00000000e+00]]])
Coordinates:
  * y             (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  * x             (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  * time_counter  (time_counter) datetime64[ns] 2016-10-26 ...
Attributes:
    units: kg m-2 s-1
    valid_min: 0.0
    valid_max: 3
    long_name: solid precipitation flux
    short_name: snow
    online_operation: inst(only(x))
    axis: TYX
    savelog10: 0.0

Merging the snow DataArray into the dataset to produce a new DataSet object:

In [15]:
ds_with_snow = xarray.merge((ds, snow))
ds_with_snow
Out[15]:
<xarray.Dataset>
Dimensions:       (time_counter: 25, x: 534, y: 684)
Coordinates:
  * y             (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  * x             (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  * time_counter  (time_counter) datetime64[ns] 2016-10-26 ...
Data variables:
    tair          (time_counter, y, x) float64 279.8 279.5 279.2 279.9 279.7 ...
    seapres       (time_counter, y, x) float64 1.027e+05 1.027e+05 1.027e+05 ...
    precip        (time_counter, y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    nav_lon       (time_counter, y, x) float32 285.263 285.289 285.314 ...
    nav_lat       (time_counter, y, x) float32 40.6528 40.6412 40.6295 ...
    solar         (time_counter, y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    therm_rad     (time_counter, y, x) float64 244.5 244.3 243.9 244.5 243.9 ...
    v_wind        (time_counter, y, x) float64 -2.484 -2.443 -2.443 -2.67 ...
    u_wind        (time_counter, y, x) float64 1.914 1.888 1.881 2.001 1.927 ...
    qair          (time_counter, y, x) float64 0.002977 0.002969 0.003103 ...
    snow          (time_counter, y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...

Although we can see above that there are non-zero snow fluxes at some grid points it appears that there is no significant snow flux at YQI on 2016-10-26:

In [16]:
ds_with_snow.snow[:, y, x].plot(
    marker='o', label='{0.long_name} [{0.units}]'.format(ds_with_snow.snow))
lgd = plt.legend(loc='best')
../_images/GoMSS-NEMO-model_NEMO-AtmosForcing_31_0.png

Storing the atmospheric forcing file

The forcing files are stored as netCDF4/HDF5 files. Doing so allows Lempel-Ziv deflation to be applied to each of the variables, resulting in a 50% to 80% reduction in the file size on disk.

The time units must also be specified when storing the file. We will use the fairly common seconds since 1970-01-01 00:00:00 units.

Those things are applied to the dataset by providing an encoding dict:

In [17]:
encoding = {var: {'zlib': True} for var in ds_with_snow.data_vars}
encoding['time_counter'] = {
    'units': 'seconds since 1970-01-01 00:00:00',
    'zlib': True,
}

encoding
Out[17]:
{'nav_lat': {'zlib': True},
 'nav_lon': {'zlib': True},
 'precip': {'zlib': True},
 'qair': {'zlib': True},
 'seapres': {'zlib': True},
 'snow': {'zlib': True},
 'solar': {'zlib': True},
 'tair': {'zlib': True},
 'therm_rad': {'zlib': True},
 'time_counter': {'units': 'seconds since 1970-01-01 00:00:00', 'zlib': True},
 'u_wind': {'zlib': True},
 'v_wind': {'zlib': True}}

And, with that, we can store the dataset on disk:

In [18]:
ds_with_snow.to_netcdf('gem2.5_y2016m10d26.nc', encoding=encoding)

It is at this point that the concatenation of the hourly forecast files into the forcing dataset occurs, as well as the calculation on the precipitation variable. The to_netcdf() method may take 60 seconds or longer, depending on the memory size, CPU speed, and number of cores available.

Plots of forcing variables time-series at a point

In [19]:
fig, axs = plt.subplots(5, 2, figsize=(20, 10))
var_axs = [
    ('precip', 0, 0),
    ('snow', 0, 1),
    ('tair', 1, 0),
    ('qair', 1, 1),
    ('u_wind', 2, 0),
    ('v_wind', 2, 1),
    ('solar', 3, 0),
    ('therm_rad', 3, 1),
    ('seapres', 4, 0),
]
for var, ax_row, ax_col in var_axs:
    factor = 10000 if var == 'precip' else 1
    (ds_with_snow.data_vars[var][:, y, x] * factor).plot(
        ax=axs[ax_row, ax_col], marker='o',
        label='{0.long_name} [{0.units}]'.format(ds_with_snow.data_vars[var]))
    axs[ax_row, ax_col].legend(loc='best')
    axs[ax_row, ax_col].grid()
fig_title = fig.suptitle(
    '{0[0]:%Y-%m-%d %H:%M} UTC to {0[24]:%Y-%m-%d %H:%M} UTC\n'
    '{1}°N, {2}°W'
    .format(
        pandas.to_datetime(ds_with_snow.time_counter.values),
        ds_with_snow.nav_lat[0, y, x].values,
        abs(ds_with_snow.nav_lon[0, y, x].values - 360),
))
../_images/GoMSS-NEMO-model_NEMO-AtmosForcing_38_0.png
In [20]:
ds.close()
ds_with_snow.close()