Skip to content

Scaling a Subset of Input Data in NetCDF Format

Reference: For general python handling of nc files using Xarray: http://xarray.pydata.org/en/stable/index.html

I was working on a set of input data of Ocean DMS concentration. I suspected the reported values over the Southern Ocean (vaguely defined as ocean south to 30ºS) were a bit high. I wanted to reduce those values by 50% to see if my model would match observations better. 🌊

Surprisingly, I couldn't find a quick way to do it. Here I present a workaround. (Please let me know if you have a smarter way! 💡)

import wget                        # for downloading data via URL links
import xarray as xr                   # the major tool to work with NetCDF data!
from matplotlib import pyplot as plt  # for plotting

Open the monthly Ocean DMS nc file

The marine DMS data used in this example can be downloaded from the Dalhousie University server. Or, we can download it from Python.

wget.download('http://rain.ucis.dal.ca/ctm/HEMCO/DMS/v2015-07/DMS_lana.geos.1x1.nc')
'DMS_lana.geos.1x1.nc'

We will use the function xr.open_dataset from the xarray package.

ds = xr.open_dataset('DMS_lana.geos.1x1.nc')

ds # display the dataset
<xarray.Dataset>
Dimensions: (lat: 181, lon: 360, time: 12)
Coordinates:

- lon (lon) float32 -180.0 -179.0 -178.0 -177.0 ... 177.0 178.0 179.0
- lat (lat) float32 -89.75 -89.0 -88.0 -87.0 ... 87.0 88.0 89.0 89.75
- time (time) datetime64[ns] 1985-01-01 1985-02-01 ... 1985-12-01
  Data variables:
  DMS_OCEAN (time, lat, lon) float32 ...
  Attributes:
  CDI: Climate Data Interface version 1.6.4 (...
  Conventions: COARDS
  history: Mon Jul 20 16:29:35 2015: ncatted -O -a units,DMS_OCEAN,o,c...
  Title: COARDS/netCDF file created by BPCH2COARDS (GAMAP v2-17+)
  Format: NetCDF-3
  Model: GEOS5_47L
  Delta_Lon: 1.0
  Delta_Lat: 1.0
  NLayers: 47
  Start_Date: 19850101
  Start_Time: 0
  End_Date: 19850201
  End_Time: 0
  Delta_Time: 744
  CDO: Climate Data Operators version 1.6.4 (...

We can also visualize annual mean of the data for a quick check:

ds['DMS_OCEAN'].mean(dim=['time']).plot()
<matplotlib.collections.QuadMesh at 0x7fd61bbd01f0>

png

Scaling down the Southern Ocean DMS

As mentioned, we want to scale down the ocean DMS below 30ºS.

# Step 1: make two copies of the data set, one for grid cells with lat > 30ºS, one for <30ºS

dms_above30S = ds['DMS_OCEAN'].sel(lat=slice(-30,90))

dms_below30S = ds['DMS_OCEAN'].sel(lat=slice(-90,-30-1e-9)) # we add a small value here to avoid the grid cells with lat = 30ºS to appear in both data sets.

# Step 2: reduce the DMS in the dataset of interest by 50%

dms_below30S = dms_below30S * 0.5

# Step 3: merge the untouched and scaled data
ds_partly_scaled = xr.merge([dms_below30S, dms_above30S], compat="no_conflicts")

To ensure our exercise is correct, let's make some plots to compare the DMS concentrations in ds_partly_scaled and the original dataset, ds.

(ds_partly_scaled['DMS_OCEAN'] / ds['DMS_OCEAN']).plot(col='time', col_wrap=3)
<xarray.plot.facetgrid.FacetGrid at 0x7fd61bd3ba00>

png

Yellow = 1 (untouched); Purple = 0.5(scaled). Mission accomplished?

ds_partly_scaled # missing attributes from the original dataset.
<xarray.Dataset>
Dimensions: (lat: 181, lon: 360, time: 12)
Coordinates:

- lat (lat) float64 -89.75 -89.0 -88.0 -87.0 ... 87.0 88.0 89.0 89.75
- lon (lon) float32 -180.0 -179.0 -178.0 -177.0 ... 177.0 178.0 179.0
- time (time) datetime64[ns] 1985-01-01 1985-02-01 ... 1985-12-01
  Data variables:
  DMS_OCEAN (time, lat, lon) float32 0.0 0.0 ... 1.4103641e-07 1.4103641e-07

Preserving metadata from the original nc file

Not yet! Now the new merged data set is missing the Attributes of the original.

To fix this, we need to make a xarray copy of the original to preserve the original information, but with the new scaled values.

ds_new = ds.copy(data=ds_partly_scaled)

ds_new # now with the Attributes
<xarray.Dataset>
Dimensions: (lat: 181, lon: 360, time: 12)
Coordinates:

- lon (lon) float32 -180.0 -179.0 -178.0 -177.0 ... 177.0 178.0 179.0
- lat (lat) float64 -89.75 -89.0 -88.0 -87.0 ... 87.0 88.0 89.0 89.75
- time (time) datetime64[ns] 1985-01-01 1985-02-01 ... 1985-12-01
  Data variables:
  DMS_OCEAN (time, lat, lon) float32 0.0 0.0 ... 1.4103641e-07 1.4103641e-07
  Attributes:
  CDI: Climate Data Interface version 1.6.4 (...
  Conventions: COARDS
  history: Mon Jul 20 16:29:35 2015: ncatted -O -a units,DMS_OCEAN,o,c...
  Title: COARDS/netCDF file created by BPCH2COARDS (GAMAP v2-17+)
  Format: NetCDF-3
  Model: GEOS5_47L
  Delta_Lon: 1.0
  Delta_Lat: 1.0
  NLayers: 47
  Start_Date: 19850101
  Start_Time: 0
  End_Date: 19850201
  End_Time: 0
  Delta_Time: 744
  CDO: Climate Data Operators version 1.6.4 (...

Finally, save the scaled DMS in a new file

And, it's one-line easy.

ds_new.to_netcdf('DMS_lana.geos.1x1.SouthernHalved.nc') # always good to have a descriptive name

Again, I know that it's likely not the best way. Please let me know if you have a smarter way. I am always open to learn!