Scaling a Subset of Input Data in NetCDF Format

Reference: For general python handling of nc files using Xarray: http://xarray.pydata.org/en/stable/index.html

I am working on a set of input data of Ocean DMS concentration. I suspect the reported values over the Southern Ocean (vaguely defined as ocean south to 30ºS) is a bit high. I want to reduce those values by 50% to see if my model will match observations better.

Surprisingly, I didn’t find a quick way to do it. Here I present a work-a-round. (Please let me know if you have a smarter way.)

import wget                        # for downloading data via URL links
import xarray as xr                   # the major tool to work with NetCDF data!
from matplotlib import pyplot as plt  # for plotting

Open the monthly Ocean DMS nc file

The marine DMS data used in this example can be downloaded here. Or, we can download it from Python.

wget.download('http://rain.ucis.dal.ca/ctm/HEMCO/DMS/v2015-07/DMS_lana.geos.1x1.nc')
'DMS_lana.geos.1x1.nc'

We will use the function xr.open_dataset from the xarray package.

ds = xr.open_dataset('DMS_lana.geos.1x1.nc')

ds # display the dataset
<xarray.Dataset>
Dimensions:    (lat: 181, lon: 360, time: 12)
Coordinates:

  • lon (lon) float32 -180.0 -179.0 -178.0 -177.0 … 177.0 178.0 179.0
  • lat (lat) float32 -89.75 -89.0 -88.0 -87.0 … 87.0 88.0 89.0 89.75
  • time (time) datetime64[ns] 1985-01-01 1985-02-01 … 1985-12-01 Data variables: DMS_OCEAN (time, lat, lon) float32 … Attributes: CDI: Climate Data Interface version 1.6.4 (http://code.zmaw.de/p... Conventions: COARDS history: Mon Jul 20 16:29:35 2015: ncatted -O -a units,DMS_OCEAN,o,c… Title: COARDS/netCDF file created by BPCH2COARDS (GAMAP v2-17+) Format: NetCDF-3 Model: GEOS5_47L Delta_Lon: 1.0 Delta_Lat: 1.0 NLayers: 47 Start_Date: 19850101 Start_Time: 0 End_Date: 19850201 End_Time: 0 Delta_Time: 744 CDO: Climate Data Operators version 1.6.4 (http://code.zmaw.de/p...

We can also visualize annual mean of the data for a quick check:

ds['DMS_OCEAN'].mean(dim=['time']).plot()
<matplotlib.collections.QuadMesh at 0x7fd61bbd01f0>

png

Scaling down the Southern Ocean DMS

As mentioned, we want to scale down the ocean DMS below 30ºS.

# Step 1: make two copies of the data set, one for grid cells with lat > 30ºS, one for <30ºS

dms_above30S = ds['DMS_OCEAN'].sel(lat=slice(-30,90))

dms_below30S = ds['DMS_OCEAN'].sel(lat=slice(-90,-30-1e-9)) # we add a small value here to avoid the grid cells with lat = 30ºS to appear in both data sets.

# Step 2: reduce the DMS in the dataset of interest by 50% 

dms_below30S = dms_below30S * 0.5

# Step 3: merge the untouched and scaled data
ds_partly_scaled = xr.merge([dms_below30S, dms_above30S], compat="no_conflicts")

To ensure our exercise is correct, let’s make some plots to compare the DMS concentrations in ds_partly_scaled and the original dataset, ds.

(ds_partly_scaled['DMS_OCEAN'] / ds['DMS_OCEAN']).plot(col='time', col_wrap=3)
<xarray.plot.facetgrid.FacetGrid at 0x7fd61bd3ba00>

png

Yellow = 1 (untouched); Purple = 0.5(scaled). Mission accomplished?

ds_partly_scaled # missing attributes from the original dataset.
<xarray.Dataset>
Dimensions:    (lat: 181, lon: 360, time: 12)
Coordinates:

  • lat (lat) float64 -89.75 -89.0 -88.0 -87.0 … 87.0 88.0 89.0 89.75

  • lon (lon) float32 -180.0 -179.0 -178.0 -177.0 … 177.0 178.0 179.0

  • time (time) datetime64[ns] 1985-01-01 1985-02-01 … 1985-12-01 Data variables: DMS_OCEAN (time, lat, lon) float32 0.0 0.0 … 1.4103641e-07 1.4103641e-07

Preserving metadata from the original nc file

Not yet! Now the new merged data set is missing the Attributes of the original.

To fix this, we need to make a xarray copy of the original to preserve the original information, but with the new scaled values.

ds_new = ds.copy(data=ds_partly_scaled)

ds_new # now with the Attributes
<xarray.Dataset>
Dimensions:    (lat: 181, lon: 360, time: 12)
Coordinates:

  • lon (lon) float32 -180.0 -179.0 -178.0 -177.0 … 177.0 178.0 179.0

  • lat (lat) float64 -89.75 -89.0 -88.0 -87.0 … 87.0 88.0 89.0 89.75

  • time (time) datetime64[ns] 1985-01-01 1985-02-01 … 1985-12-01 Data variables: DMS_OCEAN (time, lat, lon) float32 0.0 0.0 … 1.4103641e-07 1.4103641e-07 Attributes: CDI: Climate Data Interface version 1.6.4 (http://code.zmaw.de/p... Conventions: COARDS history: Mon Jul 20 16:29:35 2015: ncatted -O -a units,DMS_OCEAN,o,c… Title: COARDS/netCDF file created by BPCH2COARDS (GAMAP v2-17+) Format: NetCDF-3 Model: GEOS5_47L Delta_Lon: 1.0 Delta_Lat: 1.0 NLayers: 47 Start_Date: 19850101 Start_Time: 0 End_Date: 19850201 End_Time: 0 Delta_Time: 744 CDO: Climate Data Operators version 1.6.4 (http://code.zmaw.de/p...

Finally, save the scaled DMS in a new file

And, it’s one-line easy.

ds_new.to_netcdf('DMS_lana.geos.1x1.SouthernHalved.nc') # always good to have a descriptive name

Again, I know that it’s likely not the best way. Please let me know if you have a smarter way. I am always open to learn!

Ka Ming FUNG
Ka Ming FUNG
Data Scientist

Data Scientist who’s interested in the interactions between food security, air pollution, environmental health, and climate change

Related