Aggregating Data
Overview
Teaching: 0 min
Exercises: 0 minQuestions
How to I aggregate data over various dimensions?
Objectives
The Oceanic Nino Index ONI is used by the Climate Prediction Center to monitor and predict El Nino and La Nina. It is defined as the 3-month running mean of SST anomalies in the Nino3.4 region. We will use aggregation methods from xarray
to calculate this index.
First Steps
Create a new notebook and save it as Aggregating.ipynb
Import the standard set of packages we use:
import xarray as xr
import numpy as np
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
Read in the dataset we wrote in the last notebook.
file='/scratch/kpegion/nino34_1982-2019.oisstv2.nc'
ds=xr.open_dataset(file)
ds
Take the mean over a lat-lon region
ds_nino34_index=ds.mean(dim=['lat','lon'])
ds_nino34_index
Our data now has only a time dimension. Make a plot.
plt.plot(ds_nino34_index['time'],ds_nino34_index['sst'])
Calculate Anomalies
Anomaly means departure from normal (called climatology). We often calculate anomalies for working with climate data. We will spend time in a future class learning about calculating climatology and anomalies using another feature of xarray
called groupby
. Today, I will show you the steps with little explanation.
ds_climo=ds_nino34_index.groupby('time.month').mean()
ds_anoms=ds_nino34_index.groupby('time.month')-ds_climo
ds_anoms
Plot our data.
plt.plot(ds_anoms['time'],ds_anoms['sst'])
Why do I constantly print and plot the data?
Printing the data so I can see its dimensions after each step provides a check on whether my code did what I intended it to do.
Plotting also gives me a quick look to make sure everything makes sense.
I encourage you to do the same when developing and testing new code!
Rolling (Running Means)
The ONI is calculated using a 3-month running mean. This can be done using the rolling
function.
Reading and Learning from Documentation
Read the documentation for the
xarray.rolling
function. Following their example, make a 3-month running mean of theds_anoms
data.Solution
ds_3m=ds_anoms.rolling(time=3,center=True).mean().dropna(dim='time') ds_3m
Let’s plot our original and 3-month running mean data together
plt.plot(ds_anoms['sst'],color='r')
plt.plot(ds_3m['sst'],color='b')
plt.legend(['orig','smooth'])
Some other aggregation functions
There are a number of other aggregate functions such as:
std
,min
,max
,sum
, among others.Using the original dataset in this notebook
ds
, find and plot the maximum SSTs for each gridpoint over the time dimension.Solution
ds_max=ds.max(dim='time') plt.contourf(ds_max['sst'],cmap='Reds') plt.colorbar()
Using the original dataset in this notebook
ds
, calculate and plot the standard deviation at each gridpoint.Solution
ds_std=ds.std(dim='time') plt.contourf(ds_std['sst'],cmap='RdBu_r') plt.colorbar()
Key Points