Read Fortran Binary Data Files

Although most of the data files I work with in atmosphere, ocean, and climate science are in self describing formats, such as netCDF, grib, etc, I still sometimes encounter binary data files written in fortran direct or sequential access. I want to be able to read in these files and convert them to xarray.Dataset so I can use all the xarray methods and tools to process them.

This notebook will demonstrate how to read a fortran binary sequential access file and convert it to an xarray.Dataset

Thanks to Phil Pegion at University of Colorado/CIRES and NOAA/PSL for showing me how to do this.

Data

I was given the location of the following file from a colleague (Thanks David Straus!). It consists of EOFs (spatial patterns of variability) based on anomalies of 500 hPa geopotential height and 250 hPa zonal winds. The file is located here on the COLA servers:

A [GrADS](http://cola.gmu.edu/grads/) .ctl file was provided with the data.  It has the important metadata that will be needed to know how to read the file.  Here are its contents:

dset /project/mjo/straus/ERAI_T42/eofs/DJF/5day_means/eofs_Z500U250_PNA_DJF_T42.1980-2014.dat undef -9999.9 title T42 gridded Minerva EOFs config Z500U250_PNA_DJF_T42 options sequential yrev xdef 128 linear 0.0 2.8125 ydef 64 LEVELS -87.86 -85.10 -82.31 -79.53 -76.74 -73.95 -71.16 -68.37 -65.58 -62.79 -60.00 -57.21 -54.42 -51.63 -48.84 -46.04 -43.25 -40.46 -37.67 -34.88 -32.09 -29.30 -26.51 -23.72 -20.93 -18.14 -15.35 -12.56 -9.77 -6.98 -4.19 -1.40 1.40 4.19 6.98 9.77 12.56 15.35 18.14 20.93 23.72 26.51 29.30 32.09 34.88 37.67 40.46 43.25 46.04 48.84 51.63 54.42 57.21 60.00 62.79 65.58 68.37 71.16 73.95 76.74 79.53 82.31 85.10 87.86 zdef 1 levels 1000 tdef 50 linear 01dec1980 1dy vars 2 geo 0 9 geo 500 uwn 0 9 geo 250 endvars ```

The key things to note from the .ctl file are: * lons = 128, start from 0.0 and increase in increments of 2.8125 * lats = 64 and are specified * the file has 50 “times”; these correspond to 50 EOFs * there are 2 variables in the file, geo (500 hPa geopotential height) and uwn (250 hPa zonal wind) * under options, it says sequential, so our file is sequential access. This means that each record has 2 extra INTEGER*4 in it. One at the beginning of the record and one at the end of the record. * Missing data is indicated with values of -9999.9

The format of the data file is based on the format that GrADS expects, described [here (http://cola.gmu.edu/grads/gadoc/aboutgriddeddata.html#structure).

Each record is a grid of all lats and lons. The records are in the following order:

  1. Time 1, geo

  2. Time 1, uwn

  3. Time 2, geo

  4. Time 2, uwn

[31]:
import numpy as np
from array import array
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt

Define the path and filename

[32]:
path='/project/mjo/straus/ERAI_T42/eofs/DJF/5day_means/'
fname='eofs_Z500U250_PNA_DJF_T42.1980-2014.dat'

Define the dimensions of the data based on the .ctl file

[33]:
nlons=128
nlats=64
neofs=50
nvars=2
missing_value=-9999.9

Define the coordinates of the data based on the .ctl file

[34]:
# Define lats
lats=[-87.86, -85.10, -82.31, -79.53, -76.74, -73.95, -71.16, -68.37, \
      -65.58, -62.79, -60.00, -57.21, -54.42, -51.63, -48.84, -46.04, \
      -43.25, -40.46, -37.67, -34.88, -32.09, -29.30, -26.51, -23.72, \
      -20.93, -18.14, -15.35, -12.56, -9.77,  -6.98,  -4.19,  -1.40, \
      1.40,   4.19,   6.98,   9.77,  12.56,  15.35,  18.14,  20.93, \
      23.72,  26.51,  29.30,  32.09,  34.88,  37.67,  40.46,  43.25, \
      46.04,  48.84,  51.63,  54.42,  57.21,  60.00,  62.79,  65.58, \
      68.37,  71.16,  73.95,  76.74,  79.53,  82.31,  85.10,  87.86]

# Yrev in .ctl file indicates lats are reversed
lats=lats[::-1]

# Define lons
lons=np.arange(128)*2.8125 + 0.0

# Define times as a pandas date range
eofs=np.arange(neofs)

Define the length of each record. Be sure to include the 2 extra integers for sequential access.

[35]:
recl=(nlons*nlats+2)*4

Create empty array to store the data

[36]:
data=np.zeros((neofs,nlats,nlons,nvars))

Read the data

[37]:
# Open file
luin = open(path+fname,'rb')

# Loop over all times
for e in range(neofs):

    # Loop over both variables
    for v in range(nvars):

        # Read in fortran record in bytes
        tmp=luin.read(recl)

        # Convert to single precision (real 32bit)
        tmp1=array('f',tmp)

        # Pull out data array (leaving behind fortran control records)for fortran sequential
        tmp2=tmp1[1:-1]

        # Create a 2d array (lat x lon) and store it in the data array
        data[e,:,:,v]=np.reshape(tmp2,(nlats,nlons))

Extract out our two variables

[38]:
z500=data[:,:,:,0]
u250=data[:,:,:,1]

Take care of missing data by setting it to NAN

[39]:
z500[z500<=missing_value]=np.nan
u250[u250<=missing_value]=np.nan

Put the data into an xarray.Dataset

[40]:
# 500 hPa Geopotential Height
z500_ds=xr.DataArray(z500,
                coords={'eofnum':eofs,
                        'lat':lats,
                        'lon': lons},
                        dims=['eofnum','lat','lon'])
z500_ds=z500_ds.to_dataset(name='z500')

# 250 hPa Zonal Wind
u250_ds=xr.DataArray(u250,
                coords={'eofnum':eofs,
                        'lat':lats,
                        'lon': lons},
                        dims=['eofnum','lat','lon'])
u250_ds=u250_ds.to_dataset(name='u250')

# Merge to have both in the same `xarray.Dataset`
ds=xr.merge([z500_ds,u250_ds])

This dataset has global values, but only contains valid data in a certain region. All other values are marked as missing. So, we will drop missing data for lat and lon where all the data are missing.

[41]:
ds=ds.dropna(dim='lon',how='all').dropna(dim='lat',how='all')

Now we have an xarray.Dataset with both variables in it

[42]:
ds
[42]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • eofnum: 50
    • lat: 22
    • lon: 55
    • eofnum
      (eofnum)
      int64
      0 1 2 3 4 5 6 ... 44 45 46 47 48 49
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
             36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
    • lat
      (lat)
      float64
      79.53 76.74 73.95 ... 23.72 20.93
      array([79.53, 76.74, 73.95, 71.16, 68.37, 65.58, 62.79, 60.  , 57.21, 54.42,
             51.63, 48.84, 46.04, 43.25, 40.46, 37.67, 34.88, 32.09, 29.3 , 26.51,
             23.72, 20.93])
    • lon
      (lon)
      float64
      149.1 151.9 154.7 ... 298.1 300.9
      array([149.0625, 151.875 , 154.6875, 157.5   , 160.3125, 163.125 , 165.9375,
             168.75  , 171.5625, 174.375 , 177.1875, 180.    , 182.8125, 185.625 ,
             188.4375, 191.25  , 194.0625, 196.875 , 199.6875, 202.5   , 205.3125,
             208.125 , 210.9375, 213.75  , 216.5625, 219.375 , 222.1875, 225.    ,
             227.8125, 230.625 , 233.4375, 236.25  , 239.0625, 241.875 , 244.6875,
             247.5   , 250.3125, 253.125 , 255.9375, 258.75  , 261.5625, 264.375 ,
             267.1875, 270.    , 272.8125, 275.625 , 278.4375, 281.25  , 284.0625,
             286.875 , 289.6875, 292.5   , 295.3125, 298.125 , 300.9375])
    • z500
      (eofnum, lat, lon)
      float64
      -0.002158 -0.002206 ... 0.02562
      array([[[-0.002158  , -0.00220595, -0.00228811, ...,  0.00327728,
                0.00300202,  0.00273191],
              [-0.00108383, -0.00128649, -0.00155361, ...,  0.00501033,
                0.00468588,  0.00440384],
              [-0.00026226, -0.00073544, -0.00130028, ...,  0.00727348,
                0.00694905,  0.00672204],
              ...,
              [ 0.00074248,  0.00130832,  0.00190614, ..., -0.0083919 ,
               -0.00664706, -0.00494827],
              [ 0.00231581,  0.00295281,  0.00362799, ..., -0.00347408,
               -0.00200538, -0.00059551],
              [ 0.00294929,  0.00352495,  0.00414385, ...,  0.00032963,
                0.00150379,  0.00257015]],
      
             [[ 0.02804737,  0.02952897,  0.03100066, ...,  0.00897593,
                0.00889765,  0.00883475],
              [ 0.03061595,  0.03248763,  0.03436497, ...,  0.00791413,
                0.00814203,  0.00835402],
              [ 0.03187996,  0.03406161,  0.03627764, ...,  0.00662772,
                0.00719402,  0.00768509],
              ...,
              [-0.01016358, -0.01049233, -0.0109251 , ...,  0.00520642,
                0.00627226,  0.00718058],
              [-0.00729834, -0.00732004, -0.00740464, ...,  0.0050148 ,
                0.00580437,  0.00641984],
              [-0.00459142, -0.00439604, -0.00423954, ...,  0.00432985,
                0.00484394,  0.0052273 ]],
      
             [[ 0.00612653,  0.00697537,  0.00784884, ...,  0.02855317,
                0.02793981,  0.0272431 ],
              [ 0.00512187,  0.00622983,  0.00738804, ...,  0.03207463,
                0.03138904,  0.03061321],
              [ 0.00410869,  0.00556858,  0.00711428, ...,  0.03529465,
                0.03467516,  0.03395033],
              ...,
              [-0.00271158, -0.00279433, -0.00307642, ..., -0.0118286 ,
               -0.01089127, -0.00978137],
              [-0.0045    , -0.00489188, -0.00544718, ..., -0.00718506,
               -0.00629644, -0.00534736],
              [-0.00458211, -0.00515804, -0.00583121, ..., -0.00363517,
               -0.00283486, -0.00202041]],
      
             ...,
      
             [[-0.01864509, -0.01075046, -0.00266998, ...,  0.04010107,
                0.04090822,  0.04174188],
              [-0.05606855, -0.04493044, -0.03325174, ...,  0.03202585,
                0.03159026,  0.03131655],
              [-0.08085682, -0.06702159, -0.05223473, ...,  0.02522704,
                0.02314838,  0.02118093],
              ...,
              [ 0.02627099,  0.02653411,  0.02628059, ...,  0.00544976,
                0.00259615,  0.00035392],
              [ 0.02030761,  0.02021568,  0.01993696, ...,  0.00059026,
               -0.0018854 , -0.00314069],
              [ 0.01401448,  0.01381945,  0.01367958, ..., -0.00225147,
               -0.00419645, -0.00532932]],
      
             [[ 0.04751408,  0.04561215,  0.04331871, ..., -0.00338019,
               -0.00301781, -0.00286037],
              [ 0.06166209,  0.05696166,  0.05177492, ...,  0.0201336 ,
                0.02451151,  0.0284923 ],
              [ 0.07115426,  0.0632488 ,  0.05454243, ...,  0.03891129,
                0.04851751,  0.05747972],
              ...,
              [ 0.02097011,  0.02108839,  0.02070948, ..., -0.00358492,
               -0.00651896, -0.00976063],
              [ 0.01574826,  0.01597328,  0.01606623, ..., -0.00159008,
               -0.00374378, -0.00641956],
              [ 0.01074952,  0.01153862,  0.01228776, ..., -0.00168329,
               -0.00290302, -0.00454319]],
      
             [[-0.0435605 , -0.04174044, -0.04017138, ...,  0.01181326,
                0.00920507,  0.00700798],
              [-0.00962206, -0.00871575, -0.00828135, ...,  0.00396727,
                0.00270966,  0.00181388],
              [ 0.01499207,  0.01506878,  0.01479533, ...,  0.00439176,
                0.00604054,  0.00743839],
              ...,
              [ 0.01366874,  0.01115884,  0.00764642, ...,  0.02832497,
                0.03550598,  0.04137369],
              [ 0.01450719,  0.01236514,  0.00922952, ...,  0.02451376,
                0.02978178,  0.03393774],
              [ 0.01300429,  0.01168108,  0.00948214, ...,  0.01950187,
                0.02298067,  0.02562426]]])
    • u250
      (eofnum, lat, lon)
      float64
      0.001532 0.001246 ... -0.01382
      array([[[ 1.53196824e-03,  1.24586432e-03,  9.37075180e-04, ...,
                6.81354245e-03,  6.58087712e-03,  6.39861170e-03],
              [ 1.26479927e-03,  6.66564796e-04,  2.11234419e-05, ...,
                8.02088436e-03,  7.93377869e-03,  7.94643722e-03],
              [ 1.78612303e-04, -7.32205750e-04, -1.68895361e-03, ...,
                9.76998080e-03,  9.75693483e-03,  9.87682119e-03],
              ...,
              [ 3.72749264e-03,  3.31535260e-03,  3.01158777e-03, ...,
                3.34553272e-02,  3.23658995e-02,  3.10922600e-02],
              [ 6.10744406e-04, -5.86430833e-04, -1.63436146e-03, ...,
                2.82338019e-02,  2.65578907e-02,  2.45361514e-02],
              [-1.07795326e-03, -2.63791531e-03, -3.96567956e-03, ...,
                1.90932527e-02,  1.66952163e-02,  1.38983568e-02]],
      
             [[ 1.00594200e-02,  1.12567339e-02,  1.24731837e-02, ...,
               -3.62311769e-03, -2.97147152e-03, -2.32227636e-03],
              [ 8.11136235e-03,  9.27418191e-03,  1.04863551e-02, ...,
               -4.78920713e-03, -3.68509651e-03, -2.62138387e-03],
              [ 4.97528724e-03,  5.94995404e-03,  7.01734098e-03, ...,
               -7.43465358e-03, -6.03462383e-03, -4.71026357e-03],
              ...,
              [ 1.62234008e-02,  1.70329642e-02,  1.82482880e-02, ...,
               -1.31671099e-04, -1.40730094e-03, -2.23548221e-03],
              [ 1.41241997e-02,  1.47185987e-02,  1.56279914e-02, ...,
               -5.68011869e-03, -7.20053446e-03, -8.12136848e-03],
              [ 1.13609452e-02,  1.16782486e-02,  1.21250292e-02, ...,
               -9.37343389e-03, -1.12014730e-02, -1.26279639e-02]],
      
             [[ 5.09383040e-04,  1.29347155e-03,  2.09772983e-03, ...,
                1.21854078e-02,  1.20889908e-02,  1.18973004e-02],
              [-3.26130103e-05,  9.23476007e-04,  1.93817867e-03, ...,
                1.22913644e-02,  1.25519698e-02,  1.27273453e-02],
              [-5.42328809e-04,  6.01720414e-04,  1.84066058e-03, ...,
                1.15508195e-02,  1.21098235e-02,  1.25962971e-02],
              ...,
              [ 7.66455848e-03,  6.53720181e-03,  5.07525913e-03, ...,
                3.00171692e-02,  3.08416262e-02,  3.11397780e-02],
              [ 1.31286019e-02,  1.30052613e-02,  1.28297415e-02, ...,
                3.02441884e-02,  3.00411005e-02,  2.93272361e-02],
              [ 1.56389531e-02,  1.62307527e-02,  1.68851018e-02, ...,
                2.49920674e-02,  2.42295861e-02,  2.30317246e-02]],
      
             ...,
      
             [[-7.12160170e-02, -6.53306171e-02, -5.89000285e-02, ...,
               -1.14088216e-04, -1.22301700e-03, -2.37001758e-03],
              [-4.89874184e-02, -4.24760096e-02, -3.52199897e-02, ...,
               -2.93661770e-03, -5.10399602e-03, -7.26001710e-03],
              [-1.08780088e-02, -4.85702697e-03,  1.81823678e-03, ...,
               -1.03401281e-02, -1.11119458e-02, -1.17218299e-02],
              ...,
              [-2.37228852e-02, -2.00150795e-02, -1.47998091e-02, ...,
               -3.04615218e-02, -3.48764658e-02, -4.01759148e-02],
              [-8.65714066e-03, -6.96240878e-03, -5.22414362e-03, ...,
               -1.07812416e-02, -1.70599930e-02, -2.31175795e-02],
              [-6.50559831e-03, -4.75029740e-03, -2.25897972e-03, ...,
                3.75310797e-03, -1.70688587e-03, -4.97795641e-03]],
      
             [[ 5.29689156e-02,  4.81081195e-02,  4.32048775e-02, ...,
                6.89526349e-02,  7.62244985e-02,  8.32419321e-02],
              [ 3.72534320e-02,  3.16580981e-02,  2.59131584e-02, ...,
                8.02049711e-02,  9.05841812e-02,  1.00608267e-01],
              [ 1.99556220e-02,  1.36172855e-02,  6.86939387e-03, ...,
                6.63188621e-02,  7.91637301e-02,  9.17789042e-02],
              ...,
              [-9.82709881e-03, -6.93462091e-03, -3.35136335e-03, ...,
               -3.82250622e-02, -3.13070156e-02, -2.07038224e-02],
              [-5.54733863e-03, -4.38756030e-03, -2.61910656e-03, ...,
               -7.21446276e-02, -6.06871732e-02, -4.43444364e-02],
              [-5.97783830e-03, -7.21936952e-03, -7.65510974e-03, ...,
               -6.61494285e-02, -5.45869619e-02, -3.94132882e-02]],
      
             [[ 6.16648011e-02,  6.04927838e-02,  5.89608960e-02, ...,
                2.50439486e-03,  2.16319901e-03,  2.11935118e-03],
              [ 6.20941110e-02,  6.10787347e-02,  5.98138869e-02, ...,
                4.89066541e-03,  6.79735607e-03,  8.60090461e-03],
              [ 2.95377690e-02,  2.76271030e-02,  2.61110347e-02, ...,
                1.57942995e-02,  1.97941288e-02,  2.30482370e-02],
              ...,
              [ 1.10309701e-02,  1.46572599e-02,  1.64470412e-02, ...,
                2.29242872e-02,  1.70561224e-02,  5.99110778e-03],
              [-1.02932882e-02, -1.30164074e-02, -1.45851653e-02, ...,
                2.29679886e-02,  1.35463998e-02,  1.58838506e-04],
              [-1.93685014e-02, -2.25512218e-02, -2.36124173e-02, ...,
                1.39191495e-02,  7.63459189e-04, -1.38242673e-02]]])

Let’s see what our data look like

[43]:
plt.contourf(ds['z500'][0,:,:])
[43]:
<matplotlib.contour.QuadContourSet at 0x7fb0c0872f28>
../_images/examples_read-fortran-binary_28_1.png
[44]:
plt.contourf(ds['u250'][0,:,:])
[44]:
<matplotlib.contour.QuadContourSet at 0x7fb0c086cc88>
../_images/examples_read-fortran-binary_29_1.png

Write our data out to a netcdf file

[45]:
ds.to_netcdf('eofs.nc')
[ ]: