NCI data access training
Monash University, July 2017
Trainers
Dr Jingbo Wang
Dr Joseph Antony
Dr Adam Steer
Aims:
Students should leave with an understanding of how to:
1. Access data from NCI remotely using a web coverage service request
directly in a web browser or programmatic web request
using Python's OWSlib
2. Access data remotely using the NetCDF subset service
directly in a web browser or programmatic request
using Python's Siphon library
Assumptions:
Some familiarity with Python 3 and the Jupyter environment
Some familiarity with netCDF files
Students have been provided NCI materials on data discovery
Caution - Most of NCI's example notebooks are developed using Python 2. This is a Python 3 environment - if in doubt about some aspect of the code please ask
Task: extracting a data subset from a massive file - ocean colour, 15.65gb
We want to grab ocean colour in a small region off the east coast of Australia (say, the coast of Victoria), but we don't want to download the whole 15gb file to do that. Here is our dataset in the NCI THREDDS catalogue:
http://dapds00.nci.org.au/thredds/catalog/u39/public/data/modis/oc.stacked/v201503/catalog.html
We'll use two different services - Web Coverage Service and the NetCDF Subset Service to get some data.
1. Web Coverage services
Required libraries:
OWSlib
matplotlib
scipy
numpy
io
Reference notebooks:
This example is based on material here: https://github.com/geopython/OWSLib/blob/master/examples/wcs-thredds-prism.py
We might not know what is in our data - how can we find out?
We see two available layers - 'chl_oc3' and 'l2 flags'. Since we are after ocean colour and not QA flags, let's proceed with 'chl_oc3'
Now we know enough to build a query and get some data:
spatial and temporal boundaries
available file formats
CRS information is not available, let's try WGS84 (EPSG 4326, OGC:CRS84)
Try pasting the URL above into a web browser - or modifying any of the parameters to see what you get
WCS summary
In this section we showed how to build a Web Coverage Service request using data held as NetCDF files at NCI. The reference notebook 'THREDDS_WMS_WCS.ipynb' shows in detail how to construct URLs and what all the components mean. Here, we used a Python library as a convenient tool to get data (1.15 mb from a 15 gb file) for a specific time and region, then do a quick visualisation in an interactive notebook without writing any files out.
If you're racing ahead, pick some other NCI data examples and try the same process. See what you come up with!
GDAL and cartopy are also available in this environment - can you make a prettier map?
2. NetCDF subset service and Siphon
Required libraries:
netCDF4
Siphon
numpy
matplotlib
datetime
Reference notebooks:
https://github.com/nci/nci-notebooks/blob/master/Data_Access/Using_Thredds/THREDDS_DataAccess.ipynb
https://github.com/nci/nci-notebooks/blob/master/Data_Access/Using_Siphon/Python_Siphon_II.ipynb
This material is mainly based on the notebook https://github.com/nci/nci-notebooks/blob/master/Data_Access/Using_Siphon/Python_Siphon_II.ipynb and material here: https://unidata.github.io/siphon/examples/ncss/NCSS_Example.html#sphx-glr-examples-ncss-ncss-example-py
Now we have a dataset of choice, how can we find out more about it before we request data?
The next few cells are more or less executing various netCDF metadata queries and returning the results
Now that we know something about the data, we can construct a request to get part of it
NCSS summary
We've shown here how to generate a programmatic request for data subsets using the NetCDF subset service available on NCI's THREDDS data server. We've also shown that we can visualise data subsets quickly and easily.
How is NCSS different from WCS?
NCSS primarily gives us the ability to pull n-dimensional data subsets. In the WCS example, we could grab a single time slice only. With NCSS we can create little blocks of data in 3 or more dimensions. Both are essentially lazy - the data can be queried before download. NCSS allows a little deeper interrogation than WCS since WCS has to not care about underlying data formatting, whereas NCSS is specifically designed to expose NetCDF file properties.
Which is better for your task?
That's your personal decision. Go forth and analyse!