A sturgeon fish
Internally, GLOS refers to legacy data infrastructure as “Sturgeon.”

By Joe Smith, Cyberinfrastructure Engineer

Gigabytes of observing data have been sent to GLOS over the past decade, capturing a snapshot of the life of the lakes. To make sure that this historical dataset is available on Seagull, we worked to accurately transfer data, platform-by-platform.

As datasets were added, they became available via Seagull ERDDAP, and soon they will be available on Seagull’s front-end to plot and compare across multiple platforms.

Icons and filenames of hundreds of NetCDF files.
Sets of NetCDF files, like these from 2018 data collected by SBEDISON, were transferred to Seagull, platform-by-platform.

All of the legacy historical data, including quality checks, totals nearly 32 GB of NetCDF files. Those files are stored in the Cloud, in an AWS S3 bucket, and will be preserved indefinitely on Seagull.

The transfer process worked like this:

1) The entire collection of NetCDF files are scanned for parameters and ensured they are registered both in Seagull Staging (a private testing version) and Production (the version everyone can access at seagull.glos.org).

2) We execute a test transfer to Seagull of a few files from a selected platform and do a manual quality control check. If any parameters were missed or logged incorrectly, we get a message and investigate. We then perform visual checks via a staging ERDDAP server.

3) Once our confidence is high, we transfer all NetCDF files from that platform to Seagull Staging. Like in Step 2, we get messages if something did not transfer or was not logged correctly and then do a visual check using a staging ERDDAP server. If there is an unexpected code stoppage, we log the files already transferred, and when restarting the process, we pick back up where we were interrupted.

4) Once the staging environment is accurately populated, we initiate the transfer to the production environment and perform more quality checks, ensuring there is no duplicate data in the system.

5) Once everything looks good, all of a platform’s NetCDF data is imported to the Seagull Production environment.

A plot from Seagull showing surface water temperature in 2020.
Users can plot or download historical data on Seagull.

GLOS has even more data in other archives that we plan to import into Seagull in the future. Some of this resides in our old metadata catalog. We’ll check to see which entries are outdated, which entries could use an updated version under Seagull’s metadata catalog (powered by Geoportal), and which entries can be safely transitioned as-is to Seagull.

Joe Smith seated with computers and posters behind him.
Transferring data while in the home office.

If you have any questions or comments, feel free to reach out to me at joe@glos.org. Otherwise, you can always reach out to the rest of the team at support@glos.org.

An illustration showing an arrow from a server pointing to Seagull and the text "Transfer Complete."

Data + Info