Data from Sensors¶

Introduction¶

This section describes the process of submitting instrument-based sensor data to Axiom Data Science from an environmental sensor at a fixed or moving location.

Sensor data is sometimes called streaming data, but at Axiom we differentiate between the two. Sensor data is data that comes directly from an instrument (or was collected by a sensor directly). Sensor data may or may not be automatic. In contrast, streaming data is an instrument broadcasting its measurements for all to see and find.

This distinction is nuanced, and there is overlap. We welcome further discussion!

Terminology¶

A Station represents a physical platform that collects observations via one or more sensor packages. These sensors provide a stream of data for one or more variables. Data collected over time for one or more variables forms a dataset.

Note

Example: the Chukchi Ice Detection Buoy. This station is a mooring with multiple Sea-Bird SBE 37-SI sensors that collect data for temperature, conductivity, and pressure variables. This mooring is deployed annually, with slight tweaks to the sensor setup each time, so there is one dataset per year.

The CF Conventions we refer to are the Climate and Forecast Metadata Conventions that are community maintained and aim to improve data sharing.

General Guidelines¶

Use community standards. Follow CF Conventions for naming variables and structuring data. If possible, use a structured, self-describing format like NetCDF. For real-time data, use an existing, community-supported data server like ERDDAP.

Be consistent. Use the same file format and variable names across datasets, and from station to station. We use scripts to ingest data whenever possible, so any inconsistencies will require manual intervention and lead to delays.

Data Submission Guidelines¶

To host your data, we need two things from you: a station definition and your datasets. The station definition can be provided once and only changed as needed. For continuous, real-time data submission, you should set up a data server so that we can pull the data from you on a regular basis. For historical or manual data submission, we can set up a data transfer through Research Workspace (reach out to info@researchworkspace.com if you do not already have an account with us).

Please fill out the Sensor Data Ingest Request Form to complete your data submission request with Axiom. The sections below provide more details about the information requested in this form, as well as how best to prepare your data and metadata for submission.

Requirements to Submit¶

A dataset with header information, as clean as possible
Metadata for the dataset, as in information about the scientific and technical details of the measurements
An access point for that data (API, THREDDS, manual transfer of files, etc) and any authentication Axiom will need to access it
Any pre-processing needs identified
Any quality control methods documented
A technical point of contact
A scientific point of contact
If information not already provided in metadata:
- A list of the stations included in the dataset, in a CSV file with a header
- A README TXT file with at least the file list, data accreditation, and data licensing detailed

More details about preparing your data and metadata below.

Dataset Details¶

Please provide your dataset in one of the following file formats, in order of preference:

NetCDF
CSV or TSV

Sometimes, in-situ sensor data is not ready for data manipulation or readily available to standard tools. Some examples of pre-processing we have seen include loading data into the ERRDDAP servers or calibration adjustments to the instruments. These are best done by the scientific team responsible for the sensor.

NetCDF Guidelines¶

Follow ACDD 1.3
Follow CF conventions for variable names
Conduct quality control per scientific practice. Refer to recommended best practices
Use a compliance checker and follow its feedback. The CF compliance checker is available here

CSV Guidelines¶

First row: variable name
Second row: units for each variable
First column: time format
Use ISO 8601 format (e.g., 2018-01-01T00:00:13Z)
Provide time in UTC
QC format (if providing with data)
- use variable name + “_qc” (e.g., air_temperature and air_temperature_qc) for header value
- use QARTOD values (2=not eval, 1=pass, etc)
- more about QARTOD here
- please reach out for more information about how to format QC flags if you have questions or if you would like Axiom to apply QC to your data

Table 1. Example of tabular format for CSV data¶
datetime	temperature	salinity
UTC	c	ppm
5/1/24 12:00	29.01	34.33
5/2/24 12:00	27.33	35.01

Table 2. Example of CSV data with units include in column headers¶
datetime	temperature_c	salinity_ppm
5/1/24 12:00	29.01	34.33
5/2/24 12:00	27.33	35.01

If the CSV headers and variables are not already set to CF Conventions, provide a seperate document with a mapping of the columns to them. For example: Variable _temp_ may refer to _sea_surface_temperature_.

Note

TIMESTAMPS Unfortunately, some instruments are local time only. If the date and timestamp data cannot be provide according the ISO 8601 format, please explain the context and provide the following details in the dataset’s README file: format details, such as YYYMM; what is the timezone used; whether daylight savings time is observed; and if a calendar is used, which one.

Providing Access to the Data¶

For both real-time and manually updated data, we will want to know the following:

Where and how is this hosted? Please provide links and any instructions for accessing the data.
How often is the dataset updated?

For continuous, real-time data submission, you should set up a data server, if you don’t have one already, so that we can pull the data from you on a regular basis. Please use one of the following server options, in order of preference:

ERDDAP
REST API
THREDDS
Public files on HTTP web server (apache, nginx, etc)
FTP server

Note

ERDDAP is preferred because it guarantees a consistent dataset structure, allows us to pull data in a format of our choosing, and we can re-use data ingestion scripts across multiple data sources. A REST API also guarantees structured data, but we have to write new scripts for each API we interact with.

If none of the above apply and this is not a real-time data submission, we can get you set up with data transfer through Research Workspace. If you do not have an account with us, please email info@researchworkspace.com for access and specify the data portal, project, and/or organization that your inquiry relates to.

Points of Contact¶

Please consider providing multiple points of contact in the metadata. Preferably both a science team member and an information technology (IT) team member. These people will be crucial should the Axiom team need to interpret data or troubleshoot getting access to the data.

Metadata¶

Include as much descriptive information about datasets, sensors, platforms, models, analysis methods, and quality-control procedures as possible. Metadata is essential for the long-term usability and reuse of information.

In general, metadata should follow one of the following standards:

ERDDAP
ISO 19115 for dataset and collection-level metadata
ISO 19115-2 XML Metadata: Metadata: Part 2: Extensions for Imagery and Gridded Data
IOOS Metadata Profile for NetCDF
NetCDF-CF 1.6:Climate and Forecast conventions for NetCDF
ACDD 1.3: Attribute Conventions for Data Discovery

Data providers may request access to the Research Workspace to create ISO 19115 compliant metadata using the metadata editor and its associated help documentation.

Station List¶

For each station, please provide as a separate CSV file:

Station name
Location (lat, lng, depth/elevation)
Platform type (fixed, buoy, etc)
Expected data date range (so that we can double-check we have everything) as start date and end date
Instrument and data affiliations/attributions: * at minimum, provide the primary institution affiliated with this data * if possible, provide any other affiliations, such as the operator or funder
Links to web pages or documents with more background information, if available
If there have been other deployments, please let us know so we can keep notes for our reference, even if you are not providing data for those deployments

Some of this information may be included in the README as well.

README.txt¶

READMES are valuable narrative information for human-readable details about the data set. Please include the following details:

The file list
Data accreditation
Data licensing detailed

QC Information¶

Please include the following information in as much detail as possible:

Have you performed QC checks on this data? If so, are the results of these tests available in the dataset? Do you have any links to documentation about your qc process?
How are the QC variables defined in your datasets?
Are you able to provide the results of individual QC tests, an aggregated QC flag, or both?

If QC has not been applied to your data, would you like us to apply QARTOD? See IOOS QARTOD manuals and ioos_qc for more details.