Metadata and Documentation¶
Descriptive metadata and documentation are critical to maintaining data quality. Metadata is “data about the data” that describes and contextualizes the dataset to ensure it is understandable to future users. Beyond standardized metadata, useful documentation might include standard operations procedures, field notes, etc., from which metadata may be derived or referenced.
Throughout the data lifecycle, both the metadata and documentation must be recorded and updated to reflect the actions taken to the data. This includes collection, acquisition, processing, quality review, and analysis, as well as any other stage of the data lifecycle.
Metadata¶
Metadata describes information about a dataset to ensure that it can be understood and re-used properly in the future. Content of the metadata record includes where the data were collected, who is responsible for the data, why the data were created, and how the data are organized. Metadata generally follows a standard format to ensure the semantics of metadata fields are understood by creators and consumers of the metadata, to ease the use of the metadata in catalog and discovery systems, and simplify automatic machine-to-machine transfer of records.
The Research Workspace includes an integrated metadata editor to generate metadata in the FGDC-endorsed ISO 19110 and 19115-2 standards for geospatial metadata. Refer to the Metadata Best Practices section for help creating scientific metadata using the Research Workspace metadata editor. The BPSM document provides field-by-field guidance on how to write high-quality metadata.
Additional Data Documentation¶
Save documentation about the data in non-proprietary file formats, such as .txt, .xml, or .pdf.
Images, pictures, or figures should be saved as JPEG or GIF files.
The name of the documentation should follow a logical naming convention identical to the related data file(s), but indicating that the file is a metadata record, e.g. [data_file_name]_METADATA.xml.
For complicated datasets, supplemental documentation is more useful when structured as a user’s guide for the data. When constructing such a guide, include enough detail for someone with sufficient domain knowledge to understand, trust, and reuse your data 20+ years in the future.
Data Preservation¶
Preservation of data involves the publication or archiving of data in organization or domain-focused repository to ensure its long-term viability, accessibility, and usability by the broader scientific community. Planning to archive your data should be part of the planning process for your research project, and not left until the very end.
Identifying in the early stages of your project the data archive you’ll use and any additional requirements the archive imposes on data submissions will save you valuable time as your project finishes. Compiling data files and metadata for archive should be straightforward if you followed these best practice guidelines throughout the course of your project.
The Research Workspace includes an automated submission pathway for final datasets to the DataONE network. This pathway helps you assemble and validate the materials being sent to the archive.
For submission purposes, a complete dataset shall include:
Data files: data should be saved in preservation-ready formats with descriptive file names in a well-organized file structure.
Metadata: robust and standards-compliant metadata should accompany the data. Metadata may be provided at the file, folder, or campaign level depending on the specific structure of your archival package.
Supplemental documentation and files: provide any additional documentation or files required to assist future users understand and properly use your data.
If you plan to submit your data to a repository other than those with built-in submission capabilities from the Research Workspace, it is important to understand the additional data management steps required by the data archive center.