Data Management Cheat Sheet¶
Managing data responsibly isn’t easy, even for simple scientific projects. For large projects and campaigns, it quickly begins to feel overhwelming. Good data management requires planning, communication, and will power—but the rewards are well worth the effort. This page is designed to provide a manageable amount of information to get you started with data management, and to serve as a handy reference for things like file formatting and naming conventions.
For more detailed information, please consult our full Data Management Best Practices page.
And if you read through the information below and still find yourself needing help, please email us at metadata@axiomdatascience.com.
Data Organization¶
Make a data management plan before you collect your data, including specifics on how your data will be processed, organized, and archived.
Come up with logical naming conventions for folders, labels, and files, and follow those conventions throughout your project.
Establish a heirarchical structure for your data files and avoid nesting more than three layers of subfolders.
Data and File Formatting¶
Use open, non-proprietary, text-based file formats whenever possible.
Decide on logical file-naming conventions and stick to them.
Follow established conventions (e.g., CF Conventions) for data headers and variables whenever possible.
Decide on conventions for coded and null values in your dataset and stick with them.
For biological data, include the ITIS taxonomic serial number (TSN) and the WoRMS AphiaID.
Common File Format Specs¶
The table below outlines specifications for some common data file formats. For all formats, follow CF Conventions for naming whenever possible. If the CF Conventions don’t cover a name used in your project, refer to the Marine Metadata Interoperability Ontology Registry and Repository.
File Format |
Specifications |
---|---|
|
|
|
|
|
|
Databases |
|
Spatial Media |
|
Sensor Data |
|
Data Quality Management¶
Assign specific quality assurance tasks to specific people involved in your project.
Define parameter names, units, and null value codes before collecting data.
Review all data for missing, anamolous, or invalid values immediately after collection.
Metadata and Documentation¶
Document how your data are collected, processed, and preserved at each stage of your project.
Budget time to prepare your data for long-term preservation once your data are finalized.