This page contains the best data management practices at PO.DAAC. The content covers how PO.DAAC generally manages its data from acceptance, ingest, distribution, and retirement, along with the various documents and templates that are associated to those various stages. It also includes the recommended file formats (NetCDF, HDF) and metadata conventions (CF, ISO 19115, ACDD) for our data providers.
Every dataset that enters the PO.DAAC for archival and/or distribution must go through an acceptance process. The primary driver of this process is the Dataset Gap Analysis and Prioritization (DGAP). The DGAP provides the means for identification (See Figure 1 under Dataset Lifecycle) of individual datasets corresponding to a particular geophysical parameter, which are not yet being archived or distributed by the PO.DAAC. After identification, these datasets become ranked and prioritized according to their corresponding geophysical parameter to determine. The datasets of highest priority and ranking are then given the “green light” for the final phase of acceptance. In the final phase of acceptance, a system impact assessment is performed to assess the associated costs to the PO.DAAC to ingest the new datasets, and when approved by the Systems Engineer a Submission Agreement (a.k.a. Memorandum of Understanding) is signed between the dataset provider and the PO.DAAC manager. Once accepted, the new dataset is now ready to become integrated into the PO.DAAC.
Related Links:
NASA Earth Science Data Preservation Content Specification: https://earthdata.nasa.gov/sites/default/files/field/document/423-SPEC-0...
ESIP Preservation and Stewardship Cluster:
http://wiki.esipfed.org/index.php/Preservation_and_Stewardship
ESIP Data Study Cluster:
http://wiki.esipfed.org/index.php/Data_Study_Working_Group
ESIP Discovery Cluster:
http://wiki.esipfed.org/index.php/Discovery_Cluster
ESIP Documentation Cluster:
http://wiki.esipfed.org/index.php/Category:Documentation_Cluster
ESIP Information Quality Cluster:
http://wiki.esipfed.org/index.php/Information_Quality
Earth Science Data System Working Groups (ESDSWG): https://earthdata.nasa.gov/esdswg
PO.DAAC recommends that data providers implement their products using HDF5 or NetCDF4 file formats. These file formats contain versatile data models that supports a self describing, machine independent data format thus promoting interoperability, tool use and sharing of scientific data. Both data models have extensive applications in distributing NASA satellite data as well as used in many international satellite missions. Applications have included Level 2, 3 and 4 satellite data as well as in situ data.
Some of the features and support common to both formats are
- A versatile data model that can represent very complex data objects and a wide variety of metadata.
- A completely portable file format with no limit on the number or size of data objects in the collection.
- A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces
- A rich set of integrated performance features that allow for access time and storage space optimizations
- Community tools and applications for managing, manipulating, viewing, and analyzing the data in the collection
CF and ACDD Metadata Standards Overview for PO.DAAC Portal
In addition to advancing and providing recommendations on scientific file format standards as discussed above, PO.DAAC adopts, advocates for, and provides guidance on the usage of appropriate metadata standards for geospatial data. Metadata provide important descriptive information on data and file elements. Conformity to community developed and approved metadata standards facilitates the consistent and valid semantic interpretation of information and data critical for ensuring both efficient and automated data discovery and interoperability with tools and services across distributed and heterogeneous earth science data systems. While several established and emerging geospatial metadata frameworks exist (eg. FGDC, ISO19115), two metadata standards frameworks in particular are important in the remote sensing context, the “Attribute Conventions for Data Discovery” (ACDD) and the “Climate Forecast” (CF) convention. Implementation of ACDD and CF metadata by data providers is strongly encouraged, and the PO.DAAC works closely with providers to advise on metadata and file format aspects as part of our data archival acceptance policies. Here we provide an overview of these metadata frameworks with useful pointers to additional information. Data providers with further specific metadata questions who are considering data archival submissions should contact the PO.DAAC (email: podaac@podaac.jpl.nasa.gov).
ACDD
The ACDD convention, developed initially by UNIDATA as a complement to its netCDF self-describing scientific file format and now adopted by the ESIP Federation, provides a standards specification for global metadata attributes associated with the “header” portion of data files aimed at facilitating efficient and automated discovery of geospatial datasets. It consists of a series of standard attributes and naming conventions that provide a well-rounded and synoptic characterization of the scope and contents of data files, including for example the spatio-temporal extent of the file data, the type and processing level of the data, keyword descriptors, and details on the dataset provider. Complete documentation on ACDD metadata is available at http://wiki.esipfed.org/index.php/Category:Attribute_Conventions_Dataset_Discovery, and a listing of key ACDD attributes recommended by the PO.DAAC minimally for inclusion in data files is given in the table below.
Table 1. Key ACDD Metadata Attributes defined and illustrated
Attribute Name |
Type |
Description |
Example Implementation |
---|---|---|---|
date_created |
string |
The date and time the data file was created in the form “yyyy-mm-ddThh:mm:ssZ”. This time format is ISO 8601 compliant. |
Date_created = “2012-04-06T16:26:33Z”; |
time_coverage_start |
string |
Representative date and time of the start of the granule in the ISO 8601 compliant format of “yyyymmddThhmmssZ”. |
Time_coverage_start = “2012001013102483” |
time_coverage_end |
string |
Representative date and time of the end of the granule in the ISO 8601 compliant format of “yyyymmddThhmmssZ”. |
Time_coverage_end = “2012002000843304” |
geospatial_lat_max |
float |
Decimal degrees north, range -90 to +90. |
Geospatial_lat_max = 90.0f |
geospatial_lat_min |
float |
Decimal degrees north, range -90 to +90. |
Geospatial_lat_min = -90.0f |
geospatial_lon_max |
float |
Decimal degrees east, range -180 to +180. |
Geospatial_lon_max = 180.0f |
geospatial_lon_min |
float |
Decimal degrees east, range -180 to +180. |
Geospatial_lon_min = -180.0f |
geospatial_lat_resolution |
float |
Latitude Resolution in units matching geospatial_lat_units. |
Geospatial_lat_resolution = 1 |
geospatial_lon_resolution |
float |
Longitude Resolution in units matching geospatial_lon_units. |
Geospatial_lon_resolution = 1 |
geospatial_lat_units |
string |
Units of the latitudinal resolution. Typically “degrees_north” |
geospatial_lat_units = “degrees_north” |
geospatial_lon_units |
string |
Units of the longitudinal resolution. Typically “degrees_east” |
geospatial_lon_units = “degrees_east” |
platform |
string |
Satellite(s) used to create this data file |
platform: “Aquarius/SAC-D” |
sensor |
string |
Sensor(s) used to create this data file. |
Sensor = “Aquarius” |
project |
string |
Project/mission name |
project = “Aquarius” |
product_version |
string |
The product version of this data file, which may be different than the file version used in the file naming convention. |
Product_version = “1.3" |
processing_level |
string |
Product processing Level (eg. L2, L3, L4) |
processing_level = 3 |
keywords |
string |
Comma sperated list of GCMD Science Keywords from http://gcmd.nasa.gov/learn/keyword_list.html |
keywords_vocabulary = "SURFACE SALINITY, SALINITY, AQUARIUS SAC-D" |
CF
The Climate Forecast (CF) metadata convention, spearheaded by the Program for Model Diagnosis & Intercomparison (PCMDI) at Lawrence Livermore National Laboratory, provides a standards specification for both global file header and variable level file metadata attributes aimed at facilitating both discovery and interoperability of datasets used in climate science modelling, including remote sensing data. This includes standards for variable dimensioning, and variables holding both observational measurement data and georeference and time reference data linked to these plus associated auxiliary variables. The CF convention consists of a series of well-defined attributes, auxiliary variables, naming and value assignment conventions that provide a standard characterization of the contents of data files. This facilitates robust semantic interpretation and usage of data. Further information on CF metadata is available at http://cf-pcmdi.llnl.gov/. The current version of the CF standard is 1.6, and complete documentation of this latest specification is available at http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6. The tables below summarize key global metadata attributes and variable level attributes associated with the CF convention recommended for usage with datasets slated for PO.DAAC archival.
Table 2. Key Global CF Attributes
Attribute Name |
Type |
Description |
Example Implementation |
---|---|---|---|
Conventions |
string |
Version of Convention standard implemented by the file, interpreted as a directory name relative to a directory that is a repository of documents describing sets of discipline-specific conventions |
Conventions = "CF-1.6"; |
title |
string |
A succinct description of what is in the dataset. |
title = "Aquarius CAP Level-3 1x1 Deg Gridded 7-Day Bin Averaged Maps"; |
history |
string |
Used to document Provenance. Provides an audit trail for modifications to the original data. We recommend that each line begin with a timestamp indicating the date and time of day that the program was executed. |
history = "L2_1.3CAP2.1.4"; |
institution |
string |
Specifies where the original data was produced. |
institution = "JPL"; |
source |
string |
The method of production of the original data. If it was model-generated, source should name the model and its version, as specifically as could be useful. If it is observational, source should characterize it (e.g., "surface observation" or "radiosonde"). |
source = "CAPV1.3-HDF5"; |
comment |
string |
Miscellaneous information about the data or methods used to produce it. |
comment ="rolling 7 day means at 1 degree spatial resolution"; |
references |
string |
Published or web-based references that describe the data or methods used to produce it. |
references = "Yueh,S.,Tang, W.,Fore,A.,Freedman,A.,Neumann,G.,Chaubell,J.,Hayashi,A (2012).SIMULTANEOUS SALINITY AND WIND RETRIEVAL USING THE CAP ALGORITHM FOR AQUARIUS. http://www.igarss2012.org/Papers/viewpapers.asp?papernum=1596"; |
Table 3. CF Measurement Variable Attributes. Both variable dimensions and the associated attribute list are included as part of the variable declaration in netCDF implementations, and take the general form:
TYPE VariableName (DimensionX, ..); eg. float sss(IdLat, IdLon);
:Attribute1 :long_name =
… …
:AttributeN :comment =
Attribute Name |
Type |
Description |
Example Implementation |
---|---|---|---|
long_name |
string |
custom/long descriptive name of variable |
long_name = "Sea Surface Salinity"; |
standard_name |
string |
standard variable name used to describe a physical quantity (case sensitive, and no whitespace). Lists of standard variables and associated units are available at http://cf-pcmdi.llnl.gov/documents/cf-standard-names/standard-name-table... |
:standard_name = "sea_surface_salinity"; |
units |
string |
standard unit name for the standard variable |
:units = "1e-3"; |
valid_range |
float |
Comma seperated minimum and maximum values of the physical quantity defining the valid measuremnt range. |
:valid_range = 0.0f, 45.0f; |
scale_factor |
float |
Slope of scaling relationship applied to transform measuement data to appropriate geophysical quantity representations |
:scale_factor = 1.0f; |
add_offset |
float |
Intercept of scaling relationship applied to transform measuement data to appropriate geophysical quantity representations |
:add_offset = 0.0f; |
_FillValue |
float |
Assigned value in the data file desiganting a null or missing observation |
:_FillValue = -9999.0f; |
comment |
string |
Optional attribute field allowingprovision of further free-form information about the variable |
:comment = "level-3 analysed sea surface salinity values obtained from the Combined Active Passive -CAP- algorithm. Cell values are means for the temporal interval & 1degree spatial grid"; |
Table 4. Georeferencing Variable Attributes. Both variable dimensions and the associated attribute list are included as part of the variable declaration in netCDF implementations, and take the general form:
TYPE VariableName (DimensionX); eg. float Lat(IdLat);
:Attribute1 :long_name =
… …
:AttributeN :units =
One can see that several attributes are shared across Measurement and Georeferencing variables, but that there are some specialized attributes specific to the latter variable type.
Attribute Name |
Type |
Description |
Example Implementation |
long_name |
string |
custom/long descriptive name of variable |
long_name = "longitude"; |
standard_name |
string |
standard variable name used to describe the georefencing variable (eg. latitude, longitude, height) |
standard_name = "longitude"; |
axis |
string |
Corresponding variable axis for plotting (eg. X, Y, Z) |
axis = "X"; |
units |
string |
standard unit name for the standard georeferencing variable (eg. "degrees_north", "degrees_east", "m" |
units = "degrees_east"; |
Table 5. Temporal Variable Attributes. Both variable dimensions and the associated attribute list are included as part of the variable declaration in netCDF implementations, and take the general form:
TYPE VariableName (DimensionT); eg. float MeasurementTimes(t);
:Attribute1 :long_name =
… …
:AttributeN :_FillValue =
Here too several attributes are shared with Measurement and Georeferencing variables, but with some specialized usage specification of the Units attribute as shown below.
Attribute Name |
Type |
Description |
Example Implementation |
long_name |
string |
custom/long descriptive name of variable |
:long_name = "time of measurement" ; |
standard_name |
string |
standard variable name used to describe the temporal variable (ie. time) |
:standard_name = “time”; |
units |
string |
standard unit descriptor (eg. days, hours, seconds etc) cited against a standard reference date ("since".. date/time in ISO format) |
:units = "days since 1970-01-01 00:00:00" ; |
_FillValue |
float |
Assigned value in the data file desiganting a null or missing observation |
:_FillValue = -9999.0f; |
In addition to conventions for Flag related attributes, the CF1.6 convention now also provides specifications for auxiliary variables that further qualify attributes of the core variable types listed above. Examples include:
- the grid_mapping attribute and variables that describe the mapping projection between coordinate variables and the true latitude and longitude coordinates.
- Grid cell-related attributes and variables such as bounds and cell_methods that qualify the gridding regime used and aggregate operations conducted to yield grid data values are now available
- Standard attribute structures for the representation of complex sampling geometries, from point data to time series to profiles to trajectories.
Full descriptions of these capabilities and further information on the core attribute types summarized above are provided in the CF documentation available at http://cf-pcmdi.llnl.gov/.
ISO 19115
ISO 19115 is a type of metadata that facilitates machine readability. At PO.DAAC we have consolidated web services (http://podaac.jpl.nasa.gov/ws) that will convert metadata to make it ISO 19115 compliant.