RDA Dataset Submission Appraisal and Selection Process
Upon submission:
The RDA DECS manager will be notified as soon as a Dataset Submission Form has been submitted. If the RDA DECS manager has any questions regarding the dataset, the manager or a designated DECS representative will contact the dataset submitter via email and using the email address that is associated with the dataset submitter's registered RDA account. The dataset submitter should respond or communicate all additional information/feedback to the RDA via email. The emails exchanged might be added to the dataset information in order to build and record the provenance for the dataset history.
Upon acceptance:
After the dataset has been accepted for ingest with the RDA, a RDA DECS dataset specialist (DS) will contact the dataset submitter and dataset contact provided on the Dataset Submission Form to complete the following tasks:
- Submission of the actual data files:
- The method for transferring the data files, including any supplementary files that are relevant to the data files, could vary slightly depending on the file structure, format, and/or size of the data. The responsible DECS dataset specialist will work with the data submitter to determine the best method for data transfer. The data transfer workflow typically proceeds according to the following steps:
- The data submitter is asked to host the data files to be transferred on a remote server and provide the DECS dataset specialist with a manifest that includes the complete list of files to be transferred and the MD5 checksum for each file. As there can be certain nuances depending on the type of systems and transfer protocol used, the data submitter will work with the DECS dataset specialist to determine the best structure for the manifest file, and the appropriate method to compute the MD5 checksums for that specific use case.
- The DECS specialist will use either FTP, HTTP(s), or GridFTP as the mechanism to transfer the data files from the submitter's server to the DECS server according to the manifest details.
- Once a data file has been transferred to the DECS server, the MD5 checksum is computed and validated against the submitter provided MD5 values to verify data integrity.
- The manifest list is used to verify that the complete file set has been transferred from the submitter's server to the DECS server.
- File upload to the RDA is not an option for data submitters.
- Dataset submitter should maintain a copy of the data files on their local server until the dataset creation process is complete to avoid the chance of data loss.
- RDA DECS team might ask for clarification of the file naming conventions/structures.
- The method for transferring the data files, including any supplementary files that are relevant to the data files, could vary slightly depending on the file structure, format, and/or size of the data. The responsible DECS dataset specialist will work with the data submitter to determine the best method for data transfer. The data transfer workflow typically proceeds according to the following steps:
- Collaboration, verification, and confirmation of metadata record, including the information that will be used for the dataset's landing page under the RDA website:
- The Dataset Submission Form will be used as the basis for creating the metadata record within RDA. However, the RDA DECS dataset specialist (DS) might also contact the dataset submitter/dataset contact to confirm additional information.
- If an existing metadata record or any additional descriptive documents have been created previously for the dataset, please inform the DS as this information my assist with the dataset metadata creation process.
- To populate the dataset metadata, the DS must enter a minimum set of required metadata fields as highlighted in the “Metadata fields” section of the metadata manager tool. The minimum set of required metadata fields has been selected to be able to reflect and map to metadata schemas that are commonly recognized and supported by the RDA's scientific community. By doing so, the RDA is well positioned to support long-term preservation.
- Dataset collection level metadata is maintained in a native RDA schema based on ISO representations (e.g. ISO 8601) and leverages Global Change Master Directory (GCMD) controlled vocabulary keywords.
- Tools are provided to map the native RDA metadata into community standards based schemas according to the relevant standard specifications, including: DataCite; GCMD Directory Interchange Format (DIF); Dublin Core; Federal Geographic Data Committee (FGDC); International Organization for Standardization (ISO) 19139 and ISO 19115-3; and JSON-LD Structured Data.
- Please find an example of the available standard metadata schemas provided by the RDA by reviewing the “Metadata Record” menu found at the bottom of an example dataset homepage.
- Additionally, all of the listed metadata schemas plus the THREDDS schema, can be accessed through the RDA Open Archive Initiatives Protocol for Metadata Harvesting (OAI-PMH) web service.
- Dataset collection level metadata is maintained in a native RDA schema based on ISO representations (e.g. ISO 8601) and leverages Global Change Master Directory (GCMD) controlled vocabulary keywords.
- Dataset collection level content metadata, derived from “file level” metadata harvested during data file archival (See “About Data File Content Metadata”), are populated into the dataset collection metadata once files have been archived into the dataset collection. Content metadata is automatically updated as additional files are archived in a dataset collection over time. For an example of a summary metadata product derived from “file level” metadata, please see the “detailed metadata” summary found on an example dataset homepage.
- Any changes to RDA dataset metadata are tracked and preserved for provenance purposes as described in RDA Dataset Change Management Strategies.
- Collaboration, verification, and confirmation of agreed upon data file format adherence, completeness and understandability:
- The RDA DECS dataset specialist (DS) will scan all dataset files with the RDA's gatherxml tool to assess adherence to the agreed upon format specification and file completeness. In addition to validating adherence to data format and convention, the DS will run random checks on the data files by plotting sample fields to make sure the data values are physically reasonable.
- If issues are discovered with data file format adherence, completeness of data files, or the data values themselves, the DS will iteratively work with the data submitter to fix the data issues before the full dataset will be archived.
- Collaboration on data curation level and any related data transformations or restructuring :
- The RDA DECS dataset specialist will work with the data submitter to agree upon the appropriate curation level to be used as described in Description of RDA Dataset Collection Curation Levels. If any data restructuring or transformation is needed, it will be agreed upon during this step.
- If applicable, data transformation workflow steps will be documented by the DECS dataset specialist, and provided under the documentation tab of the dataset collection. Additional details can be found in RDA Dataset Change Management Strategies.
- Creation of dataset citation:
- RDA supports transparent data sharing and recognition of data contribution through dataset citation. As such, citation is created for each of the RDA's dataset and can be reconfigured to meet the following formats:
- American Geophysical Union (AGU)
- American Meteorological Society (AMS)
- DataCite
- Copernicus Publications
- Federation of Earth Science Information Partners (ESIP)
- Geoscience Data Journal
- A RDA DECS dataset specialist will work with the dataset submitter/dataset contact to confirm information required to construct the appropriate citation (e.g. authors, title, and affiliated institutions).
- Please note that once the data files are ingested and the metadata record/dataset landing page have been completed, the RDA DECS dataset specialist will confirm the dataset information again with the dataset submitter/dataset contact before registering the dataset to acquire the official digital object identifier (DOI). This DOI will be included as part of the dataset citation.
- Please note that once the data files are ingested and the metadata record/dataset landing page have been completed, the RDA DECS dataset specialist will confirm the dataset information again with the dataset submitter/dataset contact before registering the dataset to acquire the official digital object identifier (DOI). This DOI will be included as part of the dataset citation.
- RDA supports transparent data sharing and recognition of data contribution through dataset citation. As such, citation is created for each of the RDA's dataset and can be reconfigured to meet the following formats:
- Confirmation of rights/terms, conditions for use/collaboration, and ownership:
- Before registering the dataset to acquire the official DOI, the RDA DECS dataset specialist will also verify with the dataset submitter/dataset contact regarding the rights/terms, conditions for use/collaboration, and ownership information that was submitted via the Dataset Submission Form.
- Any modifications should be discussed and confirmed at this time.
- Before registering the dataset to acquire the official DOI, the RDA DECS dataset specialist will also verify with the dataset submitter/dataset contact regarding the rights/terms, conditions for use/collaboration, and ownership information that was submitted via the Dataset Submission Form.
- Release of public announcement of dataset:
- Once the dataset has been ingested completely with the RDA and the dataset's metadata record and the landing page have been finalized, the dataset will be announced publicly via RDA's blog site.
- The dataset submitter/dataset contact is encouraged to collaborate with the RDA DECS dataset specialist to create the dataset's public announcement.
Upon rejection:
If the dataset has been rejected for ingest with RDA, the RDA DECS team will ensure that submitted dataset information remains available for access by the Dataset Submission Form provider. Additionally, whenever possible, the RDA DECS team will assist in providing recommendations regarding alternative archive/repositories for depositing the dataset.
Additional details can be found in the Research Data Archive Dataset Ingest to Dissemination Workflow Overview.
Frequently Asked Questions:
- If my dataset has been rejected for ingest with the RDA and I do not deposit the dataset with another archive/repository, will the RDA consider my dataset at a later time?
- Depending on the original reason for not accepting the dataset, it is possible for the RDA to re-evaluate the dataset and update its previous decision. However, the RDA DECS team cannot guarantee re-evaluation within a specific time frame at this time.