RDA Dataset Maintenance

Creating a new dataset

  1. Use the Metadata Manager to add a new dataset.The Metadata Manager is linked from the top-level page of the DECS Internal web, as well as under the "Tools" tab in the DECS Internal web. When you add a new dataset, you will reserve the new dataset number and create a shell metadata profile, which you can come back and complete later. A dataset home page will NOT be created at this point.
  2. Archive the data files using dsarch and generate content metadata (see About Data File Content Metadata below) at the same time. See the dsarch documentation for complete information on dsarch usage. Run "gatherxml -showinfo" at the command line to see a list of all file-content-metadata-generating utilities and the supported formats. If your dataset has an unsupported format and there are many files and/or the dataset will be continually updated, please ask Bob to add support for the format.
  3. Go back to the Metadata Manager and fill in the dataset metadata (see this guide for help with the Metadata Manager). If you have generated content metadata for your data files, you will be able to specify the GCMD science keywords in one step from the content metadata rather than manually, and several other fields (dataset period, data format, etc.) will not even show because they will have already been filled for you from the content metadata.

Maintaining an existing dataset

  • You can update the dataset-level metadata at any time with the Metadata Manager. You can enter from the Metadata Manager home page or you can go to the internal dataset web page and click the "Edit" link in the upper right. Next to the edit link, you will also see a "History" link that will allow you to view the history of dataset metadata changes.
    • Content metadata can be generated, renamed or deleted for a data file at any time. It is most efficient to generate content metadata at the time of archiving, and dsarch will automatically call the appropriate content metadata utility for the action being perfomed on the data file.
      • For manual operation of the utilities, see About Data File Content Metadata below.
      • To see a content metadata summary for a dataset, go the the Metadata Manager home page, and on the right, select the dataset from the "View a Metadata Summary for Dataset" pull-down menu.

Data file lists

There are currently two types of data file lists: static and dynamic, and they are generated for both MSS and Web files.

  • Static list:
    The static list is a specialist-controlled file list generated from the information in the RDADB. It is called a "static" list because it is presented in only one way. A user can't interact with the list and change how the information is presented. You can manage the static list information with dsarch.
  • Dynamic list:
    The dynamic list is a user-controlled filed list generated mainly from data file content metadata, but also from some other information in the RDADB. It is called a "dynamic" list because depending on the selections a user makes, the file list will be presented in different ways. Other than using the -GX option in dsarch or generating data file content metadata manually, the group information for the dataset is the only information you can actively manage for the dynamic file list.

Why are there two types of lists?There are two types of lists because currently not all datasets have content metadata for their data files, and some datasets may never have content metadata. For datasets that do have content metadata, both the static and dynamic lists are presented as options to the user. The reason for this is that when the dynamic list was first developed, it only displayed the data files for which content metadata had been generated. For some datasets, not all data files have content metadata, and so the dynamic list was "incomplete". The dynamic list now includes all of the data files for a dataset, but if a user "customizes" the dynamic list, the resulting customized list will only show data files that have content metadata, since content metadata are the basis on which the lists can be customized.
How are the lists generated?

  • The Static list is generated by calling publish_filelist. Run the utility with no arguments to get usage information.
  • The Dynamic list is generated automatically when you generate content metadata for a data file. So if you have a dataset with 100 files and you only generate content metadata for one of those files, you will still get a dynamic list. This list will contain ALL of the MSS primary (type P) or Web data (type D) files for the dataset, unless the user customizes the list.

About data file content metadata

  • Data file content metadata are used in many ways:
    • to auto-fill some of the dataset-level metadata fields (data period, GCMD science keywords, geospatial coverage, etc.) that otherwise must be manually entered by the specialist
    • to create the dynamic file list
    • to display detailed information about the contents of the data file
    • to create the "Detailed Metadata" view for a dataset (if available, this will appear as a section on the dataset home page)
  • Generating content metadata:
    • gatherxml generates file content metadata, and it is called automatically if you include the -GX option in dsarch (it is most efficient to generate the content metadata at the time of archiving). If you need to run gatherxml manually, run it with no arguments to get usage information.
  • Deleting content metadata:
    • dcm deletes file content metadata. By default, dsarch runs dcm when you delete a file with the -DL option. If you need to run dcm manually, run it with no arguments to get usage information.Note: Because of the metadata summarization that must be done after content metadata are deleted, it is more efficient (e.g. faster) to delete content metadata for multiple data files in one call to dcm than to call dcm once for each data file.
  • Renaming content metadata:
    • rcm renames content metadata when you rename a data file or move it to another dataset. By default, dsarch runs rcm when you move a file with the -MV option. If you need to run rcm manually, run it with no arguments to get usage information.
  • Updating the content metadata database:
    • scm inserts content metadata information into RDADB. It is normally called by one of the above-listed utilities, so you will not usually need to call it yourself. However, there are some times, particularly when you change some dataset information using dsarch, when you will need to call scm manually IF the dataset has one or more files where content metadata have been generated:
      • you move data files into or out of groups using a non-archive dsarch action (an action other than -AM, -AW, -AB, -DL, -MV)
      • you archive a data file without the -GX option (meaning you don't want to or can't generate content metadata for the file)
    • In any of the above cases, call scm as follows:
      • to refresh information about MSS files:
        • scm -d nnn.n -rm gindex | all  
      • to refresh information about web files:
        • scm -d nnn.n -rw gindex | all