GATHERXML

NAME
gatherxml - scan a data file and extract metadata

SYNOPSIS
gatherxml -f format -d [ds]nnn.n [-R] [-S] [-l local_file] [-m member_name] FILE

gatherxml -showinfo

gatherxml -f format -d [ds]nnn.n -I FILE|invall

DESCRIPTION
gatherxml scans a data file and extracts the metadata necessary to enable discovery and
access. gatherxml saves the extracted metadata in a system of XML files, and then calls scm
which populates RDADB, builds disk file caches, and regenerates the dataset description.

FILE The file to scan. It must begin with "/FS/DSS/" for an HPSS file. For web
files, specify the full URL (e.g. https://rda.ucar.edu/data/dsnnn.n/...), or
the path relative to the dataset home directory, as for the -WF option of
dsarch.

-f {format} The data format of the file that is being scanned. Use gatherxml -showinfo to
see a list of all supported data formats.

-d [ds]{nnn.n} The dataset number, optionally prepended with "ds".

-I Inventory only. FILE must be a web file. Use this flag when metadata has
already been scanned for the file, but an inventory does not already exist.

In place of providing a specific filename, you can specify "invall" and
gatherxml will determine which files in the dataset do not have inventories
and generate all of them in a single call.

OPTIONS
-l {local_file} Local filename. This option is only valid for HPSS files. Use this option to
specify a local file that exactly matches the HPSS file - this saves a
retrieval of the file from the HPSS, potentially speeding up the scan since
gatherxml will not need to wait for the hsi transfer to complete.

-m {member_name} HTAR member name. This option is only valid for HPSS files. Use this option
to specify the name of the specific HTAR member that you want to scan.

-R Don't regenerate the dataset web description.

-S Don't summarize the metadata (i.e. - don't rebuild disk file caches) for the
dataset.

EXAMPLES
gatherxml -f grib2 -d 093.0
https://rda.ucar.edu/data/ds093.0/1979/pgbhnl.gdas.19790101-19790105.tar

- OR -
gatherxml -f grib2 -d 093.0 1979/pgbhnl.gdas.19790101-19790105.tar

Scan the file "pgbhnl.gdas.19790101-19790105.tar", which is in the directory
/glade/p/rda/data/ds093.0/1979 and that will be accessed by RDA users with the URL
"https://rda.ucar.edu/data/ds093.0/1979/pgbhnl.gdas.19790101-19790105.tar".

gatherxml -f radbufr -d 099.0 -m 1979010100/1bhrs2.gdas.1979010100
/FS/DSS/DS099.0/1979/cfsinput.19790101.htar


Scan the member file "1979010100/1bhrs2.gdas.1979010100", which is a member of the HTAR file
"/FS/DSS/DS099.0/1979/cfsinput.19790101.htar".

gatherxml -f grib2 -d 093.0 -R -S
-l /glade/p/rda/work/dattore/cfsr/1979/diabf01.gdas.19790101-19790105.tar
/FS/DSS/DS093.0/1979/diabf01.gdas.19790101-19790105.tar


Scan the HPSS file "/FS/DSS/DS093.0/1979/diabf01.gdas.19790101-19790105.tar". Instead of
retrieving the file from HPSS, use the copy in /glade/p/rda/work/dattore/cfsr/1979.

Don't regenerate the dataset description or summarize the metadata for the dataset. The -R and
-S options are particularly useful when backfilling file content metadata for a dataset. This
speeds up the individual gatherxml runs by quite a bit, but you will need to manually run scm
at the end of a series of runs that have used -R and -S to summarize the dataset metadata and
regenerate the dataset description. If you don't do this, your dataset information will be
out-of-sync.

SEE ALSO
scm, rcm, dcm, dsgen

AUTHOR
Bob Dattore (dattore@ucar.edu)