GATHERXML

NAME
gatherxml - scan a data file and extract metadata

SYNOPSIS
gatherxml -f format -d dnnnnnn [-R] [-S] [-l local_file] [-m member_name] FILE

gatherxml -showinfo

gatherxml -f format -d dnnnnnn -I FILE|invall

DESCRIPTION
gatherxml scans a data file and extracts the metadata necessary to enable discovery and
access. gatherxml saves the extracted metadata in a system of XML files, and then calls scm
which populates RDADB, builds disk file caches, and regenerates the dataset description.

FILE The file to scan. For web
files, specify the full URL (e.g. https://rda.ucar.edu/data/dnnnnnn/...), or
the path relative to the dataset home directory, as for the -WF option of
dsarch.

-f {format} The data format of the file that is being scanned. Use gatherxml -showinfo to
see a list of all supported data formats.

-d d{nnnnnn} The dataset number.

-I Inventory only. FILE must be a web file. Use this flag when metadata has
already been scanned for the file, but an inventory does not already exist.

In place of providing a specific filename, you can specify "invall" and
gatherxml will determine which files in the dataset do not have inventories
and generate all of them in a single call.

OPTIONS
-l {local_file} Local filename. This option is only valid for HPSS files. Use this option to
specify a local file that exactly matches the HPSS file - this saves a
retrieval of the file from the HPSS, potentially speeding up the scan since
gatherxml will not need to wait for the hsi transfer to complete.

-m {member_name} HTAR member name. This option is only valid for HPSS files. Use this option
to specify the name of the specific HTAR member that you want to scan.

-R Don't regenerate the dataset web description.

-S Don't summarize the metadata (i.e. - don't rebuild disk file caches) for the
dataset.

EXAMPLES
gatherxml -f grib2 -d d093000
https://rda.ucar.edu/data/d093000/1979/pgbhnl.gdas.19790101-19790105.tar

- OR -
gatherxml -f grib2 -d d093000 1979/pgbhnl.gdas.19790101-19790105.tar

Scan the file "pgbhnl.gdas.19790101-19790105.tar", which is in the directory
/glade/p/rda/data/d093000/1979 and that will be accessed by RDA users with the URL
"https://rda.ucar.edu/data/d093000/1979/pgbhnl.gdas.19790101-19790105.tar".

gatherxml -f radbufr -d d099000 -m 1979010100/1bhrs2.gdas.1979010100
/FS/DSS/DS099.0/1979/cfsinput.19790101.htar


Scan the member file "1979010100/1bhrs2.gdas.1979010100", which is a member of the HTAR file
"/FS/DSS/DS099.0/1979/cfsinput.19790101.htar".

gatherxml -f grib2 -d d093000 -R -S
-l /glade/p/rda/work/dattore/cfsr/1979/diabf01.gdas.19790101-19790105.tar
/FS/DSS/DS093.0/1979/diabf01.gdas.19790101-19790105.tar


Scan the HPSS file "/FS/DSS/DS093.0/1979/diabf01.gdas.19790101-19790105.tar". Instead of
retrieving the file from HPSS, use the copy in /glade/p/rda/work/dattore/cfsr/1979.

Don't regenerate the dataset description or summarize the metadata for the dataset. The -R and
-S options are particularly useful when backfilling file content metadata for a dataset. This
speeds up the individual gatherxml runs by quite a bit, but you will need to manually run scm
at the end of a series of runs that have used -R and -S to summarize the dataset metadata and
regenerate the dataset description. If you don't do this, your dataset information will be
out-of-sync.

SEE ALSO
scm, rcm, dcm, dsgen

AUTHOR
Bob Dattore (dattore@ucar.edu)