Integrating NCAR’s data infrastructure with the OSDF

Project Website

Project Overview

Data intensive research, including data analytics, machine learning,
and data assimilation continues to drive innovation and discovery
across the geosciences. An obstacle to scientific discovery is that
critical research datasets are distributed and stored across many
disparate locations, making it challenging for researchers to easily
access data outside of their home environment and investigate cross
disciplinary relationships such as those explored at NCAR and NEON.
The Open Science Data Federation (OSDF,
https://osg-htc.org/services/osdf.html) is working to overcome this
challenge by providing a unified view of datasets stored across
autonomous facilities, integrated with the high-throughput
computational resources of the Open Science Pool (OSPool,
https://osg-htc.org/services/open_science_pool.html). We propose to
incorporate NCAR’s curated research data collections with the OSDF by
acquiring, operating, and maintaining OSDF Origin and OSDF Cache nodes
and by providing research, consulting, community engagement and
training services to: 1) broaden community access to NCAR’s model
generated (climate projections and historical reanalysis) and
observing facility produced datasets on NSF national
cyberinfrastructure resources, 2) explore, develop and publish example
workflows that leverage OSDF/OSPool resources to support investigation
of reference research use cases and identify future needs in the
OSDF/OSPool infrastructure and 3) engage and train researchers on how
their research workflows can leverage the capabilities of the OSDF,
including how they can develop and run workflows on OSPool resources
and share their personal datasets to the OSDF for reuse by others.

Example research use cases

-We have developed an example research use case that involves the
bias-correction of daily temperatures and documented it in Github.
This use case demonstrates the ingestion of data both from the NCAR
origin and the AWS OpenData origin:
https://github.com/NCAR/osdf_examples
Datasets from the NCAR original are now accessible:
--All NCAR datasets are accessible from NCAR's OSDF origin under:
https://osdf-director.osg-htc.org/ncar/rda/<dnnnnnn> where <dnnnnnn>
maps into a unique dataset identifier

Future Plans

We plan to develop additional example research use cases moving
forward and will host a hackathon in Summer of 2025 as part of the
pythia cookoff series to generate community develop example use cases
that leverage OSDF resources