The data starts here 2

Feb. 24, 2016
Posted by: RDA Team

Note: This page was originally sourced from our Blogger page: http://ncarrda.blogspot.com/2016/01/the-data-starts-here-2.html

Professor Catherine D'Ignazio asked What would feminist data visualization look like? I've never thought about data visualization and feminism together before, but her essay over at the MIT Center for Civic Media is well worth a read.  So are all the essays from the recent Responsible Data Forum's event about Data Visualization.

Her concept of feminist data visualization is just plain sound data visualization (dataviz).
  1. Invent new ways to represent uncertainty, outsides, missing data, and flawed methods
  2. Invent new ways to reference the material economy behind the data.
  3. Make dissent possible
How this applies at the RDA


  1. RDA data set home pages give as much information as possible about the provenance of the data including time and spatial coverage (or lack of it).
  2. We provide links to details about the data.  When possible, we reference the relationships between data.  E.g. the "Related Resources" section of analyses and reanalyses home pages contain links to information about how the data was produced.  Likewise, the "Related RDA Datasets" section link to data incorporated into them.

    Whenever I encounter physical artifacts that produce data of the type we serve at the RDA, I always take a picture.  (See The data starts here.)

    In this video, I launch a weather balloon in California's Central Valley (south of Fresno), timed to match a satellite overpass, while my husband, a spectroscopist, flew in a Twin Otter airplane with his spectrometer (for detecting concentrations of trace gases) along the satellite path.



    Vertical profiles of temperature (T) and humidity (Q) are needed (along with detailed trace gas spectral emissions data) as inputs to radiative transfer models in order to generate the trace gas "retrievals" for the spectrometers on both the plane and the satellite. This is called a data dependency.

    Ordinarily, satellites get the first guess T and Q from an analysis. For satellite calibration, we launched a weather balloon to provide a "ground truth".  We can then perform the retrieval with the balloon data and the analysis data, to get a better estimate of the error contributed by using an analysis.

    Collecting field data is difficult and expensive. We were a popular pair of field scientists because we provide two for the price of one hotel room. ;-)

    I launched an earlier balloon that morning, timed to coincide with an IASI satellite overpass. IASI data is one of the new sources added in the recent update to ds735.0.

    Most weather balloons are launched at synoptic times of 00 and 12 UTC. If you calibrate only to synoptic balloons, you could introduce a bias by tuning your retrieval to the specifics of the region your satellite covers at 00 and 12 UTC. Asynoptic ground truth is data gold.

  3. In her essay, D'Ignazio explained that she means we should develop ways to lower the technological barrier for people who want to see the data worked up in another way.  Dataviz requires making decisions about what to show and what to hide. As scientists, we must be vigilant to root out the hidden bias lurking in the data--or ourselves.

    One good way is to have many people, from diverse backgrounds, examine the data.  The RDA provides data tutorials because we want to open up data to new audiences with fresh perspectives.

    D'Ignazio says that we also need to figure out how to help others who aren't proficient in dataviz programming to suggest other ways of looking at the data to dataviz specialists.