NCEP Model Performance

March 7, 2016
Posted by: RDA Team

Note: This page was originally sourced from our Blogger page:

Users of gridded analysis or forecast data sets may wonder how well do the analyses reflect the measurements that were assimilated into them.   Furthermore, how do the forecasts compare to the reality?

Welcome to the world of verification statistics.

If you took all the radiosondes that were ingested into an analysis like GDAS/FNL, then you could compute the mean difference (bias) and the RMSE between the measurements and the analysis for the exact same time.  The overall goal is to minimize the biases globally (but allow small biases at individual stations.)

This NCEP EMC site allows you to view some useful statistics for each analysis cycle (00Z, 06Z, 12Z, 18Z).  For instance, if you compared the analyses and forecasts from the 18Z analysis/forecast cycle against all upper air measurements, then you see a slight warm bias in the troposphere and a slight cool bias in the stratosphere at Forecast Hour 0 (analysis time).
18Z analysis cycle GFS temperature bias for forecast hours 0-168, compared to conventional upper air soundings. Operational GFS on the left and experimental GFS on the right.
Notice that the fit is not perfect. The operational GFS model is shown on the left; an experimental version (GFSX) is shown on the right. GFSX appears to be a slight improvement.

Let's look at the Root Mean Squared Error (RMSE).  Are you amazed that we can forecast the global temperature to within 2.5 degrees 5 days ahead?  Or are you young enough to take that for granted?
18Z analysis cycle GFS temperature RMSE for forecast hours 0-168, compared to conventional upper air soundings. RMSE of GFSX is smaller than GFS (right).
Again, GFSX appears to be an improvement against the current operational GFS model.  After monitoring both, NCEP EMC scientists may decide to implement GFSX as the new operational model, GFS. 

Tweaks like this are common as I explained in Analysis, forecast, reanalysis--what's the difference?  If consistent processing is important for your work, always use a reanalysis.

Here's a vertical cross-section of the same verification data, at forecast hour 48.  The web site does not offer a 0 hour graph, but the first plot shows that the 48 hour forecast errors are slightly larger than the analysis.
Bias between upper air stations and the 48-hour GFS forecast.
The NCEP EMC Mesoscale Verification site offers further insight into GFS vs GDAS/FNL.  If you read What's the difference between GFS and FNL?, you may recall that the GDAS/FNL analysis takes place several hours later than GFS, so that it can incorporate more observations. By the time GDAS/FNL is ready, the 12-hour GFS forecast representing the same time should be ready.

The 500 mb height, aka the half-height of the atmosphere, gives you an indication of temperatures and major atmospheric features such as highs and lows.  GDAS/FNL shows slightly sharper features than GFS, but notice how well they agree with each other overall.

I hope, in studying these statistics, you agree with me that NWP is a major triumph of human ingenuity and cooperation.