Instituting an observation database capability in the NCEP GSI Tom Hamill, Jeff Whitaker, Scott Gregory NOAA / ESRL Physical Sciences Division Presentation to DAOS, Exeter England, April 2016
Our intent: an observation database (ODB) capability for the GSI. In these file(s) would be: Observations. Metadata (time stamp, QC flags, lat, lon, observation error assigned). O minus F (including spread if ensemble DA used). O minus A. Bias correction information. Usage flags. Such files would be easily share able and in user friendly formats for external use.
Current NCEP observations data flow (Dennis Keyser, from 2013 DTC GSI tutorial) Bottom line: it s complicated.
Vision for how this might work in JEDAI (GSI code refactor) GTS (and other sources) Observation Processing read obs (bufr, HDF, netcdf, ascii) set R basic QC Observation Database Manager Relational Database (like ODB), and/or hierarchical dataset (HDF5, netcdf) (replaces BUFR tanks and dump files) O, R, metadata O minus F, O minus A, QC, Observer read background forecast(s) data thinning/channel selection interpolation forward operator background check QC Compute O F, O A Solver (hybrid EnVar, EnKF)
Our project: global reanalysis, ~2000 current. How to add ODB like capability for this? Several options: Adapt branch of code for NCEP CFS reanalysis (> 5 years old) that provides this to the current GSI. Software old, display infrastructure (GRaDS based) old. Totally refactor the GSI and include ODB (see upcoming slide). Patch existing GSI system code to produce the observation database capability we need for this project (see upcoming slide). being mindful of longer term refactor and making code useful for this.
Patching the GSI to provide ODB capability while we await the GSI refactor. Can be done in the short term for next GEFS reanalysis/reforecast without changing NCEP operational workflow. Enables easier access to observations and assimilation feedback information for monitoring, diagnosis, research. Amounts to pre/post processing input BUFR, output GSI diagnostic files.
GSI data assimilation system BUFR (prepbufr, radiance bufr, satwnd bufr, gpsro bufr ) analyses GSI diagnostic files (containing obs ingested by GSI, departures, QC, bias correction info) (the current system)
GSI data assimilation system python utilities: (1) bufr2nc (outputs BUFR as netcdf file) (2) merge_gsidiag_bufr2nc (matches records in GSI diagnostic file with original BUFR, adds depatures, ens. spread in ob space, QC, other DA info) (3) new_obs_2nc (various codes tailored to particular research data sets) nc2prepbufr (convert back to GSI compatible BUFR) with proposed modifications BUFR (prepbufr, radiance bufr, satwnd bufr, gpsro bufr ) analyses GSI diagnostic files (containing obs ingested by GSI, departures, QC, bias correction info) Observation database Other observation sources (field experiments, research instruments, etc.) OpenDAP data server, access to research community
Conclusions We re putting together an interim ODB capability into the GSI while we await a more complete JEDAI code refactor. We re at the early stages and are happy to learn from the experiences of others. The current design is necessitated by a reanalysis project where we must deliver quickly, but we d still like to make as much of our software re usable as possible.
Fit of short term forecasts to observations can be revealing of DA system characteristics. from Dee et al. 2011, http://onlinelibrary.wiley.com/doi/10.1002/qj.828/pdf
Fit of other quantities (here bias corrections) can be revealing also. from Dee et al. 2011, http://onlinelibrary.wiley.com/doi/10.1002/qj.828/pdf
Fit of observations to longer lead forecasts can also be revealing. With stochastic physics and verification against analysis, ECMWF ensemble appears to be underspread (left most column). Verifying against raobs or AMSU A channel 5, one has the impression that the ensemble is still under spread. Yamaguchi et al. 2016, http://onlinelibrary.wiley.com/doi/10.1002/qj.2675/abstract
ncdump h of a sample netcdf file with bufr data
Sample python code for reading in netcdf ODB type file and generating basic statistics.
Demonstration plot