Gridpoint Statistical Interpolation (GSI) Version 3.0 User s Guide

Size: px

Start display at page:

Download "Gridpoint Statistical Interpolation (GSI) Version 3.0 User s Guide"

Raymond Miles
6 years ago
Views:

1 Gridpoint Statistical Interpolation (GSI) Version 3.0 User s Guide Developmental Testbed Center National Center for Atmospheric Research National Centers for Environmental Prediction, NOAA Global Systems Division, Earth System Research Laboratory, NOAA April, 2011

2 Foreword This User s Guide describes the Gridpoint Statistical Interpolation (GSI) Version 3.0 data assimilation system, released on April 29, This version of GSI is based on the GSI repository from February As the GSI is further developed, this document will be enhanced and updated to match the released version. The following lists some of new functions and changes included in the release version 3.0 of the GSI: Improve current use of satellite data SST updating and direct assimilation of radiances for SST Update observation error statistic and background error statistics Inclusion of ASCAT data Include GOES wind thinning - improve QC and observation error specification for all satellite winds Update ozone assimilation Assimilate tropical storm with pseudo sea-level pressure observations Features to use hybrid ensemble/var data assimilation Bundle to set state and control vector flexible Inclusion of GSD cloud analysis scheme Inclusion of PM2.5 data analysis with CMAQ background Fix and improve GPS bending angle assimilation Please note due to the version update, some diagnostic files and static information ( fixed ) files have been changed as well. For the latest version of this document, please visit the GSI User s Website at Please send questions and comments to gsi_help@ucar.edu. Contributors to this guide: DTC: Ming Hu, Hui Shao, Donald Stark, Kathryn Newman, Chunhua Zhou, Xiang-Yu Huang NOAA/NCEP/EMC: John Derber, Russ Treadon, Mike Lueken, Wan-Shu Wu NCAR: Syed Rizvi, Zhiquan Liu, Tom Auligne, Arthur Mizzi NOAA/ESRL/GSD: Steve Weygandt, Dezso Devenyi, Joseph Olson Acknowlegements: We thank the U.S. Air Force Weather Agency and the National Oceanic and Atmospheric Administration for their support of this work. ii

3 Table of Contents Table of Contents Chapter 1: Overview... 1 Chapter 2: Software Installation Introduction Obtaining the Source Code System Requirements and External Libraries Compilers NetCDF and MPI LAPACK and BLAS Supplemental Libraries Compiling GSI Environment Variables Configure and Compile Getting Help and Reporting Problems Porting the GSI to a New System...11 Chapter 3: Running GSI Input Files GSI Run Script Steps in the GSI run script Customization of the GSI run script Setting up the machine environment Setting up the running environment Setting up an analysis case Description of the sample script to run GSI Introduction to Most Often Used GSI Namelist Options Files in GSI Run Directory...36 Chapter 4: GSI Diagnostics and Tuning Understanding Standard Output (stdout) Single Observation Test Setup a single observation test: Examples of single observation tests for GSI Control Data Usage Domain Partition for Parallelization and Observation Distribution Observation Innovation Statistics Conventional observations Satellite radiance Convergence Information Analysis Increments Running Time and Memory Usage...68 Chapter 5: GSI Applications Assimilating Conventional Observations with GSI: : Run script : Run GSI and check the run status...72 iii

4 Table of Contents 5.1.3: Check for successful GSI completion : Diagnose GSI analysis results : Check analysis fit to observations : Check the minimization : Check analysis increment Assimilating Radiance Data with GSI : Run script : Run GSI and check run status : Diagnose GSI analysis results : Check file fort : Check analysis increment Assimilating GPS Radio Occultation Data with GSI : Run script : Run GSI and check the run status : Diagnose GSI analysis results : Check file fort : Check analysis increment...89 Summary...90 Chapter 6: GSI Theory and Code Structure GSI Theory DVAR equations used by GSI: Iterations to find the optimal results Analysis variables GSI Code Structure Main process GSI background IO (for 3DVAR) Observation ingestion Observation innovation calculation Inner iteration...98 Chapter 7: Observation and Background Error Statistics Conventional Observation Errors Getting original observation errors Observation error adjustment and gross error check within GSI Background Error Covariance Processing of background error statistics Apply background error covariance Bias Correction for Satellite Radiance Observation Chapter 8: BUFR and PrepBUFR BUFR File Process (encode, decode, and append) Decoding/reading data from a simple BUFR file Encoding/writing data into a simple BUFR file Appending data to a simple BUFR file Examples for GSI PrepBUFR file processing Practice with examples Discussion of PrepBUFR quality markers Further information NCEP Operational BUFR/PrepBUFR Files Data Resources for Community Users Extended BUFR/PrepBUFR support from DTC iv

5 Table of Contents Appendix A: GSI tools A.1 BUFR Tools and SSRC A.2 Read GSI Diagnostic Files A.3 Read and Plot Convergence Information from fort A.4 Plot Single Observation Test Result and Analysis Increment Appendix B: GSI Namelist v

6 Overview Chapter 1: Overview The Gridpoint Statistical Interpolation (GSI) system is an unified variational data assimilation (DA) system for both global and regional applications. It was initially developed by the National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Prediction (NCEP) as a next generation analysis system based on the then operational Spectral Statistical Interpolation (SSI) analysis system. Instead of being constructed in spectral space like the SSI, the GSI is constructed in physical space and is designed to be a flexible, state-of-art system that is efficient on available parallel computing platforms. After initial development, the GSI analysis system became operational as the core of the North American Data Assimilation System (NDAS) in June 2006 and the Global Data Assimilation System (GDAS) in May 2007 at NOAA. Since then, the GSI system has been adopted in various operational systems, including the National Aeronautics and Space Administration (NASA) global atmospheric analysis system, the NCEP Real-Time Mesoscale Analysis (RTMA) system, the Hurricane WRF (HWRF), the Rapid Refresh (RR) system, and the Air Force Weather Agency (AFWA) operational system. The number of groups involved in operational GSI development has also been expanded to include more development groups, including NASA Goddard Global Modeling and Assimilation Office (GMAO), NOAA Earth System Research Laboratory (ESRL) Global Systems Division (GSD), and National Center for Atmospheric Research (NCAR) Earth System Laboratory s Mesoscale and Microscale Meteorology Division (MMM). Starting from 2007, the Developmental Testbed Center (DTC) has been collaborating with major GSI development groups to transit the operational GSI system into a community system, supporting distributed development. The DTC has complemented the development groups in providing GSI documentation, porting GSI to multiple platforms, and testing GSI in an independent and objective environment, while still functionally equivalent to operational centers. Working with NCEP Environmental Modeling Center (EMC), the DTC is maintaining a community GSI repository that is equivalent to the operational developmental repository and facilitates community users to develop GSI. Based on the repository, the DTC releases GSI code annually with intermediate bug fixes. The first community version of the GSI system was released by the DTC in Since then, the DTC has provided community support through the GSI helpdesk (gsi_help@ucar.edu), community GSI webpage ( and annual community GSI tutorial/workshop. In late 2010, a GSI review committee was formed to coordinate distributed development of the GSI system. The committee is composed of primary GSI research and operational centers, including NCEP/EMC, NASA/GMAO, NOAA/ESRL, NCAR/MMM, AFWA and DTC. As one of the committee members, the DTC represents the research community with GSI users from all over the world. 1

7 Overview The GSI Review Committee primarily steers distributed GSI development and community code management and support. The responsibilities of the Review Committee are divided into two major aspects: coordination and code review. The purpose and guiding principles of the Review Committee are as follows: Coordination and Advisory Propose and shepherd new development Coordinate on- going and new development Process management Community support recommendation GSI Code Review Establish and manage a unified GSI coding standard followed by all GSI developers. Establish and manage a process for proposal and commitment of new developments to the GSI repository. Review proposed modifications to the code trunk. Make a decision on whether code change proposals are accepted or denied for inclusion in the repository and manage the repository. Oversee the timely testing and inclusion of code into the repository. The review committee is committed to facilitating transition from research to operations (R2O). Prospective contributors of code should contact the DTC through the GSI helpdesk (gsi_help@ucar.edu). The DTC will help the prospective contributors go through a scientific review from the GSI Review Committee to avoid any potential conflict in code development. Upon the approval of the GSI review committee, the DTC will work with the prospective contributors in the preparation and integration of their code into the GSI repository. As a critical part of the GSI user support, this GSI User s Guide is provided to assist users in applying GSI to data assimilation and analysis studies. It is composed by the DTC and reviewed by the GSI Review Committee members. Please note the DTC currently focuses on testing and evaluation of GSI for limited area Numerical Weather Prediction (NWP) applications, however, the long-term plan includes transitioning to global forecast applications. This documentation describes the veriosn 3.0 release from the spring of 2011, which includes new capabilities and enhancements as well as bug fixes. This version of GSI is based on a reversion of the comminuty GSI repository from February, Currently, the GSI v3.0 is still a three-dimensional variational (3D-Var) system with some modules developed for the upcoming four-dimensional variational (4D-Var) and observation sensitivity capabilities. Combined with an ensemble system, this GSI can also be used in an ensemble-variational hybrid data assimilation system with appropriate configuration. The User s Guide is organized as follows. Chapter 1 provides a background introduction of GSI. Chapter 2 contains basic information about how to get started with GSI including system requirements; required software (and how to obtain it); how to download GSI; and 2

8 Overview information about compilers, libraries, and how to build the code. Chapter 3 focuses on the input files needed to run GSI and how to configure and run GSI. Chapter 4 includes information about diagnostic and tuning of the system. Chapter 5 illustrates how to setup configurations to run GSI with conventional, radiance, and GPSRO data and how to diagnose their results. Chapter 6 introduces the data assimilation and minimization theories behind the GSI system and the GSI code structure. Chapter 7 includes some basics on the background and observation errors. Finally, Chapter 8 introduces BUFR/PrepBUFR file processing for observation ingestion and data sources available to research community. Appendix A introduces the community tools available for GSI users and Appendix B is a complete list of the GSI namelist with explanations. 3

9 Software Installation Chapter 2: Software Installation 2.1 Introduction The DTC community GSI is a community distribution of the operational GSI at NOAA. The operational code was developed on IBM supercomputers at NOAA/NCEP. The community GSI has taken the operational code and integrated it with a general build system that allows GSI to build and run on a variety of platforms. The current version of GSI has been successfully ported to Linux platforms, using Intel or PGI Fortran compilers, IBM super computers, Intel Macintosh computers using the PGI Fortran compiler, and SGI Altix systems. The following instructions are for the DTC community GSI only. They are not for internal use at NCEP. This chapter describes how to build and install the GSI system on your local computer. Section 2.2 introduces how to obtain the source code and a brief overview of the GSI directory. Section 2.3 covers the system requirements (tools, libraries, and environment variable settings), and currently supported platforms. Section 2.4 covers the supplemental libraries included with the distribution. Section 2.5 outlines the general steps for compiling GSI. Section 2.6 discusses what to do if you have problems with the build and where to get help. And section 2.7 briefly illustrates how to port GSI to another platform. 2.2 Obtaining the Source Code The community GSI source code, including the build system and documentation, is available from the DTC community GSI users website by selecting the Downloads and then GSI System tabs on the vertical menu. New users must first register before downloading the source code. Returning users need only provide their registration address. By selecting the link to comgsi V3.0 tarball, users can download the source code for the version 3 release. This is the latest. It is import to always select the newest release of the community GSI. Due to their size, the CRTM coefficients are provided as the separate link CRTM coefficients tarball. The community GSI version 3 comes in a tar file named comgsi_v3.tar.gz. The tar file may be unpacked by using the UNIX commands: gunzip comgsi_v3.tar.gz tar -xvf comgsi_v3.tar This creates the top level GSI directory comgsi_v3. 4

10 Software Installation After downloading the source code, and prior to building, the user should check the links to known issues on the DTC website to determine if any code updates are required. The GSI system includes GSI source code, build system, supplemental libraries, fixed files, and run scripts. The following table lists the system components found inside of the main GSI directory. Directory Name src/main/ src/libs/ fix/ include/ lib/ run/ arch/ util/ Content GSI source code and makefile. Source directory for supplemental libraries. Fixed input files required by a GSI analysis, such as background error covariances, observation error tables; excluding CRTM coefficients. Include file directory, created by the build system. Directory containing compiled supplemental libraries, created by the build. Directory for executable gsi.exe and sample run script. Build options and machine architecture specifics. Tools. The src/ directory contains the source code for GSI and the supplemental libraries. There are two subdirectories within src/. The main/ directory holds the main GSI source code along with its makefiles. The source code directory for the supplemental libraries libs/ holds the source code for the supplemental libraries. The fix/ directory holds the static or fixed input files used by GSI. These files include background error covariances, observation error tables. As of version 3, the GSI tar file no longer hold the CRTM coefficients. Because of the size of the CRTM coefficients, they can now be stored separately. This will be discussed further in the next chapter. The include/ and lib/ directories are used by the build process to store include and library files, respectively. The run/ directory contains the executable and sample run script for conducting an analysis run. The arch/ directory contains the machine specific information used by the build system. The build system will be discussed in more details in section 2.7 when we discuss porting the build. Lastly, the util/ directory contains tools for GSI diagnostics. 2.3 System Requirements and External Libraries The source code for GSI is written in FORTRAN, FORTRAN 90, and C. In addition, the parallel executables require some flavor of MPI for the distributed memory parallelism, and the I/O relies on the NetCDF I/O libraries. Beyond standard shell scripts, the build system relies on use of the Perl scripting language and GNU Make. The basic requirements for building and running the GSI system are the following: FORTRAN 90/95 compiler 5

11 Software Installation C compiler MPI v1.2+ Perl NetCDF V3.6+ ( Version 3 series only) LAPACK and BLAS Because these tools and libraries are typically the purview of system administrators to install and maintain, they are lumped together here as part of the basic system requirements Compilers The DTC community GSI system successfully builds and runs on IBM AIX, Linux, SGI, and Mac Darwin platforms. The following compiler/os combinations are supported: IBM with xlf Fortran compiler Linux (both 32 and 64-bit) with o PGI pgf90 o Intel ifort Mac Darwin with PGI SGI Altrix with Intel ifort At this time, GSI is only tested on current compilers. Unforeseen build issues may occur when using an older compiler version. As always, the best results come from using the most recent version of compilers NetCDF and MPI GSI requires a number pf support libraries not included with the source code. Many of these libraries may be part of the compiler installation, and are subsequently referred to as system libraries. For our needs, the most important of these libraries are NetCDF and MPI. The NetCDF library is an exception to the rule of using the most recent version of libraries. GSI requires NetCDF for I/O operations, but is not compatible with NetCDF V4. The most recent V3 series of the library should be used only. Version 4 of NetCDF diverges significantly from version 3, and is not supported. The preferred version of the library is NetCDF V3.6+. The NetCDF libraries can be downloaded from the Unidata website: Typically, the NetCDF library is installed in a directory that is included in the users path such as /usr/local/lib. When this is not the case, the environment variable NETCDF, can be set to point to the location of the library. For csh/tcsh, the path can be set with the command: 6

12 Software Installation setenv NETCDF /path_to_netcdf_library/ For bash/ksh, the path can be set with the command: export NETCDF=/path_to_netcdf_library/ It is crucial that system libraries, such as NetCDF, be built with the same FORTRAN compiler, compiler version, and compatible flags, as used to compile the remainder of the source code. This is often an issue on systems with multiple FORTRAN compilers, or when the option to build with multiple word sizes (e.g. 32-bit vs. 64-bit addressing) is available. Many default Linux installations include a version of NetCDF. Typically this version is only compatible with code compiled using gcc. To build GSI, a version of the library must be built using your preferred compiler and with both C and FORTRAN bindings. If you have any doubts about your installation, ask your system administrator. It requires a version of the MPI library installed to build and run the GSI executable as a distributed memory parallel code. Just as with the NetCDF library, the MPI library must be built with the same FORTRAN compiler, and use the same word size option flags, as the remainder of the source code. Installing MPI on a system is typically a job for the system administrator and will not be addressed here. If you are running GSI on a computer at a large center, check the machines documentation before you ask the local system administrator. On Linux or Mac Darwin systems, you can typically determine whether MPI is available by running the following UNIX commands: which mpif90 which mpicc which mpirun If any of these tests return with Command Not Found, there may be a problem with your MPI installation or the configuration of user s account. Contact your system administrator for help if you have any questions LAPACK and BLAS The LAPACK and BLAS are open source mathematics libraries for the solution of linear algebra problems. The source code for these libraries is freely available to download from NETLIB at 7

13 Software Installation Most commercial compilers provide their own optimized versions of these routines. These optimized versions of BLAS and LAPACK provide superior performance to the open source versions. On the IBM machines, the AIX compiler is often, but not always, installed with the Engineering and Scientific Subroutine Libraries or ESSL. In part, the ESSL libraries are highly optimized parallel versions of many of the LAPACK and BLAS routines. Because GSI was developed on an IBM platform, by design the ESSL libraries provide all of the needed linear algebra routines. On Linux or Mac Darwin systems, GSI supports either or both the Intel ifort and PGI pgf90 compilers. The Intel compiler has its own optimized version of the BLAS and LAPACK routines called the Math Kernel Library or MKL. The MKL libraries are sufficient to provide the necessary routines. The PGI compiler typically comes with its own version of the BLAS and LAPACK libraries. Again, the PGI version of BLAS and LAPACK contain the necessary routines. For PGI these libraries are loaded automatically. Should your compiler come without any vender supplied linear algebra libraries, it will be necessary to download and build your own local version of the BLAS and LAPACK libraries. 2.4 Supplemental Libraries For the convenience of the user, supplemental libraries for building GSI are included in the src/libs/ directory. These libraries are built automatically when GSI is built. These supplemental libraries are listed in the table below. Directory Name Content Source code for supplemental libraries (under src/libs. except WRF) bacio NCEP BACIO library bufr NCEP BUFR library crtm_gfsgsi_2.0 JCSDA community radiative transfer model gfsio Unformatted Fortran record for GFS IO gsdcloud GSD Cloud analysis library sfcio NCEP GFS surface file i/o module sigio NCEP GFS atmospheric file i/o module sp NCEP spectral - grid transforms w3 NCEP W3 library (date/time manipulation, GRIB) WRF IO API libraries are used by GSI (not with GSI tar file) The one library in this table, which is not included with the source code, is WRF. GSI uses the WRF I/O API s to read NetCDF files created as WRF output. Therefore a copy of successfully compiled WRF is required for the GSI build. WRF can be obtained from the DTC community web site: 8

14 Software Installation Either of the WRF dynamic cores, ARW or NMM, can be used with GSI. Please note that only WRFV3.2 and WRFV3.3 are tested with this version of GSI. 2.5 Compiling GSI The community GSI requires the same build environment as the WRF model, including the NetCDF, MPI, and LAPACK libraries. In addition, GSI makes direct calls to the WRF I/O API libraries included with the WRF model. Therefore the WRF model must be built prior to building the GSI. The DTC has built a utility to install GSI and its libraries automatically on multiple platforms. This section introduces how to use the scripts that run this build system to compile GSI. Users familiar with the WRF build system will notice the similarity between that build system and the one used by GSI Environment Variables Before configuring the GSI code to be built, at least one, and at most three environment variables may need to be set. The first variable defines the path to the root of the WRF build directory. This one is mandatory. The second is the variable LAPACK_PATH, which indicates the path to the LAPACK library on your system. The third is the variable NETCDF, which indicates the path to the NetCDF library on your system. These variables must be set prior to running the configure script, and in many cases may already be defined as part of your login environment. To set the path variable WRF_DIR pointing to the WRF code, type (for csh/tcsh): setenv WRF_DIR /path_to_wrf_root_directory/ and for bash/ksh, export WRF_DIR=/path_to_WRF_root_directory/ The additional two environment variables LAPACK_PATH and NETCDF may be needed on some systems. Type echo $NETCDF. As long as it does not come back as undefined, you may skip setting the NETCDF path. Typically the environment variable LAPACK_PATH needs only to be set on Linux systems without a vender provided version of LAPACK. IBM systems usually have the ESSL library installed and therefore do not need the LAPACK. Likewise, the PGI compiler often comes with a vender provided version of LAPACK that links automatically with the compiler. Problems with the vender supplied LAPACK library are more likely to occur 9

15 Software Installation with the Intel compiler. While the Intel compilers typically have the MKL libraries installed, the ifort compiler does not automatically load the library. It is therefore necessary to set the LAPACK_PATH variable to the location of the MKL libraries when using the Intel compiler. Supposing that the MKL library path is set to the environment variable MKL, then the LAPACK environment may be set for csh/tcsh with setenv LAPACK_PATH $MKL And for bash/ksh export LAPACK_PATH=$MKL Configure and Compile To build GSI, change into the comgsi_v3 directory and issue the configure command../configure Choose one of the configure options listed. For example, on IBM computers the listed options are: 1. AIX 64-bit (dmpar,optimize) 2. AIX 32-bit (dmpar,optimize) You may choose either option depending on your computer platform. On Linux computers, the listed options are: 1. Linux x86_64, PGI compiler (dmpar,optimize) 2. Linux x86_64, Intel compiler (dmpar,optimize) After selecting the proper option, run the compile script:./compile >& build.log To conduct a complete clean which removes ALL built files in ALL directories, as well as the configure.gsi, type:./clean -a A complete clean is strongly recommended if the compilation failed or if the configuration file is changed. 10

16 Software Installation Following the compile command, the GSI executable gsi.exe can be found in the run/ directory. If the executable is not found, check the compilation log file (build.log) to determine what went wrong. 2.6 Getting Help and Reporting Problems Should the user experience any difficulty building GSI on their system, please first confirm that all the required software is properly installed (section 2.3). Next check that the external libraries exist and that their paths are correct. Lastly, check the resource file configure.gsi for errors in any of the paths or settings. Should all these check out, feel free to contact the community GSI supporters at for assistance. Problems building the GSI may be reported to the helpdesk at Please include with your a copy of your configure.gsi file, a complete description of your system (i.e. Linux with PGI), and the contents of your PATH variable by typing echo $PATH and capturing the output. 2.7 Porting the GSI to a New System Porting GSI to build with another compiler or on a new platform requires modifying some of the system defaults. The porting process is iterative. Modify the build parameters, run the compile script, then diagnose the build errors, and repeat. The build system consists of a number of scripts and a single directory. These are listed in the table below. Name makefile arch/ clean configure compile Content Top-level makefile Build options and machine architecture specifics. Script to clean up the directory structure. Script to configure the build environment for compilation. Creates a resource file called configure.gsi Script for building the GSI system. Requires the existence of the configure.gsi prior to running. The build process is initiated by the compile script. This script calls the top-level makefile after setting up the build environment. The compile script uses information contained within the resource file configure.gsi to set paths and environment variables required by the compile. The resource file configure.gsi is generated by running the configure script. The configure script, calls a Perl script that combines fixed build 11

17 Software Installation information stored in the directory arch/ with machine specific and user provided build information to construct the configure.gsi resource file. An additional script is provided to clean up the directory structure. Running the clean script scrubs the directory structure of the object and module files. Running a clean a removes everything generated by the build, including the library files, executables, and the configure resource file. Should the build fail, it is strongly recommended that the user run a./clean -a prior to running the compile script again. The arch/ directory contains a number of files used to construct the configure resource file configure.gsi. preamble: uniform requirements for the code, such as word size, etc. configure.defaults: selection of compilers and options. Users can edit this file if a change to the compilation options or library locations is needed. Also, it can be edited to add a new compilation option if needed. postamble: standard compilation ( make ) rules and dependencies. Most users will have no need to modify any of these files, with one exception, the file configure.defaults. The configure.defaults file contains build information, such as compiler flags and paths to system libraries, specific to a particular machine architecture and compiler. Typically, users will only need to modify their configure.defaults file when porting to a new system, or if their computer has been uniquely set up. Therefore the key file to start with when having porting issues, is the configure.defaults file located in the arch/ directory. To illustrate this process, the recent port of GSI to the Macintosh OSX platform, using the PGI Fortran compiler, will be considered. On opening the configure.defaults file, which is located in the arch/ directory, a collection of platform/compiler specific entries can be seen. The first entry is for the IBM platform, using the xlf compiler with 64-bit word size. This entry is indicated by a label at the top of the block starting with the tag #ARCH. For the 64-bit IBM build, the tag is: #ARCH AIX 64-bit #dmpar The block for the 64-bit IBM build is immediately followed by the 32-bit IBM build entry, which is indicated by the tag: #ARCH AIX 32-bit #dmpar Each subsequent build configuration, is separated by a similar tag. These tags are used by the build script to locate a specific build configuration. For this illustration, we wish to create the build rule for the Macintosh OSX platform using the PGI compiler. Looking through the entries in the configure.defaults file, suppose there are no Macintosh build rules. If there were, we could use one of these as an initial 12

18 Software Installation guess at the correct build rule. Instead there are two rules for Linux, which use the PGI Fortran compiler. We select the 64-bit one, make a copy of it, and modify the tag to say: #ARCH DARWIN (MACOS) PGI Compiler 64-bit #dmpar Since this rule is based on the 64-bit PGI rule, the default compiler names should be correct. To simplify things, remove any architecture dependent and debugging flags from the Fortran flags variable FFLAGS. Also remove any load flags from the variable LDFLAGS. Once the build completes without errors, these flags can be introduced one at a time to see if they are compatible with the new platform. Once a simplified build rule for our target platform has been created, run the configure script and then the compile script. Redirect the output from the compile script to a log file for reference. If the build completes, the port is finished. If, as is more likely, there is a compile or linking error, examine the log file to see what caused the issue. Modify the configure file and try compiling again. 13

19 Running GSI Chapter 3: Running GSI This chapter starts with discussing the input files required to run GSI. It then proceeds with a detailed explanation of a sample run script, and some most used options from a GSI namelist. It concludes with samples of GSI products produced by a successful GSI run. 3.1 Input Files In most cases, three types of input files (background, observation, and fixed files) must be available to run GSI. In some special idealized cases, such as a pseudo single observation test, GSI can be run without any observations. Background or first guess field As with other data analysis systems, the background or first guess fields may come from a model forecast conducted separately or from a previous cycle. The following is a list of the types of background fields supported by this release code: a) WRF NMM input fields in binary format b) WRF NMM input fields in NetCDF format c) WRF ARW input fields in binary format d) WRF ARW input fields in NetCDF format e) GFS input fields in binary format f) NEMS-NMMB input fields g) RTMA input filed (2-dimensional binary format) Please note that WRF is a community model system, including two dynamical cores: the Advanced Research WRF (ARW) and the nonhydrostatic Mesoscale Model (NMM). The GFS (Global Forecast System), NEMS (National Environmental Modeling System)-NMMB (Nonhydrostatic Mesoscale Model B- Grid), and RTMA (Real-Time Mesoscale Analysis) are operational systems of NCEP. Currently, the DTC supports GSI for regional community models, such as WRF NMM and ARW. As such, most of the multiple platform tests were conducted using WRF netcdf background files (b, d). While the formal regression test suite conducts tests of all 7 background file formats (a-g) on an IBM platform, the NetCDF options are the most thoroughly tested. For the current release, the following tests have been thoroughly performed: 1. Background (a)-(g) were tested with regression cases on IBM 2. NMM NetCDF (b) and ARW NetCDF (d) were tested with regression cases on Linux with ifort and pgi, and Mac with pgi. 3. NMM NetCDF (b) and ARW NetCDF (d) were tested with cycled cases Observations GSI can analyze many types of observational data, including conventional data, satellite observations (such as AIRS, IASI, GPSRO (bending angle or refractivity), 14

20 Running GSI SBUV/2 ozone, GOME ozone, GOES sounder, GOES imager, AVHRR), and radar data. For use with GSI, these observations are saved in the BUFR format (with NCEP specified features). A list of the default observation file names used in GSI and the corresponding observations included in the files is provided in the table below: GSI Name Content Example file names prepbufr Conventional observations, including ps, t, q, gdas1.t12z.prepbufr pw, uv, spd, dw, sst, from observation platforms such as METAR, sounding, et al. amsuabufr AMSU-A 1b radiance (brightness temperatures) from satellites NOAA-15, 16, 17,18, 19 and METOP-A gdas1.t12z.1bamua.tm00.bufr_d amsubbufr AMSU-B 1b radiance (brightness gdas1.t12z.1bamub.tm00.bufr_d temperatures) from satellites NOAA15, 16,17 radarbufr Radar radial velocity Level 2.5 data ndas.t12z. radwnd. tm12.bufr_d gpsrobufr GPS radio occultation observation gdas1.t12z.gpsro.tm00.bufr_d ssmirrbufr Precipitation rate observations fromssm/i gdas1.t12z.spssmi.tm00.bufr_d tmirrbufr Precipitation rate observations from TMI gdas1.t12z.sptrmm.tm00.bufr_d sbuvbufr SBUV/2 ozone observations from satellite gdas1.t12z.osbuv8.tm00.bufr_d NOAA16, 17, 18, 19 hirs2bufr HIRS2 1b radiance from satellite NOAA14 gdas1.t12z.1bhrs2.tm00.bufr_d hirs3bufr HIRS3 1b radiance observations from gdas1.t12z.1bhrs3.tm00.bufr_d satellite NOAA16, 17 hirs4bufr HIRS4 1b radiance observation from satellite gdas1.t12z.1bhrs4.tm00.bufr_d NOAA 18, 19 and METOP-A msubufr MSU observation from satgellite NOAA 14 gdas1.t12z.1bmsu.tm00.bufr_d airsbufr AMSU-A and AIRS radiances from satellite gdas1.t12z.airsev.tm00.bufr_d AQUA mhsbufr Microwave Humidity Sounder observation gdas1.t12z.1bmhs.tm00.bufr_d from NOAA 18 and METOP-A ssmitbufr SSMI observation from satellite f13, f14, f15 gdas1.t12z.ssmit.tm00.bufr_d amsrebufr AMSR-E radiance from satellite AQUA gdas1.t12z.amsre.tm00.bufr_d ssmisbufr SSMIS radiances from satellite f16 gdas1.t12z.ssmis.tm00.bufr_d gsnd1bufr GOES sounder radiance (sndrd1, sndrd2, gdas1.t12z.goesfv.tm00.bufr_d sndrd3 sndrd4) from GOES 11, 12, 13. l2rwbufr NEXRAD Level 2 radial velocity ndas.t12z.nexrad. tm12.bufr_d gsndrbufr GOES sounder radiance from GOES11, 12 gdas1.t12z.goesnd.tm00.bufr_d gimgrbufr GOES imager radiance from GOES 11, 12 omibufr Ozone Monitoring Instrument (OMI) gdas1.t12z.omi.tm00.bufr_d observation NASA Aura iasibufr Infrared Atmospheric Sounding Interferometer gdas1.t12z.mtiasi. tm00.bufr_d sounder observations from METOP-A gomebufr The Global Ozone Monitoring Experiment gdas1.t12z.gome.tm00.bufr_d (GOME) ozone observation from METOP-A mlsbufr Aura MLS stratospheric ozone data gdas1.t12z. mlsbufr.tm00.bufr_d tcvitl Synthetic Tropic Cyclone-MSLP observation gdas1.t12z.syndata.tcvitals.tm00 modisbufr MODIS aerosol total column AOD observations from AQUA and TERRA 15

21 Running GSI Remarks: 1. Because the current regional models do not have ozone as a prognostic variable, ozone data are not assimilated at the regional scale. 2. GSI can analyze all of the data types on the list, but each GSI run (for both operation or case study) only uses a subset of the data based on their quality, forecast impacts, and availability. Some data may be out-dated and not available, some are on monitoring mode, and some data may have quality issues during certain period. Users are encouraged to check the data quality issues prior to running an analysis. The following NCEP links provide resources that include data quality history: GSI can be run without any observations to see how the moisture constraint modifies the first guess (background) field. GSI can be run in a pseudo single observation mode, which does not require any BUFR observation files. In this mode, users should specify observation information in the namelist section SINGLEOB_TEST (see Section 4.2 for details). As more data files are used, additional information will be added through the GSI analysis. Fixed file (statistics and control files) The GSI system has a directory of static or fixed files (so called fix/ ), which includes many files required in a GSI analysis. The following table lists fixed file names used in the GSI code, the content of the files, and corresponding example files in the regional test case: File name used in GSI Content Example files in fix/ anavinfo Information file to set control and analysis variables anavinfo_wrf_nambe anavinfo_wrf_globalbe berror_stats background error covariance nam_nmmstat_na.gcv nam_glb_berror.f77.gcv errtable Observation error table nam_errtable.r3dv Observation data control file (more detailed explanation in Section 4.3) convinfo Conventional observation global_convinfo.txt information file satinfo satellite channel information file global_satinfo.txt pcpinfo precipitation rate observation information file global_pcpinfo.txt ozinfo ozone observation information file global_ozinfo.txt 16

22 Running GSI satbias_angle satbias_in Bias correction used by radiance analysis satellite scan angle dependent bias correction file global_satangbias.txt satellite variational bias correction coefficient file ndas.t06z.satbias.tm03 Radiance coefficient used by CRTM EmisCoeff.bin IR surface emissivity coefficients EmisCoeff.bin AerosolCoeff.bin Aerosol coefficients AerosolCoeff.bin CloudCoeff.bin Cloud scattering and emission coefficients CloudCoeff.bin ${satsen}.spccoeff.bin Sensor spectral response ${satsen}.spccoeff.bin characteristics ${satsen}.taucoeff.bin Transmittance coefficients ${satsen}.taucoeff.bin Each operation system, such as GFS, NAM, and RTMA, has their own set of fixed files. Therefore, for each fixed file used in GSI, there may be several optional files in the fixed file directory fix/. For example, for the background error covariance file, both nam_nmmstat_na.gcv (which is from the NAM system) and regional_glb_berror.f77. gcv (which is from the global model forecast) can be used. We also prepared the same background error covariance files with different byte order for Linux users such as nam_nmmstat_na.gcv_little_endian and nam_glb_berror.f77.gcv_little_endian, Also, each operation system, like GFS, NAM, RTMA, has their own set of the fixed files. 3.2 GSI Run Script Steps in the GSI run script The GSI run script creates a run time environment necessary for running the GSI executable. A typical GSI run script includes the following steps: 1. Request computer resources to run GSI. 2. Set environmental variables for the machine architecture. 3. Set experimental variables (such as experiment name, analysis time, background, and observation). 4. Check the definitions of required variables. 5. Generate a run directory for GSI (sometime called working or temporary directory). 6. Copy the GSI executable to the run directory. 7. Copy background file to the run directory. 8. Link observations to the run directory. 9. Link fixed files (statistic, control, and coefficient files) to the run directory. 10. Generate namelist for GSI. 11. Run the GSI executable. 12. Post-process: save analysis results, generate diagnostic files, clean run directory. 17

23 Running GSI Typically, users need to modify specific parts of the run script (steps 1, 2, and 3) to fit their specific computer environment and point to the correct input/output files and directories. The section covers each of these modifications. Section dissects a sample run script, designed to run on a Linux cluster, an IBM supercomputer, and a Linux workstation. Users should start with the provided run script and modify it for their own run environment and case configuration Customization of the GSI run script Setting up the machine environment This section focuses on modifying the machine specific entries in the run script. This corresponds to step 1 of the run script. Specifically, this consists of setting Unix/Linux environment variables and selecting the correct parallel run time environment (batch system with options). Six specific parallel environments will be considered: IBM supercomputer using LSF (Load Sharing Facility) IBM supercomputer using LoadLevel Linux clusters using PBS (Portable Batch System) Linux clusters using LSF Linux workstation (with no batch system) Intel Mac Darwin workstation with PGI complier (with no batch system) Let us start with the batch queuing system. On large computing platforms, machine resources are allocated through a queuing system. A job (along with queuing options) is submitted to a queue, where the job waits until sufficient machine resources become available to execute the job. Once sufficient resources become available, the code executes until it completes or the machine resources have been exhausted. Two queuing systems will be listed below as examples: 1) IBM with LSF #BSUB -P???????? # account number #BSUB -a poe # at NCAR: bluevista #BSUB -x # exlusive use of node (not_shared) #BSUB -n 12 # number of total tasks #BSUB -R "span[ptile=2]" # how many tasks per node (up to 8) #BSUB -J gsi # job name #BSUB -o gsi.out # output filename (%J to add job id) #BSUB -e gsi.err # error filename #BSUB -W 00:02 # wall time #BSUB -q regular # queue set -x # Set environment variables for IBM export MP_SHARED_MEMORY=yes export MEMORY_AFFINITY=MCM 18

24 Running GSI export BIND_TASKS=yes # Set environment variables for threads export SPINLOOPTIME=10000 export YIELDLOOPTIME=40000 export AIXTHREAD_SCOPE=S export MALLOCMULTIHEAP=true export XLSMPOPTS="parthds=1:spins=0:yields=0:stack= " # Set environment variables for user preferences export XLFRTEOPTS="nlwidth=80" export MP_LABELIO=yes 2) Linux Cluster with PBS #$ -S /bin/ksh #$ -N GSI_test #$ -cwd #$ -r y #$ -pe comp 64 #$ -l h_rt=0:20:00 #$ -A?????? The remaining platforms (Linux workstation and Mac) generally do not run a batch system. In these cases, steps 1 and 2 can be skipped. In both of the examples above, environment variables are set specifying system resource management, such as the number of processors, the name/type of queue, maximum wall clock time allocated for the job, options for standard out and standard error, etc. Some platforms need additional definitions to specify Unix environmental variables that further define the run environment. These variable settings can significantly impact the GSI run efficiency and accuracy of GSI results. Please check with your system administrator for the optimal settings for your computer system. Note that while the GSI can be run with any number of processors, using more than (number_of_levels) times (number_of_variables) of processors, will not scale well Setting up the running environment There are only two options to define in this block. The option ARCH selects the machine architecture. It is a function of platform type, compiler, and batch queuing system. The option GSIPROC sets the processor count used in the run. This option also decides if the job is run as a multiple processor job or as a single processor run. We listed several choices of the option ARCH in the sample run script. One of these choices should be applicable to user s system. Please check with your system administrator about running parallel MPI jobs on your system. 19

25 Running GSI Option ARCH Platform Compiler batch queuing system IBM_LSF IBM AIX xlf, xlc, LSF IBM_LoadLevel IBM AIX xlf, xlc, LoadLevel LINUX_Intel Linux workstation Intel mpirun if GSIPROC > 1 LINUX_Intel_LSF Linux cluster Intel LSF LINUX_Intel_PBS Linux cluster Intel PBS LINUX_PGI Linux workstation PGI mpirun if GSIPROC > 1 LINUX_PGI_LSF Linux cluster PGI LSF LINUX_PGI_PBS Linux cluster PGI PBS DARWIN_PGI MAC DARWIN PGI mpirun if GSIPROC > 1 # GSIPROC = processor number used for GSI analysis # GSIPROC=1 ARCH='LINUX_PGI' # Supported configurations: # IBM_LSF,IBM_LoadLevel # LINUX_Intel, LINUX_Intel_LSF, LINUX_Intel_PBS, # LINUX_PGI, LINUX_PGI_LSF, LINUX_PGI_PBS, # DARWIN_PGI Setting up an analysis case This section discusses setting up variables specific to the analysis case, such as analysis time, working directory, background and observation files, location of fixed files and CRTM coefficients, and GSI executable file. A few cautions to be aware of are: The time in the background file must be consistent with the time in the observation file used for the GSI run (there is a namelist option to turn this check off). Even if their contents are identical, PrepBUFR/BUFR files will differ if they created on platforms with different endian byte order specification (Linux intel vs. IBM). If users obtain PrepBUFR/BUFR files from an IBM system, these files must be converted before they can be used on a Linux system. Appendix A.1 discusses the conversion tool ssrc to byte-swap observation files. # ##################################################### # case set up (users should change this part) ##################################################### # # ANAL_TIME= analysis time (YYYYMMDDHH) # WORK_ROOT= working directory, where GSI runs # PREPBURF = path of PreBUFR conventional obs # BK_FILE = path and name of background file # OBS_ROOT = path of observations files # CRTM_ROOT= path of crtm coefficients files # FIX_ROOT = path of fix files # GSI_EXE = path and name of the gsi executable # ANAL_TIME= WORK_ROOT=./run_${ANAL_TIME}_arw_single 20

26 Running GSI PREPBUFR=newgblav.gdas1.t12z.prepbufr.nr BK_FILE=./bkARW/wrfout_d01_ _12:00:00 OBS_ROOT=./GSI_DATA/obs CRTM_ROOT=./comGSI/crtm/CRTM_Coefficients FIX_ROOT=./release_V3/fix GSI_EXE=./release_V3/run/gsi.exe The next part of this block focuses on additional options that specify important aspects of the GSI configuration. Option bk_core indicates the specific WRF core used to create the background files and is used to specify the WRF core when generating the namelist. Option bkcv_option specifies the background error covariance to be used in the script. Two background error covariance matrices are provided with the release, one from NCEP global data assimilation (GDAS), and one form NAM data assimilation system(ndas). Please check Section 7.2 for more details about GSI background error covariance. Option if_clean is to set if the run script needs to delete temporal intermediate files in the working directory after a GSI run is completed. # # bk_core= which WRF core is used as background (NMM or ARW) # bkcv_option= which background error covariance and parameter will # be used (GLOBAL or NAM) # if_clean = clean :delete temporal files in working directory (default) # no : leave running directory as is (this is for debug only) bk_core=arw bkcv_option=nam if_clean=clean Description of the sample script to run GSI Listed below is an annotated run script (Courier New) with explanations on each function block. For further details on the first 3 blocks of the script that users need to change, check section , , and : #!/bin/ksh ##################################################### # machine set up (users should change this part) ##################################################### # # # GSIPROC = processor number used for GSI analysis # GSIPROC=1 ARCH='LINUX_PGI' # Supported configurations: # IBM_LSF,,IBM_LoadLevel # LINUX_Intel, LINUX_Intel_LSF, LINUX_Intel_PBS, # LINUX_PGI, LINUX_PGI_LSF, LINUX_PGI_PBS, # DARWIN_PGI # 21

27 Running GSI ##################################################### # case set up (users should change this part) ##################################################### # # ANAL_TIME= analysis time (YYYYMMDDHH) # WORK_ROOT= working directory, where GSI runs # PREPBURF = path of PreBUFR conventional obs # BK_FILE = path and name of background file # OBS_ROOT = path of observations files # CRTM_ROOT= path of crtm coefficients files # FIX_ROOT = path of fix files # GSI_EXE = path and name of the gsi executable ANAL_TIME= WORK_ROOT=./run_${ANAL_TIME}_arw_single PREPBUFR=newgblav.gdas1.t12z.prepbufr.nr BK_FILE=./bkARW/wrfout_d01_ _12:00:00 OBS_ROOT=./GSI_DATA/obs CRTM_ROOT=./comGSI/crtm/CRTM_Coefficients FIX_ROOT=./release_V3/fix GSI_EXE=./release_V3/run/gsi.exe # # bk_core= which WRF core is used as background (NMM or ARW) # bkcv_option= which background error covariance and parameter will # be used (GLOBAL or NAM) # if_clean = clean:delete temporal files in working directory (default) # no : leave running directory as is (this is for debug only) bk_core=arw bkcv_option=nam if_clean=clean # Up to this point, users should be able to run the GSI for simple cases without changing the scripts. However, some advanced users may need to change some following blocks for special application, such as use of radiance data, cycled run, or running GSI on a platform not tested by DTC. ##################################################### # Users should NOT change script after this point ##################################################### # The next block sets byte-order and run command to run GSI on multiple platforms. The ARCH is set in the beginning of the script. Option BYTE_ORDER tells the script the byte order of the machine, which will set up the right byte-order files for the CRTM coefficients and background error covariance. The choices depend on a combination of the platform (e.g., IBM vs. Linux) and compiler (e.g., ifort vs. pgi). case $ARCH in 'IBM_LSF') ###### IBM LSF (Load Sharing Facility) BYTE_ORDER=Big_Endian RUN_COMMAND="mpirun.lsf " ;; 'IBM_LoadLevel') ###### IBM LoadLeve 22

28 Running GSI BYTE_ORDER=Big_Endian RUN_COMMAND="poe " ;; 'LINUX_Intel') BYTE_ORDER=Little_Endian if [ $GSIPROC = 1 ]; then #### Linux workstation - single processor RUN_COMMAND="" else ###### Linux workstation - mpi run RUN_COMMAND="mpirun -np ${GSIPROC} -machinefile ~/mach " fi ;; 'LINUX_Intel_LSF') ###### LINUX LSF (Load Sharing Facility) BYTE_ORDER=Little_Endian RUN_COMMAND="mpirun.lsf " ;; 'LINUX_Intel_PBS') BYTE_ORDER=Little_Endian #### Linux cluster PBS (Portable Batch System) RUN_COMMAND="mpirun -np ${GSIPROC} " ;; 'LINUX_PGI') BYTE_ORDER=Little_Endian if [ $GSIPROC = 1 ]; then #### Linux workstation - single processor RUN_COMMAND="" else ###### Linux workstation - mpi run RUN_COMMAND="mpirun -np ${GSIPROC} -machinefile ~/mach " fi ;; 'LINUX_PGI_LSF') ###### LINUX LSF (Load Sharing Facility) BYTE_ORDER=Little_Endian RUN_COMMAND="mpirun.lsf " ;; 'LINUX_PGI_PBS') BYTE_ORDER=Little_Endian ###### Linux cluster PBS (Portable Batch System) RUN_COMMAND="mpirun -np ${GSIPROC} " ;; 'DARWIN_PGI') ### Mac - mpi run BYTE_ORDER=Little_Endian if [ $GSIPROC = 1 ]; then #### Mac workstation - single processor RUN_COMMAND="" else ###### Mac workstation - mpi run RUN_COMMAND="mpirun -np ${GSIPROC} -machinefile ~/mach " fi ;; * ) print "error: $ARCH is not a supported platform configuration." 23

29 Running GSI esac # exit 1 ;; The next block checks if all the variables needed for a GSI run are properly defined. These variables should have been defined in the previous part of this script. # ################################################################ # Check GSI needed environment variables are defined and exist # # Make sure ANAL_TIME is defined and in the correct format if [! "${ANAL_TIME}" ]; then echo "ERROR: \$ANAL_TIME is not defined!" exit 1 fi # Make sure WORK_ROOT is defined and exists if [! "${WORK_ROOT}" ]; then echo "ERROR: \$WORK_ROOT is not defined!" exit 1 fi # Make sure the background file exists if [! -r "${BK_FILE}" ]; then echo "ERROR: ${BK_FILE} does not exist!" exit 1 fi # Make sure OBS_ROOT is defined and exists if [! "${OBS_ROOT}" ]; then echo "ERROR: \$OBS_ROOT is not defined!" exit 1 fi if [! -d "${OBS_ROOT}" ]; then echo "ERROR: OBS_ROOT directory '${OBS_ROOT}' does not exist!" exit 1 fi # Set the path to the GSI static files if [! "${FIX_ROOT}" ]; then echo "ERROR: \$FIX_ROOT is not defined!" exit 1 fi if [! -d "${FIX_ROOT}" ]; then echo "ERROR: fix directory '${FIX_ROOT}' does not exist!" exit 1 fi # Set the path to the CRTM coefficients if [! "${CRTM_ROOT}" ]; then echo "ERROR: \$CRTM_ROOT is not defined!" exit 1 fi if [! -d "${CRTM_ROOT}" ]; then 24

30 Running GSI echo "ERROR: fix directory '${CRTM_ROOT}' does not exist!" exit 1 fi # Make sure the GSI executable exists if [! -x "${GSI_EXE}" ]; then echo "ERROR: ${GSI_EXE} does not exist!" exit 1 fi # Check to make sure the number of processors for # running GSI was specified if [ -z "${GSIPROC}" ]; then echo "ERROR: The variable $GSIPROC must be set to contain the number of processors to run GSI" exit 1 fi The next block creates a working directory (workdir) in which GSI will run. It should have enough disk space to hold all the files needed for this run. This directory is cleaned before each run, therefore, save all the files needed from the previous run before rerunning GSI. # ################################################################### # Create the ram work directory and cd into it workdir=${work_root} echo " Create working directory:" ${workdir} echo " Create working directory:" ${workdir} if [ -d "${workdir}" ]; then rm -rf ${workdir} fi mkdir -p ${workdir} cd ${workdir} After creating a working directory, copy the GSI executable, background, observation, and fixed files into the working directory. # ################################################################### echo " Copy GSI executable, background file, and link observation bufr to working directory" # Save a copy of the GSI executable in the workdir cp ${GSI_EXE} gsi.exe Note: Copy the background file to the working directory as wrf_inout. The file wrf_inout will be overwritten by GSI to save analysis result. 25

31 Running GSI # Bring over background field #(it's modified by GSI so we can't link to it) cp ${BK_FILE}./wrf_inout Note: You can link observation files to the working directory because GSI will not overwrite these files. The possible observations to be analyzed in GSI can be found in dfile of the GSI namelist section OBS_INPUT in this example script. Most of the conventional observations are in one single file named prepbufr, while radiance data are in separate files based on satellite instruments, such as AMSU-A or HIRS. All these observation files must be linked to GSI recognized file names in dfile. Please check the table in Section 3.1 under the observation section for a detailed explanation of links and the meanings of each file name listed below. # Link to the prepbufr data ln -s ${PREPBUFR}./prepbufr # Link to the radiance data # ln -s ${OBS_ROOT}/ndas.t12z.1bamua.tm12.bufr_d amsuabufr # ln -s ${OBS_ROOT}/ndas.t12z.1bhrs4.tm12.bufr_d hirs4bufr # ln -s ${OBS_ROOT}/ndas.t12z.1bmhs.tm12.bufr_d mhsbufr The following block copies constant fixed files from the fix/ directory and links CRTM coefficients. Please check Section 3.1 for the meanings of each fixed file. For background error covariances, observation errors, and data information files, we provide two sets of fixed files here, one set is based on GFS statistics and another is based on NAM statistics. # ################################################################### echo " Copy fixed files and link CRTM coefficient files to working directory" # Set fixed files # berror = forecast model background error statistics # specoef = CRTM spectral coefficients # trncoef = CRTM transmittance coefficients # emiscoef = CRTM coefficients for IR sea surface emissivity model # aerocoef = CRTM coefficients for aerosol effects # cldcoef = CRTM coefficients for cloud effects # satinfo = text file with information about assimilation of # brightness temperatures # satangl = angle dependent bias correction file (fixed in time) # pcpinfo = text file with information about assimilation of # prepcipitation rates # ozinfo = text file with information about assimilation of ozone data # errtable = text file with obs error for conventional data (regional only) # convinfo = text file with information about assimilation of conventional data # bufrtable= text file ONLY needed for single obs test (oneobstest=.true.) # bftab_sst= bufr table for sst ONLY needed for sst retrieval (retrieval=.true.) if [ ${bkcv_option} = GLOBAL ] ; then echo ' Use global background error covariance' if [ ${BYTE_ORDER} = Little_Endian ] ; then 26

32 Running GSI BERROR=${FIX_ROOT}/nam_glb_berror.f77.gcv_Little_Endian else BERROR=${FIX_ROOT}/nam_glb_berror.f77.gcv fi OBERROR=${FIX_ROOT}/nam_errtable.r3dv ANAVINFO=${FIX_ROOT}/anavinfo_wrf_globalbe else echo ' Use NAM background error covariance' if [ ${BYTE_ORDER} = Little_Endian ] ; then BERROR=${FIX_ROOT}/nam_nmmstat_na.gcv_Little_Endian else BERROR=${FIX_ROOT}/nam_nmmstat_na.gcv fi OBERROR=${FIX_ROOT}/nam_errtable.r3dv ANAVINFO=${FIX_ROOT}/anavinfo_wrf_nambe fi SATANGL=${FIX_ROOT}/global_satangbias.txt SATINFO=${FIX_ROOT}/global_satinfo.txt CONVINFO=${FIX_ROOT}/global_convinfo.txt OZINFO=${FIX_ROOT}/global_ozinfo.txt PCPINFO=${FIX_ROOT}/global_pcpinfo.txt RTMFIX=${CRTM_ROOT} RTMEMIS=${RTMFIX}/EmisCoeff/${BYTE_ORDER}/EmisCoeff.bin RTMAERO=${RTMFIX}/AerosolCoeff/${BYTE_ORDER}/AerosolCoeff.bin RTMCLDS=${RTMFIX}/CloudCoeff/${BYTE_ORDER}/CloudCoeff.bin # copy Fixed fields to working directory cp $ANAVINFO anavinfo cp $BERROR berror_stats cp $SATANGL satbias_angle cp $SATINFO satinfo cp $CONVINFO convinfo cp $OZINFO ozinfo cp $PCPINFO pcpinfo cp $OBERROR errtable # # ## CRTM Spectral and Transmittance coefficients ln -s $RTMEMIS EmisCoeff.bin ln -s $RTMAERO AerosolCoeff.bin ln -s $RTMCLDS CloudCoeff.bin nsatsen=`cat satinfo wc -l` isatsen=1 while [[ $isatsen -le $nsatsen ]]; do flag=`head -n $isatsen satinfo tail -1 cut -c1-1` if [[ "$flag"!= "!" ]]; then satsen=`head -n $isatsen satinfo tail -1 cut -f 2 -d" "` spccoeff=${satsen}.spccoeff.bin if [[! -s $spccoeff ]]; then ln -s $RTMFIX/SpcCoeff/${BYTE_ORDER}/$spccoeff $spccoeff ln -s $RTMFIX/TauCoeff/${BYTE_ORDER}/${satsen}.TauCoeff.bin ${satsen}.taucoeff.bin fi fi 27

33 Running GSI isatsen=` expr $isatsen + 1 ` done # Only need this file for single obs test bufrtable=${fix_root}/prepobs_prep.bufrtable cp $bufrtable./prepobs_prep.bufrtable # for satellite bias correction cp ${FIX_ROOT}/ndas.t06z.satbias.tm03./satbias_in Set up some constants used in the GSI namelist. Please note that bkcv_option is set for background error tuning. They should be set based on specific applications. Here we provide two sample sets of the constants for different background error covariance options, one set is used in the GFS operations and one for the NAM operations. # ################################################################### # Set some parameters for use by the GSI executable and to build the namelist echo " Build the namelist " export JCAP=62 export LEVS=60 export JCAP_B=62 export DELTIM=${DELTIM:-$((3600/($JCAP/20)))} if [ ${bkcv_option} = GLOBAL ] ; then vs_op='0.7,' hzscl_op='1.7,0.8,0.5,' else vs_op='1.0,' hzscl_op='0.373,0.746,1.50,' fi if [ ${bk_core} = NMM ] ; then bk_core_arw='.false.' bk_core_nmm='.true.' else bk_core_arw='.true.' bk_core_nmm='.false.' fi The following large chunk of script is used to generate the GSI namelist called gsiparm.anl created in the working directory. A detailed explanation of each variable can be found in Section 3.3 and Appendix B. &SETUP miter=2,niter(1)=10,niter(2)=10, write_diag(1)=.true.,write_diag(2)=.false.,write_diag(3)=.true., gencode=78,qoption=2, factqmin=0.0,factqmax=0.0,deltim=$deltim, ndat=67,iguess=-1, oneobtest=.false.,retrieval=.false., nhr_assimilation=3,l_foto=.false., use_pbl=.false.,use_compress=.false.,nsig_ext=13,gpstop=30., / 28

34 Running GSI &GRIDOPTS JCAP=$JCAP,JCAP_B=$JCAP_B,NLAT=$NLAT,NLON=$LONA,nsig=$LEVS,hybrid=.true., wrf_nmm_regional=${bk_core_nmm},wrf_mass_regional=${bk_core_arw}, diagnostic_reg=.false., filled_grid=.false.,half_grid=.true.,netcdf=.true., / &BKGERR vs=${vs_op} hzscl=${hzscl_op} bw=0.,fstat=.true., / &ANBKGERR anisotropic=.false.,an_vs=1.0,ngauss=1, an_flen_u=-5.,an_flen_t=3.,an_flen_z=-200., ifilt_ord=2,npass=3,normal=-200,grid_ratio=4.,nord_f2a=4, / &JCOPTS / &STRONGOPTS jcstrong=.false.,jcstrong_option=3,nstrong=0,nvmodes_keep=20,period_max=3., baldiag_full=.true.,baldiag_inc=.true., / &OBSQC dfact=0.75,dfact1=3.0,noiqc=.false.,c_varqc=0.02,vadfile='prepbufr', / &OBS_INPUT dmesh(1)=120.0,dmesh(2)=60.0,dmesh(3)=60.0,dmesh(4)=60.0,dmesh(5)=120,time_window_max=1.5, dfile(01)='prepbufr', dtype(01)='ps', dplat(01)=' ', dsis(01)='ps', dval(01)=1.0, dthin(01)=0, dfile(02)='prepbufr' dtype(02)='t', dplat(02)=' ', dsis(02)='t', dval(02)=1.0, dthin(02)=0, dfile(03)='prepbufr', dtype(03)='q', dplat(03)=' ', dsis(03)='q', dval(03)=1.0, dthin(03)=0, dfile(04)='prepbufr', dtype(04)='uv', dplat(04)=' ', dsis(04)='uv', dval(04)=1.0, dthin(04)=0, dfile(05)='prepbufr', dtype(05)='spd', dplat(05)=' ', dsis(05)='spd', dval(05)=1.0, dthin(05)=0, dfile(06)='radarbufr', dtype(06)='rw', dplat(06)=' ', dsis(06)='rw', dval(06)=1.0, dthin(06)=0, dfile(07)='prepbufr', dtype(07)='dw', dplat(07)=' ', dsis(07)='dw' dval(07)=1.0, dthin(07)=0, dfile(08)='prepbufr', dtype(08)='sst', dplat(08)=' ', dsis(08)='sst', dval(08)=1.0, dthin(08)=0, dfile(09)='prepbufr', dtype(09)='pw', dplat(09)=' ', dsis(09)='pw', dval(09)=1.0, dthin(09)=0, dfile(10)='gpsrobufr', dtype(10)='gps_ref', dplat(10)=' ', dsis(10)='gps', dval(10)=1.0, dthin(10)=0, dfile(11)='ssmirrbufr',dtype(11)='pcp_ssmi', dplat(11)='dmsp', dsis(11)='pcp_ssmi', dval(11)=1.0, dthin(11)=-1, dfile(12)='tmirrbufr', dtype(12)='pcp_tmi', dplat(12)='trmm', dsis(12)='pcp_tmi', dval(12)=1.0, dthin(12)=-1, dfile(13)='sbuvbufr', dtype(13)='sbuv2', dplat(13)='n16', dsis(13)='sbuv8_n16', dval(13)=1.0, dthin(13)=0, dfile(14)='sbuvbufr', dtype(14)='sbuv2', dplat(14)='n17', dsis(14)='sbuv8_n17', dval(14)=1.0, dthin(14)=0, dfile(15)='sbuvbufr', dtype(15)='sbuv2', dplat(15)='n18', dsis(15)='sbuv8_n18', dval(15)=1.0, dthin(15)=0, dfile(16)='omibufr', dtype(16)='omi', dplat(16)='aura', dsis(16)='omi_aura', dval(16)=1.0, dthin(16)=6, dfile(17)='hirs2bufr', dtype(17)='hirs2', dplat(17)='n14', dsis(17)='hirs2_n14', dval(17)=6.0, dthin(17)=1, dfile(18)='hirs3bufr', dtype(18)='hirs3', dplat(18)='n16', dsis(18)='hirs3_n16', dval(18)=0.0, dthin(18)=1, dfile(19)='hirs3bufr', dtype(19)='hirs3', dplat(19)='n17', dsis(19)='hirs3_n17', dval(19)=6.0, dthin(19)=1, dfile(20)='hirs4bufr', dtype(20)='hirs4', dplat(20)='n18', dsis(20)='hirs4_n18', dval(20)=0.0, dthin(20)=1, dfile(21)='hirs4bufr', dtype(21)='hirs4', dplat(21)='metop-a', dsis(21)='hirs4_metop-a', dval(21)=6.0, dthin(21)=1, dfile(22)='gsndrbufr', dtype(22)='sndr', dplat(22)='g11', dsis(22)='sndr_g11', dval(22)=0.0, dthin(22)=1, dfile(23)='gsndrbufr', dtype(23)='sndr', dplat(23)='g12', dsis(23)='sndr_g12', dval(23)=0.0, dthin(23)=1, dfile(24)='gimgrbufr', dtype(24)='goes_img', dplat(24)='g11', dsis(24)='imgr_g11', dval(24)=0.0, dthin(24)=1, dfile(25)='gimgrbufr', dtype(25)='goes_img', dplat(25)='g12', dsis(25)='imgr_g12', dval(25)=0.0, dthin(25)=1, dfile(26)='airsbufr', dtype(26)='airs', dplat(26)='aqua', dsis(26)='airs281subset_aqua',dval(26)=20.0, dthin(26)=1, dfile(27)='msubufr', dtype(27)='msu', dplat(27)='n14', dsis(27)='msu_n14', dval(27)=2.0, dthin(27)=2, dfile(28)='amsuabufr', dtype(28)='amsua', dplat(28)='n15', dsis(28)='amsua_n15', dval(28)=10.0, dthin(28)=2, dfile(29)='amsuabufr', dtype(29)='amsua', dplat(29)='n16', dsis(29)='amsua_n16', dval(29)=0.0, dthin(29)=2, dfile(30)='amsuabufr', dtype(30)='amsua', dplat(30)='n17', dsis(30)='amsua_n17', dval(30)=0.0, dthin(30)=2, dfile(31)='amsuabufr', dtype(31)='amsua', dplat(31)='n18', dsis(31)='amsua_n18', dval(31)=10.0, dthin(31)=2, dfile(32)='amsuabufr', dtype(32)='amsua', dplat(32)='metop-a', dsis(32)='amsua_metop-a', dval(32)=10.0, dthin(32)=2, dfile(33)='airsbufr', dtype(33)='amsua', dplat(33)='aqua', dsis(33)='amsua_aqua', dval(33)=5.0, dthin(33)=2, dfile(34)='amsubbufr', dtype(34)='amsub', dplat(34)='n15', dsis(34)='amsub_n15', dval(34)=3.0, dthin(34)=3, dfile(35)='amsubbufr', dtype(35)='amsub', dplat(35)='n16', dsis(35)='amsub_n16', dval(35)=3.0, dthin(35)=3, dfile(36)='amsubbufr', dtype(36)='amsub', dplat(36)='n17', dsis(36)='amsub_n17', dval(36)=3.0, dthin(36)=3, dfile(37)='mhsbufr', dtype(37)='mhs', dplat(37)='n18', dsis(37)='mhs_n18', dval(37)=3.0, dthin(37)=3, dfile(38)='mhsbufr', dtype(38)='mhs', dplat(38)='metop-a', dsis(38)='mhs_metop-a', dval(38)=3.0, dthin(38)=3, dfile(39)='ssmitbufr', dtype(39)='ssmi', dplat(39)='f13', dsis(39)='ssmi_f13', dval(39)=0.0, dthin(39)=4, dfile(40)='ssmitbufr', dtype(40)='ssmi', dplat(40)='f14', dsis(40)='ssmi_f14', dval(40)=0.0, dthin(40)=4, dfile(41)='ssmitbufr', dtype(41)='ssmi', dplat(41)='f15', dsis(41)='ssmi_f15', dval(41)=0.0, dthin(41)=4, 29

35 Running GSI dfile(42)='amsrebufr', dtype(42)='amsre_low', dplat(42)='aqua', dsis(42)='amsre_aqua', dval(42)=0.0, dthin(42)=4, dfile(43)='amsrebufr', dtype(43)='amsre_mid', dplat(43)='aqua', dsis(43)='amsre_aqua', dval(43)=0.0, dthin(43)=4, dfile(44)='amsrebufr', dtype(44)='amsre_hig', dplat(44)='aqua', dsis(44)='amsre_aqua' dval(44)=0.0, dthin(44)=4, dfile(45)='ssmisbufr', dtype(45)='ssmis', dplat(45)='f16', dsis(45)='ssmis_f16', dval(45)=0.0, dthin(45)=4, dfile(46)='gsnd1bufr', dtype(46)='sndrd1', dplat(46)='g12', dsis(46)='sndrd1_g12', dval(46)=1.5, dthin(46)=5, dfile(47)='gsnd1bufr', dtype(47)='sndrd2', dplat(47)='g12', dsis(47)='sndrd2_g12', dval(47)=1.5, dthin(47)=5, dfile(48)='gsnd1bufr', dtype(48)='sndrd3', dplat(48)='g12', dsis(48)='sndrd3_g12', dval(48)=1.5, dthin(48)=5, dfile(49)='gsnd1bufr', dtype(49)='sndrd4', dplat(49)='g12', dsis(49)='sndrd4_g12', dval(49)=1.5, dthin(49)=5, dfile(50)='gsnd1bufr', dtype(50)='sndrd1', dplat(50)='g11', dsis(50)='sndrd1_g11', dval(50)=1.5, dthin(50)=5, dfile(51)='gsnd1bufr', dtype(51)='sndrd2', dplat(51)='g11', dsis(51)='sndrd2_g11', dval(51)=1.5, dthin(51)=5, dfile(52)='gsnd1bufr', dtype(52)='sndrd3', dplat(52)='g11', dsis(52)='sndrd3_g11', dval(52)=1.5, dthin(52)=5, dfile(53)='gsnd1bufr', dtype(53)='sndrd4', dplat(53)='g11', dsis(53)='sndrd4_g11', dval(53)=1.5, dthin(53)=5, dfile(54)='gsnd1bufr', dtype(54)='sndrd1', dplat(54)='g13', dsis(54)='sndrd1_g13', dval(54)=1.5, dthin(54)=5, dfile(55)='gsnd1bufr', dtype(55)='sndrd2', dplat(55)='g13', dsis(55)='sndrd2_g13', dval(55)=1.5, dthin(55)=5, dfile(56)='gsnd1bufr', dtype(56)='sndrd3', dplat(56)='g13', dsis(56)='sndrd3_g13', dval(56)=1.5, dthin(56)=5, dfile(57)='gsnd1bufr', dtype(57)='sndrd4', dplat(57)='g13', dsis(57)='sndrd4_g13', dval(57)=1.5, dthin(57)=5, dfile(58)='iasibufr', dtype(58)='iasi', dplat(58)='metop-a', dsis(58)='iasi586_metop-a', dval(58)=20.0, dthin(58)=1, dfile(59)='gomebufr', dtype(59)='gome', dplat(59)='metop-a', dsis(59)='gome_metop-a', dval(59)=1.0, dthin(59)=6, dfile(60)='sbuvbufr', dtype(60)='sbuv2', dplat(60)='n19', dsis(60)='sbuv8_n19', dval(60)=1.0, dthin(60)=0, dfile(61)='hirs4bufr', dtype(61)='hirs4', dplat(61)='n19', dsis(61)='hirs4_n19', dval(61)=6.0, dthin(61)=1, dfile(62)='amsuabufr', dtype(62)='amsua', dplat(62)='n19', dsis(62)='amsua_n19', dval(62)=10.0, dthin(62)=2, dfile(63)='mhsbufr', dtype(63)='mhs', dplat(63)='n19', dsis(63)='mhs_n19', dval(63)=3.0, dthin(63)=3, dfile(64)='tcvitl' dtype(64)='tcp', dplat(64)=' ', dsis(64)='tcp', dval(64)=1.0, dthin(64)=0, dfile(65)='modisbufr', dtype(65)='modis', dplat(65)='aqua', dsis(65)='modis_aqua', dval(65)=1.0, dthin(65)=6, dfile(66)='modisbufr', dtype(66)='modis', dplat(66)='terra', dsis(66)='modis_terra', dval(66)=1.0, dthin(66)=6, dfile(67)='mlsbufr', dtype(67)='mls', dplat(67)='aura', dsis(67)='mls_aura', dval(67)=1.0, dthin(67)=0, / &SUPEROB_RADAR del_azimuth=5.,del_elev=.25,del_range=5000.,del_time=.5,elev_angle_max=5., minnum=50,range_max= , l2superob_only=.false., / &LAG_DATA / &HYBRID_ENSEMBLE l_hyb_ens=.false., / &RAPIDREFRESH_CLDSURF l_cloud_analysis=.false., / &CHEM / &SINGLEOB_TEST maginnov=1.0,magoberr=0.8,oneob_type='t', oblat=38.,oblon=279.,obpres=500.,obdattim=${anal_time}, obhourset=0., / EOF Note: EOF indicates the end of GSI namelist. The following block runs GSI and checks if GSI has successfully completed. # # ################################################### # run GSI ################################################### echo ' Run GSI with' ${bk_core} 'background' case $ARCH in 'IBM_LSF' 'IBM_LoadLevel') ${RUN_COMMAND}./gsi.exe < gsiparm.anl > stdout 2>&1 ;; 30

36 Running GSI * ) ${RUN_COMMAND}./gsi.exe > stdout 2>&1 ;; esac ################################################################## # run time error check ################################################################## error=$? if [ ${error} -ne 0 ]; then echo "ERROR: ${GSI} crashed Exit status=${error}" exit ${error} fi The following block saves the analysis results with an understandable name and adds analysis time to some output file names. Among them, stdout contains runtime output of GSI and wrf_inout is the analysis results. # ################################################################## # # GSI updating satbias_in (only for cycling assimilation) # # Copy the output to more understandable names cp stdout stdout.anl.${anal_time} cp wrf_inout wrfanl.${anal_time} ln fort.201 fit_p1.${anal_time} ln fort.202 fit_w1.${anal_time} ln fort.203 fit_t1.${anal_time} ln fort.204 fit_q1.${anal_time} ln fort.207 fit_rad1.${anal_time} The following block collects the diagnostic files. The diagnostic files are merged and categorized by outer loop and data type. Setting write_diag to true, directs GSI to write out diagnostic information for each observation station. This information is very useful to check analysis details. Please check Appendix A.2 for the tool to read and analyze these diagnostic files. # Loop over first and last outer loops to generate innovation # diagnostic files for indicated observation types (groups) # # NOTE: Since we set miter=2 in GSI namelist SETUP, outer # loop 03 will contain innovations with respect to # the analysis. Creation of o-a innovation files # is triggered by write_diag(3)=.true. The setting # write_diag(1)=.true. turns on creation of o-g # innovation files. # ls -l pe0*.* > listpe loops="01 03" for loop in $loops; do 31

37 Running GSI case $loop in 01) string=ges;; 03) string=anl;; *) string=$loop;; esac # Collect diagnostic files for obs types (groups) below listall="conv amsua_metop-a mhs_metop-a hirs4_metop-a hirs2_n14 msu_n14 \ sndr_g08 sndr_g10 sndr_g12 sndr_g08_prep sndr_g10_prep sndr_g12_prep \ sndrd1_g08 sndrd2_g08 sndrd3_g08 sndrd4_g08 sndrd1_g10 sndrd2_g10 \ sndrd3_g10 sndrd4_g10 sndrd1_g12 sndrd2_g12 sndrd3_g12 sndrd4_g12 \ hirs3_n15 hirs3_n16 hirs3_n17 amsua_n15 amsua_n16 amsua_n17 \ amsub_n15 amsub_n16 amsub_n17 hsb_aqua airs_aqua amsua_aqua \ goes_img_g08 goes_img_g10 goes_img_g11 goes_img_g12 \ pcp_ssmi_dmsp pcp_tmi_trmm sbuv2_n16 sbuv2_n17 sbuv2_n18 \ omi_aura ssmi_f13 ssmi_f14 ssmi_f15 hirs4_n18 amsua_n18 mhs_n18 \ amsre_low_aqua amsre_mid_aqua amsre_hig_aqua ssmis_las_f16 \ ssmis_uas_f16 ssmis_img_f16 ssmis_env_f16" for type in $listall; do count=`grep ${type}_${loop} listpe wc -l` if [[ $count -gt 0 ]]; then cat pe*${type}_${loop}* > diag_${type}_${string}.${anal_time} fi done done The following scripts clean the temporal files # Clean working directory to save only important files ls -l * > list_run_directory if [ ${if_clean} = clean ]; then echo ' Clean working directory after GSI run' rm -f *Coeff.bin # all CRTM coefficient files rm -f pe0* # diag files on each processor rm -f listpe # list of diag files on each processor rm -f obs_input.* # observation middle files rm -f siganl sigf03 # background middle files rm -f xhatsave.* # some information on each processor rm -f fsize_* # delete temperal file for bufr size fi The GSI successfully finishes and exits with 0: exit Introduction to Most Often Used GSI Namelist Options The complete namelist options and their explanations are listed in Appendix B. For most GSI analysis applications, only a few namelist variables need to be changed. Here we introduce most often used variables for regional analyses: 32

38 Running GSI 1. Set up the number of outer loop and inner loop To change the number of outer loops and the number of inner iterations in each outer loop, you need to modify the following three variables in the namelist: miter: number of outer loops of analysis. niter(1): maximum iteration number of inner loop iterations for the 1 st outer loop. The inner loop will stop when it reaches this maximum number, or reaches the convergence condition, or when it fails to converge. niter(2): maximum iteration number of inner loop iterations for the 2 nd outer loop. 2. Set up the analysis variable for moisture If you want to change moisture variable used in analysis, here is the namelist variable: qoption = 1 or 2: If qoption=1, the moisture analysis variable is pseudo-relative humidity. The saturation specific humidity, qsatg, is computed from the guess and held constant during the inner loop. Thus, the RH control variable can only change via changes in specific humidity, q. If qoption=2, the moisture analysis variable is normalized RH. This formulation allows RH to change in the inner loop via changes to surface pressure (pressure), temperature, or specific humidity. 3. Set up the background file The following four variables define which background field will be used in GSI analyses: regional: if true, perform a regional GSI run using either ARW or NMM inputs as background. If false, perform a global GSI analysis. wrf_nmm_regional: if true, background comes from WRF NMM. When use other background fields, set it to false. wrf_mass_regional: if true, background comes from WRF ARW. When use other background fields, set it to false. netcdf: if true, wrf files are in NetCDF format, otherwise wrf files are in binary format. This option only works for performing a regional GSI analysis. 4. Set up the output of diagnostic files The following variables decide if the GSI will write out diagnostic results in certain loops: write_diag(1): if true, write out diagnostic data in the beginning of the analysis, so that we can have information on Observation Background (O- B) write_diag(2) : if true, write out diagnostic data in between the 1 st and 2nd loop 33

39 Running GSI write_diag(3): if true, write out diagnostic data at the end of the 2 nd outer loop (the analysis if the outer loop number is 2), so that we can have information on Observation Analysis (O-A) Please check appendix A.2 for the tools to read the diagnostic files. 5. Set up the link to the observation files The following variables link the observation files to GSI observation IOs: ndat: number of observation variables (not observation types). This number should be consistent with the observation variable lines in section OBS_INPUT. So if adding a new observation variable, ndat must be incremented by one and one new line is added in OBS_INPUT. Based on dimensions of the variables in OBS_INPUT (e.g., dfile), the maximum value of ndat is 200 in this version. dfile(01)='prepbufr': observation file name. The observation file contains observations used for a GSI analysis. This file can include several observation variables from different observation types. The file name is default in GSI and cannot be changed. dtype(01)='ps': analysis variable name that GSI can read in and handle. As an example here, GSI will read all ps observations from the file prepbufr. Please note this name should be consistent with that used in the GSI code. dplat(01): sets up the observation platform inside the file dfile. dsis(01): sets up data name (including both data type and platform name) used inside GSI. Please see Section 4.3 for examples. 6. Set up observation time window In the namelist section OBS_INPUT, use time_window_max to set maximum half time window (hours) for all data types. In convinfo file, you can use column twindow to set half time window for certain data type (hours). For conventional observations, only observations within the smaller window of these two will be kept to further process. For others, observations within time_window_max will be kept to further process. 7. Set up data thinning 1) Radiance data thinning Radiance data thinning is controlled through two GSI namelist variables in the section &OBS_INPUT. Below is an example of the section: 34

40 Running GSI &OBS_INPUT dmesh(1)=120.0,dmesh(2)=60.0,dmesh(3)=60.0,dmesh(4)=60.0,dmesh(5)=120,time_window_max=1.5, dfile(01)='prepbufr', dtype(01)='ps', dplat(01)=' ', dsis(01)='ps', dval(01)=1.0, dthin(01)=0, dfile(11)='ssmirrbufr',dtype(11)='pcp_ssmi', dplat(11)='dmsp', dsis(11)='pcp_ssmi', dval(11)=1.0, dthin(11)=-1, dfile(12)='tmirrbufr', dtype(12)='pcp_tmi', dplat(12)='trmm', dsis(12)='pcp_tmi', dval(12)=1.0, dthin(12)=-1, dfile(13)='sbuvbufr', dtype(13)='sbuv2', dplat(13)='n16', dsis(13)='sbuv8_n16', dval(13)=1.0, dthin(13)=0, dfile(14)='sbuvbufr', dtype(14)='sbuv2', dplat(14)='n17', dsis(14)='sbuv8_n17', dval(14)=1.0, dthin(14)=0, dfile(15)='sbuvbufr', dtype(15)='sbuv2', dplat(15)='n18', dsis(15)='sbuv8_n18', dval(15)=1.0, dthin(15)=0, dfile(17)='hirs2bufr', dtype(17)='hirs2', dplat(17)='n14', dsis(17)='hirs2_n14', dval(17)=6.0, dthin(17)=1, dfile(18)='hirs3bufr', dtype(18)='hirs3', dplat(18)='n16', dsis(18)='hirs3_n16', dval(18)=0.0, dthin(18)=1, dfile(19)='hirs3bufr', dtype(19)='hirs3', dplat(19)='n17', dsis(19)='hirs3_n17', dval(19)=6.0, dthin(19)=1, dfile(20)='hirs4bufr', dtype(20)='hirs4', dplat(20)='n18', dsis(20)='hirs4_n18', dval(20)=0.0, dthin(20)=1, dfile(27)='msubufr', dtype(27)='msu', dplat(27)='n14', dsis(27)='msu_n14', dval(27)=2.0, dthin(27)=2, dfile(28)='amsuabufr', dtype(28)='amsua', dplat(28)='n15', dsis(28)='amsua_n15', dval(28)=10.0,dthin(28)=2, dfile(34)='amsubbufr', dtype(34)='amsub', dplat(34)='n15', dsis(34)='amsub_n15', dval(34)=3.0, dthin(34)=3, dfile(35)='amsubbufr', dtype(35)='amsub', dplat(35)='n16', dsis(35)='amsub_n16', dval(35)=3.0, dthin(35)=3, dfile(41)='ssmitbufr', dtype(41)='ssmi', dplat(41)='f15', dsis(41)='ssmi_f15', dval(41)=0.0, dthin(41)=4, The two namelist variables that control the radiance data thinning are real array dmesh in the 1 st line and the integer array dthin in the last column. The dmesh gives a set of the mesh sizes in unit km for radiance thinning grids, while the dthin defines if the data type it represents needs to be thinned and which thinning gird (mesh size) to use. If the value of dthin is: an integer less and equal to 0, no thinning is needed an integer larger than 0, this kind of radiance data will be thinned in a thinning grid with the mesh size defined as dmesh (dthin). Te following gives several thinning examples defined by the above sample &OBS_INPUT section: Data type 1 (ps in prepbufr): no thinning because dthin(01)=0 Data type 11 (pcp_ssmi from dmsp): no thinning because dthin(01)=-1 Data type 17 (hirs2 from noaa 14): thinning in a 120 km grid because dthin(17)=1 and dmesh(1)=120 Data type 41 (ssmi from f15): thinning in a 60 km grid because dthin(41)=4 and dmesh(4)=60 2) Conventional data thinning The conventional data can also be thinned. However, the setup of thinning is not in the namelist. To give users a complete picture about data thinning, we briefly introduce conventional data thininng here. There are three columns, ithin, rmesh, pmesh, in the convinfo file to configure conventional data thinning: ithin: 0 = no thinning; 1 = thinning with grid mesh decided by rmesh and pmesh rmesh: horizontal thinning grid size in km pmesh: vertical thinning grid size in mb; if 0, then use background vertical grid. 35

41 Running GSI 8. Set up background error factor In the namelist section BKGERR, use vs to set up scale factor for vertical correlation length and hzscl to set up scale factors for horizontal smoothing. The scale factors for the variance of each analysis variables are set in anavinfo file. 9. Single oservation test To do single observation test, the following namelist option has to be set to true first: oneobtest=.true. Then go to the namelist section SINGLEOB_TEST to set up the single observation location and variable to be tested, please see Section 4.2 for example and details on the single observation test. 3.4 Files in GSI Run Directory When completed, GSI will create a number of files in the run directory. Below is an example of the files generated in the run directory from one of the GSI test case runs. This case was run to perform a regional GSI analysis with a WRF ARW NetCDF background using both conventional (prepbufr) and radiance data (AMSU-A, AMSU-B, HIRS4). The analysis time is 12Z 22 March Four processors were used. To make the run directory more readable, we turned on the clean option in the run script, which deleted all temperal intermediate files: amsuabufr fit_rad gsi.exe amsubbufr fit_t gsiparm.anl anavinfo fit_w hirs4bufr berror_stats fort.201 l2rwbufr convinfo fort.202 list_run_directory diag_amsua_metop-a_anl fort.203 ozinfo diag_amsua_metop-a_ges fort.204 pcpbias_out diag_amsua_n15_anl fort.205 pcpinfo diag_amsua_n15_ges fort.206 prepbufr diag_amsua_n18_anl fort.207 prepobs_prep.bufrtable diag_amsua_n18_ges fort.208 satbias_angle diag_amsub_n17_anl fort.209 satbias_in diag_amsub_n17_ges fort.210 satbias_out diag_conv_anl fort.211 satinfo diag_conv_ges fort.212 stdout diag_hirs4_metop-a_anl fort.213 stdout.anl diag_hirs4_metop-a_ges fort.214 wrf_inout errtable fort.215 wrfanl fit_p fort.217 fit_q fort.220 There are several important files that hold GSI analysis results and diagnostic information. We will introduce these files and their contents in detail in the following chapter. Here is a brief list of what these files contain: stdout.anl (stdout): standard text output file, which is a copy of stdout with analysis time appended. It is the most often used file 36

42 Running GSI to check GSI analysis processes as well as basic and important information about the analyses. We will explain the content of stdout in Section 4.1 and users are encouraged to read this file in detail to get an idea about GSI process order. wrfanl (wrf_inout): analysis results if GSI completes successfully - only if using wrf for background. It is a copy of wrf_inout with analysis time appended. The format is same as the background file. diag_conv_anl.(time): diagnostic files for conventional and GPS RO observations at final analysis step (analysis departure for each observation, please see Appendix A.2 for more information). diag_conv_ges.(time): diagnostic files for conventional and GPS RO observations at initial analysis step (background departure for each observation, please see Appendix A.2 for more information) diag_(instrument_satellite)_anl: diagnostic files for satellite observations at final analysis step. diag_(instrument_satellite)_ges: diagnostic files for satellite observations at initial analysis step. gsiparm.anl: namelist, generated by the run script. fit_(variable).(time): copies of fort.2?? with meaningful names (variable name and analysis time). They are statistic results of observation departures from background or analysis results according to observation variables. Please see Section 4.5 for more details. fort.220: output from the inner loop minimization --> pcgsoi.f90. Please see Section 4.6 for details. *info : info files that control data usage. Please see Section 4.3 for details. berror_stats and errtable: background error and observation error files. list_run_directory : the complete list of files before clean in run directory. Please note that diag_(instrument_satellite_anl.(time) and diag_conv_anl.(time) contain important information like analysis increment for each observation (O-A), also, diag_conv_ges and diag_(instrument_satellite)_ges include observation innovation for each observation (O-B). These files can be very helpful in terms of getting the detailed impact of data on the analysis. A tool is provided to process these files, which is introduced in Appendix A.2. There are lots of intermediate files in this directory during running stage and the complete list of files before cleaning is saved in a file list_run_directory. A little knowledge about the content of these files is very helpful for debugging when the GSI run crashes. Please check the following list for the meaning of these files: (Note: you may not see all files in the above list because different observational data are used. Also, the fixed files prepared for a GSI run, such as CRTM coefficient and background error statistic files, are not included.) 37

43 Running GSI sigf03 siganl *bufr pe????.(conv or instrument_satellite)_ (outer loop) obs_input.???? pcpbias_out satbias_out xhatsave.(outer loop).???? Background files in binary format (typically sigf03,sigf06 and sigf09). This is a temporal file holding binary background files. When you see this file, at least background file is successfully read in. Analysis results in binary format. When this file exists, the analysis part has finished. Observation bufr files Diagnostic files for conventional and satellite observations at each outer loop and each sub-domains (????=subdomain id) Observation scratch files ( one file is one observation type within analysis domain and time.????=observation type id in namelist) Output precipitation bias correction file Output satellite bias correction file Control vector in certain subdomain and outer loop, (????=subdomain id) 38

44 GSI Diagnostics and Tuning Chapter 4: GSI Diagnostics and Tuning The purpose of this chapter is to help users understand and diagnose the GSI analysis. The chapter starts with an introduction to the content and structure of the GSI standard output (stdout). It continues with the use of a single observation to check the features of the GSI analysis and the introductions to observation usage control, analysis domain partition, fit files, and the optimization process information in the GSI from standard and other output files. 4.1 Understanding Standard Output (stdout) In Section 3.4, we listed the files in the GSI run directory following a successful GSI analysis, and briefly introduced the contents of several important files. Of these, stdout is the most useful, because the user can obtain most critical information about the GSI analysis from the file. From stdout, users can find if the GSI has successfully completed, if optimal iterations look good, and if background and analysis fields are reasonable. The stdout file contains the GSI standard output from all the processors. Understanding the content of this file can also be very helpful for users to find where and why the GSI failed when it crashes. The structure of stdout shows typical steps in a meteorological data analysis system: 1. Read in all data and prepare analysis: a. Read in configuration (namelist) b. Read in background c. Partition domain and data for parallel analysis d. Read in observation(s) e. Read in constant fields (fixed files) 2. Optimal iteration (analysis) 3. Save analysis result In this section, the detailed structure and content of stdout are explained using an on-line example case. This case uses the WRF-ARW NetCDF file as the background, and analyzes several observations typical of operations, including most of conventional observation data and several radiance data (AMUS-A, AMSU-B, HIRS3). The case was run on the NCAR IBM supercomputer, using 4 processors. To keep the output concise and make it more readable, most repeated content was deleted (shown by the blue dotted line). For the same reason, the accuracy of some numbers has been reduced to avoid line breaks in stdout. This is the start of the GSI analysis. It shows some run time information of the analysis and control variables. 0: 0: 0:*. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. 0: PROGRAM GSI_ANL HAS BEGUN. COMPILED ORG: NP23 0: STARTING DATE-TIME MAR 29, :44: TUE : 0: 39

45 GSI Diagnostics and Tuning 0: state_vectors*init_anasv: 2D-STATE VARIABLES ps sst 0: state_vectors*init_anasv: 3D-STATE VARIABLES u v tv tsen q oz cw p3d 0: state_vectors*init_anasv: ALL STATE VARIABLES u v tv tsen q oz cw p3d ps sst 0: control_vectors*init_anacv: 2D-CONTROL VARIABLES ARE ps sst 0: control_vectors*init_anacv: 3D-CONTROL VARIABLES ARE sf vp t q oz cw 0: control_vectors*init_anacv: MOTLEY CONTROL VARIABLES stl sti 0: control_vectors*init_anacv: ALL CONTROL VARIABLES sf vp ps t q oz sst cw stl sti 0: INIT_IO: reserve units lendian_in= 15 and lendian_out= 66 for little endian i/o Next is the content of all namelist variables used in this analysis. The 1 st part shows the 4DVAR setups. Please note that while this version of the GSI includes some 4DVAR code, it is untested in this release. The general set up for the GSI analysis (3DVAR) starts after the &SETUP label. Please check Appendix B for definitions and default values of each namelist variable. 0: SETUP_4DVAR: l4dvar= F 0: SETUP_4DVAR: winlen= : SETUP_4DVAR: winoff= : SETUP_4DVAR: hr_obsbin= : SETUP_4DVAR: nobs_bins= 1 0: SETUP_4DVAR: nsubwin,nhr_subwin= 1 3 0: SETUP_4DVAR: lsqrtb= F 0: SETUP_4DVAR: lcongrad= F 0: SETUP_4DVAR: lbfgsmin= F 0: SETUP_4DVAR: ltlint= F 0: SETUP_4DVAR: ladtest,lgrtest= F F 0: SETUP_4DVAR: lwrtinc= F 0: SETUP_4DVAR: lanczosave= F 0: SETUP_4DVAR: jsiga= -1 0: SETUP_4DVAR: nwrvecs= -1 0: GSIMOD: reset time window for one or more OBS_INPUT entries to : INIT_OBSMOD_VARS: ndat_times,ndat_types,ndat= : INIT_OBSMOD_VARS: nhr_assimilation= 3 0: GSIMOD: ***WARNING*** reset oberrflg= T 0: calling gsisub with following input parameters: 0: 0: 0: &SETUP 0: GENCODE= , FACTQMIN= E+00, FACTQMAX= 0: E+00, DELTIM= , DTPHYS= 0: , BIASCOR= , BCOPTION=1, DIURNALBC= 0: E+00, NDAT=62, NITER=0, 50, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 0: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NITER_NO_QC= , , , 0: , , , , , , , , : , , , , , , , , : , , , , , , , , : , , , , , , , , : , , , , , , , , : , , , MITER=2, QOPTION=2, NHR_ASSIMILATION=3, MIN_OFFSET= 0: 180, IOUT_ITER=220, NPREDP=6, RETRIEVAL=F, DIAG_RAD=T, DIAG_PCP=T, DIAG_CONV=T, 0: DIAG_OZONE=T, DIAG_AERO=T, DIAG_CO=F, IGUESS=-1, WRITE_DIAG=F, T, F, T, F, F, F 0: 180, IOUT_ITER=220, NPREDP=6, RETRIEVAL=F, DIAG_RAD=T, DIAG_PCP=T, DIAG_CONV=T, 0: DIAG_OZONE=T, DIAG_AERO=T, DIAG_CO=F, IGUESS=-1, WRITE_DIAG=F, T, F, T, F, F, F 0: F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F 0: F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, ONEOBTEST=F, SFCMODEL=F, 0: DTBDUV_ON=T, IFACT10=0, L_FOTO=F, OFFTIME_DATA=F, NPRED_CONV_MAX=0, ID_BIAS_PS= 0: 0, ID_BIAS_T=0, ID_BIAS_SPD=120, CONV_BIAS_PS= E+00, 0: CONV_BIAS_T= E+00, CONV_BIAS_SPD= E+00, 0: STNDEV_CONV_PS= , STNDEV_CONV_T= , 0: STNDEV_CONV_SPD= , USE_PBL=F, USE_COMPRESS=F, NSIG_EXT=13, 0: GPSTOP= , PERTURB_OBS=F, PERTURB_FACT= , 0: OBERROR_TUNE=F, PRESERVE_RESTART_DATE=F, CRTM_COEFFS_PATH=./ 0: 0: 0:, 0: BERROR_STATS=berror_stats 0: 40

46 GSI Diagnostics and Tuning 0: 0:, NEWPC4PRED=F, ADP_ANGLEBC=F, ANGORD=0, 0: PASSIVE_BC=F, USE_EDGES=T, LOBSDIAGSAVE=F, L4DVAR=F, LSQRTB=F, LCONGRAD=F, 0: LBFGSMIN=F, LTLINT=F, NHR_OBSBIN=-1, NHR_SUBWIN=3, NWRVECS=-1, LADTEST=F, 0: LGRTEST=F, LOBSKEEP=F, LSENSRECOMPUTE=F, JSIGA=-1, LOBSENSFC=F, LOBSENSJB=F, 0: LOBSENSINCR=F, LOBSENSADJ=F, LOBSENSMIN=F, IOBSCONV=0, IDMODEL=F, LWRTINC=F, 0: JITERSTART=1, JITEREND=1, LOBSERVER=F, LANCZOSAVE=F, LLANCDONE=F, LFERRSCALE=F, 0: PRINT_DIAG_PCG=F, TSENSIBLE=F, LGSCHMIDT=F, LREAD_OBS_SAVE=F, LREAD_OBS_SKIP=F, 0: USE_GFS_OZONE=F, CHECK_GFS_OZONE_DATE=F, REGIONAL_OZONE=F, LWRITE_PREDTERMS=F, 0: LWRITE_PEAKWT=F, USE_GFS_NEMSIO=F 0: / 0: &GRIDOPTS 0: &BKGERR 0: &ANBKGERR 0: &JCOPTS 0: &STRONGOPTS 0: &OBSQC 0: &SUPEROB_RADAR 0: &LAG_DATA 0: &HYBRID_ENSEMBLE 0: &RAPIDREFRESH_CLDSURF 0: &CHEM Next, the background fields for the analysis are read and the maximum and minimum values of the fields in each vertical level are displayed. Here, only part of the variables znu and T are shown, and all other variables read by the GSI are listed only as the variable name in the NetCDF file (rmse_var = ). The maximum and minimum values are useful for a quick verification that the background fields have been successfully read. 0: dh1 = 1 0: iy,m,d,h,m,s= : dh1 = 1 0: before rmse var T 0: after rmse var T 0: dh1 = 1 0: rmse_var = T 0: ndim1 = 3 0: ordering = XYZ 0: k,znu(k)= : k,znu(k)= : k,znu(k)= E-02 0: k,znu(k)= E-02 0: rmse_var=znw 0: rmse_var=rdx 0: rmse_var=rdy 0: rmse_var=mapfac_m 0: rmse_var=xlat 0: rmse_var=xlong 41

47 GSI Diagnostics and Tuning 0: rmse_var=mub 0: rmse_var=mu 0: rmse_var=phb 0: rmse_var=t 0: ordering=xyz 0: WrfType,WRF_REAL= : ndim1= 3 0: staggering= N/A 0: start_index= : end_index= : k,max,min,mid T= : k,max,min,mid T= : k,max,min,mid T= : k,max,min,mid T= : k,max,min,mid T= : k,max,min,mid T= : rmse_var=qvapor 0: rmse_var=u 0: rmse_var=v 0: rmse_var=landmask 0: rmse_var=seaice 0: rmse_var=sst 0: rmse_var=ivgtyp 0: rmse_var=isltyp 0: rmse_var=vegfra 0: rmse_var=snow 0: rmse_var=u10 0: rmse_var=v10 0: rmse_var=smois 0: rmse_var=tslb 0: rmse_var=tsk Show how the analysis domain is partitioned into sub-domains in each processor, in this example, 4 processors were used (see Section 4.4 for more information): 0:general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= :general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= :general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= :general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= Give information on the horizontal dimensions of the sub-domains and set grid related variables: 0: in general_sub2grid_create_info, kbegin= : in general_sub2grid_create_info, kend= : INIT_GRID_VARS: number of threads 1 0: INIT_GRID_VARS: for thread 1 jtstart,jtstop =

48 GSI Diagnostics and Tuning Display the analysis and background file time. They should be the same: 3: READ_wrf_mass_FILES: analysis date,minutes : READ_wrf_mass_FILES: sigma guess file, nming E : GESINFO: Guess date is E+00 2: GESINFO: Analysis date is : READ_wrf_mass_FILES: sigma fcst files used in analysis : : READ_wrf_mass_FILES: surface fcst files used in analysis: Read in radar station information and prepare for radar radial velocity. This case didn t have radar velocity data linked. There is warning information about opening the file but this doesn t impact the rest of GSI analysis. 0: RADAR_BUFR_READ_ALL: problem opening level 2 bufr file "l2rwbufr" Read in and show the content of observation info files (see Section 4.3 for details). Here is part of convinfo: 0: PCPINFO_READ: no pcpbias file. set predxp=0.0 0:READ_CONVINFO: tcp :READ_CONVINFO: ps E :READ_CONVINFO: ps E :READ_CONVINFO: t E :READ_CONVINFO: t E :READ_CONVINFO: t E : CREATE_PCP_RANDOM: rseed,krsize= Partition background fields into subdomains for parallel analysis. The background is passed to each subdomain s set of processors for calculation. 0: at 0 in read_wrf_mass_guess 0: at 0.1 in read_wrf_mass_guess 0: at 1 in read_wrf_mass_guess, lm = 50 0: at 1 in read_wrf_mass_guess, num_mass_fields= 214 0: at 1 in read_wrf_mass_guess, nfldsig = 1 0: at 1 in read_wrf_mass_guess, num_all_fields= 214 0: at 1 in read_wrf_mass_guess, npe = 4 0: at 1 in read_wrf_mass_guess, num_loc_groups= 53 0: at 1 in read_wrf_mass_guess, num_all_pad = 216 0: at 1 in read_wrf_mass_guess, num_loc_groups= 54 0: READ_WRF_MASS_GUESS: open lendian_in= 15 to file=sigf03 0: ifld, temp1(im/2,jm/2)= E+05 1: READ_WRF_MASS_GUESS: open lendian_in= 15 to file=sigf03 1: ifld, temp1(im/2,jm/2)= E+04 2: READ_WRF_MASS_GUESS: open lendian_in= 15 to file=sigf03 3: READ_WRF_MASS_GUESS: open lendian_in= 15 to file=sigf03 2: ifld, temp1(im/2,jm/2)= E+03 3: ifld, temp1(im/2,jm/2)= E+03 0: ifld, temp1(im/2,jm/2)= E+03 1: ifld, temp1(im/2,jm/2)= E+03 0: in read_wrf_mass_guess, num_doubtful_sfct_all = 0 0: in read_wrf_mass_guess, num_doubtful_sfct_all = 0 Display information on control variable array allocation. 0: control_vectors: length= : control_vectors: currently allocated= 0 0: control_vectors: maximum allocated= 0 0: control_vectors: number of allocates= 0 0: control_vectors: number of deallocates= 0 0:control_vectors: Estimated max memory used= 0.0 Mb 43

49 GSI Diagnostics and Tuning Show the source of observation error used in the analysis: 0: CONVERR: using observation errors from user provided table Read in all the observations. The reading of the observations is distributed over all the processors with each processor reading in at least one type. To speed up the read of some of the larger datasets, the reading of one type can be read by more than one (ntasks) processors. Reset file status depending on whether observation time matches analysis time and how offtime_date is set. This also checks for consistency in satellite data files and known types. This observation list is decided by the availability of the observation BUFR files and the set up in the observation info files. 1: read_obs_check: bufr file date is prepbufr ps 0: read_obs_check: bufr file date is prepbufr uv 2: read_obs_check: bufr file date is prepbufr t 3: read_obs_check: bufr file date is prepbufr q 2: read_obs_check: bufr file date is 0 radarbufr rw not used 2: read_obs_check: bufr file date is 0 gpsrobufr gps_ref not used 2: read_obs_check: bufr file date is 0 sbuvbufr sbuv2 not used 2: read_obs_check: bufr file date is 0 hirs3bufr hirs3 not used 2: read_obs_check: bufr file date is 0 gsndrbufr sndr not used 2: read_obs_check: bufr file date is 0 airsbufr airs not used 0: read_obs_check: bufr file date is prepbufr sst 0: read_obs_check: bufr file date is 0 tmirrbufr pcp_tmi not used 0: read_obs_check: bufr file date is 0 omi omi not used 0: data type hirs2_n14 not used in info file -- do not read filehirs2bufr 3: read_obs_check: bufr file date is amsuabufr amsua 3: read_obs_check: bufr file date is amsuabufr amsua 3: read_obs_check: bufr file date is amsubbufr amsub 1: read_obs_check: bufr file date is 0 iasibufr iasi not used 0: READ_OBS: read 36 amsub amsub_n17 using ntasks= : READ_OBS: read 1 ps ps using ntasks= : READ_OBS: read 2 t t using ntasks= : READ_OBS: read 3 q q using ntasks= : READ_OBS: read 4 uv uv using ntasks= : READ_OBS: read 8 sst sst using ntasks= : READ_OBS: read 9 pw pw using ntasks= : READ_OBS: read 21 hirs4 hirs4_metop-a using ntasks= : READ_OBS: read 28 amsua amsua_n15 using ntasks= : READ_OBS: read 31 amsua amsua_n18 using ntasks= : READ_OBS: read 32 amsua amsua_metop-a using ntasks= Then, display basic statistics for full horizontal surface fields: 0: GETSFC: enter with nlat_sfc,nlon_sfc= 0 0 and nlat,nlon= : GETSFC: set nlat_sfc,nlon_sfc= :================================================================================ 0:Status Var Mean Min Max 0:sfcges2 FC E E E+00 0:sfcges2 VTYP E E E+01 0:sfcges2 VFRC E E E-01 0:sfcges2 SRGH E E E+00 0:sfcges2 STMP E E E+02 0:sfcges2 SMST E E E+00 0:sfcges2 SST E E E+02 0:sfcges2 SNOW E E E+02 0:sfcges2 ISLI E E E+00 0:sfcges2 STYP E E E+01 0:================================================================================ 44

50 GSI Diagnostics and Tuning Then, loop over all data files to read in observations: 2: READ_PREPBUFR: messages/reports = 640 / ntread = 1 3: READ_PREPBUFR: messages/reports = 640 / ntread = 1 3: READ_PREPBUFR: time offset is hours. 2:READ_PREPBUFR: file=prepbufr type=ps sis=ps nread= ithin= 0 rmesh= isfcalc= 0 ndata= ntask= 1 3:READ_PREPBUFR: file=prepbufr type=t sis=t nread= ithin= 0 rmesh= isfcalc= 0 ndata= ntask= 1 2: READ_PREPBUFR: messages/reports = 640 / ntread = 1 0:READ_BUFRTOVS: file=amsubbufr type=amsub sis=amsub_n17 nread= ithin= 3 rmesh= isfcalc= 0 ndata= 254 ntask= 2 3: READ_PREPBUFR: messages/reports = 640 / ntread = 1 3:READ_PREPBUFR: file=prepbufr type=pw sis=pw nread= 279 ithin= 0 rmesh= isfcalc= 0 ndata= 279 ntask= 1 2:READ_PREPBUFR: file=prepbufr type=sst sis=sst nread= ithin= 0 rmesh= isfcalc= 0 ndata= 1047 ntask= 1 0: READ_PREPBUFR: messages/reports = 640 / ntread = 1 1: READ_PREPBUFR: messages/reports = 640 / ntread = 6 3:READ_BUFRTOVS: file=amsuabufr type=amsua sis=amsua_metop-a nread= ithin= 2 rmesh= isfcalc= 0 ndata= 1750 ntask= 1 2:READ_BUFRTOVS: file=amsuabufr type=amsua sis=amsua_n18 nread= ithin= 2 rmesh= isfcalc= 0 ndata= ntask= 1 0:READ_PREPBUFR: file=prepbufr type=q sis=q nread= ithin= 0 rmesh= isfcalc= 0 ndata= ntask= 1 1: READ_PREPBUFR: obstype,ictype(nc),rmesh,pflag,nlevp,pmesh=uv : READ_PREPBUFR: obstype,ictype(nc),rmesh,pflag,nlevp,pmesh=uv : READ_PREPBUFR: obstype,ictype(nc),rmesh,pflag,nlevp,pmesh=uv : READ_PREPBUFR: obstype,ictype(nc),rmesh,pflag,nlevp,pmesh=uv : READ_PREPBUFR: obstype,ictype(nc),rmesh,pflag,nlevp,pmesh=uv :READ_PREPBUFR: file=prepbufr type=uv sis=uv nread= ithin= 0 rmesh= isfcalc= 0 ndata= ntask= 1 1:READ_BUFRTOVS: file=amsuabufr type=amsua sis=amsua_n15 nread= ithin= 2 rmesh= isfcalc= 0 ndata= ntask= 1 0:READ_BUFRTOVS: file=hirs4bufr type=hirs4 sis=hirs4_metop-a nread= ithin= 1 rmesh= isfcalc= 0 ndata= 1178 ntask= 1 Next partition observations into subdomains. The observation distribution is summarized below by listing the number of observations of each observed variable in all subdomains. (see Section 4.4 for more information): 3:OBS_PARA: ps :OBS_PARA: t :OBS_PARA: q :OBS_PARA: uv :OBS_PARA: sst :OBS_PARA: pw :OBS_PARA: hirs4 metop-a :OBS_PARA: amsua n :OBS_PARA: amsua n :OBS_PARA: amsua metop-a :OBS_PARA: amsub n Give information on ingesting background error statistics: 0: m_berror_stats_reg::berror_read_bal_reg(prebal_reg): get balance variables"berror_stats". mype,nsigstat,nlatstat = : m_berror_stats_reg::berror_read_wgt_reg(prewgt_reg): read error amplitudes "berror_stats". mype,nsigstat,nlatstat = : Assigned default statistics to variable q 0: Assigned default statistics to variable oz Notice that the output displayed below has repeated entries. This is because the information is written from inside the outer loop. Typically the outer loop is iterated twice. 45

51 GSI Diagnostics and Tuning For each outer loop, the work begins by the calculation of the observation innovation. This calculation is done by the subroutine setuprhsall, which sets up the right hand side (rhs) of the analysis equation. This information is contained in stdout file, which is discussed in the following sections: Start the first outer analysis loop: 0: GLBSOI: jiter,jiterstart,jiterlast,jiterend= Reading in CRTM coefficients for first outer loop and compute observation innovation and save diagnostic information in the first outer loop. In stdout, only information related to radiance data are printed out. The complete information can be found in diagnostic files for each observation (for details see Appendix A.2): 0: INIT_CRTM: crtm_init() on path "./" 0: Read_SpcCoeff_Binary(INFORMATION) : FILE:./amsua_n15.SpcCoeff.bin; ^M 0: SpcCoeff RELEASE.VERSION: 7.03 N_CHANNELS=15^M 0: amsua_n15 AntCorr RELEASE.VERSION: 1.04 N_FOVS=30 N_CHANNELS=15 0: Read_ODPS_Binary(INFORMATION) : FILE:./amsua_n15.TauCoeff.bin; ^M 0: ODPS RELEASE.VERSION: 2.01 N_LAYERS=100 N_COMPONENTS=2 N_ABSORBERS=1 N_CHANNELS=15 N_COEFFS= : Read_EmisCoeff_Binary(INFORMATION) : FILE:./EmisCoeff.bin; ^M 0: EmisCoeff RELEASE.VERSION: 2.02 N_ANGLES= 16 N_FREQUENCIES= 2223 N_WIND_SPEEDS= 11 0: SETUPRAD: write header record for amsua_n to file dir.0000/amsua_n15_ : INIT_CRTM: crtm_init() on path "./" 0: EmisCoeff RELEASE.VERSION: 2.02 N_ANGLES= 16 N_FREQUENCIES= 2223 N_WIND_SPEEDS= 11 0: SETUPRAD: write header record for amsua_n to file dir.0000/amsua_n18_ : GENSTATS_GPS: no profiles to process (nprof_gfs= 0 ), EXIT routine 0: obsdiags: Bytes per element= 91 0: obsdiags: length total, used= : obsdiags: Estimated memory usage= 7.0 Mb The inner iteration of the first outer loop is discussed in the example below. In this simple example, the maximum number of iterations is 50: Print Jo components (observation term for each observation type) at the beginning of the inner loop: 0: Begin Jo table outer loop 0: Observation Type Nobs Jo Jo/n 0:surface pressure E :temperature E :wind E :moisture E :radiance E : Nobs Jo Jo/n 0: Jo Global E : End Jo table outer loop Print cost function values for each inner iteration (see section 4.6 for more details): 0: GLBSOI: START pcgsoi jiter= 1 0:pcgsoi: gnorm(1:2),b= E E E+00 0: stprat : stprat E-12 46

52 GSI Diagnostics and Tuning 0:Initial cost function = E+05 0:Initial gradient norm = E+03 0: Minimization iteration 0 0:grepcost J,Jb,Jo,Jc,Jl = E E E E E+00 0:grepgrad grad,reduction= E E+00 0:pcgsoi: cost,grad,step = E E E-03 0:pcgsoi: gnorm(1:2),b= E E E-01 0: stprat : stprat E-14 0: Minimization iteration 1 0:grepcost J,Jb,Jo,Jc,Jl = E E E E E+00 0:grepgrad grad,reduction= E E-01 0:pcgsoi: cost,grad,step = E E E-03 0:pcgsoi: gnorm(1:2),b= E E E-01 0: stprat : stprat E-14 0: Minimization iteration 50 0:grepcost J,Jb,Jo,Jc,Jl = E E E E E+00 0:grepgrad grad,reduction= E E-04 0:pcgsoi: cost,grad,step = E E E-02 0: Minimization final diagnostics At the end of the 1 st outer loop, print some diagnostics about the guess fields after adding analysis increment to guess: 0: pcgsoi: Updating guess 0:================================================================================ 0:Status Var Mean Min Max 0:analysis U E E E+01 0:analysis V E E E+01 0:analysis TV E E E+02 0:analysis Q E E E-02 0:analysis TSEN E E E+02 0:analysis OZ E E E+00 0:analysis CW E E E+00 0:analysis DIV E E E+00 0:analysis VOR E E E+00 0:analysis PRSL E E E+02 0:analysis PS E E E+02 0:analysis SST E E E+02 0:analysis radb E E E+01 0:analysis pcpb E E E+00 0:================================================================================ Show some information on the control variable array allocation after cleaning up major fields at the end of inner loop. 0: state_vectors: latlon11,latlon1n,latlon1n1,lat2,lon2,nsig= : state_vectors: length= : state_vectors: currently allocated= 0 0: state_vectors: maximum allocated= 2 0: state_vectors: number of allocates= 2 0: state_vectors: number of deallocates= 2 0:state_vectors: Estimated max memory used= Mb 0: Writing control vector to file xhatsave : Norm xhatsave=

53 GSI Diagnostics and Tuning Now, start the second outer loop. 0: GLBSOI: jiter,jiterstart,jiterlast,jiterend= Read in CRTM coefficients from fix file for the second outer loop and save diagnostic information at the beginning of the second outer loop: 0: INIT_CRTM: crtm_init() on path "./" 0: Read_SpcCoeff_Binary(INFORMATION) : FILE:./amsua_n15.SpcCoeff.bin; ^M 0: SpcCoeff RELEASE.VERSION: 7.03 N_CHANNELS=15^M 0: amsua_n15 AntCorr RELEASE.VERSION: 1.04 N_FOVS=30 N_CHANNELS=15 0: Read_ODPS_Binary(INFORMATION) : FILE:./amsua_n15.TauCoeff.bin; ^M 0: ODPS RELEASE.VERSION: 2.01 N_LAYERS=100 N_COMPONENTS=2 N_ABSORBERS=1 N_CHANNELS=15 N_COEFFS= : Read_EmisCoeff_Binary(INFORMATION) : FILE:./EmisCoeff.bin; ^M 0: EmisCoeff RELEASE.VERSION: 2.02 N_ANGLES= 16 N_FREQUENCIES= 2223 N_WIND_SPEEDS= 11 2: INIT_CRTM: crtm_init() on path "./" 2: Read_SpcCoeff_Binary(INFORMATION) : FILE:./hirs4_metop-a.SpcCoeff.bin; ^M 2: SpcCoeff RELEASE.VERSION: 7.02 N_CHANNELS=19 2: Read_ODPS_Binary(INFORMATION) : FILE:./hirs4_metop-a.TauCoeff.bin; ^M 2: ODPS RELEASE.VERSION: 2.01 N_LAYERS=100 N_COMPONENTS=5 N_ABSORBERS=3 N_CHANNELS=19 N_COEFFS= : Read_EmisCoeff_Binary(INFORMATION) : FILE:./EmisCoeff.bin; ^M 2: EmisCoeff RELEASE.VERSION: 2.02 N_ANGLES= 16 N_FREQUENCIES= 2223 N_WIND_SPEEDS= 11 2: INIT_CRTM: crtm_init() on path "./" 2: Read_SpcCoeff_Binary(INFORMATION) : FILE:./amsua_metop-a.SpcCoeff.bin; ^M 2: SpcCoeff RELEASE.VERSION: 7.03 N_CHANNELS=15^M 2: amsua_metop-a AntCorr RELEASE.VERSION: 1.04 N_FOVS=30 N_CHANNELS=15 2: Read_ODPS_Binary(INFORMATION) : FILE:./amsua_metop-a.TauCoeff.bin; ^M 2: ODPS RELEASE.VERSION: 2.01 N_LAYERS=100 N_COMPONENTS=2 N_ABSORBERS=1 N_CHANNELS=15 N_COEFFS= : Read_EmisCoeff_Binary(INFORMATION) : FILE:./EmisCoeff.bin; ^M 2: EmisCoeff RELEASE.VERSION: 2.02 N_ANGLES= 16 N_FREQUENCIES= 2223 N_WIND_SPEEDS= 11 0: INIT_CRTM: crtm_init() on path "./" 0: Read_SpcCoeff_Binary(INFORMATION) : FILE:./amsua_n18.SpcCoeff.bin; ^M 0: SpcCoeff RELEASE.VERSION: 7.03 N_CHANNELS=15^M 0: amsua_n18 AntCorr RELEASE.VERSION: 1.04 N_FOVS=30 N_CHANNELS=15 0: Read_ODPS_Binary(INFORMATION) : FILE:./amsua_n18.TauCoeff.bin; ^M 0: ODPS RELEASE.VERSION: 2.01 N_LAYERS=100 N_COMPONENTS=2 N_ABSORBERS=1 N_CHANNELS=15 N_COEFFS= : Read_EmisCoeff_Binary(INFORMATION) : FILE:./EmisCoeff.bin; ^M 0: EmisCoeff RELEASE.VERSION: 2.02 N_ANGLES= 16 N_FREQUENCIES= 2223 N_WIND_SPEEDS= 11 0: GENSTATS_GPS: no profiles to process (nprof_gfs= 0 ), EXIT routine 0: obsdiags: Bytes per element= 91 0: obsdiags: length total, used= :obsdiags: Estimated memory usage= 7.0 Mb The output of the inner iterations in the second outer loop are shown below. In this example, the maximum number of iterations is 50: Print Jo components (observation term for each observation type) at the beginning of the inner loop: 0: Begin Jo table outer loop 0: Observation Type Nobs Jo Jo/n 0:surface pressure E :temperature E :wind E :moisture E :radiance E : Nobs Jo Jo/n 0: Jo Global E : End Jo table outer loop 48

54 GSI Diagnostics and Tuning Print cost function values for each inner iteration (see section 3.6 for more details): 0: GLBSOI: START pcgsoi jiter= 2 0:pcgsoi: gnorm(1:2),b= E E E+00 0: stprat : stprat E-13 0:Initial cost function = E+04 0:Initial gradient norm = E+03 0: Minimization iteration 0 0:grepcost J,Jb,Jo,Jc,Jl = E E E E E+00 0:grepgrad grad,reduction= E E+00 0:pcgsoi: cost,grad,step = E E E-03 0:pcgsoi: gnorm(1:2),b= E E E-01 0: stprat : stprat E-15 0: Minimization iteration 1 0:grepcost J,Jb,Jo,Jc,Jl = E E E E E+00 0:grepgrad grad,reduction= E E-01 0:pcgsoi: cost,grad,step = E E E-03 0:pcgsoi: gnorm(1:2),b= E E E-01 0: stprat : stprat E-15 0: Minimization iteration 2 0: Minimization iteration 50 0:grepcost J,Jb,Jo,Jc,Jl = E E E E E+00 0:grepgrad grad,reduction= E E-03 0:pcgsoi: cost,grad,step = E E E-02 0: Minimization final diagnostics Save analysis results. Again, only part of variable T is shown and all other variables are listed according to variable name in NetCDF file (rmse_var = ). The maximum and minimum valuses are useful information for a quick check of reasonableness of the analysis: 0: pcgsoi: Updating guess 0: at 2 in wrwrfmassa 0: at 3 in wrwrfmassa 0: at 6 in wrwrfmassa 0: iy,m,d,h,m,s= : nlon,lat,sig_regional= : rmse_var=p_top 0: ordering=0 0: WrfType,WRF_REAL= : end_index1= : max,min MUB= : max,min psfc= : max,min MU= : rmse_var=mu 0: ordering=xy 0: WrfType,WRF_REAL= : ndim1= 2 0: staggering= N/A 0: start_index= : end_index1= : k,max,min,mid T= : k,max,min,mid T=

55 GSI Diagnostics and Tuning 0: k,max,min,mid T= : k,max,min,mid T= : rmse_var=t 0: rmse_var=qvapor 0: rmse_var=u 0: rmse_var=v 0: rmse_var=seaice 0: rmse_var=sst 0: rmse_var=tsk Print some diagnostics about the analysis results after adding analysis increment to guess: 0:================================================================================ 0:Status Var Mean Min Max 0:analysis U E E E+01 0:analysis V E E E+01 0:analysis TV E E E+02 0:analysis Q E E E-02 0:analysis TSEN E E E+02 0:analysis OZ E E E+00 0:analysis CW E E E+00 0:analysis DIV E E E+00 0:analysis VOR E E E+00 0:analysis PRSL E E E+02 0:analysis PS E E E+02 0:analysis SST E E E+02 0:analysis radb E E E+01 0:analysis pcpb E E E+00 0:================================================================================ Show some information on control variable allocation after cleaning up major fields at the end of the 2 nd outer loop. 0: state_vectors: latlon11,latlon1n,latlon1n1,lat2,lon2,nsig= : state_vectors: length= : state_vectors: currently allocated= 0 0: state_vectors: maximum allocated= 2 0: state_vectors: number of allocates= 4 0: state_vectors: number of deallocates= 4 0:state_vectors: Estimated max memory used= Mb 0: Writing control vector to file xhatsave : Norm xhatsave= After the completion of the analysis, the subroutine setuprhsall is called again if write_diag(3)=.true.,to calculate analysis O-A information. Because CRTM initialization is inside setuprhsall, we can see the section of reading in CRTM coefficients (the third time to see this information): 0: INIT_CRTM: crtm_init() on path "./" 0: Read_SpcCoeff_Binary(INFORMATION) : FILE:./amsua_n15.SpcCoeff.bin; ^M 0: SpcCoeff RELEASE.VERSION: 7.03 N_CHANNELS=15^M 0: amsua_n15 AntCorr RELEASE.VERSION: 1.04 N_FOVS=30 N_CHANNELS=15 0: Read_ODPS_Binary(INFORMATION) : FILE:./amsua_n15.TauCoeff.bin; ^M 0: ODPS RELEASE.VERSION: 2.01 N_LAYERS=100 N_COMPONENTS=2 N_ABSORBERS=1 N_CHANNELS=15 N_COEFFS= : Read_EmisCoeff_Binary(INFORMATION) : FILE:./EmisCoeff.bin; ^M 50

56 GSI Diagnostics and Tuning 0: EmisCoeff RELEASE.VERSION: 2.02 N_ANGLES= 16 N_FREQUENCIES= 2223 N_WIND_SPEEDS= 11 0: SETUPRAD: write header record for amsua_n to file dir.0000/amsua_n15_ : INIT_CRTM: crtm_init() on path "./" 0: Read_EmisCoeff_Binary(INFORMATION) : FILE:./EmisCoeff.bin; ^M 0: EmisCoeff RELEASE.VERSION: 2.02 N_ANGLES= 16 N_FREQUENCIES= 2223 N_WIND_SPEEDS= 11 0: SETUPRAD: write header record for amsua_n to file dir.0000/amsua_n18_ : GENSTATS_GPS: no profiles to process (nprof_gfs= 0 ), EXIT routine 0: obsdiags: Bytes per element= 91 0: obsdiags: length total, used= :obsdiags: Estimated memory usage= 7.0 Mb Print Jo components (observation term for each observation type) after analysis, which shows the fit of the analysis results to data: 0: Begin Jo table outer loop 0: Observation Type Nobs Jo Jo/n 0:surface pressure E :temperature E :wind E :moisture E :radiance E : Nobs Jo Jo/n 0: Jo Global E : End Jo table outer loop The end of the GSI analysis (a successful analysis must reach this end, but to reach this end is not neccessarily a successful analysis): 0: ENDING DATE-TIME MAR 29, :18: TUE : PROGRAM GSI_ANL HAS ENDED. IBM RS/6000 SP 0:*. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. For runs on IBM computers only, additional machine resource statistics are provided. This section gives very useful information about the computer resources used in the analysis. Linux computer platforms do not provide this sort of information. The user must look for other ways to find the time and memory information about the analysis run. 0:*****************RESOURCE STATISTICS******************************* 0: 0:The total amount of wall time = :The total amount of time in user mode = :The total amount of time in sys mode = :The maximum resident set size (KB) = :Average shared memory use in text segment (KB*sec) = :Average unshared memory use in data segment (KB*sec) = :Average unshared memory use in stack segment(kb*sec) = 0 0:Number of page faults without I/O activity = :Number of page faults with I/O activity = 697 0:Number of times process was swapped out = 0 0:Number of times filesystem performed INPUT = 0 0:Number of times filesystem performed OUTPUT = 0 0:Number of IPC messages sent = 0 0:Number of IPC messages received = 0 0:Number of Signals delivered = 0 0:Number of Voluntary Context Switches = :Number of InVoluntary Context Switches = 596 0:*****************END OF RESOURCE STATISTICS************************* 0: 51

57 GSI Diagnostics and Tuning 4.2 Single Observation Test A single observation test is a GSI run with only one (pseudo) observation at a specific location of the analysis domain. By examining the analysis increments, one can visualize the important features of the analysis, such as the ratio of background and observation variance and the pattern of background error covariance. So, the single observation test is the first check that users should do after successfully installing the GSI Setup a single observation test: To perform the single observation test with the GSI, the following GSI namelist variables need to be set, which should be done through editing the run script: under &SETUP section, turn on the single observation test: oneobtest=.true., under &SINGLEOB_TEST section, set up single observation features like: maginnov=1.0, magoberr=1.0, oneob_type='t', oblat=20., oblon=285., obpres=850., obdattim= , obhourset=0., Note: Please check Section 3.4 for the explanation of each parameter. From these parameters, we can see that a useful observation in the analysis should include information like the observation type (oneob_type), value (maginnov), error (magoberr), location (oblat, oblong, obpres) and time (obdattim, obhourset). In the analysis, the GSI first generates a prepbufr file including only one observation based on the information given in the namelist &SINGLEOB_TEST section. To generate this prepbufr file, the GSI needs to read in a PrepBUFR table, which is not needed in case of a GSI analysis with real observations. The table can be found in the fix/ directory and needs to be copied to the run directory. Please check if the following lines are in the GSI run script before running the single observation test: bufrtable=${fix_root}/prepobs_prep.bufrtable cp $bufrtable./prepobs_prep.bufrtable 52

58 GSI Diagnostics and Tuning Examples of single observation tests for GSI To give users a taste of the single observation test, a single temperature observation (oneob_type='t') with 1 degree innovation (maginnov=1.0) and 1 degree observation error (magoberr=1.0) was used to perform the test. Horizontal cross section (left column) and vertical cross section (right column) of analysis increment of T, U, V, and Q from a single T observation 53

59 GSI Diagnostics and Tuning This single observation was located at the center of the domain. The results are shown with figures of the horizontal and vertical cross sections through the point of maximum analysis increment. The figure is generated using NCL scripts, which can be found in the util/analysis_utilities/plots_ncl directory. 4.3 Control Data Usage Observation data used in the GSI analysis are controlled by two parts of the GSI system: 1. In GSI namelist (inside run script), section &OBS_INPUT: In this namelist section, observation files (dfile) are linked to the observed variables used in the GSI analysis (dtype), for example: dfile(01)='prepbufr', dtype(01)='ps', dplat(01)=' ', dsis(01)='ps', dval(01)=1.0, dthin(01)=0, dfile(02)='prepbufr' dtype(02)='t', dplat(02)=' ', dsis(02)='t', dval(02)=1.0, dthin(02)=0, dfile(03)='prepbufr', dtype(03)='q', dplat(03)=' ', dsis(03)='q', dval(03)=1.0, dthin(03)=0, dfile(28)='amsuabufr', dtype(28)='amsua', dplat(28)='n15', dsis(28)='amsua_n15', dval(28)=10.0, dthin(28)=2, dfile(29)='amsuabufr', dtype(29)='amsua', dplat(29)='n16', dsis(29)='amsua_n16', dval(29)=0.0, dthin(29)=2, Here, conventional observations ps, t, and q will be read in from file prepbufr and AMSU-A radiances from NOAA 15 and 16 satellites will be read in from amsuabufr. Deleting these lines in &OBS_INPUT or making the observation files not available will turn off the use of these observation data in the GSI analysis. GSI has some default observation filenames for different observations (like prepbufr for conventional observations). All observation files need to link to the working directory with the default names. Usually, a run script does these links after locating the working directory and turning on or off these links can also control the use of data. Please see Section 3.1 for a list of default observation file names in the GSI and the corresponding observations. 2. Use info files For each variable, observations can come from multiple platforms (data types or observation instruments). For example, surface pressure (ps) can come from METAR observation station (data type 187) and Rawinsonde station (data type 120). There are several files named like *info in the GSI system (/fix) to control the usage of observations based on data type. Here is a list of info files and their function: 54

60 GSI Diagnostics and Tuning File name in GSI convinfo satinfo ozinfo pcpinfo aeroinfo Function and Content Control the usage of conventional data, including tcp, ps, t, q, pw, sst, uv, spd, dw, radial wind (Level 2 rw and 2.5 srw), gps_ref, gps_bnd Control the usage of satellite data. Instruments include amsua, amsub, hirs3, hirs4, mhs, ssmi, ssmis, iasi616, amsrem sndr, etc. and satellites include NOAA 16, 17, 18, aqua, GOES 11, 12, 13, METOP-A, etc. Control the usage of ozone data, including sbuv6, 8 from NOAA 14, 16, 17, 18, 19. omi_aura, gome_metop-a Control the usage of precipitation data, including pcp_ssmi, pcp_tmi Control the usage of aerosol data, including modis_aqua and modis_terra The head of each info file has an explanation of the content of the file. Here we discuss most often used two info files: convinfo The convinfo is to control the usage of conventional data. The following is the part of the content of convinfo:!otype type sub iuse twindow numgrp ngroup nmiter gross ermax ermin var_b var_pg ithin rmesh pmesh npred tcp ps ps ps ps ps ps ps t t t t t t t t t t t The meaning of each column is explained in the head of the file and listed also in the following table: 55

61 GSI Diagnostics and Tuning Column Content of the column Name otype observation variables (t, uv, q, etc.) type prepbufr observation type (if available) sub prepbufr subtype (not yet available) iuse flag if to use/not use / monitor data =1, use data, the data type will be read and used in the analysis after quality controls =0, read in and process data, use for quality control, but do NOT assimilate =-1, monitor data. This data type will be read in but not be used in the GSI analysis twindow time window (+/- hours) for data used in the analysis numgrp cross validation parameter - number of groups ngroup cross validation parameter - group to remove from data use nmiter cross validation parameter - external iteration to introduce removed data gross gross error parameter - gross error ermax gross error parameter maximum error ermin gross error parameter minimum error var_b variational quality control parameter - b parameter var_pg variational quality control parameter - pg parameter ithin Flag to turn on thinning (0, no thinning, 1 - thinning) rmesh size of thinning mesh (in kilometers) pmesh size of vertical thinning mesh npred Number of bias correction predictors From this table, we can see that iuse is used to control the usage of data and twindow is to control time window of data usage. gross, ermax and ermin are for gross quality control. satinfo: The satinfo file contains information about the channels, sensors, and satellites. It specifies observation error for the channels, how to use the channels (assimilate, monitor, etc), the type of channel (infrared or microwave), and other useful information. The following is the part of the content of satinfo:!sensor/instr/sat chan iuse error ermax var_b var_pg amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n hirs3_n hirs3_n hirs3_n hirs3_n

62 GSI Diagnostics and Tuning The meaning of each column is explained in the head of the file and listed also in the following table: Column Name sensor/ instr/sat chan iuse error ermax var_b var_pg Content of the column Sensor, instrument, and satellite name Channel number for certain senor = 1, use this channel data =-1, don t use this channel data Variance for each satrellite channel Error maximum for gross check to observations Possible range of variable for gross errors Probability of gross error 4.4 Domain Partition for Parallelization and Observation Distribution In the standard output file (stdout), there is an information block that lists information regarding each sub-domain partition, including domain number (task), start point (istart,jstart), and dimension of the sub-domain (ilat1,jlon1). Here is an example from the case we showed in section 4.1. Please note that 4 processors were used in the analysis: 0:general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= :general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= :general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= :general_DETER_SUBDOMAIN: task,istart,jstart,ilat1,jlon1= The standard output file (stdout) also has an information block that shows the distribution of different kinds of observations in each sub-domain. This block follows the observation input section and right before radiance initialization section. For example, the following is the observation distribution of the case shown in section 4.1 (12 UTC 22 March 2011) using 4 processors. From the case introduction, we know the prepbufr (conventional data) and radiance bufr files of amsua, amsub, and hirs were used. So in this list, we can see that the conventional observation (ps, t, q, uv, sst and pw) and radiance data (amusa, amsub, and hirs from Metop-a, NOAA 15, 17 and 18) were distributed among 4 sub-domains: Observation type number of observations in each subdomain 3:OBS_PARA: ps :OBS_PARA: t :OBS_PARA: q :OBS_PARA: uv :OBS_PARA: sst :OBS_PARA: pw :OBS_PARA: hirs4 metop-a :OBS_PARA: amsua n :OBS_PARA: amsua n :OBS_PARA: amsua metop-a :OBS_PARA: amsub n

63 GSI Diagnostics and Tuning This list is a good way to quickly check which kinds of data are used in the analysis and how they are distributed in the analysis domain. 4.5 Observation Innovation Statistics After the GSI analysis, there are a group of files named fort.2* (other than fort.220, see explanation on fort.220 in next section) containing information on observations fitting to current solution in each outer loop. The content of each of these files is listed in the following table: File name Variables in file Ranges/units fort.201 or fit of surface pressure data mb fit_p1.analysis_time fort.202 or fit of wind data m/s fit_w1.analysis_time fort.203 or fit of temperature data K fit_t1.analysis_time fort.204 or fit_q1.analysis_time fit of q data, percent of guess q saturation fort.205 current fit of precipitation water data mm fort.206 fit of ozone observations from sbuv6_n14 (, _n16, _n17, _n18) and sbuv8_n16 (, _n17, _n18, _n19) fort.207 or fit_rad1.analysis_time fit of satellite data including: amsua_n15(, n16, n17, n18, metop-a, aqua, n19), amsub_n15(, n16, n17,), hirs3_n16(, n17), hirs4_n18 (, metop-a, n19), mhs_n18(, metop-a, n19), hirs2_n14, iasi616_metop-a, airs281subset_aqua, avhrr3_n16(, n17, n18), amsre_aqua, ssmi_f13(, f14, f15), ssmis_f16, msu_n14, sndr_g11(, g12, g13), imgr_g11(, g12, g13), sndrd1(, 2, 3, 4)_g11(, g12,g13), fort.208 pcp_ssmi, pcp_tmi fort.209 rw fort.210 dw fort.211 srw1, srw2 fort.212 GPS RO fort.213 fit of conventional sst data C fort.214 Tropical cyclone central pressure fort.215 Lagrangian data To help users understand the information inside these files, the following are some examples of the contents in these files and corresponding explanations. 58

64 GSI Diagnostics and Tuning Conventional observations Example of files including single level data (fort.201, fort.205, fort.213) pressure levels (hpa)= it obs type stype count bias rms cpen qcpen o-g 01 ps o-g 01 ps o-g 01 ps o-g 01 ps o-g 01 all o-g 01 ps rej o-g 01 ps rej o-g 01 ps rej o-g 01 ps rej o-g 01 ps rej o-g 01 rej all o-g 01 ps mon o-g 01 ps mon o-g 01 ps mon o-g 01 ps mon o-g 01 ps mon o-g 01 mon all Example of files including multiple level data (fort.202, fort.203, fort.204) ptop it obs type styp pbot o-g 01 uv count o-g 01 uv bias o-g 01 uv rms o-g 01 uv cpen o-g 01 uv qcpen o-g 01 uv count o-g 01 uv bias o-g 01 uv rms o-g 01 uv cpen o-g 01 uv qcpen o-g 01 all count o-g 01 all bias o-g 01 all rms o-g 01 all cpen o-g 01 all qcpen o-g 01 uv rej count o-g 01 uv rej bias o-g 01 uv rej rms o-g 01 uv rej cpen o-g 01 uv rej qcpen o-g 01 uv rej count o-g 01 uv rej bias o-g 01 uv rej rms o-g 01 uv rej cpen o-g 01 uv rej qcpen o-g 01 rej all count o-g 01 rej all bias o-g 01 rej all rms o-g 01 rej all cpen o-g 01 rej all qcpen o-g 01 uv mon count o-g 01 uv mon bias o-g 01 uv mon rms

65 GSI Diagnostics and Tuning o-g 01 uv mon cpen o-g 01 uv mon qcpen Please note we delete layers from to hpa to make each row fit into one line. We only show observations 220 and 223 as an example. The following table lists the meaning of the information in file fort except file fort.207: Name it obs type styp ptop pbot count bias rms cpen qcpen explanation outer loop = 01: observation background = final outer loop + 1: observation analysis fields observation variable (such as uv, ps) and usage of the type, which include: blank: used in GSI analysis mon: monitored, (read in but not assimilated by GSI). rej: rejected because of quality control in GSI observation type (see Section 82. For details) observation subtype (not used now) for multiple level data: pressure at the top of the layer for multiple level data: pressure at the bottom of the layer The number of observations in terms of time, domain, and type Bias of observation departure for each outer loop (it) Root Mean Square of departure for each outer loop (it) Observation part of penalty (cost function) nonlinear qc penalty The contents of the file fort.2* are calculated based on O-B or O-A for each observation. This detailed information about each observation is saved in diagnostic files. For the content of the diagnostic files, please check the content of array rdiagbuf in one of the setup subroutines for conventional data, for example, setupt.f90. We provide a tool in appendix A.2 to help users read in the information in the diagnostic files Satellite radiance The file fort.207 is statistic file for radiance data. It contents important information about the radiance data analysis. The first part of the file fort.207 lists the content of file radinfo, which is the info file to control the data usage for radiance data. This part starts from a line like: RADINFO_READ: jpch_rad=

66 GSI Diagnostics and Tuning We can see there are 1577 channels listed in the radinfo file and the 1577 lines following this line show the detailed setup for each channel. The 2nd part of the file is a list of mass bias correction coefficients, which starts from a line like: RADINFO_READ: guess air mass bias correction coefficients below The 1577 channels can be used in the GSI analysis, therefore, there are the same number of mass bias correction coefficients for each channel though some of the coefficients are 0. The 3rd part of the fort.207 is like other fit files that content 3 repeated sections to give the detailed statistic information about the data in stages before 1st outer loop, between 1st and 2nd outer loop, and after 2nd out loop. We will use the results before the 1st outer loop as examples to explain the content of the statistic results: Summaries for various statistics as a function of observation type sat type penalty nobs iland isnoice icoast ireduce ivarl nlgross metop-a hirs qcpenalty qc1 qc2 qc3 qc4 qc5 qc6 qc sat type penalty nobs iland isnoice icoast ireduce ivarl nlgross n15 amsua qcpenalty qc1 qc2 qc3 qc4 qc5 qc6 qc rad total penalty_all= rad total qcpenalty_all= rad total failed nonlinqc= 0 The following table lists the meaning of the information in above statistics: Name sat type penalty nobs iland isnoice icoast ireduce ivarl nlgross qcpenalty qc1-7 explanation satellite name observation type cost function contribution from this observation type number of good observations used in the assimilation number of observations over land number of observations over sea ice and snow number of observations over coast number of observations that reduce qc bounds in tropics number of observations tossed by gross check number of observation tossed by nonlinear qc nolinear qc penalty from this data type number of observations whose quality control criteria has been adjusted by each qc method (1-7) 61

67 GSI Diagnostics and Tuning rad total penalty_all rad total qcpenalty_all rad total failed nonlinqc summary of penalty for all radiance observation types summary of qcpenalty for all radiance observation types summary of observation tossed by nonlinear qc for all radiance observation types Note: one observation may include multiple channels, not all channels are used in the analysis. Summaries for various statistics as a function of channel amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n hirs4_metop-a hirs4_metop-a hirs4_metop-a hirs4_metop-a hirs4_metop-a hirs4_metop-a hirs4_metop-a hirs4_metop-a hirs4_metop-a The following table lists the meaning of the information in above statistics: Column content # 1 series number 2 channel number for certain radiance observation type 3 radiance observation type (for example: amsua_n15) 4 number of good observation (nobs) used in GSI analysis within this channel 5 number of good observation (nobs) tossed by gross check within this channel 6 variance for each satellite channel 7 bias (observation-guess before bias correction) 8 bias (observation-guess after bias correction) 9 penalty contribution from this channel 10 (observation-guess with bias correction)**2 11 standard deviation 62

68 GSI Diagnostics and Tuning Final summary for each observation type it satellite instrument # read # keep # assim penalty qcpnlty cpen qccpen o-g 01 rad n18 hirs o-g 01 rad metop-a hirs o-g 01 rad n15 amsua o-g 01 rad n16 amsua o-g 01 rad n17 amsua o-g 01 rad n18 amsua o-g 01 rad metop-a amsua o-g 01 rad n17 amsub The following table lists the meaning of the information in above statistics: Name Explanation it stage (o-g 01 rad = before 1st outer loop for radiance data) satellite satellite name (n15=noaa-15) instrument instrument name (amsua) # read number of data (channel value) read in with analysis time window and domain # keep number of data (channel value) after data thinning # assim number of data (channel value) used in analysis (passed all qc process) penalty cost function contribution from this observation type qcpnlty nonlinear qc penalty from this data type cpen penalty/ (# assim) qccpen qcpnlty / (# assim) 4.6 Convergence Information There are two ways to check the convergence information for each iteration of the GSI: 1. Standard output file (stdout): The value of the cost function and norm of the gradient for each iteration are listed in the file stdout. Here is an example showing the first two iterations from the first outer loop: 0: Minimization iteration 0 0:grepcost J,Jb,Jo,Jc,Jl = E E E E E+00 0:grepgrad grad,reduction= E E+00 0:pcgsoi: cost,grad,step = E E E-03 0:pcgsoi: gnorm(1:2),b= E E E-01 0: stprat : stprat E-14 0: Minimization iteration 1 0:grepcost J,Jb,Jo,Jc,Jl = E E E E E+00 0:grepgrad grad,reduction= E E-01 63

69 GSI Diagnostics and Tuning 0:pcgsoi: cost,grad,step = E E E-03 0:pcgsoi: gnorm(1:2),b= E E E-01 0: stprat : stprat E-14 Here are the first two iterations from the second outer loop: 0: Minimization iteration 0 0:grepcost J,Jb,Jo,Jc,Jl = E E E E E+00 0:grepgrad grad,reduction= E E+00 0:pcgsoi: cost,grad,step = E E E-03 0:pcgsoi: gnorm(1:2),b= E E E-01 0: stprat : stprat E-15 0: Minimization iteration 1 0:grepcost J,Jb,Jo,Jc,Jl = E E E E E+00 0:grepgrad grad,reduction= E E-01 0:pcgsoi: cost,grad,step = E E E-03 0:pcgsoi: gnorm(1:2),b= E E E-01 0: stprat : stprat E-15 We can see clearly the number of outer loop and the inner loop (Minimization iteration). The meaning of names used in stdout is explained in the following: J,Jb,Jo,Jc,Jl: the values of cost function (J, or penalty), background term (Jb), observations term (Jo), dry pressure constraint term (Jc), and negative and excess moisture term (Jl). grad: inner product of gradients (norm of the gradient (Y*X)) reduction: cost: the values of cost function, (=J) step: stepsize (α) gnorm(1:2): 1=(norm of the gradient) 2, 2= (norm of the gradient) 2 b: parameter to estimate the new search direction stprat: convergence in stepsize estimation 2. Convergence information in file fort.220: In file fort.220, users can find more detailed information about each iterations. We will use the first two iterations as an example to explain the meaning of each value: 1) J= E E E E E E E E E E E E E E E E E E+00 64

70 GSI Diagnostics and Tuning E E E E E E E E+00 2) S= E E E E E E E E E E E E E E E E E E E E E E E E E E+00 3) b= e E E E E E E E E E E E E E E E E E E E E E E E E E+00 4) c= E E E E E E E E E E E E E E E E E E E E E E E E E E+00 5) stepsize estimates = E E E-02 6) stepsize guesses = E E E E E E E-02 7) penalties = E E E E E E E+05 8) penalty,grad,a,b= E E E E+00 9) pnorm,gnorm, step? E E+01 good J= E E E E E E E E E E E E E E E E E E E E E E E E E E+00 S= E E E E E E E E E E E E E E E E E E E E E E E E E E+00 b= e E E+00 65

71 GSI Diagnostics and Tuning E E E E E E E E E E E E E E E E E E E E E E E+00 c= E E E E E E E E E E E E E E E E E E E E E E E E E E+00 stepsize estimates = E E E-02 stepsize guesses = E E E E E E E-02 penalties = E E E E E E E+04 penalty,grad,a,b= E E E E+00 pnorm,gnorm, step? E E+00 good For each inner iterations, there are 9 different outputs. We labeled the 1 st iteration and give a detailed explanation below: 1) 4), detailed information on the cost function (J=), the stpx (S=bpen(i)/cpen(i)), bpen (b=), cpen (c=). There are 26 (5 + number of observation types) items listed in each and the meanings of these items are: 1 contribution from background, satellite radiance bias, and precipitation bias 2 place holder for future linear linear term 3 contribution from dry pressure constraint term (Jc) 4 contribution from negative moisture constraint term (Jl/Jq) 5 contribution from excess moisture term (Jl/Jq) From 6-26 are contributions to Jo from different observation types: 6 contribution from ps observation term 7 contribution from t observation term 8 contribution from w observation term 9 contribution from q observation term 10 contribution from spd observation term 11 contribution from srw observation term 12 contribution from rw observation term 13 contribution from dw observation term 14 contribution from sst observation term 15 contribution from pw observation term 16 contribution from pcp observation term 17 contribution from oz observation term 18 contribution from o3l observation term (not used) 19 contribution from gps observation term 66

72 GSI Diagnostics and Tuning 20 contribution from rad observation term 21 contribution from tcp observation term 22 contribution from lagrangian tracer 23 contribution from carbon monoxide 24 contribution from modis aerosol aod 25 contribution from level modis aero aod 26 contribution from in-situ pm2_5 obs Also, it is suggested that the users check stpcalc.f90 for the code including the above information. 5), information on the step size estimates 6), information on the step size guesses 7), information on the penalties 8) and 9), information on the cost function and gradient which is explained in the following: penalty: the cost function of all observations and background grad: inner product of gradients (norm of the gradient (Y*X)) a: stepsize (α) b: parameter to estimate the new search direction pnorm: gnorm: To evaluate the convergence of the iteration, we usually make some plots based on the above information, such as the value of the cost function and the norm of the gradient. The following is an example of the plots showing the evolution of the cost function and the norm of gradient in different outer loops: Evolution of cost function (left column) and the norm of gradient (right column) in the first outer loop (top raw) and the second outer loop (bottom raw) 67

73 GSI Diagnostics and Tuning 4.7 Analysis Increments Analysis increments are defined as the fields of analysis minus background. A plot of analysis increments can help users to understand how the analysis procedure modifies the background fields according to observations, errors (background and observation), and other constraints. You can either calculate analysis guess and plot difference field or use the tools introduced in Appendix A.4 to check the analysis increments for each observation. 4.8 Running Time and Memory Usage Other than analysis increments, running time and memory usage are other important features of an analysis system, especially for operational code like GSI. Some computer operating systems provide CPU, Wall time, and memory usage after the job has completed but an easy way to determine how much wall time is used for the GSI is to check the time tag between the files generated at the beginning and the end of the run, for example: Wall time of GSI analysis = time of wrfanl time of convinfo For a GSI run using an IBM computer, there is a resource statistics section at the end of stdout file, which gives information about run time and memory usage for the analysis (see example at the end of section 4.1). 68

GSI Applications Chapter 5: GSI Applications In this chapter, the knowledge from the previous chapter will be applied to several GSI cases to show how to setup GSI with various different data

74 GSI Applications Chapter 5: GSI Applications In this chapter, the knowledge from the previous chapter will be applied to several GSI cases to show how to setup GSI with various different data sources, and how to properly check the run status and results in order to determine if a particular GSI application was successful. Note the examples here only use the WRF ARW system global experiments or WRF NMM runs are similar, but require different background and observation files. It is assumed that the reader has successfully compiled GSI on a local machine, and has the following data available: 1. Background file When using WRF, WPS and Real.exe will be run to create a WRF input file: wrfinput_<domain>_<yyyy-mm-dd_hh:mm:ss> 2. Conventional data NAM PREPBUFR data can be obtained from the server: ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/nam/prod Note: NDAS prepbufr data was chosen to increase the amount of data 3. Radiance data and GPS RO data GDAS BUFR files can be obtained from the following server: ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/gfs/prod Note: GDAS data was chosen to get better coverage for radiance and GPS RO GSI Case Study The following case study will give users an example of a successful GSI run with various different data sources. Users are welcome to download these example data from the GSI users webpage (online case for release version 3) or create a new real-time background with observation data from the above server. The background and observations used in this case study is as follows: 1. Background files: wrfinput_d01_ _12:00:00 The horizontal grid spacing is 30-km with 51 vertical sigma levels Figure 5.1: The terrain (left) and land mask (right) of the background used in this case study 69

75 GSI Applications 2. Conventional data: NAM PrepBUFR data from 22 March 2011, 12 UTC. File: nam.t12z.prepbufr.tm00.nr 3. Radiance and GPS RO data: GDAS PREPBUFR data from 22 March 2011, 12 UTC Files: gdas.t12z.1bmaua.tm00.bufr_d gdas.t12z.1bamub.tm00.bufr_d gdas.t12z.1bhrs4.tm00.bufr_d gdas.t12z.gpro.tm00.bufr_d This case study was run on NCAR s Bluefire. We assume the background file is located at: /ptmp/gsi /DTC/NA30km/bk and all observations are located at: /ptmp/gsi /DTC/NA30km/obs Assimilating Conventional Observations with GSI: 5.1.1: Run script With GSI successfully compiled and background and observational data acquired, move to the./run directory under./comgsi_v3 to setup the GSI run following the sample script run_gsi.ksh. To properly run run_gsi.ksh, several steps need to set up: Set up batch queuing system To run GSI with multi-processors, a job queuing head has to be added at the beginning of the run_gsi.ksh script. The set up of the job queue is dependent on the machine and the jobs control system. More examples of the setup are described in section The following example is setup to run on NCAR Bluefire, which is an IBM supercomputer with LSF. The job head is as follows: ####### # set up for IBM AIX ###### ## ## Below (IBM queuing system) commands #BSUB -P??????? #BSUB -a poe #BSUB -x # exclusive use of node (not_shared) #BSUB -n 4 # number of total tasks #BSUB -R "span[ptile=2]" # how many tasks per node (up to 8) #BSUB -J gsi # job name #BSUB -o gsi.out # output filename (%J to add job id) 70

76 GSI Applications #BSUB -e gsi.err #BSUB -W 00:02 #BSUB -q regular # error filename # queue In order to increase the run speed of GSI on the IBM supercomputer, the following environmental configurations were made: set -x # Set environment variables for IBM export MP_SHARED_MEMORY=yes export MEMORY_AFFINITY=MCM export BIND_TASKS=yes # Set environment variables for threads export SPINLOOPTIME=10000 export YIELDLOOPTIME=40000 export AIXTHREAD_SCOPE=S export MALLOCMULTIHEAP=true export XLSMPOPTS="parthds=1:spins=0:yields=0:stack= " # Set environment variables for user preferences export XLFRTEOPTS="nlwidth=80" export MP_LABELIO=yes In order to find out how to setup the job head, a good method is to use an existing MPI job script and copy the job head over. Setup the number of processors and the job queue system used. For this example, IBM_LSF and 4 processors were used: GSIPROC=4 ARCH='IBM_LSF' Setup the case data, analysis time, and GSI fix, exe, and CRTM coefficients: Setup analysis time: ANAL_TIME= Setup a working directory, which will hold all the analysis results. This directory must have correct write permissions, as well as enough space to hold the output. Also, for the following example, directory /ptmp/test/ has to exist. WORK_ROOT=/ptmp/test/gsiprd_${ANAL_TIME}_prepbufr 71

77 GSI Applications Set path to the background file: BK_FILE=/ptmp/GSI/data/DTC/NA30km/bk/wrfinput_d01_ _12:00:00 Set path to the observation directory and the PrepBUFR file within the observation directory. All observations to be assimilated should in observation directory. OBS_ROOT=/ptmp/GSI/data/DTC/NA30km/obs PREPBUFR=/ptmp/GSI/data/DTC/NA30km/obs /nam.t12z.prepbufr.tm00.nr Set the GSI system used for this case: the fix and CRTM coefficients directories as well as the location of the GSI executable: FIX_ROOT=/blhome/GSI/comGSI_v3Beta/fix CRTM_ROOT=/ptmp/GSI/CRTM/CRTM_Coefficients GSI_EXE=/blhome/GSI/comGSI_v3Beta/run/gsi.exe Set which background and background error file to use: bk_core=arw bkcv_option=nam if_clean=clean This example uses the ARW NetCDF background; therefore bk_core is set to ARW. The regional background error covariance files were also used in this case. Finally, the run scripts are set to clean the run directory to delete all temporary intermediate files : Run GSI and check the run status Once the run scripts are set up properly for the case and machine, GSI can be run through the run scripts. On NCAR Bluefire, the GSI run is submitted as follows: be1005en% bsub < run_gsi.ksh To check the Job status: be1105en% bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME user RUN regular be1005en 2*be0108en gsi Mar 18 14:57 2*be0307en The above output shows that the job is running. Move to the working directory and check the details. Because we have the following working directory setup: WORK_ROOT=/ptmp/test/gsiprd_${ANAL_TIME}_prepbufr 72

78 GSI Applications Go to directory /ptmp/test to check the GSI run directory. A directory named./gsiprd_ _prepbufr should have been created. This directory is the run directory for this GSI case study. While GSI is still running, the contents of this directory should include: imgr_g12.taucoeff.bin imgr_g13.spccoeff.bin imgr_g13.taucoeff.bin ssmi_f15.spccoeff.bin ssmi_f15.taucoeff.bin ssmis_f16.spccoeff.bin They are CRTM coefficients that have been linked to this run directory through the GSI run scripts. Additionally, many other files are linked or copied to this run directory or generated during run, such as: stdout: wrf_inout: gsiparm.anl: prepbufr: convinfo: berror_stats: errtable: standard out file background file GSI namelist PrepBUFR file for conventional observation data usage control for conventional data background error file observation error file The presence of these files indicates that the GSI run scripts have successfully setup a run environment for GSI and the GSI executable is running. While GSI is still running, checking the content of the standard output file (stdout) can monitor the stage of the GSI analysis: be1105en% tail -f stdout 0:grepcost J,Jb,Jo,Jc,Jl = E E E E E+00 0:grepgrad grad,reduction= E E-01 0:pcgsoi: cost,grad,step = E E E-02 0:pcgsoi: gnorm(1:2),b= E E E-01 0: stprat : stprat E-15 The above output lines show that GSI is in the inner iteration stage. It may take several minutes to finish the GSI run. Once GSI has finished running, the number of files in the directory will be greatly reduced from those during the run stage. This is because the run script was set to clean the run directory after a successful run. The important analysis result files and configuration files will remain in the run directory. Please check Section

79 GSI Applications for more details on GSI run results. Upon successful completion of GSI, the run directory looks as follows: anavinfo fort.204 l2rwbufr berror_stats fort.205 ozinfo convinfo fort.206 pcpbias_out dir.0000 fort.207 pcpinfo dir.0001 fort.208 prepbufr dir.0002 fort.209 prepobs_prep.bufrtable dir.0003 fort.210 satbias_angle errtable fort.211 satbias_in fit_p fort.212 satbias_out fit_q fort.213 satinfo fit_rad fort.214 stdout fit_t fort.215 stdout.anl fit_w fort.217 wrf_inout fort.201 fort.220 wrfanl fort.202 gsi.exe fort.203 gsiparm.anl 5.1.3: Check for successful GSI completion It is important to always check for successful completion of the GSI analysis. Completion of the GSI run without crashing does not guarantee a successful analysis. First, check the stdout file in the run directory to be sure GSI completed each step without any obvious problems. The following are several important steps to check: 1. Read in the namelist The following lines show GSI started normal and has read in the namelist: 0: SETUP_4DVAR: lcongrad= F 0: SETUP_4DVAR: lbfgsmin= F 0: SETUP_4DVAR: ltlint= F 0: SETUP_4DVAR: ladtest,lgrtest= F F 0: SETUP_4DVAR: lwrtinc= F 0: SETUP_4DVAR: lanczosave= F 2. Read in the background field The following lines immediately following the namelist section shows that GSI is reading the background fields. Checking the range of the max and min values will indicate if certain background fields are normal. 0: end_index= : max,min XLAT(:,1)= : max,min XLAT(1,:)= : xlat(1,1),xlat(nlon,1)= : xlat(1,nlat),xlat(nlon,nlat)=

80 GSI Applications 0: rmse_var=xlong... 0: rmse_var=u 0: ordering=xyz 0: WrfType,WRF_REAL= : ndim1= 3 0: staggering= N/A 0: start_index= : end_index= : k,max,min,mid U= : k,max,min,mid U= : k,max,min,mid U= : k,max,min,mid U= : k,max,min,mid U= Read in observational data Skipping through a majority of the content towards the middle of the stdout file, the following lines will appear: 3:OBS_PARA: ps :OBS_PARA: t :OBS_PARA: q :OBS_PARA: uv :OBS_PARA: sst :OBS_PARA: pw This table is important to see if the observations have been read in, which types of observations have been read in, and the distribution of observations in each sub domain. At this point, GSI has read in all the data needed for the analysis. Following this table is the inner iteration information. 4. Inner iteration The inner iteration step in the stdout file will look as follows: 0: Minimization iteration 0 0:grepcost J,Jb,Jo,Jc,Jl = E E E E E+00 0:grepgrad grad,reduction= E E+00 0:pcgsoi: cost,grad,step = E E E-02 0:pcgsoi: gnorm(1:2),b= E E E-01 0: stprat E-01 0: stprat E-15 Following the namelist step up, similar information will be repeated for each inner loop. In this case, 2 outer loops with 50 inner loops in each outer loop have been set. The last iteration looks like: 75

81 GSI Applications 0: Minimization iteration 39 0:grepcost J,Jb,Jo,Jc,Jl = E E E E E+00 0:grepgrad grad,reduction= E E-04 0:pcgsoi: cost,grad,step = E E E-02 0: PCGSOI: WARNING **** Stopping inner iteration *** 0: gnorm E-10 less than E-09 0: Minimization final diagnostics Clearly, the iteration met the stop threshold before meeting the maximum iteration number (50). As a quick check of the iteration: the J value should descend through each iteration. Here, the J has a value of E+04 at the beginning and a value of E+04 at the final iteration. This means the value has reduced by almost half, which is an expected reduction. 5. Write out analysis results The final step of the GSI analysis procedure looks very similar to the portion where the background fields were read in: 0: max,min MU= : rmse_var=mu 0: ordering=xy 0: WrfType,WRF_REAL= : ndim1= 2 0: staggering= N/A 0: start_index= : end_index1= : k,max,min,mid T= : k,max,min,mid T= : k,max,min,mid T= : k,max,min,mid T= As an indication that GSI has successfully run, several lines will appear at the bottom of the file: 7. 0: ENDING DATE-TIME MAR 22, :52: TUE : PROGRAM GSI_ANL HAS ENDED. IBM RS/6000 SP 0:*. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. *. * GSI was mainly developed on an IBM machine, so the IBM RS/6000 SP markers will still appear on a Linux machine. After carefully investigating each portion of the stdout file, it can be concluded that GSI successfully ran through every step and there were no run issues. More complete description of the stdout file can be found in Section 4.1. However, it cannot be concluded that GSI did a successful analysis until more diagnosis has been completed. 76

82 GSI Applications 5.1.4: Diagnose GSI analysis results : Check analysis fit to observations The analysis uses observations to correct the background fields to push the analysis results to fit the true state under certain constraints. The easiest way to confirm the GSI analysis fits the observations better than the background is to check a set of files with names fort.2??, where?? is a number from 01 to 19. In the run scripts, several fort files also have been renamed as fit_t1 (q1, p1, rad1, w1).yyyymmddhh. Please check Scetion for a detailed explanation of the fit files. The following are several examples of these fit files. fit_t (fort.203) This file shows how the background and analysis fields fit to temperature observations. The contents of this file show three data types were used in the analysis: 120, 130, and 180. Also included are the observation number, bias, and rms of observation minus background (O-B) on each level for the three data types. The following is a part of the file: ptop it obs type styp pbot o-g 01 t count o-g 01 t bias o-g 01 t rms o-g 01 t count o-g 01 t bias o-g 01 t rms o-g 01 t count o-g 01 t bias o-g 01 t rms o-g 01 all count o-g 01 all bias o-g 01 all rms o-g 03 t count o-g 03 t bias o-g 03 t rms o-g 03 t count o-g 03 t bias o-g 03 t rms o-g 03 t count o-g 03 t bias o-g 03 t rms o-g 03 all count o-g 03 all bias o-g 03 all rms For example: data type 120 has 1249 observations in level hpa, a bias of -0.19, and a rms of The last column shows the statistics for the whole atmosphere. There are several summary lines for all data types, which is indicated by 77

83 GSI Applications all in the data types column. For summary O-B (which is o-g 01 in the file), we have observations total, a bias of -0.66, and a rms of Skipping ahead in the fort file, o-g 03 columns (under it ) show the observation minus analysis (O-A) information. Under the summary ( all ) lines, it can be seen that there were total observations, a bias of -0.03, and a rms of This shows that from the background to the analysis the bias reduced from to -0.03, and the rms reduced from 2.16 to This is about a 34% reduction, which is a reasonable value for large-scale analysis. fit_w (fort.202) This file demonstrates how the background and analysis fields fit to wind observations. This file (as well as fit_q1) are formatted the same as the above example. Therefore, only the summary lines will be shown for O-B and O-A to gain a quick view of the fitting: ptop it obs type styp pbot o-g 01 all count o-g 01 all bias o-g 01 all rms o-g 03 all count o-g 03 all bias o-g 03 all rms O-B: observations in total, bias is 0.32 and rms is 4.72 O-A: observations in total, bias is 0.32 and rms is 3.18 The total bias was not reduced, however the rms reduced from 4.72 to 3.18 (~33% reduction). fit_q (fort.204) This file demonstrates how the background and analysis fields fit to moisture observations (relative humidity). The summary lines for O-B and O-A are as follows: ptop it obs type styp pbot o-g 01 all count o-g 01 all bias o-g 01 all rms o-g 03 all count o-g 03 all bias o-g 03 all rms O-B: 4240 observations in total and bias is and rms is

84 GSI Applications O-A: 4241 observations in total and bias is and rms is The total bias and rms were reduced. fit_p (fort.201) This file demonstrates how the background and analysis fields fit to surface pressure observations. Because the surface pressure is two-dimensional, the table is formatted different than the three-dimensional fields shown above. Once again, the summary lines will be shown for O-B and O-A to gain a quick view of the fitting: it obs type stype count bias rms cpen qcpen o-g 01 all o-g 03 all O-B: observations in total and bias is and rms is O-A: observations in total and bias is and rms is Both the total bias and rms were reduced. These statistic values show that the analysis results fit to the observations closer than the background, which is what the analysis is supposed to do. How close the analysis fit to the observations is based on the ratio of background error variance and observation error : Check the minimization In addition to the minimization information in the stdout file, GSI writes more detailed information into a file called fort.220. The content of fort.220 is explained in section Below is an example of a quick check of the trend of the cost function and norm of gradient. The value should get smaller with each iteration step. In the run directory, the cost function and norm of gradient information can be dumped into an output file by: be1005en% grep 'penalty,grad,a,b=' fort.220 sed -e 's/penalty,grad,a,b=//g' > cost_gradient.txt The file cost_gradient.txt includes 6 columns, however only the first 4 columns are shown below. The first 5 and last 5 lines read are: E E E E E E E E E E E E E E E E E E-03 79

85 GSI Applications E E E E-04 The first column is the outer loop number and the second column is the inner iteration number. The third column is the cost function, and the forth column is the norm of gradient. It can be seen that both the cost function and norm of gradient are descending with each iterations. To get a complete picture of the minimization process, the cost function and norm of gradient can be plotted using a provided NCL script located under:./util/analysis_utilities/plot_ncl/gsi_cost_gradient.ncl. the plot is shown below: Figure 5.2: Cost function and norm of gradient change with iteration steps The above plots demonstrate that both the cost function and norm of gradient descend very fast in the first 10 iterations in both outer loops and drop very slowly after the 10 th iteration. It can also be seen that the norm of gradient in the second outer loop ascends in the first several iterations. 80

GSI Applications 5.1.4.3: Check analysis increment The analysis increment gives us an idea where and how much the background fields have been changed by the observations.

86 GSI Applications : Check analysis increment The analysis increment gives us an idea where and how much the background fields have been changed by the observations. Another useful graphics tool that can be used to look at the analysis increment is located under:./util/analysis_utilities/plot_ncl/analysis_increment.ncl. The graphic below shows the analysis increment at the 15 th level. Figure 5.3: Analysis increment at the 15 th level It can be clearly seen that the U.S. CONUS domain has many upper level observations and the data availability over the ocean is very sparse. 81

87 GSI Applications 5.2. Assimilating Radiance Data with GSI 5.2.1: Run script Adding radiance data into the GSI analysis is very straightforward after a successful run of GSI with conventional data. The same run scripts from the above section can be used to run GSI with radiance with or without PrepBUFR data. The key step to adding the radiance data is to add a link to properly link the radiance BUFR data files to the GSI run directory with the names listed in the GSI namelist &OBS_INPUT section. This example adds the three following radiance BUFR files: AMSU-A: gdas1.t12z.1bamua.tm00.bufr_d AMSU-B: gdas1.t12z.1bamub.tm00.bufr_d HIRS4: gdas1.t12z.1bhrs4.tm00.bufr_d The location of the above radiance BUFR files has been saved as indicated in OBS_ROOT, therefore the following three lines can be inserted below the link of the PrepBUFR data in run_gsi.ksh: ln -s ${OBS_ROOT}/gdas1.t12z.1bamua.tm00.bufr_d amsuabufr ln -s ${OBS_ROOT}/gdas1.t12z.1bamub.tm00.bufr_d amsubbufr ln -s ${OBS_ROOT}/gdas1.t12z.1bhrs4.tm00.bufr_d hirs4bufr If it is desired to run radiance data in addition to conventional PrepBUFR data, the following link to the PrepBUFR can be left as is: ln -s ${PREPBUFR}./prepbufr To analyze radiance data without conventional PrepBUFR data, this line can be commented out in run_gsi.ksh: ## ln -s ${PREPBUFR}./prepbufr In the following example, the case study will include both radiance and conventional observations. In order to link the correct name for the radiance BUFR file, the namelist section &OBS_INPUT should be referenced. This section has a list of data types and BUFR file names that can be used in GSI. The 1 st column dfile is the file name recognized by GSI. The 2 nd column dtype and 3 rd column dplat are the data type and data platform in the BUFR/PrepBUFR file listed in dfile, respectively. More detailed information on data usage can be found in Section 4.3. For example, the following line tells us the AMSU-A observation from NOAA-17 should be in a BUFR file named as amsuabuf : dfile(30)='amsuabufr',dtype(30)='amsua',dplat(30)='n17',dsis(30)='amsua_n17',dval(30)=0.0,dthin(30)=2, 82

88 GSI Applications In radiance data assimilation, two important setups, data thinning and bias correction, need to be checked carefully. The following is a brief description of these two setups: Radiance data thinning The radiance data thinning is setup in the namelist section &OBS_INPUT. The following is a part of namelist in that section: dmesh(1)=120.0,dmesh(2)=60.0,dmesh(3)=60.0,dmesh(4)=60.0,dmesh(5)=120 dfile(30)='amsuabufr',dtype(30)='amsua',dplat(30)='n17',dsis(30)='amsua_n17',dval(30)=0.0,dthin(30)=2, The first line of &OBS_INPUT has a thinning grid list in array demesh. In each data type line, the last column of the line, dthin(30)=2, is used to select the mesh grid used in the thinning. It can be seen that the data thinning option for NOAA-17 AMSU-A observations is 60 km because the dmesh(2) is set to 60 km. For more information about data thinning, please refer to section 3.3. Radiance data bias corretion The radiance data bias correction is very important for a successful radiance data analysis. In the run scripts, there are two points related to bias correction: SATANGL=${FIX_ROOT}/global_satangbias.txt cp ${FIX_ROOT}/ndas.t06z.satbias.tm03./satbias_in The first file (global_satangbias.txt) tells GSI the angle bias, which is calculated outside GSI. The second file (ndas.t06z.satbias.tm03) tells GSI the mass bias, which is calculated inside GSI from the previous cycle. In the released version 3.0, mass bias correction values in ndas.t06z.satbias.tm03 are all 0, which means there was not a good estimate base for mass bias correction in this case. Also, the angle bias file global_satangbias.txt was also out of date. These two files can be found in./fix for an example of the bias correction coefficients. The details of the radiance data bias corretion are not included in this document but users are suggested to check the bias correction for radiance data assimilation carefully. For this case, the GDAS bias correction files were downloaded and saved in the observation directory. To link these files, change the following lines in run script: SATANGL=${FIX_ROOT}/global_satangbias.txt cp ${FIX_ROOT}/ndas.t06z.satbias.tm03./satbias_in Once these links are set, we are ready to run the case. 83

89 GSI Applications 5.2.2: Run GSI and check run status The process for running GSI is the same as described in section Once run_gsi.ksh has been submitted, move to the run directory to check the GSI analysis results. The run directory will look exactly the same as with the conventional data, with the exception of the three links to the radiance BUFR files used in this case. Following the same steps as in section 5.1.2, check the stdout file to see if GSI has run through each part of the analysis process successfully. In addition to the information outlined for the conventional run, the radiance BUFR files should have been read in and distributed to each sub domain: 3:OBS_PARA: ps :OBS_PARA: t :OBS_PARA: q :OBS_PARA: uv :OBS_PARA: sst :OBS_PARA: pw :OBS_PARA: hirs4 metop-a :OBS_PARA: amsua n :OBS_PARA: amsua n :OBS_PARA: amsua metop-a :OBS_PARA: amsub n :OBS_PARA: hirs4 n :OBS_PARA: amsua n When comparing this output to the content in step 3 of section 5.1.3, it can be seen that there are 7 new radiance data types that have been read in: HIRS4 from METOP-A and NOAA-19, AMSU-A from NOAA-15, NOAA-18, NOAA-19, and METOP-A, and AMSU-B from NOAA-17. The above information also shown that most of the radiance data read in this case are AMSU-A from NOAA-15 and : Diagnose GSI analysis results : Check file fort.207 The file fort.207 is the statistic file for radiance data, similar to file fort.203 for temperature. This file contains important details about the radiance data analysis. Section explained the information in this file in greater detail. Below are some values from fort.207 to give a quick look of the radiance assimilation in this case study. The fort.207 file contains the following lines: For O-B, the stage before the first outer loop: o-g 01 rad n15 amsua o-g 01 rad n17 amsub For O-A, the stage after the second outer loop: o-g 03 rad n15 amsua o-g 03 rad n17 amsub

90 GSI Applications From the above information, it can be seen that AMSU-A data from NOAA-15 have within the analysis time window and domain. After thinning, there were left for this data type, but only 9647 were used in the analysis. The penalty for this data decreased from to after 2 outer loops. It is also very interesting to see the number of AMSU-A in O-A calculation increase to 37309, 4 times larger than the number of O-B. It can also be seen that AMSU-B data from NOAA-17 have within the analysis time window and domain. After thinning, only 254 were left and none of them were used in the analysis. The statistics for each channel can be view in the fort.207 file as well. Below channels from AMSU-A NOAA-15 are listed as an example: For O-B, the stage before the first outer loop: 1 1 amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n For O-A, the stage after the second outer loop: 1 1 amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n amsua_n The second column is channel number for AMSU-A and the last column is the standard deviation for each channel. It can be seen that most of the channels fit to the observation more or less, but the standard deviation increased for channels 7, 8, 9, and 10, 12. The increasing of the standard deviation seems to be occurring because a lot more observations are used in O-A calculation not because the fit has gotten worse : Check analysis increment The same methods for checking the optimal minimization as demonstrated in section can be used for radiance assimilation. Similar features to the conventional assimilation should be seen with the minimization. The figures below show detailed information of how the radiance data impact the analysis results. Using the same NCL 85

GSI Applications script as in section 5.1.4.

91 GSI Applications script as in section , analysis increment fields are plotted comparing the analysis results with radiance and conventional to the analysis results with conventional assimilation only. This difference is only shown at level 49 and level 6, which represent the maximum temperature increment level (49) and maximum moisture increment level (6). Figure 5.4: Analysis increment fields comparing to the analysis with PREPBUFR only at level 49 Figure 5.5: Analysis increment fields comparing to the analysis with PREPBUFR only at level 6 86

92 GSI Applications In order to fully understand the analysis results, the following knowledge needs to be known: 1. The weighting functions of each channel and the data coverage at this analysis time. There are several sources on the Internet to show the weighting function for the AMSU-A channels. Chanel 1 is the moisture channel, while the others are mainly temperature channels (Channels 2, 3 and 15 also have large moisture signals). Because a model top of 10 mb was specified for this case study, the actual impact should come from channels below channel The usage of each channel is located in the file named satinfo in the run directory. The first two columns show the observation type and platform of the channels and the third column tells us if this channel is used in the analysis. Because a lot of amsua_n15 and amsua_n18 data were used, they should be checked in detail. In this case, Channel 11 and 14 from amsua_n15 and channel 9 and 14 from amsua_n18 were turned off. 3. Thinning information: a quick look at the namelist in the run directory: gsiparm.anl shows that both amsua_n15 and amsau_n18 pick thinning grid 2, which is 60 km. In this case, the grid spacing is 30 km, which indicates to use the satellite observations every two grid-spaces, which may be a little dense. 4. Bias correction: radiance bias correction was previously discussed. It is very important for a successful radiance data analysis. The run scripts only can link to the old bias correction coefficients that are provided as an example in./fix: SATANGL=${FIX_ROOT}/global_satangbias.txt cp ${FIX_ROOT}/ndas.t06z.satbias.tm03./satbias_in Users can also download the operation bias correction coefficients during experiment period as a starting point to calculate the coefficients suitable for their experiments. Radiance bias correction for regional analysis is a difficult issue because of the limited coverage of radiance data. This topic is out of the scope of this document, but this issue should be considered and understood when using GSI with radiance applications Assimilating GPS Radio Occultation Data with GSI 5.3.1: Run script The addition of GPS Radio Occultation (RO) data into the GSI analysis is similar to that of adding radiance data.. In the example below, the RO data is used as refractivities. There is also an option to use the data as bending angles. The same run scripts used in sections and can be used with the addition of the following link to the observations: ln -s ${OBS_ROOT}/gdas1.t12z.gpsro.tm00.bufr_d gpsrobufr 87

93 GSI Applications For this case study, the GPS RO BUFR file was downloaded and saved in the OBS_ROOT directory. The file is linked to the name gpsrobufr, following the namelist section &OBS_INPUT: dfile(10)='gpsrobufr',dtype(10)='gps_ref',dplat(10)='',dsis(10)='gps',dval(10)=1.0,dthin(10)=0, This indicates that GSI is expecting a GPS refractivity BUFR file named gpsrobufr. In the following example, GPS RO and conventional observations are both assimilated. Change the run directory name in the run scripts to reflect this test: WORK_ROOT=/ptmp/test/gsiprd_${ANAL_TIME}_gps_prepbufr 5.3.2: Run GSI and check the run status The process of running GSI is the same as described in section Once run_gsi.ksh has been submitted, move to the working directory, gsiprd_ _gps_prepbufr, to check the GSI analysis results. The run directory will look exactly the same as with the conventional data, with the exception of the link to the GPS RO BUFR files used in this case. Following the same steps as in section 5.1.3, check the stdout file to see if GSI has run through each part of the analysis process successfully. In addition to the information outlined for the conventional run, the GPS RO BUFR files should have been read in and distributed to each sub domain: 3:OBS_PARA: ps :OBS_PARA: t :OBS_PARA: q :OBS_PARA: uv :OBS_PARA: sst :OBS_PARA: pw :OBS_PARA: gps_ref Comparing the output to the content in section 5.1.3, it can be seen that the GPS RO refractivity data have been read in and distributed to four sub-domains successfully : Diagnose GSI analysis results : Check file fort.212 The file fort.212 is the file for the fit of gps data in fractional difference. It has the same structure as the fit files for conventional data. Below is a quick look to be sure the GPS RO data were used: 88

94 GSI Applications Observation Background (O-B) ptop it obs type styp pbot o-g 01 all count o-g 01 all bias o-g 01 all rms Observation Analysis (O-A) o-g 03 all count o-g 03 all bias o-g 03 all rms Only levels above 400 hpa are shown above, while all levels are shown in the standard output. It can be seen that most of the GPS RO data are located in the upper levels, with a total of 8709 observations used in the analysis during the 1 st outer loop and the number used to calculate O-A was The bias of the data reduced from to 0.01 after the analysis, and the rms reduced from 0.66 to 0.48 after the analysis. It can be concluded that the analysis with GPS RO data looks reasonable from these statistics : Check analysis increment The same methods for checking the minimization as section can be used for the GPS RO assimilation. The following figures give detailed information of how the new data impacts the analysis result. Using the NCL script used in section , analysis increment fields are plotted comparing the analysis results with PrepBUFR and GPS RO with PrepBUFR data only for level 48, which represents the maximum temperature increment. 89

95 GSI Applications Figure 5.6: Analysis increment fields comparing to the analysis with PrepBUFR only at level 48 Summary This chapter applied the knowledge from the previous chapter to demonstrate how to set up, run, and analyze GSI for various applications. It is important to always check for successful completion of the GSI analysis, as running to completion does not always indicate a successful run. Using the tools and methods described in this chapter, a complete picture of the GSI analysis can be obtained. 90

96 GSI Theory and Code Structure Chapter 6: GSI Theory and Code Structure 6.1 GSI Theory GSI is a three dimensional variational (3DVAR) data assimilation system. As a reference for users to understand the GSI analysis procedure, a brief summary of the 3DVAR mathematical theory and the minimization steps used in the GSI is given in this section DVAR equations used by GSI: where: : Analysis fields : Background fields : Background error covariance matrix : observation operator : observations : observation error covariance J c : constraint terms (e.g., dynamical constraint, moisture constraint) (1) Define analysis increment (Δx=), then equation (1) becomes: (2) Assuming H is a linear operator, then equation (2) can be written as: (3) Define observation innovation as, then equation (3) becomes: (4) Iterations to find the optimal results The GSI preconditions its minimization algorithm by defining a new variable, then equation (4) becomes: (5) 91

97 GSI Theory and Code Structure The gradients of background and observation parts of the cost function (4) with respect to x and cost function (5) with respect to y have the form: To reach the optimal results, the following iterative minimization steps are used: First assume: (6) (7) Then iterate over n: Until maximum iteration or gradient sufficiently minimized. During the above iteration, is calculated in subroutine dprodx based on the equation: The stepsize ( ) is calculated in subroutine stpcalc Analysis variables There are seven analysis variables used in GSI analysis. They are: stream function unbalanced velocity potential unbalanced virtual temperature unbalanced surface pressure pseudo relative humidity (qoption =1) or normalized relative humidity (qoption=2) ozone mixing ratio (only for global GSI) cloud condensate mixing ratio (only for global GSI) 92

98 GSI Theory and Code Structure 6.2 GSI Code Structure This section introduces the basic code structure of the GSI. Section describes the main processes of the GSI using three main routines. Sections to introduce the code related to four important parts of GSI: background IO, observation ingestion, observation innovation calculation, and minimization iteration Main process gsimain.f90 main steps in each call call gsimain_initialize MPI initialize Initialize defaults of variables in modules Read in user input from namelist 4DVAR setup if it is true (not currently supported) Check user input for consistency among parameters for given setups Optional read in namelist for single observation run Write namelist to standard out If this is a wrf regional run, the run interface with wrf: call convert_regional_guess Initialize variables, create/initialize arrays call gsimain_run Call the main GSI driver routine call gsisub(mype) (check next page for steps in gsisub) call gsimain_finalize Deallocate arrays MPI finialize 93

99 GSI Theory and Code Structure GSI main process (continue) subroutine gsisub (gsisub.f90) high level driver for GSI If not ESMF Allocate grid arrays Get date, grid, and other information from background files Set communicators between subdomain and global/horizontal slabs End if not ESMF If single observation test, create prep.bufr file with single obs in it Read in Level 2 radar winds and create superob file Read info files for assimilation of various observations Computer random number for precipitation forward model. Complete setup and execute external and internal minimization loops if (lobserver) then call observer_init call observer_run call observer_finalize else call glbsoi(mype) endif Deallocate arrays Note: lobserver = if true, calculate departure vectors only. subroutine glbsoi (glbsoi.f90) driver for GSI Initialize timer for this procedure If l_hyb_ens is true, then initialize machinery for hybrid ensemble 3dvar Initialize observer Check GSI options against available number of guess time levels Read observations and scatter Create/setup background error and background error balance If l_hyb_ens is true, then read in ensemble perturbations If 4d-var, read output from previous min. Set error (variance) for predictors (only use guess) Set errors and create variables for dynamical constraint Main outer analysis loop do jiter=jiterstart,jiterlast Set up right hand side of analysis equation call setuprhsall Set up right hand side of adjoint of analysis equation Inner minimization loop if (lsqrtb) then call sqrtmin else call pcgsoi endif Save information for next minimization Save output of adjoint of analysis equation end do! jiter Deallocate arrays Write updated bias correction coefficients Finalize observer Finalize timer for this procedure 94

100 GSI Theory and Code Structure GSI background IO (for 3DVAR) Input background Background file Convert to internal format Read in and distribution NMM NetCDF NMM binary ARW NetCDF ARW binary Twodvar nems_nmmb CMAQ global GFS convert_regional_guess convert_netcdf_nmm convert_binary_nmm convert_netcdf_mass convert_binary_mass convert_binary_2d convert_nems_nmmb read_guess read_wrf_nmm_netcdf_guess read_wrf_nmm_binary_guess read_wrf_mass_netcdf_guess read_wrf_mass_binary_guess read_2d_guess read_nems_nmmb_guess read_cmaq_guess read_bias (bias correction fields) if ( use_gfs_nemsio ) then read_nems read_nems_chem else read_gfs read_gfs_chem write_all write_regional_analysis if ( use_gfs_nemsio ) write_nems else write_gfs write_bias (bias correction) Output analysis result write_regional_analysis wrwrfnmma_netcdf update_netcdf_nmm wrwrfnmma_binary wrwrfmassa_netcdf update_netcdf_mass wrwrfmassa_binary wr2d_binary wrnemsnmma_binary write_cmaq Analysis file NMM NetCDF NMM binary ARW NetCDF ARW binary Twodvar nems_nmmb CMAQ global GFS 95

101 GSI Theory and Code Structure Observation ingestion Data type (ditype) conv Observation type (obstype) t, uv, q, ps, pw, spd, mta_cld, gos_ctp sst srw tcp lag from mods not from mods rw (radar winds Level-2) dw (lidar winds) rad_ref lghtn larccld pm2_5 (platform) not AQUA (platform) AQUA amsub amsua msu mhs hirs4,3,2 ssu airs amsua hsb Subroutine that reads data read_prepbufr read_modsbufr read_prepbufr read_superwinds read_tcps read_lag read_radar read_lidar read_radarref_mosaic read_lightning read_nasa_larc read_anowbufr read_bufrtovs (TOVS 1b data) read_airs (airs data) rad iasi read_iasi (satellite sndr, sndrd1, sndrd2 radiances) sndrd3, sndrd4 ssmi read_ssmi amsre_low, amsre_mid amsre_hig read_amsre ssmis, ssmis* read_ssmis goes_img read_goesimg seviri read_seviri avhrr_navy read_avhrr_navy avhrr read_avhrr ozone subuv2, omi, gome, read_ozone o3lev co mopitt read_co pcp pcp_ssmi, pcp_tmi, read_pcp pcp_amsu,pcp_stage3 gps gps_ref, gps_bnd read_gps aero modis read_aerosol read_goesndr (GOES sounder data) Note: This table is based on subroutine read_obs in read_obs.f90: Data type is saved in array ditype Observation type is save in array obstype. In namelist, the observation type is dtype Each observation type uses one or more processors to read in the data and then write the data into a file called obs_input.*, where * is a processor ID that is used to read in certain observation type. Then in subroutine obs_para (obs_para.f90), each processor reads all obs_input.* files and save all observations within its subdomain into a file called: pe*.obs_setup, where * is 4 digital processor ID. 96

102 GSI Theory and Code Structure Observation innovation calculation Data type (ditype) conv rad (satellite radiances) t uv q ps pw spd sst srw tcp lag Observation type (obstype) rw (radar winds Level-2) dw (lidar winds) pm_2_5 (platform) not AQUA (platform) AQUA amsub amsua msu mhs hirs4,3,2 ssu airs amsua hsb iasi sndr, sndrd1, sndrd2 sndrd3, sndrd4 ssmi amsre_low, amsre_mid amsre_hig ssmis, ssmis* Subroutine calculate innovation setupt setupw setupq setupps setuppw setupspd setupsst setupsrw setuptcp setuplag setuprw setupdw setuppm2_5 setuprad goes_img seviri avhrr_navy avhrr ozone subuv2, omi, gome, setupoz o3lev setupo3lv pcp pcp_ssmi, pcp_tmi, setuppcp pcp_amsu,pcp_stage3 co mopitt, subuv2 setupco gps gps_ref setupref gps_bnd setupbend Note: this table is based on subroutine setuprhsall in setuprhsall.f90: Data type is saved in array ditype Observation type is save in array obstype The observation departure from the background of each outer loop is calculated in subroutine setuprhsall. A array (rdiagbuf) that holds observation innovation for diagnosis is generated in each setup routine. (Also see A2) The index of the data array from reading routine for T is list below: index content 1 ier obs error 2 ilon grid relative obs location (x) 3 ilat grid relative obs location (y) 4 ipres pressure 5 itob t observation 6 id station id 7 itime observation time in data array 8 ikxx observation type 9 iqt flag indicating if moisture obs available 10 iqc quality mark 11 ier2 original-original obs error ratio 12 iuse use parameter 13 idomsfc dominant surface type 14 iskint surface skin temperature 15 iff10 10 meter wind factor 16 isfcr surface roughness 17 ilone longitude (degrees) 18 ilate latitude (degrees) 19 istnelv station elevation (m) 20 iobshgt observation height (m) 21 iptrb t perturbation 97

103 GSI Theory and Code Structure Inner iteration The inner iteration of the GSI variational analysis is performed in subroutine pcgsoi (pcgsoi.f90), inside the following loop: inner_iteration: do iter=0,niter(jiter) end do inner_iteration For detailed steps, advanced developers are suggested to read through the code and send questions to 98

104 Observation and Background Error Statistics Chapter 7: Observation and Background Error Statistics 7.1 Conventional Observation Errors Each observation type has its own observation errors. In this section, we introduce several topics related to the conventional observation error processing in GSI Getting original observation errors For the global GSI analysis, when oberrflg (a namelist option in section &obsqc) is true, observation errors are generated based on an external observation error table according to the types of observations. Otherwise, observation errors are read in from the PrepBUFR file. For the regional GSI runs, GSI forces the use of an external observation error table to get observation errors no matter what the oberrflg is set to (oberrflg is set to true for regional runs in gsimod.f90). The external observation error table (for example, nam_errtable.r3dv) is saved in the same directory with other fixed files such as background error covariances (~/comgsi_v3/fix). This includes observation errors for all types of observations. For each type of observation, it has 6 columns and 33 levels, which cover pressure levels from 1100 hpa to 0 hpa. The following is a part of nam_errtable.r3dv for rawinsondes as well as the explanation of the column: OBSERVATION TYPE E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E OBSERVATION TYPE E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E+10 Column # Content Pressure T q UV Ps Pw Unit hpa degree C percent/10 m/s mb kg/m 2 (or mm) This table basically prescribes the observation errors for temperature (T), moisture (q), horizontal wind component (UV), surface pressure (P s ), and the total column precipitable water (P w ). The missing value indicator is E+10. The entire contents of this file are 99

105 Observation and Background Error Statistics read in during the subroutine converr_read and stored in the 3D array etabl. When a certain type of conventional data is read in from the prepbufr file (read_prepbufr.f90), the corresponding observation error is assigned by vertically interpolating it in the etabl array. Thus for each conventional type of observation, a matrix (obserr) is generated which holds the corresponding observation error corresponding to the basic five variables. The following is the list of the contents of obserr array obtained by vertically interpolating the corresponding values from the etabl array: obserr(1,k)= P s, obserr(2,k)=q, obserr(3,k)=t, obserr(5,k)=uv, obserr(7,k)=p w Where, k is the index of observation levels Observation error adjustment and gross error check within GSI The observation error extracted from the etabl array in the obserr array is finally adjusted based on its quality, vertical sigma location, observation density, time of the observation, etc. The adjustment occurs in read_prepbufr.f90 and its main steps are listed as follows: 1. Observation errors are limted with the corresponding lower limits. Currently, these lower limits are hard coded and they are assigned in read_prepbufr. The observation error limits for temperature, moisture, wind, surface pressure and total precipitable water are set in variables terrmin=1., qerrmin=0.1, werrmin=1.0, perrmin=0.5, pwerrmin=1.0, respectively. 2. Observation errors are adjusted based on the quality marks from the prepbufr data files. If the quality marks from prepbufr are larger than a threshold value (lim_qm in read_prepbuf.f90r), the observation error is adjusted to a very large number (1.0x10 6, which indicates a bad observation and will not make any impact on the analysis results. The values of lim_qm are listed in Section If an observation quality mark is either 3 or 7 errors are inflated by setting inflate_error as true. The inflation factor may be fixed based on the observation types. However, currently it is fixed as For certain observation types (e.g., T), the observation error is amplified by a factor of 1.2 if the observation location is above 100 hpa. During the observation innovation calculation (e.g., in the subroutine setupt, see section for detail), GSI performs gross error checks and observation error tuning (if obserror_tune is set to true). Users can adjust the threshold of the gross check for each data type within the convinfo file. For example, the following is a part of convinfo without last three collumns:!otype type sub iuse twindow numgrp ngroup nmiter gross ermax ermin var_b var_pg ithin ps ps t t The gross check for each data type is controlled by three columns listed with: gross, ermax, and ermin. If an observation has observation error: obserror, then a ratio is calculated: 100

106 Observation and Background Error Statistics ratio = (Observation-Background)/max(ermin,min(ermax,obserror)) If ratio > gross, then this observation fails in gross check and will be listed as a rejection in the fit files. 7.2 Background Error Covariance Background error covariance plays very important role in determining the quality of variational analysis for NWP models. It controls what percentage of the innovation becomes the analysis increment, how each observation impacts a broad area, and the balance among different analysis variables. The background error covariance matrix (B) is defined as the covariance of forecasts error [Forecast (x) Truth (xtruth)]. Since the actual state of atmosphere (truth) is not known, the forecast error needs to be estimated. When estimating forecast errors, the most common methods that are in use are the NMC method and ensemble method. In the NMC method forecast error is estimated with the difference of two (typically 12 and 24 hours) forecasts valid for the same time. In the ensemble method, the forecast error is estimated with ensemble perturbations (ensemble - ensemble mean). Because the background error is used as covariance matrix, the size of B matrix is extremely large. It is typically on the order of 10 6 x10 6, which in its present form cannot be stored in any computer. This problem is simplified by designing an ideal set of analysis variables for which the analysis is performed. These are generally referred to as analysis control variables. The design of analysis control variables are such that the cross-correlatoios between these variables are minimum and thus leads to very small off-diagonal terms in B. The cross dependency is removed with pre-computed regression coefficients. Further, the forecats error is modeled using Gaussian error characteristics with pre-computed estimates of variance and the lengthscale parameters for each of the analysis control variables. Typically, the statistics about the desired regression coefficients, variance and lengthscale parameters is computed offline with a sufficiently large data set, typically for a period of at least one month. For this purpose a separate utility called gen_be is available with latest WRF data assimilation (WRFDA) release. Details about this utility are available in Rizvi et al. (2010). This section briefly introduces how GSI processes these pre-computed background error statistics and applies them in the GSI analysis Processing of background error statistics The GSI package has several files in ~/comgsi_v3/fix/ which holds the pre-computed background error statistics for regional and global domains for different grid configurations. Since the GSI code has a builtin mechanism to interpolate input background error statistics at any desired analysis grid, the following two files are mainly used in GSI to specify the B maritx for regional applications. 101

107 Observation and Background Error Statistics nam_nmmstat_berror.f77 is the regional background error based on the NAM model covering the north hemisphere nam_glb_berror.f77 is the global background error based on the GFS model covering global All the parameters for the global background error statistics are latitude dependent. In the case of the regional background error statistics, regression coefficients of velocity potential and the variance, and horizontal lengthscale for all the variables are latitude dependent. The remaining parameters such as regression coefficients for unbalanced surface pressure, temperature and vertical lengthscales for all the fields do not varry with latitude. The background error statistics is initially read at its original sigma levels and interpolated vertically in log(sigma) coordinates on the desired vertical sigma levels. All the subroutines performing this interpolation for global and regional applications are respectively available in m_berror_stats.f90 and m_berror_stats_reg.f90 modules. In subroutines prewgt and prewgt_reg, lengthscales (both horizontal and vertical) and variance information is read by calling berror_read_wgt and berror_read_wgt_reg respectively for gloabal and regional applications. The following is the list of arrays in which the original background error statistics are read in the various subroutines discussed above: Category Array Dimension Content name Balance agvi 0:mlat+1,1:nsig,1:nsig Regression coefficients for stream function and temperature (Horizontal regression wgvi 0:mlat+1,1:nsig Regression coefficients for stream function and surface pressure coefficients) bvi 0:mlat+1,1:nsig Regression coefficients for stream function and velocity potential Horizontal and vertical influence scale hwll 0:mlat+1,1:nsig,1:nc3d horizontal lengthscales for stream function, unbalanced velocity potential, unbalanced temperature, and relative humidity hwllp 0:mlat+1,nvars-nc3d horizontal lengthscale for unbalanced surface pressure vz 1:nsig,0:mlat+1,1:nc3d Vertical lengthscale for stream function, unbalanced velocity potential, unbalanced temperature, and relative humidity corz 1:mlat,1:nsig,1:nc3d Squre root of variance for stream function, unbalanced velocity potential, unbalanced variance temperature, and relative humidity corp 1:mlat,nc2d Square root of variance for unbalanced surface pressure Note: mlat = number of latitude in original coefficient domain, nsig = number of vertical levels in analysis grid nc3d = number of 3 dimensional analysis variables nvars= number of 3 dimensional and 2 dimensional analysis variables 102

108 Observation and Background Error Statistics Horizontal interpolation of regression coefficients on the desired grid is done for global and regional applications respectively in subroutines prebal and prebal_reg, residing in the balmod.f90 module. Horizontally interpolated regression coefficients on the desired grid are stored in bvz, agvz, wgvz and bvk, agvk, wgvk arrays for global and regional applications respectively. These regression coefficients are used in subroutine balance to build the respective balance part of velocity potentional, temperature, and surface pressure fields. In subroutines prewgt_reg and prewgt, horizontal and vertivcal lengthscales (hwll, hwllp, vz) and variance (corz, corp) information is horizontally interpolated and adjusted with the corresponding input tuning parameters ( vs, hzscl, hswgt, as3d and as2d ) supplied through gsiparm.anl and anavinfo.txt. Desired information is finally processed and transformed to new arrays such as slw, sli, dssv and dssvs, which are subsequently used for recursive filter applications both in the horizontal and vertical directions. The variance array: dssv is an allocated array for 3D variables with the dimensions as (lat, lon, nsig, variables). The dssvs is an allocated array for 2D variables with the dimensions (lat, lon, variables). For both of these arrays, allocation of variables is decided by the input parametrs supplied via anavinfo.txt Apply background error covariance According to variational equations used in the GSI, the background error covariance is used to calculate the gradient of cost function with respect to y based on the gradient of cost function with respect to x, which can be represented below following section 6.1.1: (subroutine bkerror(gradx,grady)) Because B is very complex and has a very large dimension in most of the data analysis domain, in reality, it must be decomposed into several sub-matrices to fulfill its function step by step. In GSI, the B matrix is decomposed into the following form: B = B balance VB Z (B x B y B y B x )B Z VB T balance The function of each sub-matrix is explained in the following table: Sub-matrix of B Function Subroutine GSI files balance among different variables balance balmod.f90 adjoint of balance equation tbalance balmod.f90 Squra root of variance bkgvar bkgvar.f90 vertical smoother frfhvo smoothzrf.f90 Self-adjoint smoothers in West-East (B x ) and South-North (B y ) direction smoothrf smoothzrf.f90 103

109 Observation and Background Error Statistics To help users to understand how the background error covariance is applied in GSI, the above process is achieved by calling bkerror in following three steps: Step 1. Adjoint of balance equation ( )is done by calling tbalance Step 2. Apply square root of variances, vertical and horizontal parts of background error correlation by calling subroutine bkgcov Multiply by square root of background error variances ( ) by calling bkgvar Apply vertical smoother ( ) by calling frfhvo Convert from subdomain to full horizontal field distributed among processors by calling sub2grid Apply self-adjoint smoothers in West-East (B x ) and South-North (B y ) direction by calling smoothrf. Smoothing in horizontal is achieved by calling ryxyyx at each vertical sigma level in a loop over number of vertical sigma levels (nlevs). Smoothing for three horizontal scales is done with the corresponding weighting factors (hswgt) and horizontal lengthscale tuning factors (hzscl). The horizontal field is transformed back to respective subdomains by calling grid2sub Apply vertical smoother ( ) by calling frfhvo Multiply by the squre root of background error variances ( ) by calling bkgvar Step 3. Applicaton of balance equation ( ) is done by calling balance In this step the balance part of velocity potential, temperature and surface pressure is computed from the stream function filled by using the corresponding regression coefficients as follows: velocity potential = unbalanced velocity potential + stream function temperature = unbalanced temperature + stream function surface pressure = unbalanced surface pressure + stream function 7.3 Bias Correction for Satellite Radiance Observation Two files are used in the bias correction for satellite radiance observations. One file is called satbias_angle, and located in ~/comgsi_v3/fix, another one is called satbias_in located as an example under./fix as ndas.t06z.satbias.tm03. Both are used by the bias correction processing in radiance data assimilation. The following descriptions for these two files come from the subroutine radinfo_read in radinfo.f satbias_angle The satbias_angle file contains the angle dependent part of the brightness temperature bias for each channel/instrument/satellite. Also included in this file is the mean temperature lapse rate for each channel weighted by the weighting function for the given channel/instrument. These coefficients are caculated based on previous cycle radiance information outside of GSI 104

110 Observation and Background Error Statistics 2. satbias_in The satbias_in file contains coefficients for predictive (air mass) part of satellite radiance data bias correction. These coefficients are caculated based on previous cycle radiance information inside of GSI. 105

111 BUFR and PrepBUFR Chapter 8: BUFR and PrepBUFR The GSI expects observations to be encoded in BUFR (Binary Universal Form for the Representation of meteorological data). BUFR is a self-descriptive table driven code form. It offers great advantages of flexibility and expandability compared with the traditional alphanumeric code forms as well as packing to reduce message sizes. The term "selfdescriptive" means that the form and content of the data contained within a BUFR message are described within the BUFR message itself. BUFR is one of the code forms that the World Meteorological Organization (WMO) recommends for the representation and exchange of observational data and is recommended for use in all present and future WMO applications. PrepBUFR is the National Center for Environmental Prediction (NCEP) term for prepared or QC d data in BUFR format (NCEP convention/standard). Please note that a PrepBUFR file is still a BUFR file, but has more QC information. NCEP uses PrepBUFR files to organize conventional observations and satellite retrievals as well as other related information (such as quality marks) into single files. The BUFRLIB software and BUFR tables are needed for processing BUFR files (including PrepBUFR files). In this chapter, we introduce the basic procedure/methodology to encode, decode, and append BUFR files through several simple BUFR examples based on the NCEP BUFRLIB software documentation in Section 1. Operational BUFR/PrepBUFR files are introduced in Section 2. Finally, BUFR/PrepBUFR data resources available for community users are listed in Section BUFR File Process (encode, decode, and append) To give users an idea on how a BUFR file is organized and provide a starting point for processing BUFR files, the community GSI package includes a set of simple Fortran programs under./util/bufr_tools/, which can perform basic BUFR functions such as encoding (write), decoding (read), and appending. We will use these examples to explain how to work with a BUFR file. Please note that all these examples are based on the NCEP BUFRLIB. Online documentation on BUFRLIB and tables are listed in Section To understand the BUFR file structure, it is important to remember the following hierarchy of a BUFR file in a top-down fashion: "A BUFR file contains one or more BUFR messages, each containing one or more BUFR data subsets, each containing one or more BUFR data values." Decoding/reading data from a simple BUFR file The following is from the code bufr_decode_sample.f90, which shows how to read specific observation values (among a large variety) in a BUFR file named as sample.bufr. 106

BUFR and PrepBUFR program bufr_decode_sample!! example of reading observations from bufr!

idate,iret,num_message,num_subset! decode open(unit_in,file='sample.

= 0 write(*,'(i10,i4,a10)') idate,num_message,subset sb_report: do while (ireadsb(unit_in) == 0) num_subset = num_subset+1 call ufbint(unit_in,hdr,3,1,iret,hdstr) call

112 BUFR and PrepBUFR program bufr_decode_sample!! example of reading observations from bufr! implicit none character(80):: hdstr='xob YOB DHR' character(80):: obstr='tob' real(8) :: hdr(3),obs(1,10) integer :: ireadmg,ireadsb character(8) subset integer :: unit_in=10 integer :: idate,iret,num_message,num_subset! decode open(unit_in,file='sample.bufr',action='read',form='unformatted') call openbf(unit_in,'in',unit_in) call datelen(10) num_message=0 msg_report: do while (ireadmg(unit_in,subset,idate) == 0) num_message=num_message+1 num_subset = 0 write(*,'(i10,i4,a10)') idate,num_message,subset sb_report: do while (ireadsb(unit_in) == 0) num_subset = num_subset+1 call ufbint(unit_in,hdr,3,1,iret,hdstr) call ufbint(unit_in,obs,1,10,iret,obstr) write(*,'(2i5,4f8.1)') num_subset,iret,hdr,obs(1,1) enddo sb_report enddo msg_report call closbf(unit_in) end program The structure of the FORTRAN code matches the top-down hierarchy of a BUFR file. To better illustrate this structure, the code is divided into four different levels as shown in the following: open(unit_in,file='sample.bufr',action='read',form='unformatted') call openbf(unit_in,'in',unit_in) msg_report: do while (ireadmg(unit_in,subset,idate) == 0) sb_report: do while (ireadsb(unit_in) == 0) call ufbint(unit_in,hdr,3,1,iret,hdstr) call ufbint(unit_in,obs,1,10,iret,obstr) enddo sb_report enddo msg_report call closbf(unit_in) 107

113 BUFR and PrepBUFR The three RED lines are the first level (file level) statements, which open/close a BUFR file for decoding. The two BLUE lines are the second level (message level) statements, which read in BUFR messages from the BUFR file. Each loop reads in one message until the last message in the file is reached. The two GREEN lines are the third level (subset level) statements, which read in BUFR data subsets from a BUFR message. Each loop reads in one subset until the last subset in the message is reached. The BLACK lines are the fourth level (data level) statements, which read in user picked data values into user defined arrays from each BUFR subset. All BUFR encode, decode, and append programs have the same structure as listed here. The message loop (msg_report) and subset loop (sb_report) are needed only if there are multiple messages in a file and multiple subsets in a message, which is the case for most types of observations. Now, let s check the usage of each of the BUFRLIB subroutines/functions used in the code: 1. First level (file level): open a BUFR file open(unit_in,file='sample.bufr',action='read',form='unformatted') call openbf(unit_in,'in',unit_in) call closbf(unit_in) The open command: a Fortran command to link a file with a logical unit. Here the action is read because we want to decode (read) only. openbf: CALL OPENBF ( LUBFR, CIO, LUNDX ) Input arguments: LUBFR INTEGER Logical unit for BUFR file CIO CHAR*(*) 'IN' or 'OUT' or 'APX' LUNDX INTEGER Logical unit for BUFR tables For decode, LUBFR=LUNDX is the logical unit in Fortran open command. CIO uses IN. Every BUFR file must have a BUFR table file associated with it, and these tables may be defined within a separate ASCII text file (see Description and Format of NCEP BUFR Tables for more info.) or, in the case of an existing BUFR file, the table may be embedded within the BUFR messages of the file itself, closbf: CALL CLOSBF ( LUBFR ) Input argument: LUBFR INTEGER Logical unit for BUFR file 108

114 BUFR and PrepBUFR It is also worth noting that CLOSBF will, before returning, actually execute a FORTRAN "CLOSE" on logical unit LUBFR, whereas subroutine OPENBF did not itself handle the FORTRAN "OPEN" of the same LUBFR. 2. Second level (message level): read in messages msg_report: do while (ireadmg(unit_in,subset,idate) == 0) enddo msg_report Function ireadmg: IRET = IREADMG ( LUBFR, CSUBSET, IDATE ) Input argument: LUBFR INTEGER Logical unit for BUFR file Output arguments: CSUBSET CHAR*(*) Table A mnemonic (name/type) for BUFR message IDATE INTEGER Section 1 date-time for BUFR message IRET INTEGER Return code: 0 = normal return -1 = no more BUFR messages in LUBFR Note: CSUBSET returns information about data type 3. Third level (subset level): read in data subsets sb_report: do while (ireadsb(unit_in) == 0) enddo sb_report Function ireadsb: IRET = IREADSB ( LUBFR ) Input argument: LUBFR INTEGER Logical unit for BUFR file Output arguments: IRET INTEGER Return code: 0 = normal return -1 = no more BUFR data subsets in current BUFR message 4. Fourth level (data level): read in picked data values This is the level where observation values are read into user-defined arrays. To understand how to read in observations from a BUFR subset, the following two questions need to be addressed: 1) How do I know what kind of data are included in the subset (or a BUFR file)? This question can be answered by checking the content of a BUFR table and mnemonics. The detailed definition and discussion on BUFR table and mnemonics can be found in the NCEP documents listed in Section Here we illustrate how to use the BUFR table to solve the problem directly. As an example, an excerpt from the BUFR table in sample.bufr for the message type ADPUPA is shown in the following. We will use this table information to illustrate how to track observation variables in ADPUPA (the upper level data type): 109

115 BUFR and PrepBUFR MNEMONIC NUMBER DESCRIPTION ADPUPA A48102 UPPER-AIR (RAOB, PIBAL, RECCO, DROPS) REPORTS AIRCAR A48103 MDCRS ACARS AIRCRAFT REPORTS MNEMONIC SEQUENCE ADPUPA HEADR SIRC {PRSLEVEL} <SST_INFO> <PREWXSEQ> {CLOUDSEQ} ADPUPA <CLOU2SEQ> <SWINDSEQ> <AFIC_SEQ> <TURB3SEQ> HEADR SID XOB YOB DHR ELV TYP T29 TSB ITP SQN PROCN RPT HEADR TCOR <RSRD_SEQ> SID STATION IDENTIFICATI XOB LONGITUDE YOB LATITUDE DHR OBSERVATION TIME MINUS CYCLE TI ELV STATION ELEVATION TYP PREPBUFR REPORT TYP MNEMONIC SCAL REFERENCE BIT UNITS SID CCITT IA XOB DEG E YOB DEG N DHR HOURS ELV METER TYP CODE TABL The four colors here are used to separate the different parts of the BUFR table, which can also be marked as Part 1 (red), Part 2 (blue), Part 3 (yellow), and Part 4 (green) in the order they are listed above. As discussed before, IREADMG reads in a message with three output arguments. The first output argument is: CSUBSET Table A mnemonic for BUFR message which returns the message type (also called data type). This message type is the starting point to learn what types of observations are included in this message. The description of message types can be found in the first section of a BUFR table, for example, Part 1 (red) in the sample BUFR table. Here, if CSUBSET has the value of ADPUPA, the contents of this message or all subsets (third level) read in from this message are upper air reports (like rawinsonde). There are many message types listed in the first section of the BUFR table and each of them represents a sequence. For example, a search of ADPUPA in the BUFR table returns the first two lines of Part 2 (blue), in which ADPUPA is followed by a sequence of items like: HEADR SIRC {PRSLEVEL}. If we then search for HEADR in the same file, we can find the last two lines in Part 2 (blue), in which HEADR leads another sequence, containing: SID XOB YOB DHR ELV TYP. 110

116 BUFR and PrepBUFR If we then search for SID XOB YOB DHR ELV TYP in the same file, we can find the definition of these items in Part 3 (yellow). Clearly, the message type ADPUPA includes variables like station ID, observation location (longitude, latitude), observation time, etc. These are important variables to describe an observation. If we keep searching for other items under ADPUPA, we can also find lots of observation variables are included in ADPUPA. Please note that a complete list of all variables in a message type could be very long and complex, but we don t need to learn about all of them - we only need to know what we need for our specific application. The last part of the BUFR table (Part 4, green) includes useful unit information for a variable; for example, the unit of XOB is DEG (degree) and the unit of DHR is HOURS (hours). Users will not likely need to make use of the scale, reference, and bit information. Two more notes are addressed here to help understand the BUFR table: MNEMONIC : it is a name for a message type or sequence, like ADPUPA, HEADR, or a name for a observation value like YOB DHR. { } : indicates that the enclosed mnemonic is replicated between 0 and 255 replications. For example, {PRSLEVEL} shows sequence PRSLEVEL is replicated to hold the multi-level rawinsonde observations. There are lots of other details on BUFR table, but the above information should be sufficient for most of BUFR file processing applications using the NCEP BUFRLIB software. 2). How to tell BUFRLIB to only read in specific data information? In this example, a temperature observation, along with its longitude, latitude, and observation time, is used to illustrate how to solve this question. From the BUFR table, for the message type ADPUPA, the name of longitude, latitude, and time in the BUFR table are 'XOB YOB DHR' within the sequence HEADER. Similarly, the name of the temperature observation can be found as 'TOB' in the sequence {PRSLEVEL} (not shown in the example BUFR table). Actually, most conventional message types contain such observation information. In the example code, the first several lines define the information we want to read: character(80):: hdstr='xob YOB DHR' character(80):: obstr='tob' real(8) :: hdr(3),obs(1,10) Where hdstr is a string of blank-separated names (mnemonics) associated with array hdr, while obstr is another string associated with array obs. Please note that arrays (hdr and obs) have to be defined as REAL*8 arrays. When we run the following two BUFRLIB subroutines, longitude (XOB), latitude (YOB), and observation time (DHR) will be read into array hdr and values for temperature observations (TOB) are read into array obs. call ufbint(unit_in,hdr,3,1,iret,hdstr) call ufbint(unit_in,obs,1,10,iret,obstr) 111

117 BUFR and PrepBUFR ufbint CALL UFBINT ( LUBFR, R8ARR, MXMN, MXLV, NLV, CMNSTR ) Input arguments: LUBFR INTEGER Logical unit for BUFR file CMNSTR CHAR*(*) String of blank-separated mnemonics associated with R8ARR MXMN INTEGER Size of first dimension of R8ARR MXLV INTEGER Size of second dimension of R8ARR OR number of levels of data values to be written to data subset Input or output argument (depending on context of LUBFR): R8ARR(*,*) REAL*8 Data values written/read to/from data subset Output argument: NLV INTEGER Number of levels of data values written/read to/from data subset The correspondence between CMNSTR and the REAL*8 values listed within the first dimension of R8ARR is one-to-one. This means that CMNSTR, a string of blank-separated mnemonics, defines which observation variables are read in and array R8ARR holds the values of those variables. The 2 nd dimension of R8ARR is a replication of CMNSTR, usually showing vertical observation levels. In this case, after these two lines, the array contents should be: hdr(1) hdr(2) hdr(3) obs(1,1) obs(1,2) obs(1,3)... - longitude - latitude - time - temperature observation in 1st level (single level) - temperature observation in 2nd level for multi-level observation - temperature observation in 3rd level for multi-level observation Because these two lines are inside of the message and subset loops, we can get temperature observation with location and time from all observations in the BUFR file. If some subsets do not contain data, the values in the array will show as 10.0E10, which is the missing value. Now, only one BUFRLIB subroutine left in the code needs to be explained: datelen: CALL DATELEN ( LEN ) Input argument: LEN INTEGER Length of Section 1 date-time values to be output by message-reading subroutines such as READMG, READERME, etc. 8 = YYMMDDHH (i.e. 2-digit year) 10 = YYYYMMDDHH (i.e. 4-digit year) This subroutine allows the user to specify the format for the IDATE output argument that is returned by READMG. 112

BUFR and PrepBUFR 8.1.2 Encoding/writing data into a simple BUFR file The following is from the program bufr_encode_sample.

implicit none character(80):: hdstr='xob YOB DHR' character(80):: obstr='tob' real(8) :: hdr(3),obs(1,1) character(8) subset integer :: unit_out=10,unit_table=20 integer :: idate,iret!

118 BUFR and PrepBUFR Encoding/writing data into a simple BUFR file The following is from the program bufr_encode_sample.f90, which shows how to write a few observation variables into a new BUFR file. program bufr_encode_sample!! example of writing one value into a bufr file! implicit none character(80):: hdstr='xob YOB DHR' character(80):: obstr='tob' real(8) :: hdr(3),obs(1,1) character(8) subset integer :: unit_out=10,unit_table=20 integer :: idate,iret! set data values hdr(1)=75.;hdr(2)=30.;hdr(3)=-0.1 obs(1,1)= idate= ! YYYYMMDDHH subset='adpupa'! upper-air reports! encode open(unit_table,file='table_prepbufr.txt') open(unit_out,file='sample.bufr',action='write' &,form='unformatted') call datelen(10) call openbf(unit_out,'out',unit_table) call openmb(unit_out,subset,idate) call ufbint(unit_out,hdr,3,1,iret,hdstr) call ufbint(unit_out,obs,1,1,iret,obstr) call writsb(unit_out) call closmg(unit_out) call closbf(unit_out) end program Here, we can see the BUFR encode procedure has the same structure as the decode procedure: file level, message level, subset level, which are marked in the same color as the decode example in Section The major difference between encode and decode are highlighted in Bold in the code and explained below: open(unit_table,file='table_prepbufr.txt') To encode some observation values into a new BUFR file, a pre-existing BUFR table file is necessary and needs to be opened. 113

119 BUFR and PrepBUFR open(unit_out,file='sample.bufr',action='write',form='unformatted') Action in fortran open command has to be write call openbf(unit_out,'out',unit_table) The second input parameter is set to OUT and the third parameter is the logical unit linking to the BUFR table file. call openmb(unit_out,subset,idate) CALL OPENMB ( LUBFR, CSUBSET, IDATE ) Input arguments: LUBFR INTEGER Logical unit for BUFR file CSUBSET CHAR*(*) Table A mnemonic for type of BUFR message to be opened IDATE INTEGER Date-time to be stored within Section 1 of BUFR message This function opens and initializes a new BUFR message for eventual output to LUBFR, using the arguments CSUBSET and IDATE to indicate the type and time of message to be encoded. call writsb(unit_out) CALL WRITSB ( LUBFR ) Input argument: LUBFR INTEGER Logical unit for BUFR file This subroutine is called to indicate to the BUFRLIB software that all necessary data values for this subset have been stored and are ready to be encoded and packed into the current message for the BUFR file associated with logical unit LUBFR. Before this subroutine, we can see two consecutive calls to the subroutine ufbint, which is the same as in the decode example. However, this time, the strings hdstr and obstr tell the BUFR subroutine ufbint that the array hdr holds longitude, latitude, and observation time and array obs holds temperature observations. They need to be written into a new subset in the BUFR file when subroutine writsb is called Appending data to a simple BUFR file The following is from the program bufr_append_sample.f90, which shows how to append a few observation variables into an existing BUFR file. 114

120 BUFR and PrepBUFR!! sample of appending one observation into bufr file! implicit none character(80):: hdstr='xob YOB DHR' character(80):: obstr='tob' real(8) :: hdr(3),obs(1,1) character(8) subset integer :: unit_out=10,unit_table=20 integer :: idate,iret! set data values hdr(1)=85.0;hdr(2)=50.0;hdr(3)=0.2 obs(1,1)=300.0 idate= ! YYYYMMDDHH subset='adpsfc'! surface land reports! get bufr table from existing bufr file open(unit_table,file='table_prepbufr_app.txt') open(unit_out,file='sample.bufr',status='old',form='unformatted') call openbf(unit_out,'in',unit_out) call dxdump(unit_out,unit_table) call closbf(unit_out)! append open(unit_out,file='sample.bufr',status='old',form='unformatted') call datelen(10) call openbf(unit_out,'apn',unit_table) call openmb(unit_out,subset,idate) call ufbint(unit_out,hdr,3,1,iret,hdstr) call ufbint(unit_out,obs,1,1,iret,obstr) call writsb(unit_out) call closmg(unit_out) call closbf(unit_out) end program Comparing to the encode example, there are only two little different setups, which are highlighted in the code as Bold and explained below: In the Fortran open command, the status has to be set as old because appending requires an existing BUFR file. In the subroutine openbf, the second input parameter has to be set as APN. Another key point that needs special attention for appending is: Appending has to use the exact same BUFR table as the existing BUFR file. To ensure this, we add the following three lines to the code in order to extract the BUFR table from the existing BUFR file: 115

121 BUFR and PrepBUFR call openbf(unit_out,'in',unit_out) call dxdump(unit_out,unit_table) call closbf(unit_out) This extracting action is in the file level and a new subroutine dxdump is called to read in the BUFR table from existing BUFR file linked by logical unit unit_out and then write the extracted BUFR table as an ASCII file linked by logical unit unit_table Examples for GSI PrepBUFR file processing After understanding how to use the NCEP BUFRLIB to encode, decode, and append simple observation information, users can check other examples designed to work with GSI PrepBUFR files under the directory./util/bufr_tools: Code name prepbufr_decode_all.f90 prepbufr_encode_surface.f90 prepbufr_encode_upperair.f90 prepbufr_append_upperair.f90 prepbufr_append_surface.f90 prepbufr_append_retrieve.f90 bufr_decode_radiance.f90 Illustrated process function read BUFR table from an existing prepbufr file read all observation information used by GSI analysis from an existing prepbufr file. write a surface observation into a new prepbufr file write a upper air observation into a new prepbufr file read BUFR table from an existing prepbufr file append a upper air observation into an existing prepbufr file read BUFR table from an existing prepbufr file append a surface observation into an existing prepbufr file. read BUFR table from an existing prepbufr file append retrieved data into an existing prepbufr file. read BUFR table from an existing radiance bufr file real radiance data from an existing radiance bufr file. These files have the same structure and call the same BUFRLIB subroutines/functions of sections to process PrepBUFR files. The only difference is the mnemonic lists used in the PrepBUFR files are much longer, which is usually the case for real observation data ingested by GSI. The following gives a brief explanation of some strings and arrays in processing PrepBUFR files: 116

122 BUFR and PrepBUFR Mnemonic string name Corresponding array name Content hdstr hdr observation head information obstr obs observation values qcstr qcf observation quality markers oestr oer observation errors Users are suggested to check the meaning of each mnemonic through a real-case BUFR table Practice with examples Each above-mentioned sample code has to be compiled after a successful GSI compilation because these examples need to be linked to the BUFR library (BUFRLIB) generated during the compilation. Under the directory./util/bufr_tools/, users can simply type:./make to compile the code. After compilation, we should see the following 10 executables for 10 examples in the same directory. bufr_append_sample.exe bufr_decode_radiance.exe bufr_decode_sample.exe bufr_encode_sample.exe prepbufr_append_retrieve.exe prepbufr_append_surface.exe prepbufr_append_upperair.exe prepbufr_decode_all.exe prepbufr_encode_surface.exe prepbufr_encode_upperair.exe No existing BUFR/PrepBUFR files are needed to test these executables; however, the executables with names like prepbufr_* can also work with a real PrepBUFR file in this directory to illustrate their function. The only existing file required to test the code is a BUFR table, which must be present in this directory. Practice using the sample codes: bufr_encode_sample.exe The BUFR table prepobs_prep.bufrtable in the directory is needed by this sample code. As a result, a new BUFR file, named as sample.bufr, will be created in the directory. It includes one temperature sounding observation with its location and observation time. 117

BUFR and PrepBUFR bufr_decode_sample.exe This sample code reads in (decodes) the single observation from the BUFR file sample.bufr created by bufr_encode_sample.exe in the directory.

1 Longitude Latitude observation time temperature value (XOB) (YOB) (DHR) (TOB) bufr_append_sample.exe This sample code appends a surface temperature into the existing BUFR file, sample.

123 BUFR and PrepBUFR bufr_decode_sample.exe This sample code reads in (decodes) the single observation from the BUFR file sample.bufr created by bufr_encode_sample.exe in the directory. We can see the content of the BUFR file by running this executable: (idate) Cycle time (sbset) Message type ADPUPA Longitude Latitude observation time temperature value (XOB) (YOB) (DHR) (TOB) bufr_append_sample.exe This sample code appends a surface temperature into the existing BUFR file, sample.bufr, in the directory. bufr_decode_sample.exe By running this sample again, we can find that there are two observations in the file now, one from encoding and one from appending: ADPUPA ADPSFC Practice using the sample codes for PrepBUFR: prepbufr_encode_surface.exe and prepbufr_append_upperair.exe We can run prepbufr_encode_surface.exe to encode two surface observations into a PrepBUFR file named as prepbufr and then call prepbufr_append_upperair.exe to append two sounding observations into it. prepbufr_encode_upperair.exe and prepbufr_append_surface.exe Same as above, we can run these two samples to encode two sounding observations into a PrepBUFR file named as prepbufr and then append two surface observations into it. prepbufr_append_retrieve.exe This sample is to append a retrieved conventional observation into a PrepBUFR file named as prepbufr. prepbufr_decode_all.exe 118

COMMUNITY VERSION 3.3. User s Guide. June Developmental Testbed Center

COMMUNITY VERSION 3.3. User s Guide. June Developmental Testbed Center COMMUNITY VERSION 3.3 User s Guide June 2014 Developmental Testbed Center National Center for Atmospheric Research National Centers for Environmental Prediction, NOAA Global Systems Division, Earth System