National Center for Emerging and Zoonotic Infectious Diseases PulseNet Updates: Transitioning to WGS for Reference Testing and Surveillance Kelley Hise, MPH Enteric Diseases Laboratory Branch Division of Foodborne, Waterborne and Environmental Diseases PulseNet/OutbreakNet West Coast Regional Meeting February 5, 2019
Overview Transition Timeline Updates on Conversions/Certifications Sequencing Prioritization Turn Around Times Data Analysis Workflow with the National Databases Combined Organism Databases
Transition Timeline Updates on Conversions and Certifications
Dates for PulseNet s Transition to WGS as the Gold Standard for Foodborne Surveillance January 15, 2018 Listeria March 15, 2019 Salmonella, STEC, Shigella October 1, 2018 Campylobacter
PulseNet, WGS and Enhanced Epidemiological Capacity Converted to BioNumerics 7.6 Not WGS wet lab certified OutbreakNet Enhanced or FoodCORE Converted to BioNumerics 7.6 and WGS analysis certified for Listeria, Salmonella, Escherichia and Campylobacter AK WA ME HI USDA/FSIS (WL) CA2 CASC CA CAOC LAC OR NV NVLV ID UT AZ MT WY CO NM ND SD NE KS OK MN IA MO AR WI USDA (MWL) IL2 IL MS IN MI TN KY AL OH GA FNE NYAG WV PA SC VA NC NY MD VT NJ NH MA CT NYC DE DC RI USDA/FSIS (EL) NJEP FDA Converted to BioNumerics 7.6: 31 states 35 labs WGS Analysis Certified: 5 states 5 labs TX HU LA PR FL Area Laboratories PulseNet Central Modified: February 1, 2019
Conversion Tips You MUST clean your local databases DATES: must be in correct format KEYS: no spaces at the end BUNDLES: permanent bundles should be deleted or moved to a different location PLUGINS: all plugins must be deactivated. The MLVA plugin is known to cause an issue when converting. LIBRARIES: all libraries should be deleted COMPARISONS: saved comparisons will be lost in the conversion For more detailed info: SharePoint PulseNet Documents Database Cleaning Guidelines
Conversion Tips Local IT support should be readily available the week of your conversion Review the Prep instructions Review training documents on SharePoint so you are ready to upload PFGE patterns as soon as you have converted (WGS tools to be added in March) SharePoint PulseNet Documents WGS PHL Upgrade to BioNumerics v7.6 BN7 PFGE Data Management Call/email PulseNet with questions related to conversion Once converted, email pulsenet@cdc.gov to let them know and request analysis certification information
Analysis Certifications: Request after Conversion PNQ08 has been updated WGS Analysis Certification is available for Escherichia, Salmonella, Listeria, and Campylobacter using BioNumerics v 7.6 Receive: (1) Certification set assignment A, B or C (i.e. Listeria Certification Set_A) and (2) Instructions for accessing fastq files, associated metadata, a bundle file and an analysis submission template via the PulseNetQA FTP site
Analysis Certifications: Tips Do not read too much into the quality metrics threshold tables No need to list every possible reason a sequence should be repeated Pay attention to the metrics in Red on the table in PNQ08-6 Metrics in black type are important and can provide information but aren t required for data to be uploaded to PulseNet Use metrics listed and not those for wet lab certifications (i.e. cannot find median insert size in BioNumerics) Look at NCBI submission presentation and SOP to understand NCBI metadata requirements
Sequencing Prioritization and Turn Around Times
Sequencing Prioritization (as of March 15, 2019) Listeria and STEC: sequence all isolates Salmonella: sequence all isolates if possible prioritize isolates with cluster codes (while PFGE remains) random sequencing, e.g. every other or every third as requested by CDC and/or epi Campylobacter and Shigella: sequence all isolates, but prioritize other organisms first* *unless specifically funded by other projects, like FoodNet
Turn Around Times (TATs) Starting January 1, 2019 TAT will be calculated from the date the isolate was received (or recovered) in the PHL to the date of upload to the national database Day 1 is date of receipt of a culture; for CIDTs, day 1 is not until an isolate is recovered Should be 7 working days or less for WGS Keep track of local TATs Track steps along the way to determine areas for improvement
Turn Around Times: Calculate in BioNumerics 1. Select entries to calculate TAT 2. Click on clock at top of screen 3. Define parameters NOTE Upload_Date is not available PulseNet_UploadDate is populated in BN7.6 upon upload Suggest moving contents of Upload_Date to PulseNet_UploadDate for entries prior to conversion
Data Analysis Workflow with the National Databases
Data Analysis Workflow with National Database PHL Raw Sequence Data Private Raw Sequence Storage Reference ID Database Organism-specific Database PulseNet National Databases 1. Sequence isolates using PulseNet Key number File naming format: Key-LabID-M###- YYMMDD 2. Save generated sequence files locally, on BaseSpace, or external hard drive 3a. Link sequence data to Reference ID database by PulseNet Key name 3b. Submit data to calculation engine (CE) for denovo assembly, species identification (Genus, Species by ANI) 3c. Verify quality 3d. Export entries 4a. Import entries from Reference ID 4b. Add demographic information for entries 4c. Submit to the CE* for allele calls and genotyping results (serotype, AST, virulence) 4d. Verify quality and upload to national database (WGS id automatically assigns) 4e. Upload raw data sequence reads to NCBI 4f. Perform surveillance in BioNumerics Public Raw Sequence Data Storage *CE: Calculation Engine Updated 10/23/2018
Calculation Engine (CE) CDC Calculation Engine Calculation engine built to be highly customizable with easy integration of both custom-made and open source code CE Store: server on the CE that states upload their data to Offers temporary storage of sequences QA/QC, trimming, mapping, SNP detection, allele detection
RefID Database to Organism-Specific Database 3b and c. Submit raw reads, and retrieve assembly with basic QC metrics Reference ID Database PHL 3d. Export de novo assemblies, QC metrics, taxa ID to correct organismspecific database Organism-specific Database CDC Select sequences to submit to the CDC s calculation engine (CE) and retrieve the de novo assemblies and basic QC metrics Once assemblies received, resubmit to CE and get back the taxonomic identification De novo assemblies, QC metrics and taxa ID can then be exported and imported into the organismspecific database based on the genus/species identified. Either create new entry or link to previously imported entry (i.e. PFGE already done). Calculation Engine Allele Databases
Submission to the Calculation Engine Select sequences for analysis Choose from a list of algorithms to select for analysis (figure) Submit assemblies and raw reads to the CDC calculation engine PHL Organism-specific Database 4c. Submit sequence data for allele calls and genotyping results CDC Calculation Engine Allele Databases
Analyzed Results in BioNumerics PHL Organism-specific Database Retrieve allele calls after submission Predicted Serotype, Resistance, Virulence CDC User retrieves jobs from calculation engine Allele calls/additional quality metrics are imported into user database, and includes predicted: Serotype Resistance Virulence Calculation Engine Allele Databases
Upload to the PulseNet National Database PHL Organism-specific Database 4e. Upload allele calls and metadata Download allele code, outbreak code, etc. Authenticate to PulseNet firewall Select entries and analyzed data to upload to PulseNet national databases User can search national database for close matches to uploaded sequence data and download things like outbreak codes and allele codes CDC PulseNet National Databases
Submission to NCBI PHL Upload raw sequence data with minimal metadata Can create and save templates for upload of biosample, sequence metadata, and fastq files to NCBI Can import NCBI-assigned ids back into user database (e.g. NCBI accession and SRR numbers) Organism-specific Database Public Raw Sequence Data Storage
Questions? For more information, contact CDC 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 www.cdc.gov The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention. Telephone: 404-639-4558 E-mail: PulseNet@cdc.gov #PulseNet Web: www.cdc.gov/pulsenet