SigMF: The Signal Metadata Format. Ben Hilburn

Similar documents
CTI-TC Weekly Working Sessions

Doc: WBT Programmers. TECH Note. WBT-Your All In One RF Test Solution. 9 February 2015 Rev: 004

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

mole Documentation Release 1.0 Andrés J. Díaz

What is version control? (discuss) Who has used version control? Favorite VCS? Uses of version control (read)

Citrix Analytics Data Governance Collection, storage, and retention of logs generated in connection with Citrix Analytics service.

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch

Weak Consistency and Disconnected Operation in git. Raymond Cheng

File Systems: Fundamentals

Optical Data Interface ODI-2 Transport Layer Preliminary Specification. Revision Date

Qualifications Dataset Register User Manual: Publishing Workflow

Homework: Building an Apache-Solr based Search Engine for DARPA XDATA Employment Data Due: November 10 th, 12pm PT

Open Data Standards for Administrative Data Processing

COMMUNITIES USER MANUAL. Satori Team

CrownPeak Playbook CrownPeak Search

Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions

Hitachi & Red Hat collaborate: Container migration guide

Version Control with GIT: an introduction

Automating Digital Downloads

COMP 530: Operating Systems File Systems: Fundamentals

Biocomputing II Coursework guidance

TeamSpot 2. Introducing TeamSpot. TeamSpot 2.5 (2/24/2006)

D2K Support for Standard Content Repositories: Design Notes. Request For Comments

Rest Client for MicroProfile. John D. Ament

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Making a Business Case for Electronic Document or Records Management

Remembering 9/11 Accessing Oral Histories for Educational and Research Purposes

Parallelizing Windows Operating System Services Job Flows

09/06: Project Plan. The Capstone Experience. Dr. Wayne Dyksen Department of Computer Science and Engineering Michigan State University Fall 2017

HDF Product Designer: A tool for building HDF5 containers with granule metadata

NetPGP BSD-licensed Privacy. Alistair Crooks c

Mascot Insight is a new application designed to help you to organise and manage your Mascot search and quantitation results. Mascot Insight provides

Online Data Analysis at European XFEL

Github/Git Primer. Tyler Hague

How to write ADaM specifications like a ninja.

ISO Document management applications Electronic document file format enhancement for accessibility Part 1: Use of ISO (PDF/UA-1)

NGFI RMIX traffic profile

IP over DVB workshop

A New Model for Image Distribution

Programming Models MapReduce

This Statement of Work describes tasks to be performed by the RFC Production Center (RPC).

IMS Bench SIPp. Introduction. Table of contents

MI-PDB, MIE-PDB: Advanced Database Systems

Buffer Management for XFS in Linux. William J. Earl SGI

Preserving. Your Digital Memories. The National Digital Information Infrastructure and Preservation Program

Active and Passive Metrics and Methods (and everything in-between, or Hybrid) draft-ietf-ippm-active-passive-03 Al Morton Nov 2015

Reproducibility with git and rmarkdown

Newmont 3. Patrick Barringer Tomas Morris Emil Marinov

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

Lecture 3. Essential skills for bioinformatics: Unix/Linux

Reproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team

TeamSpot 3. Introducing TeamSpot. TeamSpot 3 (rev. 25 October 2006)

TeamSpot 3. Introducing TeamSpot. TeamSpot 3 (8/7/2006)

The TDAQ Analytics Dashboard: a real-time web application for the ATLAS TDAQ control infrastructure

Introducing Microsoft s commitment to interoperability (Office, Windows, and SQL)

Getting Started Guide

The Quick Access Toolbar can be either just below the Title Bar or, as in this case, just above the ruler.

Hawaii Energy and Environmental Technologies (HEET) Initiative

NOMAD Metadata for all

GFS-python: A Simplified GFS Implementation in Python

Consider the following to determine which data and fields to include in your report:

Laboratorio di Programmazione. Prof. Marco Bertini

A Tutorial on Apache Spark

3GPP TS V ( )

Intuitive distributed algorithms. with F#

File Systems: Fundamentals

White Paper(Draft) Continuous Integration/Delivery/Deployment in Next Generation Data Integration

Essential Skills for Bioinformatics: Unix/Linux

Building a Data Storage System

Subscriber Data Correlation

Distributed Filesystem

Topics covered. Introduction to Git Git workflows Git key concepts Hands on session Branching models. Git 2

Introduction to Remote Sensing Wednesday, September 27, 2017

More info: Complete demo of CoZone Collaborate:

For example, let's say that we have the following functional specification:

Project Proposal: Deep Routing Simulation

The EHRI GraphQL API IEEE Big Data Workshop on Computational Archival Science

THE COMPLETE GUIDE HADOOP BACKUP & RECOVERY

CSC 2700: Scientific Computing

cget Documentation Release Paul Fultz II

New Features Summary. SAP Sybase Event Stream Processor 5.1 SP02

G E T T I N G S TA R T E D W I T H G I T

Technical Security Standard

TZWorks Graphical Engine for NTFS Analysis (gena) Users Guide

How to Prepare a Digital Edition PDF Book With Microsoft Word

Registry for Performance Metrics

TREENO ELECTRONIC DOCUMENT MANAGEMENT. Administration Guide

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

Key Technologies for Security Operations. Copyright 2014 EMC Corporation. All rights reserved.

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

Introduction to Supercomputing

XCOLD Information Leak

Git AN INTRODUCTION. Introduction to Git as a version control system: concepts, main features and practical aspects.

Using GitHub to Share with SparkFun a

Version Control Systems

Harnessing the Web to Streamline Statistical Programming Processes

ALERT2 System & Field Maintenance

Lessons Learned While Building Infrastructure Software at Google

White Paper. Nexenta Replicast

DOI for Astronomical Data Centers: ESO. Hainaut, Bordelon, Grothkopf, Fourniol, Micol, Retzlaff, Sterzik, Stoehr [ESO] Enke, Riebe [AIP]

Transcription:

SigMF: The Signal Metadata Format Ben Hilburn

Problem Statement How do you make large recordings of samples more practically useful?

Signal Metadata Format Format for describing recordings of digital samples. Why is this useful? Don t need hardware Signals you don t have access to Reproducibility (for science!) Collaborative processing Basically code comments for signal data Create feature / characteristic annotations Moving data between tools/workflows and retaining meta-information

Example Usage Scenario A receiver will monitor some piece of the spectrum and record the raw data, which will then be processed by some signal detection & classification analysis engine, the results of which will be visualized by a human operator in a GUI. Historically, this sort of system would need to be monolithic to guarantee that each block could understand the data from the blocks upstream. This limits the operator s ability to update pieces of the system or analyze the data with different tools. It is effectively system-lock-in, making the data useless outside of that one system. SigMF solves this problem by defining an open standard for the recording, including the raw data and metadata.

The SigMF Standard Open-Source Standard Itself released under CC-BY-SA license Specification format loosely based on IETF RFC formats. Metadata is written with JSON Portability! Also, readability. This is not just for GNU Radio. The goal is create something that allows moving datasets between tools and workflows without a loss or corruption of information.

Types of Applications Defined by the Standard Writers are applications that are writing metadata. Must be able to do this in a streaming fashion Must be able to do this without filling up your disk with useless data. Readers are applications that are reading metadata and samples. Must be able to do this in a streaming fashion. Must know, unambiguously, what metadata applies to what samples.

Driving Principles Make it as straight-forward as possible for developers to create applications, translation libraries, and integrate the format into existing systems. Define a core namespace, but allow extensions for users to expand the scope of the metadata. e.g., if you are building a detector and want to specify custom fields, you can simply create a `detector` namespace for new fields Guarantee that metadata written with a compliant writer application is useful in a compliant reader application. Means we must define what compliance means for both applications as well as the data & metadata. Keep things simple and easy to understand.

A SigMF Recording A SigMF Recording of one flat data file and one flat metadata file The data file is just samples The metadata file is just JSON Recordings can be stored & distributed in an archive format. Archives have a defined directory structure for including multiple Recordings

High-Level Structure of a SigMF Metadata File Fundamentally, there are three questions you need to answer to be able to understand the data and metadata: How do I understand these files? How do I understand these samples? How do I understand this metadata? SigMF has three top-level JSON objects to answer these three questions.

How do I understand these files? Global - General information about the file. The minimal information needed to parse the dataset file. Example fields: Datatype: How are the samples stored? Sample Rate: What is the sample rate at which this data was recorded? Version: What version of the SigMF standard is in use? Author: Who created these files? License: What is the license of this data? Hash: A hash of the data to provide proof of integrity. etc.,

How do I understand these samples? Global - General information about the file. Captures - An array of segments that describe the parameters of the capture, starting at a certain sample index. Example fields: Center Frequency: At what frequency was the radio tuned to during the capture? Timestamp: What is the timestamp of a particular sample index? etc.,

How do I understand the metadata? Global - General information about the file. Captures - An array of segments that describe the parameters of the capture. Annotations - An array of segments that describe features or provides comments about the signal. Annotations are meant to be as flexible as possible. detected interference here classified modulation as QAM64 cat jumped on antenna etc.,

Mapping Metadata to Samples Each piece of metadata is mapped to samples using the most fundamental unit in the recording: Sample Index Metadata that applies to many samples provides both the starting index at which the metadata applies, as well as the number of samples it applies to. Active Development: Annotations in frequency

Continuously Varying Fields Dealing with fields that are continuously changing can be a significant challenge for metadata. Example: If your receiver is in a vehicle, how do you record the changing geolocation in a useful way? Example: If your antenna is a spinning RADAR dish, how do you record the changing azimuth of your aperture? SigMF makes this easy: these continuously variably fields are just another SigMF recording! (This feature is still under active development, and is not yet reflected in the master spec) (Also under development is using this feature for continually varying sample rates)

Why didn t you use <X>? VITA49 HDF5 The packet header-based format makes it impossible to separate the metadata from the data for modification, inspection, a streaming without also interacting with the data. Some fields cannot be continuously varying with VITA49 (e.g., sample rate) Everyone seems to have their own implementation where they have cherry-picked the parts of VITA49 they need, and no one agrees on what VITA49 actually is. This breaks portability of datasets. The HDF5 standard is highly generalized, and as a result of the large scope it is much more complex in terms of the standard, readers, writers, and structure of the data. Because of the generality, there is no definition or agreement on what it would mean to create a compliant application of metadata for signals. i.e., we could not guarantee that portability.

Progress Draft of the standard is live on Github Actively discussing and making changes using Issues and Pull Requests Python validator for SigMF files Alpha web front-end for the validator Already have involvement from numerous other communities, organizations, and authors of non-gnu Radio tools. Things to do: Give us feedback, ask questions, and help us improve the standard! GNU Radio Source / Sink blocks (good example implementations) Static Visualization Tool Make a slick logo! Proselytize!

Come get involved! https://github.com/gnuradio/sigmf