HANDLING PUBLICLY GENERATED AIR QUALITY DATA PETE TENEBRUSO & MIKE MATSKO MARCH 8 TH, 2017

Similar documents
Semantically enhancing SensorML with controlled vocabularies in the marine domain

Observation trends: Expectations from European Comission regarding data exchange and interoperability

Land Administration and Management: Big Data, Fast Data, Semantics, Graph Databases, Security, Collaboration, Open Source, Shareable Information

Sensor Data Management

Context-aware Services for UMTS-Networks*

EXPERT SERVICES FOR IoT CYBERSECURITY AND RISK MANAGEMENT. An Insight Cyber White Paper. Copyright Insight Cyber All rights reserved.

SmartData Fabric distributed virtual data, graph data and master data management, analytics and security. Solutions and Key Features Revision 2.

The Storage Networking Industry Association (SNIA) Data Preservation and Metadata Projects. Bob Rogers, Application Matrix

Full file at

Sensor Web when sensor networks meet the World-Wide Web

USERS CONFERENCE Copyright 2016 OSIsoft, LLC

Dell Boomi Cloud MDM Overview

BSC Smart Cities Initiative

Writing a Data Management Plan A guide for the perplexed

THE ENVIRONMENTAL OBSERVATION WEB AND ITS SERVICE APPLICATIONS WITHIN THE FUTURE INTERNET Project introduction and technical foundations (I)

M-2-M-2-People. How Mobility Enables Visibility. Daniel Munyan Director, M2M Center of Excellence Computer Sciences Corporation 6/1/2012 1

The Modeling and Simulation Catalog for Discovery, Knowledge, and Reuse

Fluentd + MongoDB + Spark = Awesome Sauce

Reducing Consumer Uncertainty

ASSET AND OPERATIONS MANAGEMENT INTEGRATED SOLUTIONS FOR EFFECTIVE LOW COST MONITORING. Colin Davies. Carbon Based Environmental Pty Ltd

Semantic Web Mining and its application in Human Resource Management

Enrichment of Sensor Descriptions and Measurements Using Semantic Technologies. Student: Alexandra Moraru Mentor: Prof. Dr.

Convergence and Collaboration: Transforming Business Process and Workflows

DATA COLLECTION. Slides by WESLEY WILLETT 13 FEB 2014

On User-centric QoE Prediction for VoIP & Video Streaming based on Machine-Learning

More than a Lifetime of

Data Model Considerations for Radar Systems

Monitoring the Environment with Sensor Web Services

Novel System Architectures for Semantic Based Sensor Networks Integraion

Elysium Technologies Private Limited::IEEE Final year Project

EXTRA Examples of OGC standards in support of health applications

Informatica Enterprise Information Catalog

<Insert Picture Here> Click to edit Master title style

The Semantic Sensor Network Ontology A Generic Language to Describe Sensor Assets

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

Using Linked Data and taxonomies to create a quick-start smart thesaurus

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017

Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science

MINING OPERATIONAL DATA FOR IMPROVING GSM NETWORK PERFORMANCE

Data Sheet. Monitoring Automation for Web-Scale Networks MONITORING AUTOMATION FOR WEB-SCALE NETWORKS -

Analytics and Visualization

Pre-Requisites: CS2510. NU Core Designations: AD

Data Preprocessing. Slides by: Shree Jaswal

Next Steps in Data Mining. Sistemas de Apoio à Decisão Cláudia Antunes

Semantic web based Sensor Planning Services (SPS) for Sensor Web Enablement (SWE)

2. An implementation-ready data model needn't necessarily contain enforceable rules to guarantee the integrity of the data.

Question Bank. 4) It is the source of information later delivered to data marts.

Call for Participation in AIP-6

Labelling & Classification using emerging protocols

The Emerging Data Lake IT Strategy

Grid Computing Systems: A Survey and Taxonomy

Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Perceptions

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Meltem Özturan misprivate.boun.edu.tr/ozturan/mis515

Benefits of Automating Data Warehousing

9/27/15 MOBILE COMPUTING. CSE 40814/60814 Fall System Structure. explicit output. explicit input

A data-driven framework for archiving and exploring social media data

Pedigree Management and Assessment Framework (PMAF) Demonstration

Hitachi Visualization Suite

Enabling Data Governance Leveraging Critical Data Elements

745: Advanced Database Systems

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Data Driving the Smart Grid

Web Services for Geospatial Mobile AR

IoT Mashups with the WoTKit

Enterprise Data Catalog for Microsoft Azure Tutorial

Warfare and business applications

Dynamic Semantics for the Internet of Things. Payam Barnaghi Institute for Communication Systems (ICS) University of Surrey Guildford, United Kingdom

How Insurers are Realising the Promise of Big Data

CHAPTER 2: DATA MODELS

MarkLogic. A Modern Data Platform To Support Your Critical Path COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Driving Interoperability with CMIS

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

Challenges of Positive Train Control Interoperability

A Study of Mountain Environment Monitoring Based Sensor Web in Wireless Sensor Networks

Yeseong Kim. System Energy Efficiency Lab. seelab.ucsd.edu

RiskSense Attack Surface Validation for IoT Systems

Data Management Glossary

Smartcrawler: A Two-Stage Crawler for Efficiently Harvesting Deep-Web Interfaces

MarkLogic Technology Briefing

Survey on Community Question Answering Systems

<Insert Picture Here> Enterprise Data Management using Grid Technology

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

Data formats for exchanging classifications UNSD

Comparison of SmartData Fabric with Cloudera and Hortonworks Revision 2.1

GEOSS Data Management Principles: Importance and Implementation

Introduction to Data Science

Curriculum Guide. ThingWorx

The GeoPortal Cookbook Tutorial

Oracle 10g GeoSpatial Technologies. Eve Kleiman Asia/Pacific Spatial Product Manager Oracle Corporation

The Information Platform of the Future. MarkLogic and Smartlogic

70-532: Developing Microsoft Azure Solutions

Top 20 Data Quality Solutions for Data Science

USC Viterbi School of Engineering

Chapter 3 A New Framework for Multicast Mobility in WiFi Networks

The Value of Metadata

Chapter 6 VIDEO CASES

Transcription:

HANDLING PUBLICLY GENERATED AIR QUALITY DATA PETE TENEBRUSO & MIKE MATSKO MARCH 8 TH, 2017

EXAMPLES OF DEP DATA AND CROWDSOURCING Storm Readiness Beach Assessments Park Closings Emergency Management Social Media Watershed Ambassadors ARMS

AIR INFORMATION MANAGEMENT SYSTEM (ARMS) There are 286 monitors at these 40 air quality stations tracking 35 distinct parameters (continuous and non-continuous) NOX, NO, SO2, PM2.5, NO2, CO, O3, wind speed, wind direction, temp are monitored There approximately 150,000,000 continuous air quality minute data points collected every year.

ARMS SITES & MONITORS

Back Up Wireless Polling System NJDEP Air and Radiation Monitoring System Regular Air Wireless Sites Verizon Air Card Envista Comm Center and Database Digi Wireless Router Verizon Wireless Network OIT ExtraNet GSN WWW WWW.NJAQINOW.NET Hosted by Envitech Updated via FTP from ARMS Comm Center Air FRM Wireless Sites Digi Wireless Router Public Access Layer Secure Access Layer Air TEOM Wireless Sites CREST Wireless Sites Digi Wireless Router Digi Wireless Router Phone Line Modem for Backup Verizon Wireless Access Points Core layer Standby Database Standby Comm. Center & FTP MOXA NPort Modem Bank for Backup Primary Database Clustering Master Comm. Center & FTP Clustering MOXA NPort Modem Bank for Backup CREST Leased Line Sites Leased Line Modem Leased Line Modem FRM/TEOM Modem Oracle Observer (Manage Fast Start Failover) Slave Comm. Center & FTP Air FRM Phone Line Sites Air TEOM Phone Line Sites Phone Line Modem Phone Line Modem Leased Lines & Phone lines Network Green Devices represent future projects Standby NJ Air/Rad Monitoring System @ 401 East State Street DEP TLS Primary NJ Air/Rad Monitoring System @ Troop C Prepared by Harry Chen, 12/1/2010

CROWDSOURCING a specific sourcing model in which individuals or organizations use contributions from Internet users to obtain needed services or ideas Amazon Mechanical Turk Kickstarter Wikipedia

BACKGROUND Massive data deluge in recent years 80% of the worlds data is unstructured (images, videos, raw text, etc.) Algorithms to fully comprehend unstructured data have not been developed yet Many experts believe we are at least several decades away from this goal

CONSIDERATIONS OF USING INFO FROM 'CROWDS' Can disseminate both valid and invalid information Crowds often have no immediate way to discern truth from falsehood Crowds are prone to add opinion to data; which sometimes sticks more than the credible data themselves. Separating opinion and credible data through expert interpretation and curation, both centralized and decentralized, is important Very few organizational or procedural channels specifying how to aggregate and incorporate information in decision making Better information is needed not necessarily more monitors.

INTEGRATING EXPERTS, CROWDS, & ALGORITHMS.

CROWD SOURCING CONCERNS How to solicit users What they can contribute How to combine their contributions How to manage quality, open versus close worlds, query semantics, query execution, optimization, and user interfaces

BENEFITS OF MACHINE LEARNING Feature extraction i.e. interpreting text to infer time, location, people, etc. referred to in it; Classification - classify, group or tag information based on some explicit or unknown criteria; Clustering - Machines can process vast amounts of data and present correlations and proximities that escape the human eye and brain, sometimes discovering non-obvious correlations between variables With large amounts of data available, it is not even necessary to have a deep understanding of the relationships within the data themselves: machines can on their own distil the noise from the relevant correlations through successive optimization.

MACHINE LEARNING SHORTCOMINGS Algorithms are more specific than sensitive, meaning that important signals may be missed (false negatives) A combination of algorithms is important to draw different types of events and event features from undifferentiated data understanding which algorithms, through experience, is essential Algorithms need to be thoroughly validated and tested and reassessed Algorithms need data to train and feedback to learn. Out of the box value is difficult Human factor lazy over time experience with accepted algorithms, where over-dependency and improper cross-checks of an algorithm's results may result in missed or misinterpreted signals; Low social acceptance of systems that do not function in a way that is predictable or describable Past misuse of machine learning has led users to fear and distrust algorithm w/o some human interaction. Should an algorithm declare a health emergency or should it help present data to an expert or authority with 'suggestions' and 'red flags', and then the authority can declare a health emergency

STANDARDIZATION NEEDED FOR INTEROPERABILITY Interoperability challenges with data formats, service interfaces, semantics and measurement uniformity Broad usage of open sensor standards is needed The Sensor Web Enablement Initiative (SWE) by the OGC (Open Geospatial Consortium) seeks to provide open standards and protocols for enhanced operability within and between multiple platforms and vendors. They aim to make sensors discoverable, query-able, and controllable over the Internet. Currently, the SWE family consists of seven standards: Sensor Model Language (SensorML) XML Schemas to defining geometric, dynamic and observational properties of a sensor. Accommodates sensor discovery, processing and analysis of the retrieved data, as well as the geo-location of observed values. Observations & Measurements (O&M) Transducer Model Language (TML) Generally speaking, TML can be understood as O&M's pendant or streaming data by providing a method and message format describing how to interpret raw transducer data. Sensor Observation Service (SOS) This component provides a service to retrieve measurement results from a sensor or a sensor network.

STANDARDIZATION CONTINUED Sensor Planning Service (SPS) This component provides a standardized interface for collection assets and aims at automating complex information flows in large networks.. Sensor Alert Service (SAS) Interfaces enabling sensors to advertise and publish alerts, including according metadata. Web Notification Service (WNS) Enables 1 & 2 way message exchanges, with other services. This process is especially expedient when several services are required to comply with a client's request, or when an according response is only possible under considerable delays.

SENSOR OBSERVATION SERVICE (SOS)

NEED A GOOD PLAN What are you trying to do - what s the value of this data What s the approach? Selecting location and placement Collecting Quality control Sensor maintenance Data review Data validation Issues (interference and drift) Analyze, interpret, communicate results QA QC

SENSOR CONSIDERATIONS Low cost Varying reliability, quality, and accuracy Questionable maintenance and calibration Pollutants measured (ozone, PM, volatiles) Location and Placement - Fixed/mobile, in/outside, below/above ground IOT Security of devices

DATA MANAGEMENT CONSIDERATIONS Several Existing repositories - would not want to replicate DEP had experience in managing large sets of data but not at this potential scale Large cost of managing data Infrastructure/Tools/etc. Leverage existing Real time and historical APIs Separation of local, state, and nationwide data Integration and analysis with existing state data