Information to Insight

Similar documents
High Scalability Resource Management with SLURM Supercomputing 2008 November 2008

Optimizing Bandwidth Utilization in Packet Based Telemetry Systems. Jeffrey R Kalibjian

NIF ICCS Test Controller for Automated & Manual Testing

Testing PL/SQL with Ounit UCRL-PRES

Portable Data Acquisition System

METADATA REGISTRY, ISO/IEC 11179

Resource Management at LLNL SLURM Version 1.2

Stereo Vision Based Automated Grasp Planning

Alignment and Micro-Inspection System

Go SOLAR Online Permitting System A Guide for Applicants November 2012

Java Based Open Architecture Controller

Testing of PVODE, a Parallel ODE Solver

In-Field Programming of Smart Meter and Meter Firmware Upgrade

Electronic Weight-and-Dimensional-Data Entry in a Computer Database

PJM Interconnection Smart Grid Investment Grant Update

ALAMO: Automatic Learning of Algebraic Models for Optimization

PJM Interconnection Smart Grid Investment Grant Update

OPTIMIZING CHEMICAL SENSOR ARRAY SIZES

Bridging The Gap Between Industry And Academia

Adding a System Call to Plan 9

ENDF/B-VII.1 versus ENDFB/-VII.0: What s Different?

Kara Greenfield, William Campbell, Joel Acevedo-Aviles

and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

HPC Colony: Linux at Large Node Counts

COMPUTATIONAL FLUID DYNAMICS (CFD) ANALYSIS AND DEVELOPMENT OF HALON- REPLACEMENT FIRE EXTINGUISHING SYSTEMS (PHASE II)

DERIVATIVE-FREE OPTIMIZATION ENHANCED-SURROGATE MODEL DEVELOPMENT FOR OPTIMIZATION. Alison Cozad, Nick Sahinidis, David Miller

Intelligent Grid and Lessons Learned. April 26, 2011 SWEDE Conference

CHANGING THE WAY WE LOOK AT NUCLEAR

FY97 ICCS Prototype Specification

Big Data Computing for GIS Data Discovery

ACCELERATOR OPERATION MANAGEMENT USING OBJECTS*

Integrated Training for the Department of Energy Standard Security System

SmartSacramento Distribution Automation

Advanced Synchrophasor Protocol DE-OE-859. Project Overview. Russell Robertson March 22, 2017

DOE EM Web Refresh Project and LLNL Building 280

NATIONAL GEOSCIENCE DATA REPOSITORY SYSTEM

Technical Information Resources for Criticality Safety

LosAlamos National Laboratory LosAlamos New Mexico HEXAHEDRON, WEDGE, TETRAHEDRON, AND PYRAMID DIFFUSION OPERATOR DISCRETIZATION

REAL-TIME MULTIPLE NETWORKED VIEWER CAPABILITY OF THE DIII D EC DATA ACQUISITION SYSTEM

Protecting Control Systems from Cyber Attack: A Primer on How to Safeguard Your Utility May 15, 2012

@ST1. JUt EVALUATION OF A PROTOTYPE INFRASOUND SYSTEM ABSTRACT. Tom Sandoval (Contractor) Los Alamos National Laboratory Contract # W7405-ENG-36

Development of Web Applications for Savannah River Site

The National Collaborative for Bio-Preparedness

On Demand Meter Reading from CIS

Urika: Enabling Real-Time Discovery in Big Data

GA A22637 REAL TIME EQUILIBRIUM RECONSTRUCTION FOR CONTROL OF THE DISCHARGE IN THE DIII D TOKAMAK

Systems Integration Tony Giroti, CEO Bridge Energy Group

OUTDATED. Policy and Procedures 1-12 : University Institutional Data Management Policy

Cross-Track Coherent Stereo Collections

SEI/CMU Efforts on Assured Systems

Use of the target diagnostic control system in the National Ignition Facility

IBM. IBM i2 Enterprise Insight Analysis User Guide. Version 2 Release 1

Final Report for LDRD Project Learning Efficient Hypermedia N avi g a ti o n

Flux: Practical Job Scheduling

National Counterterrorism Center

Saiidia National Laboratories. work completed under DOE ST485D sponsored by DOE

Does a SAS 70 Audit Leave you at Risk of a Security Exposure or Failure to Comply with FISMA?

Collaborating with Human Factors when Designing an Electronic Textbook

OFFICE OF THE DIRECTOR OF NATIONAL INTELLIGENCE INTELLIGENCE COMMUNITY POLICY MEMORANDUM NUMBER

UNCLASSIFIED. September 24, In October 2007 the President issued his National Strategy for Information Sharing. This

5A&-qg-oOL6c AN INTERNET ENABLED IMPACT LIMITER MATERIAL DATABASE

ESNET Requirements for Physics Reseirch at the SSCL

Risk-based security in practice Turning information into smart screening. October 2014

Site Impact Policies for Website Use

GDPR: An Opportunity to Transform Your Security Operations

Distributed Hybrid MDM, aka Virtual MDM Optional Add-on, for WhamTech SmartData Fabric

Graphical Programming of Telerobotic Tasks

Shrapnel Impact Probability and Diagnostic Port Failure Analysis for LLNL s Explosives Testing Contained Firing Facility (CFF)

Winnebago Industries, Inc. Privacy Policy

Large Scale Test Simulations using the Virtual Environment for Test Optimization

MULTIPLE HIGH VOLTAGE MODULATORS OPERATING INDEPENDENTLY FROM A SINGLE COMMON 100 kv dc POWER SUPPLY

Washington DC October Consumer Engagement. October 4, Gail Allen, Sr. Manager, Customer Solutions

Ringtail Certification Program Guide

Entergy Phasor Project Phasor Gateway Implementation

Displacements and Rotations of a Body Moving About an Arbitrary Axis in a Global Reference Frame

v /4 Quick Short Test Report Technical Publication Transfer Test Hughes Tucson Support Systems Operation MIL-D-28001A (SGML) CTN Test Report AFTB-ID

TUCKER WIRELINE OPEN HOLE WIRELINE LOGGING

Advanced Cyber Risk Management Threat Modeling & Cyber Wargaming April 23, 2018

STRENGTHENING THE CYBERSECURITY OF FEDERAL NETWORKS AND CRITICAL INFRASTRUCTURE

SOC for cybersecurity

Building an Effective Threat Intelligence Capability. Haider Pasha, CISSP, C EH Director, Security Strategy Emerging Markets Office of the CTO

BWXT Y-12 Y-12. A BWXT/Bechtel Enterprise COMPUTER GENERATED INPUTS FOR NMIS PROCESSOR VERIFICATION. Y-12 National Security Complex

Real Time Price HAN Device Provisioning

Clusters Using Nonlinear Magnification

TESTIMONY MAURITA J. BRYANT, ASSISTANT CHIEF FIRST NATIONAL VICE PRESIDENT NATIONAL ORGANIZATION OF BLACK LAW ENFORCEMENT EXECUTIVES, (NOBLE)

Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth

Reduced Order Models for Oxycombustion Boiler Optimization

Integrated Volt VAR Control Centralized

The NMT-5 Criticality Database

Cluster-based 3D Reconstruction of Aerial Video

Extranet Site User Guide for IATA Aircraft Recovery Portal (IARP) (Website Structure)

NEODC Service Level Agreement (SLA): NDG Discovery Client Service

GA A26400 CUSTOMIZABLE SCIENTIFIC WEB-PORTAL FOR DIII-D NUCLEAR FUSION EXPERIMENT

IEEE Electronic Mail Policy

KCP&L SmartGrid Demonstration

GA A22720 THE DIII D ECH MULTIPLE GYROTRON CONTROL SYSTEM

IBM Proventia Management SiteProtector. Scalability Guidelines Version 2.0, Service Pack 7.0

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Massive Scalability With InterSystems IRIS Data Platform

- Q807/ J.p 7y qj7 7 w SfiAJ D--q8-0?dSC. CSNf. Interferometric S A R Coherence ClassificationUtility Assessment

Transcription:

Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466 UCRL-PRES-211485 UCRL-PRES-211467 This work was performed under the auspices of the U.S. Department of Energy by University of California, Lawrence Livermore National Laboratory under Contract W-7405-Eng-48

We must be able to address the analysts requirements Strategic analysis See the big picture and how to counter terrorism Support decision makers in setting policies and priorities Integral to targeting technical and human source collection Tactical analysis Predict and warn of pending attacks Provide an understanding of our adversaries' current intentions and capabilities Allow the United States to act with precision both defensively and offensively Both strategic and tactical analysis require a system capable of fusing information obtained from very diverse sources The Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement (ADVISE) system is being developed for DHS S&T to meet these requirements

ADVISE lets us understand the information that characterizes our national security challenges ~1990 2003 + Viewing, Analysis, & Insight Integration & Correlation Compatible interfaces for viewing, analysis, & insight Template Subgraphs Knowledge Interface Semantic Graph Creating chains of relationships between disjoint information Information Data Scalable, adaptive interface to disparate data sources with unique sensors Ontology Information Interface Imagery Sensors Organizations Sensors Text reports Networks Distribution and Automation Volume and Disparity of Sources

What drives the design Connect the Dots Scaling to massive data volume Ingest information from information sources 100 s of systems Real-time High-throughput Stove-piped by intent Support 100 s of analysts Event notification in near-real time Control access and Protect privacy Responsive to change

What to consider when scaling to massive levels What do we want from the Knowledge Fusion engine? Relations between facts (nodes) Individual facts without relations not particularly useful (might as well keep stovepipes) Relate facts (build the graph) at high ingest rates with results in real-time Responsive to change What is important when scaling to massive sizes? An optimal model Use relations between facts (connectivity) to extract knowledge from data Query performance Key for high-complexity algorithms

Semantic graphs provide the basis for these massive knowledge relationships Called Jon Miller Jenifer Jones Works for Works for Santa Clara Emailed Company Y Located in Located in California Fremont David Smith Vendor to Livermore Uses Works for Subcontractor to Information source Classification sparkie.llnl.gov Time (effectively) Geo-location (if appropriate) Confidence Located in Owned by LLNL Web connection to Company x Owner by www.x.com

The fused graph reveals connections and gaps not immediately apparent Existing search tools can find documents that contain a given connection: Person A & Person B Person A & Country X Person A & City Y Graph identifies connections that span several messages (sources): Previously unknown Middlemen (path traversal)??? Person A Person B WMD Program (Person C) Hidden common connection (unknown nodes) Person B Person A??? (Country X) Financial transactions Material transhipment

Two facts "fuse" when they contain a common node with identical attribute values Alice Alice Traveled to Chicago Sent e-mail to Bob Bob Subscribes to Yahoo.com

ADVISE canonicalizes data to maximize fusion and improve searches Chicago chicago After canonicalization and fusion Chicago windy city Applications can use any of the organization names to get the same result

The ADVISE system model partitions the design Application Layer New applications can utilize the semantic graph, template subgraphs, and ontology to develop complex insights Knowledge Layer The Knowledge Layer fuses facts and relations into a massive-scale, ontologydriven semantic graph. Information Layer The Information Interface supports multiple high throughput distributed information systems that send facts directly to ADVISE. Visualization Simulation Network analysis... Knowledge Interface Template Subgraphs Ontology Semantic Graph Information Interface Dynamic sources... Lists / Files

Creating entities and relationships from free text is critical BAGHDAD, Iraq (CNN) -- A hostage shown in a videotape on an Arabic language satellite TV network Wednesday is the American executive who was kidnapped Monday at a construction site in Baghdad, according to a U.S. Embassy official. Jeffrey Ake, president and chief executive officer of a machine manufacturing firm, was seen in the video being held at gunpoint by militants. HARD Country: Iraq Country: Iraq City: Baghdad, Iraq City: Baghdad, Iraq Location: construction site Location: construction site Person: U.S. Embassy official Person: U.S. Embassy official Person: Jeffrey Ake Person: Jeffrey Ake Relation: LOCATED_IN Relation: LOCATED_IN Locatee: construction site Locatee: construction site Locator: Baghdad, Iraq Locator: Baghdad, Iraq Event: KIDNAPPING Event: KIDNAPPING Victim: Jeffrey Ake Victim: Jeffrey Ake Perpetrator: militants Perpetrator: militants Location: construction site Location: construction site

Integrating knowledge extraction into ADVISE Application Layer Knowledge Layer Information Layer Knowledge Extraction Spud Tagging Assistant Extractor Specific Parser Ontology Translator & Plugins Louisiana Framework Extractor Specific Parser Extraction Engine Extraction Engine Lists / Files

Evaluating extraction engines Qualitative: Show resultant graph to analysts They hate it Quantitative: Compare engine output to an answer key Modified GATE to 1 evaluate extraction engine results against 0.9 one another or against 0.8 a hand-annotated answer key 0.7 Hand-annotated some documents (not fun) 0.6 Can use documents entered via Spud 0.5 0.4 0.3 0.2 0.1 0 All Entities All Links

Current direction for text extraction Integration Improve usability of Louisiana Add graph interactivity to Spud Work on merging results from multiple engines Evaluation Evaluate more engines AeroText and ClearForest on deck Look for applicable pre-tagged document corpora Build graph-comparison capability in ADVISE Collaboration

Graph analysis environment Component Analysis Graph Metrics Community Analysis Pattern Analysis? Strength of Association

Component Analysis assists in the understanding of how graphs fuse We build a semantic graph from various information sources The graph is based on an ontology, which only allows certain relationships Some data will fail to fuse Analyzing resulting components can provide us valuable information about data fusion

Community Analysis partitions the graph into clusters of related nodes Measure the betweenness of each link Eliminate the link with highest betweenness Stopping criterion computed at each iteration to determine ideal partition Our stopping criterion measures the density of links within communities relative to the density of links between communities - iterations stop when this is maximized

Community Analysis partitions the graph into clusters that may facilitate knowledge discovery Key Uses for Graph Analysis: Examining the semantic graph at varying degrees of granularity Trials indicate a tendency to produce semantically homogeneous communities Metrics run on communities provide a local and more detailed analysis of a large semantic graph

Graph Metrics helps in the understanding of what is in the graph Our library of graph metrics allows us to: Analyze high-level content Characterize our graph/ communities Measure knowledge extraction performance Node/Link Type Frequencies Node Degree Distributions Path Analysis Ontology Utilization Metrics High Degree Node Statistics

Pattern Analysis determines potentially valuable information from patterns in the graph Identify rare and common patterns Pattern matching Fuzzy pattern matching?

Strength of Association allows nodes to be ranked according to their relative strength Allow pairs of nodes to be ranked according to their relative strength of association Topological strength (neighborhood) Source-based Weight (quantify source support) Allow multiple paths between two nodes to be ranked according to their relative strength

ADVISE supports scalable knowledge management across multiple missions IA IA Analysis BTS Region Border Ops NBACC BKC Mission Organizations Operational Centers Knowledge Discovery and Distribution TVIS RTAS Texas Knowledge Interface Template Subgraphs Knowledge Interface Template Subgraphs Knowledge Interface Template Subgraphs Ontology Semantic Graph Ontology Semantic Graph Ontology Semantic Graph ADVISE Information Interface Information Interface Information Interface Controlled sources Scalable Knowledge Management and Integration Shared sources

Disclaimer and Auspices This document was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor the University of California nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or the University of California, and shall not be used for advertising or product endorsement purposes. This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48. UCRL-PRES-211319 UCRL-PRES-211466 UCRL-PRES-211485 UCRL-PRES-211467