Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466 UCRL-PRES-211485 UCRL-PRES-211467 This work was performed under the auspices of the U.S. Department of Energy by University of California, Lawrence Livermore National Laboratory under Contract W-7405-Eng-48
We must be able to address the analysts requirements Strategic analysis See the big picture and how to counter terrorism Support decision makers in setting policies and priorities Integral to targeting technical and human source collection Tactical analysis Predict and warn of pending attacks Provide an understanding of our adversaries' current intentions and capabilities Allow the United States to act with precision both defensively and offensively Both strategic and tactical analysis require a system capable of fusing information obtained from very diverse sources The Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement (ADVISE) system is being developed for DHS S&T to meet these requirements
ADVISE lets us understand the information that characterizes our national security challenges ~1990 2003 + Viewing, Analysis, & Insight Integration & Correlation Compatible interfaces for viewing, analysis, & insight Template Subgraphs Knowledge Interface Semantic Graph Creating chains of relationships between disjoint information Information Data Scalable, adaptive interface to disparate data sources with unique sensors Ontology Information Interface Imagery Sensors Organizations Sensors Text reports Networks Distribution and Automation Volume and Disparity of Sources
What drives the design Connect the Dots Scaling to massive data volume Ingest information from information sources 100 s of systems Real-time High-throughput Stove-piped by intent Support 100 s of analysts Event notification in near-real time Control access and Protect privacy Responsive to change
What to consider when scaling to massive levels What do we want from the Knowledge Fusion engine? Relations between facts (nodes) Individual facts without relations not particularly useful (might as well keep stovepipes) Relate facts (build the graph) at high ingest rates with results in real-time Responsive to change What is important when scaling to massive sizes? An optimal model Use relations between facts (connectivity) to extract knowledge from data Query performance Key for high-complexity algorithms
Semantic graphs provide the basis for these massive knowledge relationships Called Jon Miller Jenifer Jones Works for Works for Santa Clara Emailed Company Y Located in Located in California Fremont David Smith Vendor to Livermore Uses Works for Subcontractor to Information source Classification sparkie.llnl.gov Time (effectively) Geo-location (if appropriate) Confidence Located in Owned by LLNL Web connection to Company x Owner by www.x.com
The fused graph reveals connections and gaps not immediately apparent Existing search tools can find documents that contain a given connection: Person A & Person B Person A & Country X Person A & City Y Graph identifies connections that span several messages (sources): Previously unknown Middlemen (path traversal)??? Person A Person B WMD Program (Person C) Hidden common connection (unknown nodes) Person B Person A??? (Country X) Financial transactions Material transhipment
Two facts "fuse" when they contain a common node with identical attribute values Alice Alice Traveled to Chicago Sent e-mail to Bob Bob Subscribes to Yahoo.com
ADVISE canonicalizes data to maximize fusion and improve searches Chicago chicago After canonicalization and fusion Chicago windy city Applications can use any of the organization names to get the same result
The ADVISE system model partitions the design Application Layer New applications can utilize the semantic graph, template subgraphs, and ontology to develop complex insights Knowledge Layer The Knowledge Layer fuses facts and relations into a massive-scale, ontologydriven semantic graph. Information Layer The Information Interface supports multiple high throughput distributed information systems that send facts directly to ADVISE. Visualization Simulation Network analysis... Knowledge Interface Template Subgraphs Ontology Semantic Graph Information Interface Dynamic sources... Lists / Files
Creating entities and relationships from free text is critical BAGHDAD, Iraq (CNN) -- A hostage shown in a videotape on an Arabic language satellite TV network Wednesday is the American executive who was kidnapped Monday at a construction site in Baghdad, according to a U.S. Embassy official. Jeffrey Ake, president and chief executive officer of a machine manufacturing firm, was seen in the video being held at gunpoint by militants. HARD Country: Iraq Country: Iraq City: Baghdad, Iraq City: Baghdad, Iraq Location: construction site Location: construction site Person: U.S. Embassy official Person: U.S. Embassy official Person: Jeffrey Ake Person: Jeffrey Ake Relation: LOCATED_IN Relation: LOCATED_IN Locatee: construction site Locatee: construction site Locator: Baghdad, Iraq Locator: Baghdad, Iraq Event: KIDNAPPING Event: KIDNAPPING Victim: Jeffrey Ake Victim: Jeffrey Ake Perpetrator: militants Perpetrator: militants Location: construction site Location: construction site
Integrating knowledge extraction into ADVISE Application Layer Knowledge Layer Information Layer Knowledge Extraction Spud Tagging Assistant Extractor Specific Parser Ontology Translator & Plugins Louisiana Framework Extractor Specific Parser Extraction Engine Extraction Engine Lists / Files
Evaluating extraction engines Qualitative: Show resultant graph to analysts They hate it Quantitative: Compare engine output to an answer key Modified GATE to 1 evaluate extraction engine results against 0.9 one another or against 0.8 a hand-annotated answer key 0.7 Hand-annotated some documents (not fun) 0.6 Can use documents entered via Spud 0.5 0.4 0.3 0.2 0.1 0 All Entities All Links
Current direction for text extraction Integration Improve usability of Louisiana Add graph interactivity to Spud Work on merging results from multiple engines Evaluation Evaluate more engines AeroText and ClearForest on deck Look for applicable pre-tagged document corpora Build graph-comparison capability in ADVISE Collaboration
Graph analysis environment Component Analysis Graph Metrics Community Analysis Pattern Analysis? Strength of Association
Component Analysis assists in the understanding of how graphs fuse We build a semantic graph from various information sources The graph is based on an ontology, which only allows certain relationships Some data will fail to fuse Analyzing resulting components can provide us valuable information about data fusion
Community Analysis partitions the graph into clusters of related nodes Measure the betweenness of each link Eliminate the link with highest betweenness Stopping criterion computed at each iteration to determine ideal partition Our stopping criterion measures the density of links within communities relative to the density of links between communities - iterations stop when this is maximized
Community Analysis partitions the graph into clusters that may facilitate knowledge discovery Key Uses for Graph Analysis: Examining the semantic graph at varying degrees of granularity Trials indicate a tendency to produce semantically homogeneous communities Metrics run on communities provide a local and more detailed analysis of a large semantic graph
Graph Metrics helps in the understanding of what is in the graph Our library of graph metrics allows us to: Analyze high-level content Characterize our graph/ communities Measure knowledge extraction performance Node/Link Type Frequencies Node Degree Distributions Path Analysis Ontology Utilization Metrics High Degree Node Statistics
Pattern Analysis determines potentially valuable information from patterns in the graph Identify rare and common patterns Pattern matching Fuzzy pattern matching?
Strength of Association allows nodes to be ranked according to their relative strength Allow pairs of nodes to be ranked according to their relative strength of association Topological strength (neighborhood) Source-based Weight (quantify source support) Allow multiple paths between two nodes to be ranked according to their relative strength
ADVISE supports scalable knowledge management across multiple missions IA IA Analysis BTS Region Border Ops NBACC BKC Mission Organizations Operational Centers Knowledge Discovery and Distribution TVIS RTAS Texas Knowledge Interface Template Subgraphs Knowledge Interface Template Subgraphs Knowledge Interface Template Subgraphs Ontology Semantic Graph Ontology Semantic Graph Ontology Semantic Graph ADVISE Information Interface Information Interface Information Interface Controlled sources Scalable Knowledge Management and Integration Shared sources
Disclaimer and Auspices This document was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor the University of California nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or the University of California, and shall not be used for advertising or product endorsement purposes. This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48. UCRL-PRES-211319 UCRL-PRES-211466 UCRL-PRES-211485 UCRL-PRES-211467