Big Data Computing for GIS Data Discovery
|
|
- Abner Briggs
- 6 years ago
- Views:
Transcription
1 Big Data Computing for GIS Data Discovery Solutions for Today Options for Tomorrow Vic Baker 1,2, Jennifer Bauer 1, Kelly Rose 1,Devin Justman 1,3 1 National Energy Technology Laboratory, 2 MATRIC, 3 AECOM ESRI User Conference, July 2017
2 Data Discovery Challenges Data is often unstructured, mixed Spatial, contextual FTP, WWW, local filesystems, storage area networks, etc. Convoluted ways to search for and identify data Hard to identify all the data, i.e., see the whole Elephant, without falling down the rabbit hole Image from G. Renee Guzlas (available at 2
3 Data Discovery Needs Need tools to assist with / automate aspects of data discovery Parse data silos Improve how we use search engines Utilize machine learning to correlate relevant information Search for data in new ways (e.g., html source) Need infrastructure capable of processing millions+ of assets to: Extract valuable information Understand complex data relationships on a scale previously not possible Perform more robust spatio-temporal analyses Error 404 s from Izeas and modified from GitHub 3
4 NETL s Big Data Discovery Ecosystem (To Date) Data Mining Clients Data Collection: FTP Recursion WWW Crawl Data Analysis: Phrase Generation Relevance Analysis Geoprocessing Metastore (Hive, HBase) 4
5 Using a Big Data Ecosystem for Data Discovery! NETL s data driven research requires: Lots of data Incorporating different data types & formats, Integrating data from multiple locations (web, local, databases) Traditional Search methods impede our efforts: Search engine limits context to a few terms Labor intensive to conduct data searching Even more difficult to find relevant spatial data 5
6 Harnessing Big Data for Assistive Discovery: Approach Ingest Seed Corpus of Representative Documents / Web Sites Parse library of documents that are related to topic of interest Assistive Identification of Search Terms Generate phrases and sort by # occurrences found Manual filtering of resultant terms and categorization Review phrases and categorization for topic specific context Query search engine with desired terms Make lots of queries to Bing and snag links from Bing result pages Perform Crawl (Web and/or FTP) of results and store in Hadoop database Crawl the Bing result links and each link found within the crawled content Post processing / Data Mining: Solr Search and/or Contextual Cataloging Do useful work with the results! search the html for millions of sites for a map tag Identify new search terms based on catalog of crawled content Discover relevant documents and spatial information 6
7 Harnessing Big Data for Assistive Discovery: Approach Assistive Identification of Search Terms: Run Spark job to: Ingest and parse seed corpus text (library of document(s), web sites) identify phrases (1..n words in length) Sort results by # of occurrences 1 word: (methane,724) (gas,684) (emissions,455) (natural,285) (coal,279) 2 word: (natural gas,230) (methane emissions,149) (slip events,84) (greenhouse gas,67) (gas emissions,49) 3 word: (greenhouse gas emissions,42) (oil and gas,40) (renewable and sustainable,33) (coal bed methane,31) (fossil fuel subsidies,27) 7
8 Harnessing Big Data for Assistive Discovery: Approach Manual filtering of resultant terms and categorization Review generated terms, keeping desired discovered terms Create JSON based categorization schema [ { "Term_type": "Contextual", "Category": "Transmission", "Subcategories": "Pipeline, transmission", "terms": "Transmission, midstream, aboveground, belowground, interstate, intrastate, water crossing, interconnects" }, 8
9 Harnessing Big Data for Assistive Discovery: Approach Sample Search Engine Results Page for oil and gas We utilize Spark and Tika to automate data mining from Bing Crawl Bing pages for specific content Extract links from discovered content Recursively mine Bing s /search?q= links to populate initial crawler queue 9
10 Harnessing Big Data for Assistive Discovery: Approach Query search engine with desired terms to initialize crawler queue Ingest Categorized Terms Generate Bing URLs Execute Bing queries Parse Result Pages Crawl Bing relative search links Crawler Table sourceurl, redirectedurl, header, html, http_links, relative_links, ftp_links, textcontents 10
11 Harnessing Big Data for Assistive Discovery: Approach Perform Crawl (Web and/or FTP) of queue and store in Hadoop database Aggregate unique http, relative, and ftp links from crawler queue Select, crawl, parse links not previously crawled Optionally restrict crawler to specific domains Repeat process until threshold (# rows, queue empty, etc) Crawler Table Select unique links from aggregation of http, relative, ftp links (Optionally) Filter to restrict to specific domains Open links and parse with Apache Tika 11
12 Harnessing Big Data for Assistive Discovery: Approach Post processing / Data Mining: Solr Search and/or Contextual Cataloging 12
13 Use Case: Data Discovery for a Global Oil & Gas Database Big Data Machine Learning Tool Training Resources Advantages to using Big Data Computing: Semi-automated Repeatable Rapid Trainable Learns from each iteration Used to validate & augment manual search results Worldwide Web 13
14 Use Case: Data Discovery for a Global Oil & Gas Database Result: Hotspots of Global Oil & Gas Infrastructure Features Number of Features per 111km 2 Low (1) High (> 60,000) 14
15 Use Case: FTP Data Mining: Hadoop + ESRI Problem: How to search data in FTP data silos (millions of files, spatial and contextual) Solution: Index FTP silos using Hadoop and query using ESRI ArcMap FTP Sites Middleware Client USGS WVGISTC 15
16 Use Case: FTP Data Mining: Hadoop + ESRI Demonstration 16
17 Next Steps - Improving Discoverability to Drive Analytics & Insights Begin integrating offshore data, tools & models developed into an online, common operating platform, serving: Dynamic data Web-based tools & applications Big Data computational tools & analytics Find ways to incorporate additional data! Big Data search algorithms to identify additional data sources & sets Potentially integrate data from electronic forms, obtain equipment location information via sensors 17
18 Thank you Vic Baker Mid-Atlantic Technology, Research & Innovation Center (MATRIC), National Energy Technology Laboratory, Morgantown, West Virginia, USA For more information on data and tools visit: Kelly Rose U.S. Dept. of Energy, National Energy Technology Laboratory, Albany, Oregon, USA Jennifer Bauer ) U.S. Dept. of Energy, National Energy Technology Laboratory, Albany, Oregon, USA Devin Justman ) AECOM for U.S. Dept. of Energy, National Energy Technology Laboratory, Albany, Oregon, USA Acknowledgment: This technical effort was funded in support of the National Energy Technology Laboratory s ongoing research under the RES contract DE-FE Disclaimer: This project was funded by the Department of Energy, National Energy Technology Laboratory, an agency of the United States Government, through a support contract with AECOM. Neither the United States Government nor any agency thereof, nor any of their employees, nor AECOM, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. 18
Starting small to go Big: Building a Living Database
Starting small to go Big: Building a Living Database Michael Sabbatino 1,2, Baker, D.V. Vic 3,4, Rose, K. 1, Romeo, L. 1,2, Bauer, J. 1, and Barkhurst, A. 3,4 1 US Department of Energy, National Energy
More informationBridging The Gap Between Industry And Academia
Bridging The Gap Between Industry And Academia 14 th Annual Security & Compliance Summit Anaheim, CA Dilhan N Rodrigo Managing Director-Smart Grid Information Trust Institute/CREDC University of Illinois
More informationGo SOLAR Online Permitting System A Guide for Applicants November 2012
Go SOLAR Online Permitting System A Guide for Applicants November 2012 www.broward.org/gogreen/gosolar Disclaimer This guide was prepared as an account of work sponsored by the United States Department
More informationIntelligent Grid and Lessons Learned. April 26, 2011 SWEDE Conference
Intelligent Grid and Lessons Learned April 26, 2011 SWEDE Conference Outline 1. Background of the CNP Vision for Intelligent Grid 2. Implementation of the CNP Intelligent Grid 3. Lessons Learned from the
More informationIn-Field Programming of Smart Meter and Meter Firmware Upgrade
In-Field Programming of Smart and Firmware "Acknowledgment: This material is based upon work supported by the Department of Energy under Award Number DE-OE0000193." Disclaimer: "This report was prepared
More informationSmartSacramento Distribution Automation
SmartSacramento Distribution Automation Presented by Michael Greenhalgh, Project Manager Lora Anguay, Sr. Project Manager Agenda 1. About SMUD 2. Distribution Automation Project Overview 3. Data Requirements
More information5A&-qg-oOL6c AN INTERNET ENABLED IMPACT LIMITER MATERIAL DATABASE
5A&-qg-oOL6c AN INTERNET ENABLED IMPACT LIMITER MATERIAL DATABASE S. Wix, E Kanipe, W McMurtry a d F - 9 $0507-- Sandia National Laboratories, P.O. Box 5800, Albuquerque, Nh4 REC6!!IVED Summary This paper
More informationCHANGING THE WAY WE LOOK AT NUCLEAR
CHANGING THE WAY WE LOOK AT NUCLEAR John Hopkins Chairman and CEO, NuScale Power NuScale UK Supplier Day 13 July 2016 Acknowledgement and Disclaimer This material is based upon work supported by the Department
More informationSouthern Company Smart Grid
Southern Company Smart Grid Smart Grid Investment Grant Update July 25, 2011 Southern Company Southern Company is one of the nations largest generators of electricity Has 4.4 million retail customers across
More informationEntergy Phasor Project Phasor Gateway Implementation
Entergy Phasor Project Phasor Gateway Implementation Floyd Galvan, Entergy Tim Yardley, University of Illinois Said Sidiqi, TVA Denver, CO - June 5, 2012 1 Entergy Project Summary PMU installations on
More informationNATIONAL GEOSCIENCE DATA REPOSITORY SYSTEM
NATIONAL GEOSCIENCE DATA REPOSITORY SYSTEM PHASE II: PLANNING AND PILOT STUDY PROGRESS REPORT 3rd Quarter July September, 1995 - Submitted by the AMERICAN GEOLOGICAL INSTITUTE to the Office of Fossil Energy,
More informationALAMO: Automatic Learning of Algebraic Models for Optimization
ALAMO: Automatic Learning of Algebraic Models for Optimization Alison Cozad 1,2, Nick Sahinidis 1,2, David Miller 2 1 National Energy Technology Laboratory, Pittsburgh, PA,USA 2 Department of Chemical
More informationOPTIMIZING CHEMICAL SENSOR ARRAY SIZES
OPTIMIZING CHEMICAL SENSOR ARRAY SIZES G. C. Osbourn, R. F. Martinez, J. W. Bartholomew, W. G. Yelton, A. J. Ricco* Sandia National Laboratories, Albuquerque, NM 87 185-1423, "ACLARA Biosciences, Inc.,
More informationIBM Advantage: IBM Watson Compare and Comply Element Classification
IBM Advantage: IBM Watson Compare and Comply Element Classification Executive overview... 1 Introducing Watson Compare and Comply... 2 Definitions... 3 Element Classification insights... 4 Sample use cases...
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More informationDevelopment of Web Applications for Savannah River Site
STUDENT SUMMER INTERNSHIP TECHNICAL REPORT Development of Web Applications for Savannah River Site DOE-FIU SCIENCE & TECHNOLOGY WORKFORCE DEVELOPMENT PROGRAM Date submitted: October 17, 2014 Principal
More informationOn Demand Meter Reading from CIS
On Demand Meter Reading from "Acknowledgment: This material is based upon work supported by the Department of Energy under Award Number DE-OE0000193." Disclaimer: "This report was prepared as an account
More informationPJM Interconnection Smart Grid Investment Grant Update
PJM Interconnection Smart Grid Investment Grant Update Bill Walker walkew@pjm.com NASPI Work Group Meeting October 12-13, 2011 Acknowledgment: "This material is based upon work supported by the Department
More informationDERIVATIVE-FREE OPTIMIZATION ENHANCED-SURROGATE MODEL DEVELOPMENT FOR OPTIMIZATION. Alison Cozad, Nick Sahinidis, David Miller
DERIVATIVE-FREE OPTIMIZATION ENHANCED-SURROGATE MODEL DEVELOPMENT FOR OPTIMIZATION Alison Cozad, Nick Sahinidis, David Miller Carbon Capture Challenge The traditional pathway from discovery to commercialization
More informationNIF ICCS Test Controller for Automated & Manual Testing
UCRL-CONF-235325 NIF ICCS Test Controller for Automated & Manual Testing J. S. Zielinski October 5, 2007 International Conference on Accelerator and Large Experimental Physics Control Systems Knoxville,
More informationAdding a System Call to Plan 9
Adding a System Call to Plan 9 John Floren (john@csplan9.rit.edu) Sandia National Laboratories Livermore, CA 94551 DOE/NNSA Funding Statement Sandia is a multiprogram laboratory operated by Sandia Corporation,
More informationPJM Interconnection Smart Grid Investment Grant Update
PJM Interconnection Smart Grid Investment Grant Update Bill Walker walkew@pjm.com NASPI Work Group Meeting October 22-24, 2013 Acknowledgment: "This material is based upon work supported by the Department
More informationHigh Scalability Resource Management with SLURM Supercomputing 2008 November 2008
High Scalability Resource Management with SLURM Supercomputing 2008 November 2008 Morris Jette (jette1@llnl.gov) LLNL-PRES-408498 Lawrence Livermore National Laboratory What is SLURM Simple Linux Utility
More informationKCP&L SmartGrid Demonstration
KCP&L SmartGrid Demonstration Kansas House Energy & Environment Committee Bill Menge Director, SmartGrid February 7, 2013 Topeka, KS What is SmartGrid? SmartGrid is basically the integration of digital
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationzorder-lib: Library API for Z-Order Memory Layout
zorder-lib: Library API for Z-Order Memory Layout E. Wes Bethel Lawrence Berkeley National Laboratory Berkeley, CA, USA, 94720 April, 2015 i Acknowledgment This work was supported by the Director, Office
More informationWHITE PAPER: TOP 10 CAPABILITIES TO LOOK FOR IN A DATA CATALOG
WHITE PAPER: TOP 10 CAPABILITIES TO LOOK FOR IN A DATA CATALOG The #1 Challenge in Successfully Deploying a Data Catalog The data cataloging space is relatively new. As a result, many organizations don
More informationInformation to Insight
Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466 UCRL-PRES-211485 UCRL-PRES-211467 This work was performed under
More informationMETADATA REGISTRY, ISO/IEC 11179
LLNL-JRNL-400269 METADATA REGISTRY, ISO/IEC 11179 R. K. Pon, D. J. Buttler January 7, 2008 Encyclopedia of Database Systems Disclaimer This document was prepared as an account of work sponsored by an agency
More informationReal Time Price HAN Device Provisioning
Real Time Price HAN Device Provisioning "Acknowledgment: This material is based upon work supported by the Department of Energy under Award Number DE-OE0000193." Disclaimer: "This report was prepared as
More informationTable 1 The Elastic Stack use cases Use case Industry or vertical market Operational log analytics: Gain real-time operational insight, reduce Mean Ti
Solution Overview Cisco UCS Integrated Infrastructure for Big Data with the Elastic Stack Cisco and Elastic deliver a powerful, scalable, and programmable IT operations and security analytics platform
More informationIntegrated Volt VAR Control Centralized
4.3 on Grid Integrated Volt VAR Control Centralized "Acknowledgment: This material is based upon work supported by the Department of Energy under Award Number DE-OE0000193." Disclaimer: "This report was
More informationProtecting Control Systems from Cyber Attack: A Primer on How to Safeguard Your Utility May 15, 2012
Protecting Control Systems from Cyber Attack: A Primer on How to Safeguard Your Utility May 15, 2012 Paul Kalv Electric Director, Chief Smart Grid Systems Architect, City of Leesburg Doug Westlund CEO,
More informationAdvanced Synchrophasor Protocol DE-OE-859. Project Overview. Russell Robertson March 22, 2017
Advanced Synchrophasor Protocol DE-OE-859 Project Overview Russell Robertson March 22, 2017 1 ASP Project Scope For the demanding requirements of synchrophasor data: Document a vendor-neutral publish-subscribe
More informationTesting PL/SQL with Ounit UCRL-PRES
Testing PL/SQL with Ounit UCRL-PRES-215316 December 21, 2005 Computer Scientist Lawrence Livermore National Laboratory Arnold Weinstein Filename: OUNIT Disclaimer This document was prepared as an account
More informationLarge Scale Test Simulations using the Virtual Environment for Test Optimization
Large Scale Test Simulations using the Virtual Environment for Test Optimization (VETO) S. E. Klenke, S. R. Heffelfinger, H. J. Bell and C. L. Shierling Sandia National Laboratories Albuquerque, New Mexico
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationFSEC Procedure for Testing Stand-Alone Photovoltaic Systems
FSEC Procedure for Testing Stand-Alone Photovoltaic Systems Authors FSEC PVDG Division Publication Number FSEC-GP-69-01 Copyright Copyright Florida Solar Energy Center/University of Central Florida 1679
More informationSystems Integration Tony Giroti, CEO Bridge Energy Group
Systems Integration Tony Giroti, CEO Bridge Energy Group #GridWeek BRIDGE Energy Group Smart Grid Integration Strategy & Implementation Partner HQ in Boston-metro area with offices in US, Canada Developed
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationWashington DC October Consumer Engagement. October 4, Gail Allen, Sr. Manager, Customer Solutions
Consumer Engagement Through Social Media October 4, 2012 Gail Allen, Sr. Manager, Customer Solutions KCP&L Company Overview Key Statistics Customers Generation 9 plant sites 26 generating units 830,000
More informationElectronic Weight-and-Dimensional-Data Entry in a Computer Database
UCRL-ID- 132294 Electronic Weight-and-Dimensional-Data Entry in a Computer Database J. Estill July 2,1996 This is an informal report intended primarily for internal or limited external distribution. The
More information@ST1. JUt EVALUATION OF A PROTOTYPE INFRASOUND SYSTEM ABSTRACT. Tom Sandoval (Contractor) Los Alamos National Laboratory Contract # W7405-ENG-36
EVALUATION OF A PROTOTYPE INFRASOUND SYSTEM Rod Whitaker Tom Sandoval (Contractor) Los Alamos National Laboratory Contract # W745-ENG-36 Dale Breding, Dick Kromer Tim McDonald (Contractor) Sandia National
More informationReduced Order Models for Oxycombustion Boiler Optimization
Reduced Order Models for Oxycombustion Boiler Optimization John Eason Lorenz T. Biegler 9 March 2014 Project Objective Develop an equation oriented framework to optimize a coal oxycombustion flowsheet.
More informationTaming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems
1 Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems The Defacto Choice For Convergence 2 ABSTRACT & SPEAKER BIO Dealing with enormous data growth is a key challenge for
More informationGA A22637 REAL TIME EQUILIBRIUM RECONSTRUCTION FOR CONTROL OF THE DISCHARGE IN THE DIII D TOKAMAK
GA A22637 TION FOR CONTROL OF THE DISCHARGE IN THE DIII D TOKAMAK by J.R. FERRON, M.L. WALKER, L.L. LAO, B.G. PENAFLOR, H.E. ST. JOHN, D.A. HUMPHREYS, and J.A. LEUER JULY 1997 This report was prepared
More informationPortable Data Acquisition System
UCRL-JC-133387 PREPRINT Portable Data Acquisition System H. Rogers J. Bowers This paper was prepared for submittal to the Institute of Nuclear Materials Management Phoenix, AZ July 2529,1999 May 3,1999
More informationProcessing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.
Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most
More informationSite Impact Policies for Website Use
Site Impact Policies for Website Use Thank you for visiting the Site Impact website (the Website ). We have set up some ground rules to ensure protection of our rights and yours. Site Impact reserves the
More informationDOE EM Web Refresh Project and LLNL Building 280
STUDENT SUMMER INTERNSHIP TECHNICAL REPORT DOE EM Web Refresh Project and LLNL Building 280 DOE-FIU SCIENCE & TECHNOLOGY WORKFORCE DEVELOPMENT PROGRAM Date submitted: September 14, 2018 Principal Investigators:
More informationA USER S GUIDE TO REGISTERING AND MAINTAINING DATA SERVICES IN HIS CENTRAL 2.0
A USER S GUIDE TO REGISTERING AND MAINTAINING DATA SERVICES IN HIS CENTRAL 2.0 Prepared by Jon Pollak, CUAHSI Water Data Center User Support Specialist September 2014 1 DISCLAIMERS The HIS Central application
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationand opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.
'4 L NMAS CORE: UPDATE AND CURRENT DRECTONS DSCLAMER This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any
More informationTUCKER WIRELINE OPEN HOLE WIRELINE LOGGING
RMOTC TEST REPORT DOE/RMOTC - 020167 TUCKER WIRELINE OPEN HOLE WIRELINE LOGGING April 5, 2002 - April 6, 2002 Work performed under Rocky Mountain Oilfield Testing Center (RMOTC) CRADA 2002-014 Data of
More informationGA A22720 THE DIII D ECH MULTIPLE GYROTRON CONTROL SYSTEM
GA A22720 THE DIII D ECH MULTIPLE GYROTRON CONTROL SYSTEM by D. PONCE, J. LOHR, J.F. TOOKER, W.P. CARY, and T.E. HARRIS NOVEMBER 1997 DISCLAIMER This report was prepared as an account of work sponsored
More informationCreating a Recommender System. An Elasticsearch & Apache Spark approach
Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused
More informationLosAlamos National Laboratory LosAlamos New Mexico HEXAHEDRON, WEDGE, TETRAHEDRON, AND PYRAMID DIFFUSION OPERATOR DISCRETIZATION
. Alamos National Laboratory is operated by the University of California for the United States Department of Energy under contract W-7405-ENG-36 TITLE: AUTHOR(S): SUBMllTED TO: HEXAHEDRON, WEDGE, TETRAHEDRON,
More informationGraphical Programming of Telerobotic Tasks
Graphical Programming of Telerobotic Tasks Daniel E. Small Michael J. McDonald Sandia National Laboratories Intelligent Systems and Robotics Center Albuquerque, NM 87185-1004 d L NOW 0 6 El!% OSTI Introduction
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationTuning Enterprise Information Catalog Performance
Tuning Enterprise Information Catalog Performance Copyright Informatica LLC 2015, 2018. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States
More informationSaiidia National Laboratories. work completed under DOE ST485D sponsored by DOE
MatSeis: A Seismic Toolbox for MATLAB J. Mark Harris and Christopher J. Young Saiidia National Laboratories work completed under DOE ST485D sponsored by DOE RECEIVED AUG 1 6 19% OSTI ABSTRACT To support
More informationBig Data Analytics. Description:
Big Data Analytics Description: With the advance of IT storage, pcoressing, computation, and sensing technologies, Big Data has become a novel norm of life. Only until recently, computers are able to capture
More information- Q807/ J.p 7y qj7 7 w SfiAJ D--q8-0?dSC. CSNf. Interferometric S A R Coherence ClassificationUtility Assessment
19980529 072 J.p 7y qj7 7 w SfiAJ D--q8-0?dSC \---@ 2 CSNf - Q807/ Interferometric S A R Coherence ClassificationUtility Assessment - 4 D. A. Yocky Sandia National Laboratories P.O. Box 5800, MS1207 Albuquerque,
More informationSOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera
SOLUTION TRACK Finding the Needle in a Big Data Haystack @EvaAndreasson, Innovator & Problem Solver Cloudera Agenda Problem (Solving) Apache Solr + Apache Hadoop et al Real-world examples Q&A Problem Solving
More informationIntroduction to Big Data
Introduction to Big Data OVERVIEW We are experiencing transformational changes in the computing arena. Data is doubling every 12 to 18 months, accelerating the pace of innovation and time-to-value. The
More informationESNET Requirements for Physics Reseirch at the SSCL
SSCLSR1222 June 1993 Distribution Category: 0 L. Cormell T. Johnson ESNET Requirements for Physics Reseirch at the SSCL Superconducting Super Collider Laboratory Disclaimer Notice I This report was prepared
More informationCOMPUTATIONAL FLUID DYNAMICS (CFD) ANALYSIS AND DEVELOPMENT OF HALON- REPLACEMENT FIRE EXTINGUISHING SYSTEMS (PHASE II)
AL/EQ-TR-1997-3104 COMPUTATIONAL FLUID DYNAMICS (CFD) ANALYSIS AND DEVELOPMENT OF HALON- REPLACEMENT FIRE EXTINGUISHING SYSTEMS (PHASE II) D. Nickolaus CFD Research Corporation 215 Wynn Drive Huntsville,
More informationResearch at PNNL: Powered by AWS NLIT 2018
Research at PNNL: Powered by AWS NLIT 2018 RALPH PERKO AND MIKE GIARDINELLI Pacific Northwest National Laboratory Reference herein to any specific commercial product, process, or service by trade name,
More informationEmerging Technologies in Knowledge Management By Ramana Rao, CTO of Inxight Software, Inc.
Emerging Technologies in Knowledge Management By Ramana Rao, CTO of Inxight Software, Inc. This paper provides an overview of a presentation at the Internet Librarian International conference in London
More informationGraph and Timeseries Databases
Graph and Timeseries Databases Roman Kern ISDS, TU Graz 2017-10-23 Roman Kern (ISDS, TU Graz) Dbase2 2017-10-23 1 / 31 Graph Databases Graph Databases Motivation and Basics of Graph Databases? Roman Kern
More informationAn Oracle White Paper October Oracle Social Cloud Platform Text Analytics
An Oracle White Paper October 2012 Oracle Social Cloud Platform Text Analytics Executive Overview Oracle s social cloud text analytics platform is able to process unstructured text-based conversations
More informationMULTIPLE HIGH VOLTAGE MODULATORS OPERATING INDEPENDENTLY FROM A SINGLE COMMON 100 kv dc POWER SUPPLY
GA A26447 MULTIPLE HIGH VOLTAGE MODULATORS OPERATING INDEPENDENTLY FROM A SINGLE COMMON 100 kv dc POWER SUPPLY by W.L. McDANIEL, P. HUYNH, D.D. ANASTASI, J.F. TOOKER and D.M. HOYT JUNE 2009 DISCLAIMER
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationCross-Track Coherent Stereo Collections
Cross-Track Coherent Stereo Collections Charles V. Jakowatz, Jr. Sandia National Laboratories Albuquerque, NM cvjakow @ sandia.gov Daniel E. Wahl dewahl@sandia.gov Abstract In this paper we describe a
More informationHierarchy of knowledge BIG DATA 9/7/2017. Architecture
BIG DATA Architecture Hierarchy of knowledge Data: Element (fact, figure, etc.) which is basic information that can be to be based on decisions, reasoning, research and which is treated by the human or
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationWISP. Western Interconnection Synchrophasor Program. Vickie VanZandt & Dan Brancaccio NASPI Work Group Meeting October 17-18, 2012
WISP Western Interconnection Synchrophasor Program Vickie VanZandt & Dan Brancaccio NASPI Work Group Meeting October 17-18, 2012 1 Acknowledgement and Disclaimer Acknowledgment: This material is based
More informationShine a Light on Dark Data with Vertica Flex Tables
White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,
More informationGA A26400 CUSTOMIZABLE SCIENTIFIC WEB-PORTAL FOR DIII-D NUCLEAR FUSION EXPERIMENT
GA A26400 CUSTOMIZABLE SCIENTIFIC WEB-PORTAL FOR DIII-D NUCLEAR FUSION EXPERIMENT by G. ABLA, N. KIM, and D.P. SCHISSEL APRIL 2009 DISCLAIMER This report was prepared as an account of work sponsored by
More informationSAP HANA SPS 08 - What s New? SAP HANA Interactive Education - SHINE (Delta from SPS 07 to SPS 08) SAP HANA Product Management May, 2014
SAP HANA SPS 08 - What s New? SAP HANA Interactive Education - SHINE (Delta from SPS 07 to SPS 08) SAP HANA Product Management May, 2014 Agenda SHINE - Overview SHINE What s new in SPS 08 SHINE - Roadmap
More informationOptimizing Bandwidth Utilization in Packet Based Telemetry Systems. Jeffrey R Kalibjian
UCRL-JC-122361 PREPRINT Optimizing Bandwidth Utilization in Packet Based Telemetry Systems Jeffrey R Kalibjian RECEIVED NOV 17 1995 This paper was prepared for submittal to the 1995 International Telemetry
More informationIBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse
IBM dashdb Local Using a software-defined environment in a private cloud to enable hybrid data warehousing Evolving the data warehouse Managing a large-scale, on-premises data warehouse environments to
More informationACCELERATOR OPERATION MANAGEMENT USING OBJECTS*
LBL-3644: LSGN-21( UC4( ACCELERATOR OPERATION MANAGEMENT USING OBJECTS* H. Nishimura, C. Timossi, and M. Valdez Advanced Light Source Accelerator and Fusion Research Division Lawrence Berkeley Laboratory
More informationResource Management at LLNL SLURM Version 1.2
UCRL PRES 230170 Resource Management at LLNL SLURM Version 1.2 April 2007 Morris Jette (jette1@llnl.gov) Danny Auble (auble1@llnl.gov) Chris Morrone (morrone2@llnl.gov) Lawrence Livermore National Laboratory
More informationFY97 ICCS Prototype Specification
FY97 ICCS Prototype Specification John Woodruff 02/20/97 DISCLAIMER This document was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government
More informationInformatica Enterprise Data Catalog REST API Reference
Informatica 10.2.1 Enterprise Data Catalog REST API Reference Informatica Enterprise Data Catalog REST API Reference 10.2.1 May 2018 Copyright Informatica LLC 2017, 2018 This software and documentation
More informationUNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX
UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX 1 Successful companies know that analytics are key to winning customer loyalty, optimizing business processes and beating their
More informationA METHOD FOR EFFICIENT FRACTIONAL SAMPLE DELAY GENERATION FOR REAL-TIME FREQUENCY-DOMAIN BEAMFORMERS
ORNL/CP-94457 A METHOD FOR EFFICIENT FRACTIONAL SAMPLE DELAY GENERATION FOR REAL-TIME FREQUENCY-DOMAIN BEAMFORMERS J. Eric Breeding Thomas P. Karnowski Oak Ridge National Laboratory* Paper to be presented
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationBigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation
BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data 2013 IBM Corporation A Big Data architecture evolves from a traditional BI architecture
More informationMicrosoft Exam
Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred
More informationData Management Technology Survey and Recommendation
Data Management Technology Survey and Recommendation Prepared by: Tom Epperly LLNL Prepared for U.S. Department of Energy National Energy Technology Laboratory September 27, 2013 Revision Log Revision
More informationPerformance Comparison of Hive, Pig & Map Reduce over Variety of Big Data
Performance Comparison of Hive, Pig & Map Reduce over Variety of Big Data Yojna Arora, Dinesh Goyal Abstract: Big Data refers to that huge amount of data which cannot be analyzed by using traditional analytics
More informationBring Context To Your Machine Data With Hadoop, RDBMS & Splunk
Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may
More informationHortonworks DataPlane Service
Data Steward Studio Administration () docs.hortonworks.com : Data Steward Studio Administration Copyright 2016-2017 Hortonworks, Inc. All rights reserved. Please visit the Hortonworks Data Platform page
More informationReal-Time Particulate Filter Soot and Ash Measurements via Radio Frequency Sensing
Real-Time Particulate Filter Soot and Ash Measurements via Radio Frequency Sensing 19 th ETH Conference on Combustion Generated Nanoparticles Zurich, Switzerland June 29, 215 Alexander Sappok, Paul Ragaller,
More informationThe NMT-5 Criticality Database
LA-12925-MS The NMT-5 Criticality Database Los Alamos N A T I O N A L L A B O R A T O R Y Los Alamos National Laboratory is operated by the University of California for the United States Department of
More informationMinghai Liu, Rui Cai, Ming Zhang, and Lei Zhang. Microsoft Research, Asia School of EECS, Peking University
Minghai Liu, Rui Cai, Ming Zhang, and Lei Zhang Microsoft Research, Asia School of EECS, Peking University Ordering Policies for Web Crawling Ordering policy To prioritize the URLs in a crawling queue
More information