Welcome. Atlanta R Users Group. HPCC Systems Architecture Overview & R Integration Demo
|
|
- Abel Lambert
- 5 years ago
- Views:
Transcription
1 Welcome Atlanta R Users Group HPCC Systems Architecture Overview & R Integration Arjuna Chala, Architect Integrations, HPCC Systems / LexisNexis Agenda 12:00-12:30pm: 12:30-1:30pm: 1:30-1:50pm: 1:50-2:00pm: Welcome Lunch / Meet & Greet HPCC Systems Architecture Overview & R Integration Demo Q&A / Open Discussion Raffle / Kindle Fire giveaway / Close Twitter event hashtag: #hpccmeetup hpccsystems.com 1
2 Contents -Introducing HPCC -How does LexisNexis use HPCC? -ECL -R and HPCC A match made in Heaven? 2
3 What is HPCC? 3
4 Thor Architecture 4
5 Thor Architecture (contd..) 5
6 Roxie Architecture Distributed Architecture Highly Concurrent Low Latency Highly Scalable Highly Redundant 6
7 HPCC Trivia You have several million records that needs to be cleaned, linked and mined. Which HPCC component will you use? 7
8 To Summarize - Three main HPCC components HPCC Data Refinery (Thor) HPCC Data Delivery Engine (Roxie) Enterprise Control Language (ECL) Massively Parallel Extract Transform and Load (ETL) engine Built from the ground up as a parallel data environment. Leverages inexpensive locally attached storage. Doesn t require a SAN infrastructure. Enables data integration on a scale not previously available: Current LexisNexis person data build process generates 350 Billion intermediate results at peak Suitable for: Massive joins/merges Massive sorts & transformations Programmable using ECL A massively parallel, high throughput, structured query response engine Ultra fast low latency and highly available due to its read-only nature. Allows indices to be built onto data for efficient multi-user retrieval of data Suitable for Volumes of structured queries Full text ranked Boolean search Programmable using ECL An easy to use, declarative data-centric programming language optimized for large-scale data management and query processing Highly efficient; automatically distributes workload across all nodes. Automatic parallelization and synchronization of sequential algorithms for parallel and distributed processing Large library of efficient modules to handle common data manipulation tasks 8
9 How does LN use HPCC? 9
10 Getting Caught in the Act - A LexisNexis Use Case 10
11 Getting Caught in the Act - A LexisNexis Use Case 11
12 Where is John Smith Now? - A LexisNexis Use Case 12
13 Demo Time - SALT 13
14 Insurance Collusion in Louisiana - A (yet another) LN Use Case 14
15 Insurance Collusion in Louisiana - A (yet another) LN Use Case BEFORE AFTER HPCC 15
16 We do have some fun once in a while - A (fun) LN Use Case 16
17 HPCC Trivia Name two attributes that make Roxie a great data delivery engine? 17
18 And Finally.. 12 million background checks a year Big Data Supporting 90 percent of the Fortune 500 companies 99% of all U.S. auto insurance claims Open Source Components 4 Petabytes of Data 30,000 Data Sources 50 billion records Several million records daily 250 million unique identities
19 ECL 19
20 ECL is SQL on Steroids ECL SELECT persons Select * from persons FILTER persons(firstname= Jim ) Select * from persons where firstname= Jim SORT SORT(persons, firstname) Select * from persons order by firstname COUNT COUNT( Person(firstName= TOM )) SQL Select COUNT(*) from Person where firstname= TOM GROUP DEDUP(persons, firstname, ALL) Select * from persons group by firstname AGGREGATE SUM(persons, age) Select SUM(age) from persons Cross Tab TABLE(persons, {state; statecount:= COUNT(GROUP);}, state) Select persons.state, COUNT(*) from persons group by state JOIN JOIN(persons,state,LEFT.state=RIGHT.code) Select * from persons,states where persons.state=states.code 20
21 ECL for ETL Basic Data Structure PersonRec := RECORD STRING50 firstname; STRING50 lastname; UNSIGNED1 age; END; Transformations PersonRec persontransform(personrec person) := TRANSFORM SELF.upperFirstName := UPPER(person.firstName); SELF := person; END; upperpersons := PROJECT(persons, persontransform(left) ); OUTPUT(upperPersons); Functions Used in context of Transformations All Functions PROJECT, ROLLUP, JOIN, ITERATE, NORMALIZE, DENORMALIZE /community/docs/ecl-languagereference/html/built-in-functions-and-actions 21
22 Enterprise Control Language (ECL) Declarative programming language: Describe what needs to be done and not how to do it Powerful: Unlike Java, high level primitives as JOIN, TRANSFORM, PROJECT, SORT, DISTRIBUTE, MAP, etc. are available. Higher level code means fewer programmers & shortens time to delivery Extensible: As new attributes are defined, they become primitives that other programmers can use Implicitly parallel: Parallelism is built into the underlying platform. The programmer needs not be concerned with it Maintainable: A high level programming language, no side effects and attribute encapsulation provide for more succinct, reliable and easier to troubleshoot code Complete: Unlike Pig and Hive, ECL provides for a complete programming paradigm. Homogeneous: One language to express data algorithms across the entire HPCC platform, including data ETL and high speed data delivery. 22
23 Demo Time - ECL 23
24 HPCC Trivia What does ECL stand for? Is ECL meant to be imperative? 24
25 Finally..R and HPCC A match made in Heaven? 25
26 Seen this Before? Data don t make any sense, we will have to resort to statistics 26
27 And the next thing you know 27
28 With HPCC and R you can. Data Sources Analyze, Mine, Model Big Data Processing Business Intelligence Unstructured Data HPCC R ECL Input Data Results Status ECL Results JDBC SQL Results Visualization RDBMS DW Structured Data provide an end to end modeling/analytical solution 28
29 Use the power of HPCC in R 29
30 How did we do it in R? S4 Classes -> Generates ECL code -> Executes on HPCC -> Results back to R 30
31 Q&A Thank You Web: info@hpccsystems.com Contact us:
Welcome. BIG Data & Analytics. Solving Big Data Problems with the Open Source HPCC Systems Platform. John Holt, PhD, Senior Architect - LexisNexis
Welcome BIG Data & Analytics Solving Big Data Problems with the Open Source HPCC Systems Platform John Holt, PhD, Senior Architect - LexisNexis Agenda 7:20-7:45pm: 7:45-7:55pm: 7:55-8:00pm: Presentation
More informationWelcome. Database Week - NYC. Tackling Big Data with HPCC Systems, Hadoop & Pentaho BI Suite
Welcome Database Week - NYC Tackling Big Data with HPCC Systems, Hadoop & Pentaho BI Suite Dr. Flavio Villanustre, VP Infrastructure & Products, LexisNexis & Head of HPCC Systems Agenda 6:30-6:45pm: 6:45-8:00pm:
More informationAtlanta R Users Group
Welcome Atlanta R Users Group Integration with R & HPCC Systems using rhpcc 3:00-3:05pm: 3:05-3:45pm: 3:45-3:55pm: 3:55-4:00pm: Agenda Welcome / Overview Flavio Villanustre, VP Technology Architect & Product
More informationHPCC Systems ECL and Distributed Machine Learning with the HPCC Systems Platform.
RED/082311 HPCC Systems ECL and Distributed Machine Learning with the HPCC Systems Platform Big Data and Machine Learning Extracting information from Big Data can be hard! Even understanding the structure
More informationManaging Big Data using New Innovations with HPCC Systems Bob Foreman Senior Software Engineer/ECL Instructor
Managing Big Data using New Innovations with HPCC Systems Bob Foreman Senior Software Engineer/ECL Instructor Twitter: #ATO2017 #HPCCSystems Welcome! HPCC Systems has been open source since June 2011 Although
More informationMaking Sense of Medicare Data. From Mining To Analytics
Making Sense of Medicare Data From Mining To Analytics 1 Tripfilms.com 2 The Achievement Network 3 Archway Health Advisors 4 Medicare Centers for Medicare & Medicaid Services (CMS) Medicare is a national
More informationEDA Toolkit for Data Scientists
EDA Toolkit for Data Scientists Srini Sivasubramanian, Senior Architect, Cognizant Joe Chambers, Senior Software Engineer, LexisNexis Presented at Big Data Week, Atlanta May 6, 2014 Data Analytics is 90%
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More information#mstrworld. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending. Presented by: Trishla Maru.
Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending Presented by: Trishla Maru Agenda Overview MultiSource Data Federation Use Cases Design Considerations Data
More informationHPCC Preflight and Certification. Boca Raton Documentation Team
HPCC Preflight and Certification Boca Raton Documentation Team HPCC Preflight and Certification Boca Raton Documentation Team Copyright We welcome your comments and feedback about this document via email
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationThe Download: Community Tech Talks Episode 7. September 14, 2017
The Download: Community Tech Talks Episode 7 September 14, 2017 Welcome! Please share: Let others know you are here with Ask questions! We will answer as many questions as we can following each speaker.
More informationThe Technology of the Business Data Lake. Appendix
The Technology of the Business Data Lake Appendix Pivotal data products Term Greenplum Database GemFire Pivotal HD Spring XD Pivotal Data Dispatch Pivotal Analytics Description A massively parallel platform
More informationHPCC Preflight and Certification. Boca Raton Documentation Team
HPCC Preflight and Certification Boca Raton Documentation Team HPCC Preflight and Certification Boca Raton Documentation Team Copyright We welcome your comments and feedback about this document via email
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationMigrating Oracle Databases To Cassandra
BY UMAIR MANSOOB Why Cassandra Lower Cost of ownership makes it #1 choice for Big Data OLTP Applications. Unlike Oracle, Cassandra can store structured, semi-structured, and unstructured data. Cassandra
More information745: Advanced Database Systems
745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.
More informationEvolution of Database Systems
Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second
More informationTutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access
Map/Reduce vs. DBMS Sharma Chakravarthy Information Technology Laboratory Computer Science and Engineering Department The University of Texas at Arlington, Arlington, TX 76009 Email: sharma@cse.uta.edu
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationPage 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24
Goals for Today" CS162 Operating Systems and Systems Programming Lecture 24 Capstone: Cloud Computing" Distributed systems Cloud Computing programming paradigms Cloud Computing OS December 2, 2013 Anthony
More information2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice
2014 年 3 月 13 日星期四 From Big Data to Big Value Infrastructure Needs and Huawei Best Practice Data-driven insight Making better, more informed decisions, faster Raw Data Capture Store Process Insight 1 Data
More informationThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,
More informationAppliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1
Appliances and DW Architecture John O Brien President and Executive Architect Zukeran Technologies 1 OBJECTIVES To define an appliance Understand critical components of a DW appliance Learn how DW appliances
More informationHadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop
Hadoop Open Source Projects Hadoop is supplemented by an ecosystem of open source projects Oozie 25 How to Analyze Large Data Sets in Hadoop Although the Hadoop framework is implemented in Java, MapReduce
More informationETL Transformations Performance Optimization
ETL Transformations Performance Optimization Sunil Kumar, PMP 1, Dr. M.P. Thapliyal 2 and Dr. Harish Chaudhary 3 1 Research Scholar at Department Of Computer Science and Engineering, Bhagwant University,
More information.. Cal Poly CPE/CSC 369: Distributed Computations Alexander Dekhtyar..
.. Cal Poly CPE/CSC 369: Distributed Computations Alexander Dekhtyar.. Overview of the Course Why Compute in a Distributed Environment? Distributed Computing Definition: Distributed Computing is an approach
More informationDATABASE SCALE WITHOUT LIMITS ON AWS
The move to cloud computing is changing the face of the computer industry, and at the heart of this change is elastic computing. Modern applications now have diverse and demanding requirements that leverage
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationOverview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::
Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional
More informationBigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation
BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data 2013 IBM Corporation A Big Data architecture evolves from a traditional BI architecture
More informationA Review Paper on Big data & Hadoop
A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College
More informationrelational Key-value Graph Object Document
NoSQL Databases Earlier We have spent most of our time with the relational DB model so far. There are other models: Key-value: a hash table Graph: stores graph-like structures efficiently Object: good
More informationOptimizing Performance for Partitioned Mappings
Optimizing Performance for Partitioned Mappings 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationData Management in Data Intensive Computing Systems - A Survey
IJSTE - International Journal of Science Technology & Engineering Volume 2 Issue 5 November 2015 ISSN (online): 2349-784X Data Management in Data Intensive Computing Systems - A Survey Mayuri K P Department
More informationCopyright 2013, Oracle and/or its affiliates. All rights reserved.
1 Oracle NoSQL Database: Release 3.0 What s new and why you care Dave Segleau NoSQL Product Manager The following is intended to outline our general product direction. It is intended for information purposes
More informationThe Download: Community Tech Talks Episode 5. May 25, 2017
The Download: Community Tech Talks Episode 5 May 25, 2017 Welcome! Please share: Let others know you are here with Ask questions! We will answer as many questions as we can following each speaker. Look
More information1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda
Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:
More informationQLIK INTEGRATION WITH AMAZON REDSHIFT
QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik
More informationDESIGNING FOR PERFORMANCE SERIES. Smokin Fast Queries Query Optimization
DESIGNING FOR PERFORMANCE SERIES Smokin Fast Queries Query Optimization Jennifer Smith, MCSE Agenda Statistics Execution plans Cached plans/recompilation Indexing Query writing tips New performance features
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationHANA Performance. Efficient Speed and Scale-out for Real-time BI
HANA Performance Efficient Speed and Scale-out for Real-time BI 1 HANA Performance: Efficient Speed and Scale-out for Real-time BI Introduction SAP HANA enables organizations to optimize their business
More informationExploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating
More informationMassively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data
Big Data Really Fast A Proven In-Memory Analytical Processing Platform for Big Data 2 Executive Summary / Overview: Big Data can be a big headache for organizations that have outgrown the practicality
More informationEvaluating Use of Data Flow Systems for Large Graph Analysis
Evaluating Use of Data Flow Systems for Large Graph Analysis Andy Yoo and Ian Kaplan, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S. Department of Energy by under
More informationR Language for the SQL Server DBA
R Language for the SQL Server DBA Beginning with R Ing. Eduardo Castro, PhD, Principal Data Analyst Architect, LP Consulting Moderated By: Jose Rolando Guay Paz Thank You microsoft.com idera.com attunity.com
More informationCloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationOPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS
OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS 1 Why GPUs? A Tale of Numbers 100x Performance Increase Infrastructure Cost Savings Performance 100x gains over traditional
More informationMicrosoft Big Data and Hadoop
Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common
More informationTypical size of data you deal with on a daily basis
Typical size of data you deal with on a daily basis Processes More than 161 Petabytes of raw data a day https://aci.info/2014/07/12/the-dataexplosion-in-2014-minute-by-minuteinfographic/ On average, 1MB-2MB
More informationHPCC JDBC Driver. Boca Raton Documentation Team
Boca Raton Documentation Team HPCC JDBC Driver Boca Raton Documentation Team We welcome your comments and feedback about this document via email to Please include Documentation
More information<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure
MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure Mario Beck (mario.beck@oracle.com) Principal Sales Consultant MySQL Session Agenda Requirements for
More informationHadoop. copyright 2011 Trainologic LTD
Hadoop Hadoop is a framework for processing large amounts of data in a distributed manner. It can scale up to thousands of machines. It provides high-availability. Provides map-reduce functionality. Hides
More informationDecisionCAMP 2016: Solving the last mile in model based development
DecisionCAMP 2016: Solving the last mile in model based development Larry Goldberg July 2016 www.sapiensdecision.com The Problem We are seeing very significant improvement in development Cost/Time/Quality.
More informationCrystal Reports. Overview. Contents. How to report off a Teradata Database
Crystal Reports How to report off a Teradata Database Overview What is Teradata? NCR Teradata is a database and data warehouse software developer. This whitepaper will give you some basic information on
More informationSTATE OF MODERN APPLICATIONS IN THE CLOUD
STATE OF MODERN APPLICATIONS IN THE CLOUD 2017 Introduction The Rise of Modern Applications What is the Modern Application? Today s leading enterprises are striving to deliver high performance, highly
More informationTOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY
Journal of Analysis and Computation (JAC) (An International Peer Reviewed Journal), www.ijaconline.com, ISSN 0973-2861 International Conference on Emerging Trends in IOT & Machine Learning, 2018 TOOLS
More informationLeveraging Customer Behavioral Data to Drive Revenue the GPU S7456
Leveraging Customer Behavioral Data to Drive Revenue the GPU way 1 Hi! Arnon Shimoni Senior Solutions Architect I like hardware & parallel / concurrent stuff In my 4 th year at SQream Technologies Send
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationInformation Management (IM)
1 2 3 4 5 6 7 8 9 Information Management (IM) Information Management (IM) is primarily concerned with the capture, digitization, representation, organization, transformation, and presentation of information;
More informationDevelop and test your Mobile App faster on AWS
Develop and test your Mobile App faster on AWS Carlos Sanchiz, Solutions Architect @xcarlosx26 #AWSSummit 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The best mobile apps are
More informationOracle Data Integrator 12c: Integration and Administration
Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive
More informationHow Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,
How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationCloud Analytics and Business Intelligence on AWS
Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse
More informationIntroduction to Data Science Day 2
Introduction to Data Science Day 2 Data Matters Summer workshop series in data science Sponsored by the Odum Institute, RENCI, and NCDS Thomas M. Carsey carsey@unc.edu Examples of Data Science Google Flu
More informationDatabases 2 (VU) ( / )
Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:
More informationREGULATORY REPORTING FOR FINANCIAL SERVICES
REGULATORY REPORTING FOR FINANCIAL SERVICES Gordon Hughes, Global Sales Director, Intel Corporation Sinan Baskan, Solutions Director, Financial Services, MarkLogic Corporation Many regulators and regulations
More information<Insert Picture Here> DBA s New Best Friend: Advanced SQL Tuning Features of Oracle Database 11g
DBA s New Best Friend: Advanced SQL Tuning Features of Oracle Database 11g Peter Belknap, Sergey Koltakov, Jack Raitto The following is intended to outline our general product direction.
More informationHadoop An Overview. - Socrates CCDH
Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected
More informationFLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM
FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM RECOMMENDATION AND JUSTIFACTION Executive Summary: VHB has been tasked by the Florida Department of Transportation District Five to design
More informationMaking the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor
Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack Chief Architect RainStor Agenda Importance of Hadoop + data compression Data compression techniques Compression,
More information@Pentaho #BigDataWebSeries
Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of
More informationGoing beyond MapReduce
Going beyond MapReduce MapReduce provides a simple abstraction to write distributed programs running on large-scale systems on large amounts of data MapReduce is not suitable for everyone MapReduce abstraction
More informationSystems Analysis & Design
Systems Analysis & Design Dr. Arif Sari Email: arif@arifsari.net Course Website: www.arifsari.net/courses/ Slide 1 Adapted from slides 2005 John Wiley & Sons, Inc. Slide 2 Course Textbook: Systems Analysis
More informationAcquiring Big Data to Realize Business Value
Acquiring Big Data to Realize Business Value Agenda What is Big Data? Common Big Data technologies Use Case Examples Oracle Products in the Big Data space In Summary: Big Data Takeaways
More informationBlended Learning Outline: Cloudera Data Analyst Training (171219a)
Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills
More informationOracle Data Integrator 12c: Integration and Administration
Oracle University Contact Us: +34916267792 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive data integration platform
More informationAgenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache
Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,
More informationRyan Stephens. Ron Plew Arie D. Jones. Sams Teach Yourself FIFTH EDITION. 800 East 96th Street, Indianapolis, Indiana, 46240
Ryan Stephens Ron Plew Arie D. Jones Sams Teach Yourself FIFTH EDITION 800 East 96th Street, Indianapolis, Indiana, 46240 Table of Contents Part I: An SQL Concepts Overview HOUR 1: Welcome to the World
More informationA Glimpse of the Hadoop Echosystem
A Glimpse of the Hadoop Echosystem 1 Hadoop Echosystem A cluster is shared among several users in an organization Different services HDFS and MapReduce provide the lower layers of the infrastructures Other
More informationThe future of database technology is in the clouds
Database.com Getting Started Series White Paper The future of database technology is in the clouds WHITE PAPER 0 Contents OVERVIEW... 1 CLOUD COMPUTING ARRIVES... 1 THE FUTURE OF ON-PREMISES DATABASE SYSTEMS:
More informationELTMaestro for Spark: Data integration on clusters
Introduction Spark represents an important milestone in the effort to make computing on clusters practical and generally available. Hadoop / MapReduce, introduced the early 2000s, allows clusters to be
More informationStoring data in databases
Storing data in databases The webinar will begin at 3pm You now have a menu in the top right corner of your screen. The red button with a white arrow allows you to expand and contract the webinar menu,
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationMastering Data Warehouse Aggregates Solutions For Star Schema Performance
Mastering Data Warehouse Aggregates Solutions For Star Schema Performance Star Schema The Complete Reference Christopher Adamson Amazon. Mastering Data Warehouse Aggregates, Solutions for Star Schema Performance
More informationAutomated Netezza Migration to Big Data Open Source
Automated Netezza Migration to Big Data Open Source CASE STUDY Client Overview Our client is one of the largest cable companies in the world*, offering a wide range of services including basic cable, digital
More informationDatabase Solution in Cloud Computing
Database Solution in Cloud Computing CERC liji@cnic.cn Outline Cloud Computing Database Solution Our Experiences in Database Cloud Computing SaaS Software as a Service PaaS Platform as a Service IaaS Infrastructure
More informationINTERMEDIATE SQL GOING BEYOND THE SELECT. Created by Brian Duffey
INTERMEDIATE SQL GOING BEYOND THE SELECT Created by Brian Duffey WHO I AM Brian Duffey 3 years consultant at michaels, ross, and cole 9+ years SQL user What have I used SQL for? ROADMAP Introduction 1.
More informationSAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine
SAP IQ Software16, Edge Edition The Affordable High Performance Analytical Database Engine Agenda Agenda Introduction to Dobler Consulting Today s Data Challenges Overview of SAP IQ 16, Edge Edition SAP
More informationSAP Crystal Reports and SAP HANA: Options and Opportunities (0301)
September 9 11, 2013 Anaheim, California SAP Crystal Reports and SAP HANA: Options and Opportunities (0301) Jaclyn Churcher Learning Points Connectivity options to SAP HANA for SAP Crystal Reports Two
More informationOracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA
Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Keywords: Big Data, Oracle Big Data Appliance, Hadoop, NoSQL, Oracle
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationSQL stands for Structured Query Language. SQL is the lingua franca
Chapter 3: Database for $100, Please In This Chapter Understanding some basic database concepts Taking a quick look at SQL Creating tables Selecting data Joining data Updating and deleting data SQL stands
More informationETL Best Practices and Techniques. Marc Beacom, Managing Partner, Datalere
ETL Best Practices and Techniques Marc Beacom, Managing Partner, Datalere Thank you Sponsors Experience 10 years DW/BI Consultant 20 Years overall experience Marc Beacom Managing Partner, Datalere Current
More informationOverview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL
* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL
More information