Aster. Now. Future. Why.
|
|
- Joy Parks
- 5 years ago
- Views:
Transcription
1 Aster. Now. Future. Why. Michael McIntire CTO Teradata Labs, Aster #TDPARTNERS16 GEORGIA WORLD CONGRESS CENTER
2 Who is that McIntire guy anyway Extreme Scale MPP platforms & complex data systems The Seven year stretch Seven years: Geographic Information Systems Seven years: Teradata in the 90 s DB Architect Seven years: independent EDW consultant Seven years: ebay EDW Chief Architect And some between time... Yahoo, Sears, Prime, EDS...
3 Roles Market
4 Role One: Ideas Business Process Algorithm and Knowledge Finding Business Process Growth Objective Discover and Prove New: Classic and Citizen Data Scientists REVIEW DESIGN PLAN BUILD TEST DRIVE Speed of Ideation Dynamic Connectivity Broad analytic capability Scalable performance EVALUATE
5 Role Two: Operationalization IT Process Given a Known Hypothesis IT - Fitting things together Generalized Architecture Secure, Repeatable, predictable Platform for Production IT Process Control for Cost Architecture Adherence Enterprise Features and Resiliency Fed by the Business process Production Platform
6 Platform view of the market Expanding Compute Engines Narrower Focus More capable point solutions Diverging Storage and Compute APIs are the Enabler Consolidating Storage Engines Rise of Good enough Compute Storage Former Platforms as Engines... Pluggable Infrastructures CEPH Local
7 Ecosystem is the Platform Connecting Multiple Platforms at Runtime Aster Compute within Hadoop Ecosystem First Class Citizen YARN Resource mgmt Native Read/Write Storage MPP Interoperability with External Engines Native Aster and Spark integration Teradata + Aster + Hadoop Accellerate Business Decision making with Platform Interoperability
8 Engine Workflow Integration RIGHT PLAY WORK & PLAY WORK What kind of play? Play Hard or Money? Creative money? FACEBOOK GAMING EXTREME JUST NOTES NO YES Why not get Snake action game First Shooter Be an Actor What kind of Work? An Opad? FALSE BROKEN Snake action game NO YES NO YES RIGHT NOTEPAD Play Hard or Money? Play Hard or Money? Inputs FALSE Getting A success NOTEPAD NO RIGHT Outcome YES Get a Success RIGHT FALSE FALSE NOTEPAD NO NO Get a Succes s? YES YES Every Analytic Engine will have one Just like visualization First Generation: AppCenter Workflows will Mix Paradigms Set processing + Procedural So What s New: Command Language Implementation Server Side implementation Visual Tools Layered on Commands Data Cooking tools already there Tech issue: Exposing Logic inside a set statement across proc bounds. Predicate Search for example Optimization across loop/branch constructs
9 Aster Next
10 Objective - Aster in the Cloud Aster 6.2x on Appliance Aster on Hadoop (AsterX 7.0) Compute Aster 6.20 on AWS Aster 7.x on Cloud Managed, Public Not likely to see: Aster on Hadoop on the Cloud CEPH Storage Local
11 AsterX Evolution Aster Execution Engine Aster is a Compute Engine Spill to disk temp storage only Compute Non-Persistant Storage Access via Connectors QueryGrid2 CEPH Storage Local
12 6.20 Worker Aster Managed Storage Worker Node vworker ASTER - vworker Many ASTER per - vworker Node Cluster Services 1 per node Map Reduce Engine Graph Engine User Edge Node Queen Exec Relational Engine Aster Aster Compression Replication Aster Aster Local Storage Node Local Storage OS Managed
13 6.50 Worker 100% Storage Hadoop Worker Node vworker m/node ASTER - vworker ASTER - vworker Relational Engine Cluster Services 1 per node Map Reduce Engine Cluster Wide Storage Interface Graph Engine Hado YARN op Name Node Aste r User Edge Node Queen Exec Distributed File System ( on HDP, CDH) Compression Replication Security Management Cluster Cluster Wide Storage: :/aster/vworkerx*y
14 AsterX 7.0 Worker Local Temp Hadoop Worker Node vworker m/node ASTER - vworker ASTER - vworker Relational Engine Cluster Services 1 per node Map Reduce Engine Graph Engine Hado YARN op Name Node Aste r User Edge Node Queen Exec NODE LOCAL Storage: /aster/vworkerx*y Cluster Distributed Persistant Data System Hive, Teradata,
15 Aster 7.x Architecture
16 AsterX 7.0 Cluster Architecture Internally: Daemon based implementation Always on - not per job instantiation vworker Deployment: Cluster Subset vworker count is Static per Instance vworkers can be moved Expand / contract Hadoop Cluster without Aster intervention Architected as a SUBSET User Edge Node App B Queen User Edge Node App A Queen Head Nodes Hadoop Services Name Node Queen edge node (required) Security and Connectivity (++ eliminate bridge) Aster A Aster B Map/ Reduce Libraries on all nodes For Simplicity and Latency reasons Aster B Hive Aster A Worker Nodes
17 YARN Managed Resource Full Hadoop Services Integration Injects Third Party Management Reverse order Worker setup/teardown Yarn Managed Edge Node User ASTER Queen Hadoop Services Ambari Aster Yarn Server Yarn Client 2 1 Zookeeper YARN Aster Cluster still manages State Consul implementation YARN Managed Worker Node 4 ASTER vworker 3 Aster
18 AsterX 7.0 Consul State Management Consul: State management, configurations Simple, always on key-value store Similar to ZooKeeper (Dir/Key structure) Consul User Queen Aster 7.0 use: Common, resilient store port mapping Future use dynamic mapping of ports Dynamic worker movement Consul is required AX7.0 is a Private Implementation Future use of existing Consul possible If not available Aster will not come up Aster Temp Aster Temp
19 AsterX 7.0 Cluster Configuration Subset of nodes: explicitly or system decides Exact # nodes will fit node capacity i.e if the nodes are powerful there will be fewer nodes used Alternate maxusage yarn parm for temp/io heavy apps Equivalent of Prepared state, still needs activate Port Configuration startup time: port conflicts can be resolved Re-address when new Cluster SW is installed User Queen No add/remove node functionality Stand up another cluster Point to the data No data migration... Aster Temp Aster Temp
20 AsterX 7.0 Cluster Startup Install, startup - separate steps Install libraries, basic directories Startup plumbs all connections Setup vworkers Connect Queen and Workers Shutdown - cleans up the workers Temp data removed Reuse temp : Future Optimization Case All via Aster Yarn Client Commands Equivalent to Aster Activate Consul Aster Temp User Queen Aster Temp
21 AsterX 7.0 Worker Local Temp Only Persistence in Hive, hcat Access: Connectors + QueryGrid Read at script Start / Write at script End Objects managed by user Same semantics as a Database Persist for Duration of Cluster No Replication & Compression Redistribution remains Cluster Hado YARN op Name Node Hadoop Node Hadoop Node Hadoop Aster Connectors Node Query Grid vworker m/node ASTER - vworker ASTER SQL - vworker M/R API Engine Engine NODE LOCAL Storage User Edge Node Quee n Exec Graph Engine
22 AsterX Local Storage
23 Advanced Analytics Enabled by SQL (for Data/Business Analysts) Once you know how to use on Aster SQL command you have learned how to use them all! CREATE TABLE complaints_nb_model (PARTITION KEY(token)) AS SELECT token, SUM(crash) AS crash, SUM(no_crash) AS no_crash FROM NaiveBayesText ( ON complaints TEXT_COLUMN ('text_data') CATEGORY_COLUMN ('category') CATEGORIES ('crash', 'no_crash') ) GROUP BY token; ANSI SQL Statement SQL MR Statement Data Source SQL-MR Predicates
24 AsterX 7.0 Storage Examples: Analytic Temp Tables and Hive Perm Tables Foreign Server Read/Write w SQL-H Distributed Storage: Hive, hcat Hive Table Web log Perm Storage Local Spill to disk AX AX AX SQL Temp Temp Queen TEMP Storage
25 AsterX Storage Before AX is running Hive tables: Hive_t1, Hive_t2, Hive_t3 Flat Files: weblog.txt Foreign Server Read/Write w SQL-H Distributed Storage: Hive, hcat Hive T1 Hive T2 Hive T3 Web log Perm Storage Local Spill to disk AX AX AX SQL Temp Temp Queen TEMP Storage
26 AsterX Storage CTAS analytic: Aster_analytic_t1 CTAS Temp: Aster_session_temp_t2 Foreign Server Read/Write w SQL-H Distributed Storage: Hive, hcat Hive T1 Hive T2 Hive T3 Web log Perm Storage Local Spill to disk AX AX AX SQL Temp Temp T1 T2 TEMP Storage Queen Uptime Lifetime Session Lifetime
27 AsterX Storage SQL query phase temp tables Temp_phase_1 (the real name would be like _tmp_ ) Temp_phase_2 Query_output Foreign Server Read/Write w SQL-H Distributed Storage: Hive, hcat Hive T1 Hive T2 Hive T3 Web log Perm Storage Local Spill to disk AX AX AX SQL Temp Temp Queen T1 T2 TP1 TP1 QO Query Lifetime TEMP Storage
28 AsterX Storage CTAS To Hive Alan_dailyreport_06_24_2015 Foreign Server Read/Write w SQL-H Distributed Storage: Hive, hcat Hive T1.. Hive Web Alan T3 log DR Perm Storage Local Spill to disk AX AX AX SQL Temp Temp T1 T2 TEMP Storage Queen
29 AsterX Storage After Aster shutdown Foreign Server Read/Write w SQL-H Distributed Storage: Hive, hcat Hive T1.. Hive Web Alan T3 log DR Perm Storage Local Spill to disk AX AX AX SQL Temp Temp Queen TEMP Storage
30 AsterX Failure & Recovery
31 Metadata persistance Admin DDL Checkpoint Saves checkpoint file to disk Manually done via ncli command Restart causes checkpoint to be replayed Checkpoint files are valid on any AsterX instance* Check pointed Users, Roles, Databases, Schemas foreign server definitions packaged analytics models and functions grant privileges on above Not Check pointed Tables, views, constraints, indexes R scripts installed on the server side user-installed files and SQL/MR functions user scripts for vacuum or daily jobs
32 AsterX 7.0 Failure Recovery Node Failure = loss of analytic tables Edge Node Queen User Worker Node AND/or vworker System will allocate new node Conversation with YARN Move vworkers/node Come to prepared state Activate automatically Before After Restart Aster Aster Temp Temp Temp Edge Node Queen User ALL TEMP Data is LOST. Vworker is treated as node failure Aster Aster Temp Temp Temp
33 AsterX 7.0 Failure Recovery Queen Fails Recovery is Repair DDL Gen Unwind User Edge Node App B Queen User Edge Node App A Queen Head Nodes Hadoop Services Name Node If unrecoverable Delete cluster cluster create Aster A Aster A Map/ Reduce Aster B Hive Aster A Worker Nodes
34 AsterX 7.0 Failure Recovery Other issues Same behaviors, different impact User Edge Node App B Queen User Edge Node App A Queen Head Nodes Hadoop Services Name Node DDL Gen SQL Script of the dictionary. Aster A State is lost Temp data will be deleted on restart Aster A Map/ Reduce Aster B Hive Aster A Worker Nodes
35 AsterX 7.0 Expansion New Instance. Got that? User Edge Node User Edge Node Head Nodes Hadoop Services Create New Aster Instance Setup Foreign Server Constructs App B Queen App A Queen Name Node Go Aster A Reference Existing Persistent Data Aster A Map/ Reduce Aster B Hive Aster A Worker Nodes
36 AsterX 7.x Configuration Options Many, many more options User User Head Nodes Single cluster per workload Or... Xmas sized Cluster... Monthly Term licensing??? Edge Node App B Queen Edge Node App A Queen Hadoop Services Name Node Internal Chargeback? Aster A LOB specific Aster Instance Delegation of adminstration... Simplified CapEx / OpEx administration Aster B Hive Aster A Aster A Map/ Reduce Worker Nodes
37 Aster Persistent Storage & Access - Query Grid Two
38 Aster 7.10 QueryGrid Two - Next Gen High speed TD, Presto, Hadoop connectivity Cluster to Cluster connectivity Point to Point model not hub (Kafka is a Hub) Common Framework included in each product Communications, State, Error Management, Data Conversion Network Protocol, Parallelism, Distribution and more Single cost implementation Simple set of Get/Set operations specific to the implementation TD TD TD Uses full matrix communications in first release Blocks of Tuples are distributed round robin Full Communication Matrix Session Data is MultiPlexed Multiple sessions use same communications channel QG2 TD TD TD Aster Aster Aster Aster Current Connectors
39 Aster - Foreign Server Syntax Support DML syntax - external objects Teradata s Foreign Server Syntax Aster & source: Bi-directional data movement Load_from_Hcatalog, Teradata, etc Load_to_Hcatalog, Teradata, etc Use: SEL,INS,Views and CTAS Query pushdown, Query time special & override of parameters also supported Grant & Provoke USAGE & EXECUTE privileges CREATE FOREIGN SERVER name USING server( ') port('1234') DO IMPORT WITH Load_from_XYZ USING DO EXPORT WITH Load_to_XYZ USING SELECT * FROM table@foreignserver; INSERT INTO table@foreignserver SELECT id, value FROM astertable; WITH FOREIGN SERVER fsalias as (foreignserver using username('foo') password('bar') ) SELECT * FROM table1@fsalias, table2@fsalias ;
40 AsterX 7.0 Scripting Pattern Changes Existing Customers - Implementing persistence in AX Best practices in script writing Disable Failure mode until after DDL commands Truncate Tables (delete from all) Create Tables inline (keeps code in one place, enables operator to drop table and not have to change production code) Cascading Insert/Selects/CTAS Pour over tables for failure/locking latency Option of creating a cluster sized just for this workload
41 Aster 7.x Other Cool Stuff
42 AsterX 7.0 Planner Changes Improve plans for external table (ET) queries External tables are the norm in AX (exception in AD) 7.0 Planner Hive Meta-store to get table size AD 6.50 planner view of ETs AX 7.0 planner view of ETs??? 3,000,000,000 6,000 4 Region Sales Store Region Store Sales
43 AsterX 7.0 Planner Changes Planner Hive Meta-store to get base table stats Only table rowcount and size. No columns/histograms Recognize small ETs and replicate as dim tables Save on costly data repartitioning Improve join order optimization Avoid early theta joins and dataflow multipliers Better skew avoidance Avoid partitioning on low cardinality columns
44 AsterX 7.0 Multi-Tenancy User Edge Node App B Queen User Edge Node App A Queen Head Nodes Hadoop Services Name Node Hadoop is an Execution Environment Aster must conform to Hadoop s capabilities Hadoop supports Sessions and Aster supports Sessions Ergo how does Aster run inside Hadoop... Aster is a Daemon based architecture... Aster A Aster B Aster B Hive Aster A Aster A Map/Reduce Worker Nodes Multi-Tenancy in AX7.0 Co-exist with other Hadoop Applications Port Mapping is the largest single problem
45 Thank You Questions/Comments Follow Me DataOcean Rate This Session # 598 with the PARTNERS Mobile App Remember To Share Your Virtual Passes 45
46 Aster 7.10 What s NEXT (where s the cool graphic???)
47 Aster 7.10 Containers (using Docker) Objective: Architecture using Containers (what processes go where) - Hadoop Major Impact to Startup, Distribution, Process Management Reality: Extraordinarily difficult on Hadoop Required complete rewrite of all process management Theory issue problems Process Allocation and management Foundation of AsterX on other platforms - GCP, AWS, Azure... Open decision on long term Hadoop implementation
48 Aster 7.10 Planner Improvements Pushdown predicate to Hive Automatic in 7.10 (manual in 7.0) Reduces data movement Utilizes store format filters Cuts down IO & CPU Filter Increase dependence on Stats Foreign System Get/Set Scan Scan + Filter
49 Aster 7.10 Planner Improvements Planner pushdown sub-queries to Hive Intelligent push down (semantics, type, size) Minimal data movement Better stats & data distribution utilization NPath NPath T4 GB T4 ET1 ET2 ET3 GB ET1 ET2 ET3
50 Aster 7.10 Aster Spark Utilize Spark execution framework for Aster Aster query operator as Spark functions / scripts Uses Spark MLlib analytics libraries Customers write functions in Spark using Uses familiar SQL/MR language framework Support multiple Spark clusters (ex: same query) Parallel data transfer (Sockets or ) Spark Job Monitoring
51 Aster 7.10 Spark Aster Aster table/queries use Spark Data frame API Read Aster tables/queries in parallel Can cache data on disk sqlcontext.readastertable ( <table-name>, cache-on-disk>, ) sqlcontext.readasterusingquery ( <query>, <cache-on-disk>, ) Write Data frames Tables in parallel (overwrite / append mode) <dataframe>.writetoastertable( table-name, <mode>, )
52 Existing Framework (Analytic flow) Batch-mode Processing Training Data Queries (Test Data) Appropriate Action Analytics (Model Builder) Model Analytics (Predictor) Prediction Requests Prediction Response Score ASTER FRAMEWORK
53 Proposal: Split the processing Real-time Scoring Aster Platform - Prediction analytics isolated from training (modeling) RealTime Platform -Asynchronous feedback between the two frameworks. Training Data Queries (Test Data) Appropriate Action Analytics (Model Builder) Model Prediction Requests Analytics (Predictor) Prediction Response Score
54 Aster Model Language Generator Generates AML File from Model Table Training Data DRIVER FUNCTION Real-time Scoring Queries (Test Data) Appropriate Action Analytics (Model Builder) Model AML Generator Prediction Requests Analytics (Predictor) Prediction Response Score
55 Scorer Execution Flow AML File Your Real Time Framework Model Type Model Definition Model Data Request Parameters Request Definition Transport as Request Prediction RequestsAppropriate Action Java JAR file Response Configurator Score Prediction Response
MapR Enterprise Hadoop
2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS
More information1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda
Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:
More informationPredictive Analytics using Teradata Aster Scoring SDK
Predictive Analytics using Teradata Aster Scoring SDK Faraz Ahmad Software Engineer, Teradata #TDPARTNERS16 GEORGIA WORLD CONGRESS CENTER At Teradata, we believe. Analytics and data unleash the potential
More informationBig Data Analytics using Apache Hadoop and Spark with Scala
Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important
More informationAsanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks
Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data
More informationHow Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,
How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS
More informationHAWQ: A Massively Parallel Processing SQL Engine in Hadoop
HAWQ: A Massively Parallel Processing SQL Engine in Hadoop Lei Chang, Zhanwei Wang, Tao Ma, Lirong Jian, Lili Ma, Alon Goldshuv Luke Lonergan, Jeffrey Cohen, Caleb Welton, Gavin Sherry, Milind Bhandarkar
More informationApril Copyright 2013 Cloudera Inc. All rights reserved.
Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on
More informationThe Future of Analytics in the Cloud
The Future of Analytics in the Cloud Ashutosh Tiwary VP/GM of Cloud, Teradata #TDPARTNERS16 GEORGIA WORLD CONGRESS CENTER At Teradata, we believe Analytics and data unleash the potential of great companies
More informationSQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism
Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and
More informationData Analytics at Logitech Snowflake + Tableau = #Winning
Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationApache HAWQ (incubating)
HADOOP NATIVE SQL What is HAWQ? Apache HAWQ (incubating) Is an elastic parallel processing SQL engine that runs native in Apache Hadoop to directly access data for advanced analytics. Why HAWQ? Hadoop
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationHadoop An Overview. - Socrates CCDH
Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationEsgynDB Enterprise 2.0 Platform Reference Architecture
EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed
More informationHadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here
Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here 2013-11-12 Copyright 2013 Cloudera
More informationTuning Enterprise Information Catalog Performance
Tuning Enterprise Information Catalog Performance Copyright Informatica LLC 2015, 2018. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationHDInsight > Hadoop. October 12, 2017
HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationHadoop. Introduction / Overview
Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationDeveloping Enterprise Cloud Solutions with Azure
Developing Enterprise Cloud Solutions with Azure Java Focused 5 Day Course AUDIENCE FORMAT Developers and Software Architects Instructor-led with hands-on labs LEVEL 300 COURSE DESCRIPTION This course
More informationApache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source
Apache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC https://ignite.apache.org @apacheignite @dsetrakyan Agenda About In- Memory Computing Apache Ignite
More informationMySQL Cluster for Real Time, HA Services
MySQL Cluster for Real Time, HA Services Bill Papp (bill.papp@oracle.com) Principal MySQL Sales Consultant Oracle Agenda Overview of MySQL Cluster Design Goals, Evolution, Workloads,
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationHortonworks and The Internet of Things
Hortonworks and The Internet of Things Dr. Bernhard Walter Solutions Engineer About Hortonworks Customer Momentum ~700 customers (as of November 4, 2015) 152 customers added in Q3 2015 Publicly traded
More informationExam Questions
Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationImpala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam
Impala A Modern, Open Source SQL Engine for Hadoop Yogesh Chockalingam Agenda Introduction Architecture Front End Back End Evaluation Comparison with Spark SQL Introduction Why not use Hive or HBase?
More informationHortonworks Data Platform
Hortonworks Data Platform Workflow Management (August 31, 2017) docs.hortonworks.com Hortonworks Data Platform: Workflow Management Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationOracle Big Data. A NA LYT ICS A ND MA NAG E MENT.
Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More information50 Must Read Hadoop Interview Questions & Answers
50 Must Read Hadoop Interview Questions & Answers Whizlabs Dec 29th, 2017 Big Data Are you planning to land a job with big data and data analytics? Are you worried about cracking the Hadoop job interview?
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationApache HBase Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel
Apache HBase 0.98 Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel Who am I? Committer on the Apache HBase project Member of the Big Data Research
More informationHadoop & Big Data Analytics Complete Practical & Real-time Training
An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE
More informationWebinar Series TMIP VISION
Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing
More informationIn-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet
In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Big data analytics / machine learning 6+ years
More informationStinger Initiative. Making Hive 100X Faster. Page 1. Hortonworks Inc. 2013
Stinger Initiative Making Hive 100X Faster Page 1 HDP: Enterprise Hadoop Distribution OPERATIONAL SERVICES Manage AMBARI & Operate at Scale OOZIE HADOOP CORE FLUME SQOOP DATA SERVICES PIG Store, HIVE Process
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationThe Evolution of Big Data Platforms and Data Science
IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering
More informationHow to Keep UP Through Digital Transformation with Next-Generation App Development
How to Keep UP Through Digital Transformation with Next-Generation App Development Peter Sjoberg Jon Olby A Look Back, A Look Forward Dedicated, data structure dependent, inefficient, virtualized Infrastructure
More informationApache Flink- A System for Batch and Realtime Stream Processing
Apache Flink- A System for Batch and Realtime Stream Processing Lecture Notes Winter semester 2016 / 2017 Ludwig-Maximilians-University Munich Prof Dr. Matthias Schubert 2016 Introduction to Apache Flink
More informationApproaching the Petabyte Analytic Database: What I learned
Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may
More informationData 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.
17-18 March, 2018 Beijing Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020 Today, 80% of organizations
More informationexam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0
70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to
More informationMAPR DATA GOVERNANCE WITHOUT COMPROMISE
MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016
Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation
More informationCHAPTER 3 Implementation of Data warehouse in Data Mining
CHAPTER 3 Implementation of Data warehouse in Data Mining 3.1 Introduction to Data Warehousing A data warehouse is storage of convenient, consistent, complete and consolidated data, which is collected
More informationTechno Expert Solutions An institute for specialized studies!
Course Content of Big Data Hadoop( Intermediate+ Advance) Pre-requistes: knowledge of Core Java/ Oracle: Basic of Unix S.no Topics Date Status Introduction to Big Data & Hadoop Importance of Data& Data
More informationFluentd + MongoDB + Spark = Awesome Sauce
Fluentd + MongoDB + Spark = Awesome Sauce Nishant Sahay, Sr. Architect, Wipro Limited Bhavani Ananth, Tech Manager, Wipro Limited Your company logo here Wipro Open Source Practice: Vision & Mission Vision
More informationSwimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad
Swimming in the Data Lake Presented by Warner Chaves Moderated by Sander Stad Thank You microsoft.com hortonworks.com aws.amazon.com red-gate.com Empower users with new insights through familiar tools
More informationMap-Reduce. Marco Mura 2010 March, 31th
Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of
More informationDesigning your BI Architecture
IBM Software Group Designing your BI Architecture Data Movement and Transformation David Cope EDW Architect Asia Pacific 2007 IBM Corporation DataStage and DWE SQW Complex Files SQL Scripts ERP ETL Engine
More informationSAS, Sun, Oracle: On Mashups, Enterprise 2.0 and Ideation
SAS, Sun, Oracle: On Mashups, Enterprise 2.0 and Ideation Charlie Garry, Director, Product Manager, Oracle Corporation Charlie Garry, Director, Product Manager, Oracle Corporation Paul Kent, Vice President,
More informationApache Hive for Oracle DBAs. Luís Marques
Apache Hive for Oracle DBAs Luís Marques About me Oracle ACE Alumnus Long time open source supporter Founder of Redglue (www.redglue.eu) works for @redgluept as Lead Data Architect @drune After this talk,
More informationNetezza The Analytics Appliance
Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation Thought for
More informationApache Ignite and Apache Spark Where Fast Data Meets the IoT
Apache Ignite and Apache Spark Where Fast Data Meets the IoT Denis Magda GridGain Product Manager Apache Ignite PMC http://ignite.apache.org #apacheignite #denismagda Agenda IoT Demands to Software IoT
More informationShark: SQL and Rich Analytics at Scale. Yash Thakkar ( ) Deeksha Singh ( )
Shark: SQL and Rich Analytics at Scale Yash Thakkar (2642764) Deeksha Singh (2641679) RDDs as foundation for relational processing in Shark: Resilient Distributed Datasets (RDDs): RDDs can be written at
More informationUnderstanding the latent value in all content
Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence
More informationIntroduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data
Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationSQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024
Current support level End Mainstream End Extended SQL Server 2005 SQL Server 2008 and 2008 R2 SQL Server 2012 SQL Server 2005 SP4 is in extended support, which ends on April 12, 2016 SQL Server 2008 and
More informationGain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.
Gain Insights From Unstructured Data Using Pivotal HD 1 Traditional Enterprise Analytics Process 2 The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources
More informationDeep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services
Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationCIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin. Presented by: Suhua Wei Yong Yu
CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin Presented by: Suhua Wei Yong Yu Papers: MapReduce: Simplified Data Processing on Large Clusters 1 --Jeffrey Dean
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Spark Technology Overview and Streaming Workload Use Cases Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationDATABASE SCALE WITHOUT LIMITS ON AWS
The move to cloud computing is changing the face of the computer industry, and at the heart of this change is elastic computing. Modern applications now have diverse and demanding requirements that leverage
More informationHadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)
Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:
More informationCloud + Big Data Putting it all Together
Cloud + Big Data Putting it all Together Even Solberg 2009 VMware Inc. All rights reserved 2 Big, Fast and Flexible Data Big Big Data Processing Fast OLTP workloads Flexible Document Object Big Data Analytics
More informationExadata Database Machine: 12c Administration Workshop Ed 2 Duration: 5 Days
Exadata Database Machine: 12c Administration Workshop Ed 2 Duration: 5 Days This Exadata Database Machine: 12c Administration Workshop introduces you to Oracle Exadata Database Machine. Explore the various
More informationHadoop Development Introduction
Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand
More informationCertified Big Data Hadoop and Spark Scala Course Curriculum
Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills
More informationYARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa
YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa ozawa.tsuyoshi@lab.ntt.co.jp ozawa@apache.org About me Tsuyoshi Ozawa Research Engineer @ NTT Twitter: @oza_x86_64 Over 150 reviews in 2015
More informationExadata Database Machine: 12c Administration Workshop Ed 2
Oracle University Contact Us: 00 9714 390 9050 Exadata Database Machine: 12c Administration Workshop Ed 2 Duration: 5 Days What you will learn This Exadata Database Machine: 12c Administration Workshop
More informationMicrosoft vision for a new era
Microsoft vision for a new era United platform for the modern service provider MICROSOFT AZURE CUSTOMER DATACENTER CONSISTENT PLATFORM SERVICE PROVIDER Enterprise-grade Global reach, scale, and security
More informationDelving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture
Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases
More informationBest Practices and Performance Tuning on Amazon Elastic MapReduce
Best Practices and Performance Tuning on Amazon Elastic MapReduce Michael Hanisch Solutions Architect Amo Abeyaratne Big Data and Analytics Consultant ANZ 12.04.2016 2016, Amazon Web Services, Inc. or
More informationMySQL Cluster Student Guide
MySQL Cluster Student Guide D62018GC11 Edition 1.1 November 2012 D79677 Technical Contributor and Reviewer Mat Keep Editors Aju Kumar Daniel Milne Graphic Designer Seema Bopaiah Publishers Sujatha Nagendra
More informationCloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 3 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationOracle GoldenGate 12c
Oracle GoldenGate 12c (12.1.2.0 and 12.1.2.1) Joachim Jaensch Principal Sales Consultant Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information
More informationOracle 1Z0-515 Exam Questions & Answers
Oracle 1Z0-515 Exam Questions & Answers Number: 1Z0-515 Passing Score: 800 Time Limit: 120 min File Version: 38.7 http://www.gratisexam.com/ Oracle 1Z0-515 Exam Questions & Answers Exam Name: Data Warehousing
More informationFusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic
WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationMigrate from Netezza Workload Migration
Migrate from Netezza Automated Big Data Open Netezza Source Workload Migration CASE SOLUTION STUDY BRIEF Automated Netezza Workload Migration To achieve greater scalability and tighter integration with
More informationManaging Oracle Database 12c with Oracle Enterprise Manager 12c
Managing Oracle Database 12c with Oracle Enterprise Manager 12c The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 Managing Oracle Database 12c with Oracle Enterprise Manager 12c Martin
More informationInstalling Data Sync Version 2.3
Oracle Cloud Data Sync Readme Release 2.3 DSRM-230 May 2017 Readme for Data Sync This Read Me describes changes, updates, and upgrade instructions for Data Sync Version 2.3. Topics: Installing Data Sync
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationCHAPTER. Oracle Database 11g Architecture Options
CHAPTER 1 Oracle Database 11g Architecture Options 3 4 Part I: Critical Database Concepts Oracle Database 11g is a significant upgrade from prior releases of Oracle. New features give developers, database
More informationDeveloping Microsoft Azure Solutions: Course Agenda
Developing Microsoft Azure Solutions: 70-532 Course Agenda Module 1: Overview of the Microsoft Azure Platform Microsoft Azure provides a collection of services that you can use as building blocks for your
More information