Big Data with Hadoop Ecosystem

Similar documents
Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Big Data & Hadoop Ecosystem. Diógenes Santos

Big Data Hadoop Stack

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Stages of Data Processing

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018

Microsoft Big Data and Hadoop

A Glimpse of the Hadoop Echosystem

Big Data Architect.

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

HDInsight > Hadoop. October 12, 2017

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Innovatus Technologies

Oracle Big Data Connectors

Oracle Big Data Fundamentals Ed 2

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Hadoop. Introduction / Overview

Embedded Technosolutions

Databases 2 (VU) ( / )

Big Data Analytics using Apache Hadoop and Spark with Scala

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG

The Reality of Qlik and Big Data. Chris Larsen Q3 2016

Hadoop Overview. Lars George Director EMEA Services

April Copyright 2013 Cloudera Inc. All rights reserved.

CISC 7610 Lecture 2b The beginnings of NoSQL

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

Hadoop An Overview. - Socrates CCDH

Oracle GoldenGate for Big Data

QLIK INTEGRATION WITH AMAZON REDSHIFT

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

SpagoBI and Talend jointly support Big Data scenarios

Oracle Big Data Fundamentals Ed 1

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Data Storage Infrastructure at Facebook

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Introduction to BigData, Hadoop:-

Certified Big Data and Hadoop Course Curriculum

This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.

SURVEY ON BIG DATA TECHNOLOGIES

BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation

New Approaches to Big Data Processing and Analytics

International Journal of Advance Engineering and Research Development. A study based on Cloudera's distribution of Hadoop technologies for big data"

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Hadoop, Yarn and Beyond

Progress DataDirect For Business Intelligence And Analytics Vendors

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Department of Information Technology, St. Joseph s College (Autonomous), Trichy, TamilNadu, India

Hive SQL over Hadoop

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

BIG DATA TESTING: A UNIFIED VIEW

Security and Performance advances with Oracle Big Data SQL

CSE 444: Database Internals. Lecture 23 Spark

A Survey on Big Data

SOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera

The Hadoop Paradigm & the Need for Dataset Management

Improving the MapReduce Big Data Processing Framework

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Big Data Hadoop Course Content

Data Lake Based Systems that Work

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

Acquiring Big Data to Realize Business Value

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Cloudera Impala Headline Goes Here

An Introduction to Big Data Formats

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Top 7 Data API Headaches (and How to Handle Them) Jeff Reser Data Connectivity & Integration Progress Software

What is Gluent? The Gluent Data Platform

MapR Enterprise Hadoop

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Challenges for Data Driven Systems

Configuring and Deploying Hadoop Cluster Deployment Templates

BIG DATA COURSE CONTENT

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

Microsoft Analytics Platform System (APS)

Introduction to Hadoop and MapReduce

Modern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc.

MapReduce and Friends

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Hadoop Development Introduction

Indiana Oracle Users Group January meeting January 27 th, Big Data

Certified Big Data Hadoop and Spark Scala Course Curriculum

Big Data Specialized Studies

BIG DATA ANALYTICS A PRACTICAL GUIDE

Big Data Infrastructures & Technologies

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Cmprssd Intrduction To

Hortonworks and The Internet of Things

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Introduction to Big-Data

Transcription:

Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI)

Internet Live http://www.internetlivestats.com/

Introduction

Business Intelligence

Business Intelligence Process

Some tools

Sources for Big Data Data Warehouse RDBMS Web server log files; Social Media Contents; Business Reports; Texts of consumer emails to the company; Macroeconomic indicators; Satisfaction surveys; IoT CRM

Examples

Example

Example

Example

Main Concepts

Definitions Business intelligence (BI) is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance. Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation. Business analytics is comprised of solutions used to build analysis models and simulations to create scenarios, understand realities and predict future states. Business analytics includes data mining, predictive analytics, applied analytics and statistics, and is delivered as an application suitable for a business user. Gartner

Other Concepts Cognitive Computing Data Discovery Data Lake Data Science Machine Learning Self BI Fast Data

The competitive advantages Identification of patterns Competitor analysis Product Development Data driven marketing Measure customer dissatisfaction

Big Data

Landscape

Big Data is not Bitcoin

Google File System (GFS or GoogleFS) Google File System (GFS or GoogleFS) is a proprietary distributed file system developed by Google to provide efficient, reliable access to data using large clusters of commodity hardware. A new version of Google File System code named Colossus was released in 2010. Wikipedia 2003 GFS 2004 MapReduce 2006 Big Table

Apache Hadoop The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models Apache Hadoop.

Apache Hadoop The project includes these modules: Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS ): A distributed file system that provides high-throughput access to application data. Hadoop YARN: A framework for job scheduling and cluster resource management. Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

Others Hadoop Projects

Distributions

Architecture

Hadoop Architecture

MapReduce A programming paradigm that allows for massive scalability across hundreds or thousands of servers in a Hadoop cluster. IBM.

MapReduce

Frameworks

Apache Hive The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive. Hive.org.

Apache Architecture

Cloudera Impala Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Cloudera Impala query UI in Hue) as Apache Hive. Cloudera.

Impala Architecture

Impala Architecture

NoSQL NoSQL is a term used to describe high-performance non-relational databases. NoSQL databases use a variety of data models, including documents, graphs, key-values, and columnar data. Amazon.

NoSQL No PAIN No GAIN NoSQL no join.

HBASE HBase is an open-source, non-relational, distributed database modeled after Google's Bigtable and is written in Java. Hive.apache.org.

HBASE Example Relational view Column family view

Hands-on

Communication channels Hands On in, www.bilivre.com.br facebook.com/bilivre BI Livre