Acquiring Big Data to Realize Business Value

Similar documents
<Insert Picture Here> Introduction to Big Data Technology

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

Oracle NoSQL Database Overview Marie-Anne Neimat, VP Development

Big Data The end of Data Warehousing?

Big Data For Oil & Gas

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Oracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Evolving To The Big Data Warehouse

Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

Building an Integrated Big Data & Analytics Infrastructure September 25, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle

Stages of Data Processing

Embedded Technosolutions

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

Big Data with Hadoop Ecosystem

@Pentaho #BigDataWebSeries

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

Microsoft Exam

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

CHAPTER. Overview of Oracle NoSQL Database and Big Data

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Fast Innovation requires Fast IT

Oracle Big Data Connectors

Microsoft Big Data and Hadoop

Capture Business Opportunities from Systems of Record and Systems of Innovation

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Oracle Exadata Statement of Direction NOVEMBER 2017

Challenges for Data Driven Systems

Top Five Reasons for Data Warehouse Modernization Philip Russom

A Review Paper on Big data & Hadoop

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

Informatica Enterprise Information Catalog

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

Chapter 6 VIDEO CASES

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Introduction to Big-Data

Oracle Exadata: Strategy and Roadmap

The Hadoop Paradigm & the Need for Dataset Management

5 Fundamental Strategies for Building a Data-centered Data Center

Big Data in Research: Research Analytics Industry Solution. Stuart Long CTO - Oracle Systems Asia Pacific and Japan

Big Data Hadoop Stack

Analyze Big Data Faster and Store It Cheaper

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Oracle #1 RDBMS Vendor

Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers

Oracle Exadata: The World s Fastest Database Machine

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING

Modern Database Concepts

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo

The Technology of the Business Data Lake. Appendix

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Safe Harbor Statement

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

New Approaches to Big Data Processing and Analytics

BIG DATA TESTING: A UNIFIED VIEW

Strategic Briefing Paper Big Data

Oracle Big Data Fundamentals Ed 1

Modern Data Warehouse The New Approach to Azure BI

REGULATORY REPORTING FOR FINANCIAL SERVICES

Building a Data Strategy for a Digital World

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Certified Big Data and Hadoop Course Curriculum

Data-Intensive Distributed Computing

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

When, Where & Why to Use NoSQL?

Netezza The Analytics Appliance

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Security and Performance advances with Oracle Big Data SQL

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

A Survey on Big Data

Webinar Series TMIP VISION

Private Cloud Database Consolidation Name, Title

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

How Insurers are Realising the Promise of Big Data

Big Data Facebook

Oracle Big Data Fundamentals Ed 2

Accelerate your SAS analytics to take the gold

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Top Trends in DBMS & DW

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Swimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad

Prototyping Data Intensive Apps: TrendingTopics.org

From Data Challenge to Data Opportunity

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

ELTMaestro for Spark: Data integration on clusters

Warehouse- Scale Computing and the BDAS Stack

Progress DataDirect For Business Intelligence And Analytics Vendors

1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

In-Memory Computing EXASOL Evaluation

Cloud Computing Techniques for Big Data and Hadoop Implementation

Transcription:

Acquiring Big Data to Realize Business Value

Agenda What is Big Data? Common Big Data technologies <Insert Picture Here> Use Case Examples Oracle Products in the Big Data space In Summary: Big Data Takeaways 2

What is Big Data? Big Data is the convergence of vast quantities of both structured and unstructured information sourced internally and externally to the business. New technologies are used along with traditional RDBMS systems to drive revenue or lower costs for an organization. Big Data is the Convergence of the 3 V s 1 Velocity Moves at very high rates Valuable in its temporal, high velocity state Volume Fast-moving data creates massive historical archives Valuable for mining patterns, trends and relationships Variety Structured (logs, business transactions) Semi-structured and unstructured Companies are swimming in data TBs/day of data arrival is here to stay 3 V s of Big Data Velocity Volume Variety 1. The Data Warehousing Institute

The Data Universe Structured Data Row and column structure Related Unstructured Data Enormous transactional data Enormous unstructured information Too big for relational databases alone New tools are needed 4

Velocity Every 60 Seconds on the Net 5

Tapping into Diverse Data Sets Information Architectures Today: Decisions based on classical transactional data Video and Images Big Data: Decisions based on all your data Documents Social Data Machine-Generated Data 6

Why Do Big Data? Opportunity Threat Big data can improve your top line today! Big data can make you much more agile Provides an edge over your competitors Big data is here now Your competitors will not miss out on the opportunity Act now! Start building a big data platform for your organization 7

Big Claims of Big Data 1 Industry New Data What s Possible Why? Communication Product Innovation, Operational Efficiency, Strategic Positioning Customer Usage Data, Mobile Social Media Streams New product innovation, pricing mix, cross marketing oppty Product Differentiation, New revenue potential Retail One size fits all marketing Weblog, click streams Micro-segmentation, recommendations Increase net margin by 60% Banking Fraud detection; risk analysis, Predictive Market Analysis Weblogs, transaction systems, fraud reports Semantic discovery; pattern detection Billions of Dollars lost in bank fraud annually Location-Based Services Based on home zip code Personal location data Geo-advertising, traffic, local search, more. Increase revenue for providers by $100B+ Utilities Resilient and adaptable grid Smart meter reading, call center data Realtime and predictive utilization analysis Energy use expected to grow by 22 percent by 2030 Healthcare Improve Quality and Efficiency Practitioner s notes; machine statistics Best practices, reduced hospitalization Increase industry value by $300 B per year Source: * McKinsey Global Institute: Big Data The next frontier for innovation, competition and productivity (May 2011)

Why Should the Business Care? There is a major industry shift underway Big Data is Disruptive The inclusion of vast amounts and variety of data is giving firms competitive advantages New technologies allow us to leverage this convergence Firms can act in the short term for value Oracle is releasing core products around Big Data The business is looking for direction Big Data is here now!

Common Big Data Technologies NoSQL DB MapReduce Not Only SQL - Transactional data that does not have a row and column structure, is inconsistent, or sparse Dynamic and rapidly changing schema Predictable and bounded low latency store Break problem up into smaller sub-problems Distribute across thousands of nodes Can be exposed via SQL and in SQL-based BI tools Hadoop Leading MapReduce implementation (from Apache) Highly scalable batch processing Highly customizable infrastructure HDFS Hadoop Distributed File System Low cost, distributed, highly scalable storage Write once, read many times 10

Common Big Data Technologies (cont d) Apache Hive HBASE Open Source R PIG Sqoop A query processor built on top of Hadoop Much more convenient than writing MapReduce logic Support limited SQL constructs and UDFs Non-relational database modeled after Google s BigTable on top of HDFS Provides a fault-tolerant way of storing large quantities of sparce data Programming language for statistical analysis Language support for generalized linear models, cluster analysis, and other advanced statistical techniques A new programming paradigm to simplify MapReduce programming Developed at Yahoo runs 30% of their jobs SQL-to-Hadoop Parallel data import/export tool from Cloudera Tools to connect Hadoop to Relational Databases Export MapReduce results back to RDBMS end users 11

Process Flow of Big Data Acquire Organize Analyze Low, predictable Latency High Transaction Count Flexible Data Structures High Throughput In-Place Preparation All Data Sources/Structures Deep Analytics Agile Development Massive Scalability Real Time Results 12

Use Cases: Putting the Pieces Together Hive HDFS Map Reduce Analytics Data Mining R Hadoop NoSQL 13

A Common Use Case: Predictive Ad and Content Generation Low Latency Lookup user profile Real-time: Determine best ad to place on page for this user Add user if not present NoSQL DB Input into Expert System HDFS Web logs NoSQL DB Profiles High scale data reductions Predictions on browsing Actual ads served BI and Analytics Billing Batch 14

Data Warehousing @ Facebook using Hive & Hadoop 15

Looks like this 4800 cores, Storage capacity of 5.5 PetaBytes Disks Node Disks Node Disks Node Disks Node Disks Node Disks Node 16

Hadoop & Hive Cluster @ Facebook Hadoop/Hive Warehouse the new generation 4800 cores 12 TB per node, Storage capacity of 5.5 PetaBytes Two level network topology 1 Gbit/sec from node to rack switch 4 Gbit/sec to top level rack switch Statistics per day: 4 TB of compressed data added per day 135TB of compressed data scanned per day 7500+ Hive jobs per day 17

Hadoop & Hive Usage @ Facebook Types of Applications: Reporting Eg: Daily/Weekly aggregations of impression/click counts Measures of user engagement Microstrategy dashboards Ad hoc Analysis Eg: how many group admins broken down by state/country Machine Learning (Assembling training data) Predictive Optimization for Advertising Eg: User Engagement as a function of user attributes Many others 18

Large Bank A - Big Data Analytics Strategy Offering Hadoop as a fully engineered shared service 5 of 7 lines of business are actively using Hadoop Building data scientist team A new role in most companies Has about 50, hiring & training more Trying to influence the evolution of Big Data tools 19

How Bank A Uses Big Data Customer Intelligence Data Mining Sentiment Analysis Competitive Edge Liquidity Analysis $$ Increase Revenue $$ $$ $$ $$ Reduce Cost Data Processing Fraud Intelligence IT Risk Management Operational Efficiency Self Service 20

Bank A Use Case: Data Mining Analysis of large datasets (web logs, transactions, social media) Empowering Data Scientists to bypass data modeling process Exploration of data across business lines to identify useful intelligence Retail Bank & IT Infrastructure Fraud Prevention Investment Management Trade quality analysis Analysis across structured and unstructured data 21

Recorded Futures The Web is Loaded with Predictive Signals Silicon Valley executives head to Vail, Colo. next week for the Drought and malnutrition hinder next annual Pacific Crest Technology year s development plans in Yemen... Leadership Forum The carrier may select partners to set up a new carrier as early as next month 2010 is the year when Iran will kick out Islam. Ya Ahura we will.... Dr Sarkar says the new facility will be operational by March 2014......opposition organizers plan to meet on Thursday to protest... Excited to see Mubarak speak this weekend... Strange new Russian worm set to unleash botnet on 4/1/2012... 6/6/2012 22 According to TechCrunch China s new 4G network will be deployed by mid- 2010

Recorded Futures: Selling Big Data as a Service Predicting Predicting trading trading volumes volume Predicting Predicting stock stock returns returns from by sentiment company sentiment Predict Predicting volatility volatility by future events by future events Single blog Single blog impact analysis impact analysis 23

Banking Analyze complex semantic graphs Data coming in from multiple input streams Unstructured text Schema-based data Identify relationships between data items Translate bits and bytes into higher level facts Rationalize data across silos Query large graphs Consolidate data from multiple streams Extract entities Generate RDF/OWL capturing relationships between entities Identify fraud: Users with multiple identities 24

Utilities Spatial analysis Data feeds coming in from multiple types of smart meters Data in multiple vendorspecific formats Semi-structured feeds Aggregate results based on geographic service areas Perform what-if analysis on the distribution network Extract relevant location, time and consumption values from raw data feeds Transform data into a standard spatial format Analyze and display service areas with high or low utilization based on smart meter data 25

Oracle Products in the Big Data Space Bridging the Chasm Between Structured/Unstructured Data 26

Total Data Solution Spectrum Data Variety Unstructured Schema-less Distributed File Systems Transaction (Key-Value) Stores MapReduce Solutions NoSQL Flexible Specialized Developer Centric Structured DBMS (OLTP) ETL DBMS (DW) Advanced Analytics SQL Trusted Secure Administered Information Density Acquire Organize Analyze 27

Oracle Engineered Solutions Data Variety Unstructured Schema-less Structured Big Data Appliance Hadoop HDFS Hadoop NoSQL Database Oracle Loader Oracle for Loader hadoop Oracle for Hadoop NoSQL DB Oracle Data Integrator Oracle Data Integrator Oracle Database (OLTP) Oracle Exadata Oracle Database (DW) Image XML Text Semantic Graphs Spatial In-DB Analytics Exalytics Image Speed XML of Thought Text Analytics Graph Spatial Oracle BI EE NoSQL Flexible Specialized Developer Centric SQL Ad hoc query Secure Rich visualization Information Density Acquire Organize Analyze 28

Oracle Big Data Appliance Usage Model Oracle Big Data Appliance Oracle Exadata Oracle Exalytics InfiniBand InfiniBand Stream Acquire Organize Analyze & Visualize 29

Oracle Big Data Appliance Optimized for acquiring, organizing, and loading unstructured data into Oracle Database 18 Intel CPU Servers 216 cores 864GB memory 432 TB storage 40 Gb p/sec InfiniBand Oracle Big Data Appliance includes: Oracle NoSQL to capture and manage data Open source Hadoop to transform data Application Adapter for Hadoop to load an Oracle Database Open source distribution of R Oracle Enterprise Linux 30

In Summary: Big Data Takeaways Its here! Can t be ignored The technology is maturing We need to be prepared to advise our clients The 3Vs: Volume, Velocity, and Variety Adds value to business decision making process Will lead to a competitive advantage We need to be prepared to advise our customers, both internal and external 31

Q & A 32