Big Data in Research: Research Analytics Industry Solution. Stuart Long CTO - Oracle Systems Asia Pacific and Japan

Similar documents
Evolving To The Big Data Warehouse

Oracle Big Data Connectors

Earth Science Community view on Digital Repositories

<Insert Picture Here> Introduction to Big Data Technology

Building an Integrated Big Data & Analytics Infrastructure September 25, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle

The ESA Earth Observation Long Term Data Preservation (LTDP) Programme ABSTRACT

Big Data For Oil & Gas

ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE

Oracle Exadata X7. Uwe Kirchhoff Oracle ACS - Delivery Senior Principal Service Delivery Engineer

Acquiring Big Data to Realize Business Value

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

Big Data The end of Data Warehousing?

5 Fundamental Strategies for Building a Data-centered Data Center

Fast Innovation requires Fast IT

Preservation DataStores: An architecture for Preservation Aware Storage

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

DATABASE SOFTWARE. Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Was ist dran an einer spezialisierten Data Warehousing platform?

Capture Business Opportunities from Systems of Record and Systems of Innovation

Aufbau agiler BI- & Discovery-Applikationen mit Oracle Endeca. DOAG 2012 Nürnberg, 20. November Harald Erb Solution Architect BI & DWH

1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Real Time for Big Data: The Next Age of Data Management. Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104

Oracle #1 RDBMS Vendor

Top Trends in DBMS & DW

Modern Data Warehouse The New Approach to Azure BI

Drawing the Big Picture

Oracle Exadata: Strategy and Roadmap

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Oracle NoSQL Database Overview Marie-Anne Neimat, VP Development

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Big Data with Hadoop Ecosystem

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Building a Data Strategy for a Digital World

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

How Insurers are Realising the Promise of Big Data

Big Data - Some Words BIG DATA 8/31/2017. Introduction

Information empowerment for your evolving data ecosystem

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

Oracle Exadata Statement of Direction NOVEMBER 2017

Strategic Briefing Paper Big Data

Storage Optimization with Oracle Database 11g

Pivot3 Acuity with Microsoft SQL Server Reference Architecture

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Security and Performance advances with Oracle Big Data SQL

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

BIG DATA TESTING: A UNIFIED VIEW

BUILD BETTER MICROSOFT SQL SERVER SOLUTIONS Sales Conversation Card

Shine a Light on Dark Data with Vertica Flex Tables

Modernize Your IT with Dell EMC Storage and Data Protection Solutions

Stages of Data Processing

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Oracle Big Data Science IOUG Collaborate 16

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Safe Harbor Statement

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Data Analytics at Logitech Snowflake + Tableau = #Winning

Virtuoso Infotech Pvt. Ltd.

QLogic 2500 Series FC HBAs Accelerate Application Performance

CHAPTER. Overview of Oracle NoSQL Database and Big Data

Netezza The Analytics Appliance

SIEM Solutions from McAfee

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Active Archive and the State of the Industry

When, Where & Why to Use NoSQL?

São Paulo. August,

Oracle Big Data Science

Oracle Database 11g for Data Warehousing and Business Intelligence

VxRail: Level Up with New Capabilities and Powers GLOBAL SPONSORS

10/29/2013. Program Agenda. The Database Trifecta: Simplified Management, Less Capacity, Better Performance

2013 AWS Worldwide Public Sector Summit Washington, D.C.

IBM System Storage DCS3700

Digital Enterprise Platform for Live Business. Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU

Private Cloud Database Consolidation Name, Title

IBM Storage Solutions & Software Defined Infrastructure

Transform to Your Cloud

Automating Information Lifecycle Management with

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

Copyright 2010 EMC Corporation. All rights reserved. CLOUD MEETS BIG DATA. Sujal Patel President, Isilon Storage Division EMC Corporation

Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Hybrid Data Platform

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

DELL EMC ISILON SCALE-OUT NAS PRODUCT FAMILY Unstructured data storage made simple

Oracle 1Z0-515 Exam Questions & Answers

Modernize Without. Compromise. Modernize Without Compromise- All Flash. All-Flash Portfolio. Haider Aziz. System Engineering Manger- Primary Storage

Massive Scalability With InterSystems IRIS Data Platform

Oracle Big Data Fundamentals Ed 1

Oracle Exadata. Smart Database Platforms - Dramatic Performance and Cost Advantages. Juan Loaiza Senior Vice President Oracle Database Systems

Bringing Data to Life

DELL EMC ISILON SCALE-OUT NAS PRODUCT FAMILY

Embedded Technosolutions

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. reserved. Insert Information Protection Policy Classification from Slide 8

Dell EMC All-Flash solutions are powered by Intel Xeon processors. Learn more at DellEMC.com/All-Flash

The age of Big Data Big Data for Oracle Database Professionals

Transcription:

Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific and Japan

Information Architecture Capability Model Data Data technology Technology Management management Information Sharing & Delivery Information sharing & delivery Business Intelligence intelligence & & Data data Warehousing warehousing Oracle Enterprise Architecture Framework Data security Data Security Data Data governance, Governance, quality Quality, & lifecycle Lifecycle mgmt Unstructured Data Analytical Data Big Data Enterprise data model Enterprise Data Model Data Realms Transaction Data Master Data Reference Data Metadata Master data management Master Data Management Oracle Information Architecture Framework Data integration Integration Content Content management Management Unstructured data Metadata Big Data Data Realms Analytical data Master data Transaction data Reference data

The Information Architecture Spectrum Evaluating Economic and Architecture Tradeoffs Data Realms Structure Volume Security Master data Transactions Analytical data Metadata Structured Medium - High Database, app, & user access Storage & Retrieval RDBMS / SQL Modeling Pre-defined relational or dimensional modeling Processing/I ntegration ETL/ELT, CDC, Replication, Message Consumption BI & Statistical Tools, Operational Applications Reference data Structured and Semi- Structured Low- Medium Platform security XML / xquery Flexible & Extensible ETL/ELT, Message System-based data consumption Documents and Content Unstructured High File system based File System / Search Free Form OS-level file movement Content Mgmt Big Data - Weblogs - Sensors - Social Media Structured, Semi- Structured, Unstructured High File system & database Distributed FS / nosql Flexible (Key Value) Hadoop, MapReduce, ETL/ELT, Message BI & Statistical Tools

Total Archive in TerraBytes (TB) Big Data Evolution of ESA's EO Data Archives between 1986-2007 and future estimates (up to 2020) The LOFAR Radio-Interferometre is producing 1.6TB/sec 138PB/day, setting new frontiers for radio-astronomy 22000 21000 20000 19000 18000 17000 16000 15000 14000 13000 12000 11000 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 1986 1989 1993 1995 1998 2000 2003 2005 2007 2015 2020 Year The volume of earth-observation data from European Space Agency s satellites passed 3PB in 2007 and the projection for 2020 is seven-fold Future Data Estimates LANDSAT 2-4 MSS (75-Dec 93) AQUA Modis (April 03-today) ENVISAT LR (March 02-today) ENVISAT HR (March 02-today) TERRA Modis (June 01-today) QUICK SCATT (01-today) /PROBA (May 02-today) LANDSAT 7 ETM (April 99-Dec 03) SEA STAR SeaWifs (Apr 98-today) ERS 2 HR (May 95-today) ERS 2 LBR (May 95-today) JERS SAR/OPS VNIR (92-Sep 98) ERS 1 HR (Jul 91-Mar 00) ERS 1 LBR (Jul 91-Mar 00) SPOT 1-4 HRV (87-today) MOS 1, 1b MESSR (87-Oct 93) NOAA 9-17 AVHRR (86-today) LANDSAT 5 TM (April 84-today) NIMBUS 7 (Nov 78-May 86), SEASAT (Jun-Oct 78) Courtesy of BERIS In genomics: Cost of sequencing is dropping by 50% every 5 months analysis, not sequencing, will be the main expense hurdle (Cambridge University, UK) The volume of worldwide climate data is expanding rapidly, creating challenges for both physical archiving and sharing, for ease of access of relevant information in a multidisciplinary environment In high energy physics, the data recorded by each of the big experiments at the Large Hadron Collider will be enough to fill around 100,000 DVDs every year! J T Overpeck et al. Science 2011;331:700-702 4

The Challenges of Big Data Volume Very large quantities of data Velocity Extremely fast streams of data Variety Wide range of data type characteristics Value High potential value if harnessed correctly 5

Intel Xeon 5500 Series: First Platform with End-to-End HW Virtualization Intel Virtualization Technology Intel VT-x Intel VT For Directed I/O Intel VT-d Intel VT For Connectivity Intel VT-c Processor Chipset Network Holistic platform centric approach for virtualization usages

Oracle First Platform with Data Embedded Instructions Oracle Enabling Technology SQL Smart Cache Low Latency Data Processing Unit DPU Data Aware Storage Data Defined Network Optimised for Data Processing and Database

120000000 100000000 80000000 60000000 40000000 20000000 0 Economies of Real Time Analytics Waiting for DATA Today s Research applications are increasingly held back by slow storage When requesting data, the server spends most of its time waiting for storage Application performance remains sluggish regardless of the Server CPU horsepower The traditional remedy of adding more DRAM or short-stroking HDDs is both expensive and inefficient

Big Data inside the Research Lifecycle Oracle s Engineered Systems Solution Oracle Big Data Appliance Oracle Exadata Oracle Exalytics Infiniband Acquire Organize Analyze Visualize 13

The Research Industry Solutions The Research Enterprise Research Analytics Research Data Management Research Administration & Control Our goal: To support researchers, their communities and their organizations to do better Research by providing cost-effective, reliable and open solutions 3

Oracle Research Analytics A platform that enables Researchers to: Work collaboratively on extremely large data sets providing performance and innovative ways to exploit into data Build workflows that best support science and the operations of complex Research Run applications and best adapt them to different scientific loads and challenges 9

Challenges to address Exponential growth in data and the ability to access critical information Enterprise infrastructure ability to quickly accommodate new data sources Evolve from data analysis to predictive science Ability to translate raw data into information and knowledge Managing resources across workloads and platforms 7

Oracle Differentiators Process high-volume, low-density information Support flexible data structures In-database deep analytics Perform analysis on big data Parallel execution for efficient processing Deep, rich set of analytics for extracting maximum business value Research Data Management Research Mission Research Infrastructure Research Ecosystem Research Administration 11

Research Analytics Flow Organization Discovery Visualization Sharing 12

Oracle Research Analytics: overview Organization Discovery Visualization Sharing High velocity loading and organization of information Ability to optimize workloads and system operations Ingest a wide range of data types Data integration Map reduction Statistical tools Analyze data across a wide variety of data characteristics using deep analytics Key Capabilities Key Benefits Represent analyze finding Transform big data into something easy to analyze Load data quickly Ability to work on extremely large data sets allowing researchers new ways to exploit data Ensure trust and security Interoperable access to distributed repositories of data Open standards-based environment Minimize development time and effort Ensures appropriate levels of access Lower cost of research Facilitate innovative approach to discovery and results Support deep rich set of analytics Minimize development time/effort Reduce time-to-discovery Lower cost of research Enables new science Facilitate manipulation of extremely large data sets Maximize analytic performance and achieve faster results Access to the latest investigative methods & tools Enables new science Ensures appropriate levels of access Enables cross-disciplinary science & discovery 14

Oracle s Integrated Big Data Solution Stack People. Process. Portfolio.

Oracle Integrated Solution for Big Data In-Database Analytics HDFS Oracle NoSQL Database Enterprise Applications Hadoop Oracle Big Data Connectors Data Warehouse Analytic Applications Interactive Discovery ACQUIRE ORGANIZE ANALYZE DECIDE

Oracle s Big Data solution Endeca Information Discovery Oracle Big Data Appliance Cloudera Hadoop Oracle NoSQL Open-Source R Big Data Connectors InfiniBand Oracle Data Integrator Oracle Exadata Oracle Advanced Analytics Oracle Spatial and Graph Oracle Database InfiniBand Oracle Exalytics Oracle Business Intelligence Acquire Organize & Discover Analyze Decide

Oracle Big Data Appliance Engineered Systems for Big Data Big Data Appliance Pre-configured and optimized for Big Data processing 18 Servers, 864GB RAM, 648TB Storage/Rack; easy rack expansion NoSQL, Cloudera Hadoop, Oracle R Oracle Loader, Oracle Data Integrator, HDFS Connector for integration Integrates into your existing architecture Streams data into Exadata @15 TB/hour Oracle Big Data Appliance

Oracle Exadata Engineered Systems for Systems of Record Exadata Oracle Exadata Fastest Data Warehouse & OLTP: 10X-20X fast load and query times 10X storage savings, 80% less power, and a lot less space Optimized for In-Database Analytics Model functions execute in storage Optimized for Network Throughput Network connections In from Big Data Capture and Out to In-Memory Analytics 1/5th to1/8th cost of other alternatives

Oracle Advanced Analytics Advanced In-Database Predictive Analytics Predictive Analytics Comprehensive Predictive Analytic platform built inside Database Data mining, text mining Statistical analysis (based on R) Built for data analysts / scientists Text Mining Statistics Data Mining Scalable and parallel: analyzes huge volumes of data Tightly integrated with SQL, enabling broad usage Works inside Exadata and Big Data Appliance

Oracle Exalytics In-Memory Engineered System for Analytics Exalytics In-Memory Machine Spans Relational, Multi-Dimensional, and Unstructured analysis, combined with Financial & Operational Planning In-Memory Optimized Hardware In-Memory Oracle BI, TimesTen, Essbase, and Endeca Several In-Memory Software Innovations Tightly integrated with Exadata

Oracle Information Discovery In-Memory Un-Structured & Semi-Structured Analysis Exalytics In-Memory Machine Unified Search Information Discovery Faceted Navigation Interactive Exploration Hybrid in-memory search / analytic engine Combines un-structured/structured and internal/external data (big data) Enables search, navigation, and discovery of data and correlations Data Mashup Unified Indexing Text Analysis Highly interactive UI for discovery/exploration Social Media Analytics Customer 360 Analysis Competitive Intelligence

Customer Success in Big Data Architecture People. Process. Portfolio.

Customer Success: Erasmus Medical Challenges Centre Complex data processing and analysis. Ability to load huge data information in minimum time store these data and their genomic DNA research results on storage disk have an efficient system able to give them query performance Results Thanks to an Exadata-based solution, Erasmus Medical Centre achieved: For a 11 minute query, Exadata could improve it to 1 second, which is a major advantage for researchers to have immediate results Smart Scan and Flash Card : give performance in analyzing data. Hybrid Columnar Compression : gives performance in the ability to manipulate Tb of data (compression from 133 Gb to 11 Gb), with increased performance. Adding Oracle Database 11g features like partitioning gives more performance in manipulating, quantifying data obtained through the study of various genomes 16

Customer Success: Oregon State University s COAS COAS: College of Oceanic and Atmospheric Sciences Challenges To expand its infrastructure to support its leading edge scientific research on the ocean and atmosphere s influence on the Earth s climate To meet the data intensive demands of its scientific research and foster an environment that will address current and future workflows Results With Oracle, COAS has an easy to manage, integrated system that delivers the flexibility and scalability necessary to address the exponential data increases associated with its leading-edge research, as well as quickly adjust to ever-changing data availability requirements. As a result of extending its infrastructure with Oracle, COAS has improved data movement and performance by approximately 3 to 4 times, reduced system administration and management time, and unified research silos to gain a holistic view of integrated data sets. Additionally, COAS can now manage its unusually large input/output (I/O) loads, enabling the computation, storage, analysis and visualization of massive data flows. 17

Customer Success: Indiana University Challenges To provide researchers with a first-class database environment that is secure, reliable and easy to use To gain rapidly and effectively insight into the data by building and managing research-oriented, data-intensive applications. To provide tools, templates and plug-ins they need to easily leverage research data to enhance their findings and increase productivity. Results Enable Research and effective data analysis in different fields Provide and run a robust, secure and cost-effective Research environment protecting data and ensuring that researchers have access to state-of-the-art technology. For additional insight into research data, it provides researchers with access to Oracle Data Mining, Oracle Spatial and Oracle OLAP to deliver its Database-as-a- Service to researchers both within Indiana University and at other universities around the country. 18