Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

Similar documents
Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

Oracle Big Data Connectors

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

DATA INTEGRATION PLATFORM CLOUD. Experience Powerful Data Integration in the Cloud

An Oracle White Paper November Primavera Unifier Integration Overview: A Web Services Integration Approach

Oracle GoldenGate for Big Data

Oracle Exadata Statement of Direction NOVEMBER 2017

Creating Custom Project Administrator Role to Review Project Performance and Analyze KPI Categories

Oracle CIoud Infrastructure Load Balancing Connectivity with Ravello O R A C L E W H I T E P A P E R M A R C H

Generate Invoice and Revenue for Labor Transactions Based on Rates Defined for Project and Task

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016

Oracle Grid Infrastructure 12c Release 2 Cluster Domains O R A C L E W H I T E P A P E R N O V E M B E R

October Oracle Application Express Statement of Direction

Load Project Organizations Using HCM Data Loader O R A C L E P P M C L O U D S E R V I C E S S O L U T I O N O V E R V I E W A U G U S T 2018

Loading User Update Requests Using HCM Data Loader

RAC Database on Oracle Ravello Cloud Service O R A C L E W H I T E P A P E R A U G U S T 2017

Migration Best Practices for Oracle Access Manager 10gR3 deployments O R A C L E W H I T E P A P E R M A R C H 2015

Repairing the Broken State of Data Protection

August 6, Oracle APEX Statement of Direction

Correction Documents for Poland

Oracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE

August Oracle - GoldenGate Statement of Direction

Oracle Grid Infrastructure Cluster Domains O R A C L E W H I T E P A P E R F E B R U A R Y

JD Edwards EnterpriseOne Licensing

Big Data The end of Data Warehousing?

Acquiring Big Data to Realize Business Value

An Oracle Technical White Paper October Sizing Guide for Single Click Configurations of Oracle s MySQL on Sun Fire x86 Servers

Oracle Database Appliance X6-2S / X6-2M ORACLE ENGINEERED SYSTEMS NOW WITHIN REACH FOR EVERY ORGANIZATION

An Oracle White Paper October The New Oracle Enterprise Manager Database Control 11g Release 2 Now Managing Oracle Clusterware

Oracle NoSQL Database For Time Series Data O R A C L E W H I T E P A P E R D E C E M B E R

Automatic Data Optimization with Oracle Database 12c O R A C L E W H I T E P A P E R S E P T E M B E R

Extreme Performance Platform for Real-Time Streaming Analytics

A Distinctive View across the Continuum of Care with Oracle Healthcare Master Person Index ORACLE WHITE PAPER NOVEMBER 2015

Oracle Data Masking and Subsetting

Tutorial on How to Publish an OCI Image Listing

Oracle Event Processing Extreme Performance on Sparc T5

Key Features. High-performance data replication. Optimized for Oracle Cloud. High Performance Parallel Delivery for all targets

Oracle Clusterware 18c Technical Overview O R A C L E W H I T E P A P E R F E B R U A R Y

Veritas NetBackup and Oracle Cloud Infrastructure Object Storage ORACLE HOW TO GUIDE FEBRUARY 2018

Oracle WebLogic Server Multitenant:

Configuring Oracle Business Intelligence Enterprise Edition to Support Teradata Database Query Banding

Siebel CRM Applications on Oracle Ravello Cloud Service ORACLE WHITE PAPER AUGUST 2017

Sun Fire X4170 M2 Server Frequently Asked Questions

An Oracle White Paper June Enterprise Database Cloud Deployment with Oracle SuperCluster T5-8

Achieving High Availability with Oracle Cloud Infrastructure Ravello Service O R A C L E W H I T E P A P E R J U N E

Oracle WebLogic Portal O R A C L E S T A T EM EN T O F D I R E C T IO N F E B R U A R Y 2016

Automatic Receipts Reversal Processing

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

Differentiate Your Business with Oracle PartnerNetwork. Specialized. Recognized by Oracle. Preferred by Customers.

Oracle Cloud Applications. Oracle Transactional Business Intelligence BI Catalog Folder Management. Release 11+

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

An Oracle White Paper December, 3 rd Oracle Metadata Management v New Features Overview

VISUAL APPLICATION CREATION AND PUBLISHING FOR ANYONE

An Oracle White Paper December Oracle Exadata Database Machine Warehouse Architectural Comparisons

Your New Autonomous Data Warehouse

<Insert Picture Here> Introduction to Big Data Technology

CONTAINER CLOUD SERVICE. Managing Containers Easily on Oracle Public Cloud

Differentiate Your Business with Oracle PartnerNetwork. Specialized. Recognized by Oracle. Preferred by Customers.

April Understanding Federated Single Sign-On (SSO) Process

Installation Instructions: Oracle XML DB XFILES Demonstration. An Oracle White Paper: November 2011

Working with Time Zones in Oracle Business Intelligence Publisher ORACLE WHITE PAPER JULY 2014

ORACLE SERVICES FOR APPLICATION MIGRATIONS TO ORACLE HARDWARE INFRASTRUCTURES

Bulk Processing with Oracle Application Integration Architecture. An Oracle White Paper January 2009

Oracle Secure Backup. Getting Started. with Cloud Storage Devices O R A C L E W H I T E P A P E R F E B R U A R Y

Autonomous Data Warehouse in the Cloud

Oracle Fusion Middleware

StorageTek ACSLS Manager Software Overview and Frequently Asked Questions

Oracle DIVArchive Storage Plan Manager

ORACLE DATABASE LIFECYCLE MANAGEMENT PACK

An Oracle White Paper September Security and the Oracle Database Cloud Service

Big Data Architect.

Deploying Custom Operating System Images on Oracle Cloud Infrastructure O R A C L E W H I T E P A P E R M A Y

An Oracle White Paper June StorageTek In-Drive Reclaim Accelerator for the StorageTek T10000B Tape Drive and StorageTek Virtual Storage Manager

Oracle Mobile Application Framework

Oracle NoSQL Database Enterprise Edition, Version 18.1

Benefits of an Exclusive Multimaster Deployment of Oracle Directory Server Enterprise Edition

Oracle NoSQL Database Enterprise Edition, Version 18.1

Improve Data Integration with Changed Data Capture. An Oracle Data Integrator Technical Brief Updated December 2006

An Oracle White Paper September, Oracle Real User Experience Insight Server Requirements

STORAGE CONSOLIDATION AND THE SUN ZFS STORAGE APPLIANCE

Oracle Fusion Configurator

An Oracle White Paper October Oracle Social Cloud Platform Text Analytics

Security and Performance advances with Oracle Big Data SQL

Oracle Database 12c Release 2 for Data Warehousing and Big Data O R A C L E W H I T E P A P E R N O V E M B E R

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fences

An Oracle White Paper October Minimizing Planned Downtime of SAP Systems with the Virtualization Technologies in Oracle Solaris 10

Oracle TimesTen Scaleout: Revolutionizing In-Memory Transaction Processing

Oracle Hyperion Planning on the Oracle Database Appliance using Oracle Transparent Data Encryption

Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data

Subledger Accounting Reporting Journals Reports

Oracle Data Provider for.net Microsoft.NET Core and Entity Framework Core O R A C L E S T A T E M E N T O F D I R E C T I O N F E B R U A R Y

NOSQL DATABASE CLOUD SERVICE. Flexible Data Models. Zero Administration. Automatic Scaling.

Oracle Database Vault

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

ORACLE SNAP MANAGEMENT UTILITY FOR ORACLE DATABASE

Integrating Oracle SuperCluster Engineered Systems with a Data Center s 1 GbE and 10 GbE Networks Using Oracle Switch ES1-24

Cloud Operations for Oracle Cloud Machine ORACLE WHITE PAPER MARCH 2017

Using Oracle In-Memory Advisor with JD Edwards EnterpriseOne

Oracle JD Edwards EnterpriseOne Object Usage Tracking Performance Characterization Using JD Edwards EnterpriseOne Object Usage Tracking

Transcription:

Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration WHITE PAPER / JANUARY 25, 2019

Table of Contents Introduction... 3 Harnessing the power of big data beyond the SQL world... 4 Big data technologies what you need to know... 4 Challenges posed by big data... 5 Best technology approaches for integrating big data... 5 Integrated Design Tools: Improve Productivity... 5 Oracle Data Integrator 12c (ODI12c): Loading and Transforming Big Data... 6 Oracle Data Integrator Application Adapters for Hadoop... 6 Integrated Platform: Simplify and Optimize... 7 Real-time Business Analytics: Extends the Value of Big Data... 7 Conclusion... 8 2 WHITE PAPER / Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

INTRODUCTION Volume, Velocity, Variety - Big Data is all the vogue. Why? Enterprises know that there is a treasure trove of information they should be tapping into. This information is predominantly less structured data consisting of weblogs, social media, email, sensors, and photographs that can be mined for useful information. Whether it is healthcare, financial, manufacturing, government or retail, Big Data presents a big opportunity. A recent McKinsey study (see figure 1) illustrates that the US retail industry could realize a 60% increase in operating margins just by fully exploiting Big Data. And this is just the beginning. Big Data turns traditional information architecture on its head, putting into question commonly accepted notions of where and how data should be aggregated processed, analyzed, and stored. This is where low cost options of Hadoop and NoSQL come in new technologies which solve new problems for managing unstructured data. But one shouldn t forget what s already running in today s IT. Today s Business Analytics, Data Warehouses, Business Applications (ERP, CRM, SCM, HCM), and even many social, mobile, cloud applications still rely almost exclusively on structured data. This dilemma is what today s IT leaders are up against: what are the best ways to bridge enterprise data with Big Data through a common set of unified data integration tools? And what are the best strategies for dealing with the complexities of these two unique worlds? Figure 1: Real World Use Cases for Big Data Health Care Improve Quality And Efficiency Retail One Size Fits All Marketing Banking Fraud Detection; Risk Analysis NEW DATA WHAT IS POSSIBLE WHY Practitioner s Notes; Machine Statistics Weblogs; Click Streams Weblogs, Transaction, Systems, Fraud Reports Best Practices; Reduced Hospitalization Micro-Segmentation; Recommendations Semantic Discovery; Pattern Detection Increase industry value by $300Bn per year Increase net margin by 60% Billions of dollars lost annually in bank fraud The advantage of Big Data comes into play when you have the ability to correlate Big Data with your existing enterprise data. There s an implicit product requirement here in consolidating these various architecture principles into a single integration solution. The advantages of a single solution allow you to address not only the complexities of mapping, accessing, and loading Big Data but also combining it with your enterprise data and this correlation requires integrating across mixed application environments. The correlation is key to taking full advantage of Big Data and requires a single unified tool that can straddle both environments. Chai Pykmikulla Senior Director, Product Management Oracle Corporation Location Based Services Based On Home Zip Code Personal Location Data Geo- Advertising, Traffic, Local Search Increase Revenue For Providers By $100B+ Utilities Resilient And Adaptable Grid Smart Meter Reading, Call Center Data Real Time And Predictive Utilization Analysis Energy use expected to increase by 22% by 2030. 3 WHITE PAPER / Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

HARNESSING THE POWER OF BIG DATA BEYOND THE SQL WORLD Big data is having a tremendous impact on the data integration space. This has to do primarily with the fact that Big Data poses new questions for the best ways to process volumes and varieties of data at higher speeds and at faster velocities while keeping operating costs to a minimum. But before we look at the impact in more detail, let s first look at the current state of data integration. Data integration in a traditional sense solves the issues of bulk data movement, replication, synchronization, virtualization, transformation, data quality, and data services. These capabilities serve as a key technology component for moving data between Data Warehouses, Business Analytics, Master Data Management, Enterprise Applications, and Custom Applications. But now there s more to be moved. Much, much more. In fact, it s not just the scale, but it s the complexity of new technologies. Data Integration tools have evolved to support integration to and interoperation with technology offerings like Hadoop Distributed File System ( HDFS ) using innovative techniques like MapReduce and NoSQL. Oozie, Sqoop and new access methodologies like Kafka, Spark, Spark Streaming and HiveQL have all merged the realms of Data Integration into the Big Data world. And in many cases to effectively leverage the power of this technology effectively it has to run natively in the technology. Much like E-LT runs natively on relational database systems, so too data integration technology needs to run its transformation, loading natively in Big Data environments BIG DATA TECHNOLOGIES WHAT YOU NEED TO KNOW Kafka Apache Kafka is an open-source platform for stream-processing of real-time data feeds. Spark is an open-source cluster computing framework to implement iterative algorithms (e.g. training algorithms for machine learning) for repeated data queries. Spark Streaming is an extension of Spark that enables scalable, high-throughput processing of live data streams. It lets you ingest data from Kafka (and others like: Flume, Twitter, ZeroMQ, etc.) and pushes data out to filesystems, databases and live dashboards. Hadoop - is an open source software project that supports data-intensive distributed applications. It enables applications to work with thousands of computational independent computers and petabytes of data. MapReduce - is a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or standalone computers Hive is a query language similar to standard SQL statements. The query language is called Hive Query Language (HQL). Hive is executed on Hadoop clusters based on MapReduce operations. NoSQL is a term used for a wide variety of database management systems whose one thing in common is that they don t use SQL. Oracle NoSQL Database, for example, is a key value database, where data is stored under and accessed by a unique key, and variable schemas are easily supported. Hadoop Distributed File System (HDFS) - is a distributed, scalable, and portable file system written in Java for the Hadoop framework. In order to scale a Hadoop cluster to hundreds and thousands of nodes, HDFS is critical. 4 WHITE PAPER / Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

MongoDB is a NoSQL open source Database designed for scale, flexible data aggregation and to store files of any size. It has rich querying, high availability and full indexing support and is fast being adopted by many businesses. HBase is an Apache open source, distributed, versioned project that provides real - time read/write access to Big Data atop clusters of commodity hardware. Apache HBase provides these capabilities on top of Hadoop and HDFS. CHALLENGES POSED BY BIG DATA A shift from familiar Data Integration tools to new technologies to deal with Big Data problems will impact productivity of the organization and increase costs to invest in new skills and tools. Performance constraints of the Data Integration tools to deal with larger and more complex data within ever shortening time windows and real time analytics. Costly network traffic in moving data within and outside the business and storage areas, and lastly Redundancy of existing costly IT investment which become obsolete if they are not Big Data Enabled. Oracle s approach to Big Data solves each of these concerns. Oracle Data Integrator 12c (ODI12c) addresses productivity and skill requirements without leaving familiar programming and IT environments. Oracle Data Integrator for Big Data helps bring the data together while Oracle s Integrated Platform delivers optimal Big Data flow ensuring not just data size and complexity, but also data speed and delivers Fast Data. BEST TECHNOLOGY APPROACHES FOR INTEGRATING BIG DATA Integrated Design Tools: Improve Productivity The advantage of Big Data comes into play when you have the ability to correlate Big Data with your existing enterprise data. There s an implicit product requirement here in consolidating these various architecture principles into a single integration solution. The advantages of a single solution allow you to address not only the complexities of mapping, accessing, and loading Big Data but also combining it with your enterprise data and this correlation requires integrating across mixed application environments. The correlation is key to taking full advantage of Big Data and requires a single unified tool that can straddle both environments. In addition, Big Data sources consist of many different types and in many different forms. How can anyone be sure of the quality of that data? In the Big Data scenario, Data Quality is important because of the multitude of data sources. Multiple data sources make it difficult to trust the underlying data. Being able to quickly and easily identify and resolve any data discrepancies, missing values, etc in an automated fashion is beneficial to the applications and systems that use this information. Oracle Enterprise Data Quality integrated with Oracle Data Integrator 12c (ODI12c) enables data analysts to investigate, identify and resolve data quality issues by discovering and analyzing anomalies. 5 WHITE PAPER / Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

Oracle Data Integrator 12c (ODI12c): Loading and Transforming Big Data Oracle Data Integrator 12c (ODI12c) leverages unified tooling for both Big Data and enterprise data which translates into a faster learning curve as well as seamless usability so that the data scientist or data analyst can focus on integration versus usability. Flow based declarative designs in Oracle Data Integrator 12c (ODI12c) helps build complex expressions that are easy to maintain and support. Two representations of the same ELT mappings, the logical representation and the physical representation, provide customized working environments for business analysts and data scientists. Figure 2: Loading and Transforming Big Data using Oracle Data Integrator 12c Oracle Data Integrator Application Adapters for Hadoop As a component of Oracle Big Data Connectors, Oracle Data Integrator Application Adapter for Hadoop provides native Hadoop integration together with Oracle Data Integrator 12c (ODI12c). Knowledge Modules can then be used to build Hadoop metadata within Oracle Data Integrator 12c (ODI12c), load data into Hadoop, transform data within Hadoop, and load data easily and directly into Oracle Database utilizing Oracle Loader for Hadoop. Once the data is processed and organized on the Hadoop cluster, Oracle Data Integrator 12c (ODI12c) loads the data directly into Oracle Database utilizing the Oracle Loader for Hadoop. Oracle Data Integrator Application Adapter for Hadoop simplifies data loading and movement between Hadoop and an Oracle Database through Oracle Data Integrator 12c (ODI12c) s easy to use prebuilt interfaces. Oracle Data Integrator Application Adapter for Hadoop simplifies data integration from Hadoop and an Oracle Database through Oracle Data Integrator 12c (ODI12c) s easy to use interface. By providing efficient connectivity between Oracle Database and Hadoop, Oracle Data Integrator for Big Data enables analysis of all data, both structured and unstructured, in enterprise data warehouses. Additionally, enterprises that are already using a Hadoop solution integrate data from HDFS using Oracle Data Integrator for Big Data as a standalone software solution as well. 6 WHITE PAPER / Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

Integrated Platform: Simplify and Optimize Taking all the miscellaneous technologies around Big Data which are new to many organizations - and making them each work with one another is challenging. Making them work together in a production- grade environment is even more daunting. Integrated systems can help an organization radically simplify their Big Data architectures by integrating the necessary hardware and software components to provide fast and cost-efficient access, and mapping, to NoSQL and HDFS. Combined hardware and software systems can be optimized for redundancy with mirrored disks, optimized for high availability with hot-swappable power, and optimized for scale by adding new racks with more memory and processing power. Take it one step further and you can use these same systems to build out more elastic capacity to meet the flexibility requirements Big Data demands. Figure 3: Oracle s Big Data Story Oracle is uniquely qualified to combine everything needed to meet the Big Data challenge including software and hardware into one engineered system. The Oracle Big Data Appliance is an engineered system that combines optimized hardware with the most comprehensive software stack featuring both the Cloudera Distribution including Apache Hadoop and specialized solutions developed by Oracle to deliver a complete, easy-to-deploy solution for acquiring, organizing and loading Big Data into Oracle Database. It is designed to deliver extreme analytics on all data types, with enterpriseclass performance, availability, supportability and security. With Oracle Big Data Connectors and Oracle Data Integrator 12c (ODI12c), the solution is tightly integrated with Oracle Exadata and Oracle Database, so that data can be analyzed data with extreme performance. Real-time Business Analytics: Extends the Value of Big Data One of the key expectations for Big Data is to yield real-time analytics that improve business insights. But how timely is the data is a question that will still need to be answered for both Big Data or traditional enterprise data. While Big Data on its own has no means of applying traditional change data capture [since there are no log files in NoSQL or HDFS], it s still an important requirement to implement real-time solutions in conjunction with big data. Otherwise the speed advantage to indexing realms of Big Data will be undone by sluggish ETL processing that it s dependent on. Big Data can be processed at high volume with high velocity. In fact, this is the entire strategy behind the invention of MapReduce providing 7 WHITE PAPER / Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

search responses of the highest quality in close to the blink of a human eye. Combine this power with real-time solutions in replication, change data capture, synchronization, and the integration to business analytics tooling, and you have what amounts to the compelling advantages of real-time business analytics. The combination of Oracle Data Integrator and Oracle GoldenGate pack a powerful punch when it comes to achieving true real-time business analytics. Oracle GoldenGate is an integral part of the realtime business analytics use case to accomplish real- time data replication and capture ensuring applications have the data they need immediately. CONCLUSION Big data continues to be the center of attention - take away the hype and there's a key takeaway. For years, companies have been running their critical business infrastructure and building business insights based on transactional data stored in relational databases. Beyond that critical data, however, is a potential treasure trove of less structured data: web and call logs, social media, email, sensors, and photographs [and more] that can be mined for useful information. Companies that are seeking ways to capitalize on the hidden potential of Big Data need to consider data integration technologies to help bridge the gap and correlate that data across the enterprise. Bridging the two worlds of Big Data and enterprise data means considering solutions that are complete, based on traditional as well as emerging Hadoop technologies, and are poised for success through integrated design tools, integrated platforms, and real-time analytics. Leveraging these best practices ensures improved productivity, lowered TCO, IT optimization, and better business insights. Only Oracle provides the most complete, integrated, and real-time solution data integration that helps bridge the two worlds of Big Data and enterprise data to accelerate adoption and yield greater returns for these emerging technologies. For more information on Oracle Data Integrator go to our website: www.oracle.com/goto/dataintegration. 8 WHITE PAPER / Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

ORACLE CORPORATION Worldwide Headquarters 500 Oracle Parkway, Redwood Shores, CA 94065 USA Worldwide Inquiries TELE + 1.650.506.7000 + 1.800.ORACLE1 FAX + 1.650.506.7200 oracle.com CONNECT WITH US Call +1.800.ORACLE1 or visit oracle.com. Outside North America, find your local office at oracle.com/contact. blogs.oracle.com/oracle facebook.com/oracle twitter.com/oracle Copyright 2019, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only, and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. This device has not been authorized as required by the rules of the Federal Communications Commission. This device is not, and may not be, offered for sale or lease, or sold or leased, until authorization is obtained. (THIS FCC DISLAIMER MAY NOT BE REQUIRED. SEE DISCLAIMER SECTION ON PAGE 2 FOR INSTRUCTIONS.) Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 0119 White Paper Title January 2017 Author: [OPTIONAL] Contributing Authors: [OPTIONAL]