Narration Script for ODI Adapter for Hadoop estudy

Size: px
Start display at page:

Download "Narration Script for ODI Adapter for Hadoop estudy"

Transcription

1 Narration Script for ODI Adapter for Hadoop estudy MODULE 1: Overview of Oracle Big Data Title Hello, and welcome to this Oracle self-study course entitled Oracle Data Integrator Application Adapter for Hadoop. My name is Richard Green. I am a curriculum developer at Oracle and I have helped educate customers on Oracle products since I will be your tour guide for the next hour of interactive lectures and demos. The aim of this course is to introduce you to the Oracle Data Integrator Application Adapter for Hadoop, one of the Oracle Big Data Connectors. After completing this course, you should be able to appreciate the features of this adapter, including an understanding of the business and technical benefits of this product. With the focus on the ODI Application Adapter for Hadoop only, this course is not intended as an overall introduction to all of the features of Oracle Data Integrator. A case study involving Simon Howell, a data integration project manager of a fictitious company, brings out the business benefits of the ODI Application Adapter for Hadoop. 2 Using The Player Before we begin, now might be a good time to take a look at some of the features of this Flash-based course player. Feel free to skip this slide and start the lecture if you ve attended similar Oracle self-study courses in the past. To your left, you will find a hierarchical course outline. This course enables and even encourages you to go at your own pace, which means you are free to skip over topics you already feel confident on, or jump right to a feature that really interests you, or go back and review topics that were already covered. Simply click on a course section to expand its contents and then select an individual slide. However, note that by default we will automatically walk you through the entire course without requiring you to use the outline. Also to your left is a panel containing any additional reference notes for the current slide. Feel free to read these reference notes at the conclusion of the course. Or if you prefer you can pause and read them as we go along. Standard Flash player controls are also found at the bottom of the player, including pause, previous, and next buttons. There is also an interactive progress bar to fast forward or rewind the current slide. Interactive slides may have additional controls and buttons along with instructions on how to use them. Various handouts may be available from the Attachments button, including the audio narration scripts for this course. The course will now pause, so feel free to take some time and explore the interface. Then when you re ready to continue, click the NEXT button below, or alternatively click the Module 1 slide in the course outline at left.

2 3 About This Course Introduction So after having been given a quick introduction to the goals of this course, you still may be asking yourself Am I in the right place? To help you answer this question, you can access information here regarding the course s objectives, target audience, and prerequisites. Overview This course introduces you to the features of Oracle Data Integrator Application Adapter for Hadoop, ODIAAH for short. Using lecture materials and product demos, it provides an introduction to the architecture and capabilities of ODIAAH. This course does not serve as a general introduction to the broad set of features and functionality of Oracle Data Integrator. This course is most useful if you have prior hands-on experience with Oracle Data Integrator. For broad or detailed information about the entire range of Oracle data integration capabilities, refer to the documentation and Oracle University courses for data integration and data warehousing. Course Objectives After completing this course, you should be able to: Describe the Oracle approach to Big Data Explain the main features and capabilities of ODIAAH Explain how to use ODIAAH to load data from files into Hive Explain how to use ODIAAH to transform files in Hive Explain how to use ODIAAH to move data from Hive into Oracle What are the Prerequisites? To get the most from this course, you should have thorough familiarity, preferably hands-on, with Oracle Data Integrator. 4 Course Road Map This course consists of five modules: 1. The first module is an overview of the Oracle approach to big data 2. The second module is an introduction to Oracle Data Integrator Application Adapter for Hadoop 3. The third module shows how to use ODIAAH to load data from files into Hive 4. The fourth module shows how to use ODIAAH to transform and validate files that are in Hive 5. The fifth module shows how to use ODIAAH to load data from Hive into an Oracle database. 5 Module 1 title slide: Overview of Oracle Big Data Let us now proceed with the 1st module, Overview of Oracle Big Data.

3 6 Module Topics Before we dive into our examination of the Oracle Data Integrator Application Adapter for Hadoop, we need to step back and consider the so-called big data environment in which it operates. This module s topics include: What does the term big data mean? What are the Oracle Big Data products? and an introduction to the Hadoop environment and MapReduce programming 7 Big Data is About Today the term big data draws a lot of attention, but there's a simple story behind the hype. For decades, companies have been making business decisions based on transactional data that is stored in relational databases. The usual information-gathering process involves extracting the data that resides in a database. That process restricts you to retrieving mostly structured and transactional data. The Oracle big data solution eliminates this restriction by tapping into diverse data sets, finding relationships in the data, and using the data for different business purposes. Beyond that critical data, however, is a potential treasure trove of nontraditional, less structured data, such as weblogs, social media, , sensors, and photographs, which can be mined for useful information. Decreases in the cost of both storage and compute power have made it feasible to collect this data, which would have been thrown away only a few years ago. As a result, more and more companies want to include nontraditional data and traditional enterprise data in their business intelligence analysis. 8 How Did Big Data Evolve? As more and more people use the Internet and new technologies like smartphones, greater volumes of data are generated all over the world. Not only is this data voluminous, but it is also generated in various formats. Some examples of big data sources are: Social networks Banking and financial services E-commerce services Web-centric services Internet search indexes Scientific searches Document searches Medical records And Weblogs Greater volumes of data are being generated by: Traditional enterprise applications such as transactional ERP data, web store transactions, and general ledger data. And vast amounts of machine-generated/sensor data including Call Detail Records, weblogs, smart meters, manufacturing sensors, equipment logs (often referred to as digital exhaust ), and trading

4 systems data. And an explosion of social data including customer feedback streams, microblogging sites like Twitter, and social media platforms like Facebook. Because traditional databases cannot handle this data or process it instantly, a different approach to storing data became necessary. 9 Big Data: Infrastructure Requirements Let s examine the three steps for converting data into decision-making information. First, capture or acquire raw data with Hadoop Distributed File System (HDFS) and key-value stores (NoSQL databases). Second, use a programming paradigm called MapReduce to interpret and refine the data. Third, feed the refined and organized data into a relational database (SQL databases) to enable proper analysis, and then base business decisions on the data. 10 Oracle Integrated Software Solution This diagram depicts, at a high level, Oracle s integrated software solution for conducting the: Acquire, Organize, and Analyze & Decide phases of data conversion. Business decisions are derived from the analyzed data using your choice of tools. This Oracle solution handles data that can be classified as ranging from high information density to high data variety. This data might be organized in schemas, it might be schema-less, or it might be entirely unstructured. The software that is the focus for this self-study is Oracle Data Integrator Application Adapter for Hadoop. 11 Oracle Big Data Appliance: Hardware Components An Oracle Big Data Appliance rack consists of these components: 18 Sun Fire X4270 M2 Servers 1 Sun Rack II 1242 Base 2 NM2-GW Sun Network QDR InfiniBand Gateway Switches 1 NM2-36 Sun Datacenter InfiniBand Switch 36 1 Cisco Catalyst 4948 Ethernet Switch 1 KVM Switch And 2 Power Distribution Units 12 Oracle Big Data Appliance: Software Components Oracle Big Data Appliance includes open-source components, which are packaged as system software with the appliance, and Oracle software, which is packaged as Big Data Connectors. Oracle Big Data Appliance is preinstalled and preconfigured for large-scale big data management. It uses Cloudera s Distribution including Apache Hadoop (CDH) and Oracle NoSQL Database as data management capabilities, and runs on Oracle Linux and Oracle HotSpot JVM. It includes Cloudera

5 Manager for cluster-wide administration and monitoring of CDH. For deep analysis of big data, it also includes an open-source distribution of the statistical environment R. In contrast to building a big data system from the ground up, the Big Data Appliance eliminates the time-consuming efforts of choosing and configuring hardware, determining the proper open-source components and versions, and integrating and tuning the overall configuration. The entire solution is preinstalled and preconfigured out of the box for high performance and high availability. The foundation software residing on Oracle Big Data Appliance includes: Oracle Linux Oracle Java VM Open-source Apache Hadoop distribution from Cloudera and Open-source R Distribution The application software includes: Oracle NoSQL Database Enterprise Edition Oracle Loader for Hadoop and Oracle Data Integrator Application Adapter for Hadoop The focus of this self-study is Oracle Data Integrator Application Adapter for Hadoop (ODIAAH). Note that the position of blocks in the diagram is not meant to suggest that ODIAAH operates on top of Oracle Loader for Hadoop. 13 Oracle Big Data Connectors Software Where Oracle Big Data Appliance makes it easy for organizations to acquire and organize new types of data, Oracle Big Data Connectors enables an integrated data set for analyzing all data. You can install Oracle Big Data Connectors on Oracle Big Data Appliance or on a generic Hadoop cluster. There are four Oracle Big Data Connectors: Oracle Loader for Hadoop uses MapReduce processing to load data efficiently into Oracle Database 11g. Oracle R Connector for Hadoop gives R users native, high-performance access to the HDFS and MapReduce programming framework. Oracle Direct Connector for Hadoop Distributed File System enables the Oracle Database SQL engine to access data seamlessly from HDFS; it allows Oracle Database to access big data on a Hadoop cluster without loading the data. Oracle Data Integrator Application Adapter for Hadoop enables Oracle Data Integrator to generate Hadoop MapReduce programs that extract, transform, and load big data into partitioned tables in Oracle Database, through an easy-to-use graphical interface. ODIAAH is the focus of this training. 14 Hadoop Architecture With Hadoop MapReduce, you can easily develop applications that process massive volumes of data in parallel on large clusters of an engineered system, which is subject to fault tolerance and reliability. The framework sorts the output of the maps, which are then input to the reduce tasks. Job input and output are stored in a file system. The framework also schedules tasks, monitors them, and reexecutes the failed tasks. In a single Oracle Big Data Appliance rack, Hadoop distributes the files and workload across the 18 servers in a cluster. The characteristics of the Hadoop architecture are:

6 Distributed file system with redundant storage Map/Reduce programming paradigm ighly scalable data processing And cost-effective model for high-volume, low-density data 15 What is MapReduce? MapReduce is a framework that enables distributed computation on large clusters. MapReduce is a set of code and infrastructure for parsing and building large data sets. A map function generates a key/value pair from the input data, and this data is then reduced by a function to combine all values that are associated with equivalent keys. The cluster has a single JobTracker, and each node in the cluster has a TaskTracker. The JobTracker schedules jobs. The jobs are broken down into component tasks that are submitted to, and executed by, the TaskTrackers. The JobTracker monitors and re-executes any failed tasks. 16 Example of a Word Count MapReduce Program This diagram shows an example of a MapReduce program for performing a word count. The Mapper maps input key/value pairs to a set of intermediate key/value pairs. Maps are the individual tasks that transform input data sets into intermediate data sets. The transformed intermediate data sets do not have to be of the same type as the input data sets. A given input pair may map to zero or many output pairs. 17 MapReduce Sessionization Example Sessionization is a distinctive map-reduce use case in web data analysis. The diagram displays a scenario for dividing the available click streams into sessions within a specific range of time, and then finding different patterns from each session. You can use the session log for marketing strategies, familiarity of a particular product, irrelevant transactions, and so on. The map-reduce process for sessionization involves steps that map, shuffle and sort, and reduce. The input file is the session information log, which contains the userid, pageid, and time stamp detail on their login. Based on this log, you can identify the number of sessions that are active at a particular time, including the duration of each session. The input files are mapped using the Mapper. The mapped pairs are sorted and shuffled, and redundancy is eliminated. These sorted pairs are further reduced to extract the required output. Note: You can repeat this map-reduce process based on the size of the input data. 18 What is Hive? Hive is a data warehouse infrastructure for Hadoop. It facilitates easy data summarization, ad hoc queries, and analysis of large data sets that are stored in Hadoop-compatible file systems. Hive provides a mechanism to project structure onto this data, and query the data using a SQL-like language called HiveQL. Hive is not designed for online transaction processing (OLTP) workloads and does not offer real-time queries or row-level updates. It is best used for batch jobs over large sets of append-only data, such as web

7 logs. Key benefits of Hive are scalability (scale out with more machines added dynamically to the Hadoop cluster), extensibility (with MapReduce framework and custom scalar functions [UDFs], aggregations [UDAFs], and table functions [UDTFs]), fault tolerance, and loose coupling with its input formats. 19 Review In this module, you should have learned about: What does the term big data mean? What are the Oracle Big Data products? and an introduction to the Hadoop environment and MapReduce programming 20 Quiz MODULE 2: Oracle Data Integrator Application Adapter for Hadoop 21 Title Slide Welcome to Module 2, Oracle Data Integrator Application Adapter for Hadoop. In this module, you are first introduced to the Oracle Data Integrator product. You then learn how the ODI Application Adapter for Hadoop can serve as an efficient alternative to manual MapReduce programming. 22 Module Topics In this module, you learn: What is Oracle Data Integrator? What are the main components of ODI? The benefits of the ODI declarative design The flexibility and efficiency of ODI knowledge modules What is the ODI Application Adapter for Hadoop? What are the Knowledge modules specifically designed for Hadoop? 23 Should We Switch from Manual MapReduce Programming? Let s meet Simon Howell, who manages a data integration project for a fictitious company. Simon wants to know if Oracle s ODI Application Adapter for Hadoop can possibly replace his team s manual MapReduce programming. Simon needs to learn enough about ODIAAH to be able to assess whether it can provide a more efficient alternative to manual MapReduce programming. 24 What is Oracle Data Integrator? Oracle Data Integrator (ODI) is a tool for designing, deploying, and executing jobs for data movement and data transformation among data systems. ODI can read and write to and from a number of different sources and targets. ODI provides predefined connectors and application adapters to connect to specific source applications,

8 databases, and legacy systems. Oracle Data Integrator is a widely used data integration software toolset that provides: Declarative design approach to defining data transformation and integration processes Faster and simpler development and maintenance Unique ELT architecture (extract, then load, then transform) Most cost-effective solution Unified infrastructure to streamline data and application integration projects 25 ODI Components Let s look at the components of ODI. The ODI repository is a database containing all of the ODI metadata. It is made of 2 parts, the Master Repository and one or more Work Repositories. The Master Repository is shared among different projects and users. It contains security information related to users and their access rights. It contains topology information about the different technologies with which ODI can interface. By technology, we mean platforms that ODI can read or write to, or platforms that ODI can use as an engine, such as Oracle, or DB2, or flat files. The technology that is specified determines what kind of code ODI will generate. For example, the generated code can include outer joins if the specified technology supports outer joins. You can create a number of Work Repositories that sit on top of the Master Repository. This is where you define interfaces that map the movement and transformation of data from source to target. You also model your schema definitions in Work Repositories. You also organize your work as projects in Work Repositories. Your projects contain definitions of such things as business rules, packages, procedures, and knowledge module templates for connecting to specific source technologies. Work Repositories also store information about the success or failure of your job executions. The Agent is the program that schedules and runs all your interfaces of source to target mappings. The agent orchestrates when and where the code is executed. The interface code is compiled and runs as what is called a scenario. The agent takes the scenario from the repository, and sends the scenario to the machine on which it will run. The agent starts the job and gets log information, allowing you to see which task is taking place. The agent retrieves return codes, messages, and statistics (such as the number of rows processed and execution time), which it writes back to the Work Repository. The ODI Studio is the graphical user interface to define everything. You can define your source to target interfaces, package them, manage executions, and so on. You import the Application Adapters, such as the Hadoop adapter, using ODI Studio. The Application Adapters are stored in the ODI Repository. The Applications Adapters are not shipped as part of the ODI Studio. 26 ODI Declarative Design A powerful feature of ODI is its declarative design paradigm. You define what you want to do. You pass it through one or more of ODI s predefined templates, which describe how the job is done. These templates are either the standard library of knowledge module templates that come with ODI, or one of the application adapters like the adapter for Hadoop, which you can import into ODI. ODI s code generator automatically defines code specifically for that task, written for the particular source and target technologies, such as Hadoop or Oracle or DB2.

9 The screenshot shows the editor for an ODI interface, presenting a logical representation of source objects in the upper left panel, target objects in the upper right panel, and the properties of the target object in the panel at the bottom. 27 ODI Knowledge Modules ODI ships with a comprehensive library of knowledge modules, which are like templates for generating technology-specific code. Knowledge modules (KMs) are at the core of the ODI architecture. They make all ODI processes modular, flexible, and extensible. They implement the actual data flows and define the templates for generating code across the multiple systems that are involved in each process. ODI provides a comprehensive library of knowledge modules. o The red chevrons at the top of the slide show the 6 types of knowledge modules For example, in ODI, you can define a flat file definition and define the target table in a database. You specify what you want to accomplish by mapping the flat file fields to the table columns. But at that point, you are not actually saying how you want to load the data into the table. You can now choose the Oracle SQL*Loader knowledge module, and the resulting code will push the data into the table using SQL*Loader. Or, you could use the External Tables knowledge module, and the resulting code will push the data into the table using an external table. You choose either approach of how to perform work, without changing the initial mapping of what you want done. 28 ODI Components This is the ODI components diagram that we examined earlier. Now we add the Oracle Big Data Appliance symbol, to indicate which ODI components are included in the Big Data Appliance, and which lie outside. The ODI Repository is inside a MySQL instance. There is one Master Repository and one Work Repository. The Hive JDBC connection is predefined. The ODI Studio is not included in the Big Data Appliance. If you use ODI Studio, you will need to define a connection to the ODI Repository. The Application Adapters are licensed with the Big Data Appliance. In ODIAAH, the agent runs the interfaces that you created and passes the work to the JobTracker. ODIAAH accesses Hive through Java Database Connectivity (JDBC). If you do a query through JDBC, such as an Insert, ODIAAH generates MapReduce code, which the agent sends to the JobTracker. The JobTracker, in turn, sends the MapReduce code to a TaskTracker. 29 ODI and Hadoop Oracle Data Integrator Application Adapter for Hadoop is a Big Data Connector. ODI with ODIAAH is used to orchestrate the following data integration functions around the Hadoop environment: Moving data into Hadoop Transforming the data while inside Hadoop Extracting data from Hadoop The diagram shows how ODIAAH interacts with the Hadoop environment: The HiveQL language provides a relational projection of files that are stored in HDFS. The Hive-specific KMs in ODIAAH enable ODI to generate Hive code.

10 Hive takes the query, generates a set of MapReduce programs, and executes them on the stored files, producing a new set of files stored in HDFS that have a relational presentation. 30 What Is ODI Application Adapter for Hadoop? ODIAAH is a Big Data Connector that allows data integration developers to easily integrate and transform data within Hadoop by using Oracle Data Integrator. It has preconfigured, Hive-specific ODI KMs. The main functions of ODIAAH include: o Loading data into Hadoop from local file systems and HDFS o Performing validation and transformation of data within Hadoop o Loading processed data from Hadoop to Oracle for further processing and report generation The screenshot displays the KMs for Hadoop in the Projects section under the ODI Designer view.. 31 ODIAAH Knowledge Modules for Hadoop To facilitate the MapReduce implementation, ODI Application Adapter for Hadoop provides the Hivespecific knowledge modules. RKM Hive can be used to reverse-engineer into ODI table and view definitions from Hive, so that they can be used in an ODI interface. IKM File to Hive loads data from local or HDFS files into Hive tables. The next two knowledge modules perform transformations within Hadoop. IKM Hive Control Append applies SQL-like transformations on the data. It utilizes the Hive function library. IKM Hive Transform integrates data into a Hive target table after the data is transformed by a custom, user-defined script. CKM Hive can be used to define constraints to validate data being loaded into Hive. You can apply common constraints such as not null, foreign key, unique key, and primary key. IKM File/Hive to Oracle (OLH) uses Oracle Loader for Hadoop to load data from Hadoop into an Oracle database. This KM will use the Delimeted File InputFormat, the Hive InputFormat, or a user defined InputFormat(Java class) to stream the data. Different output modes can be chosen (JDBC, Direct Path, Datapump) 32 Review In this module, you should have learned about: What is Oracle Data Integrator? What are the main components of ODI? The benefits of the ODI declarative design The flexibility and efficiency of ODI knowledge modules What is the ODI Application Adapter for Hadoop? and What are the Knowledge modules specifically designed for Hadoop? 33 Quiz

11 MODULE 3: Loading Unstructured Data from File into Hive 34 Title slide Welcome to Module 3, Loading Unstructured Data from File into Hive. In this module, you learn how to use a Hive-specific knowledge module of ODI Application Adapter for Hadoop to move data from a file into a Hive server. 35 Module Topics In this module, you learn: The capabilities of the File to Hive knowledge module The steps for defining a data source in Oracle Data Integrator How to define and test the ODI connection to a Hive server How to reverse engineer data structures using the RKM knowledge module And through a demo you learn how to load data using the File to Hive knowledge module 36 Does ODIAAH Automate Loading Files into Hive? Simon first wants to learn how ODIAAH automates the loading of unstructured data in Hadoop HDFS. Simon will be examining the ODIAAH File to Hive knowledge module for automating the task of loading data into Hadoop. 37 Demos in this Self-Study The knowledge modules for Oracle Data Integrator Application Adapter for Hadoop were developed to simplify processing of unstructured and structured data on Hadoop. The knowledge modules were developed to facilitate the following processes: Loading data into Hive (which is the subject of our first demo) Transforming and validating data in Hive Loading processed data from Hive into Oracle 38 Steps for Defining Data Sources The ODI Big Data Appliance provides a preconfigured environment. However, beyond the predefined and preconfigured objects, you must still perform the steps shown on this slide to point to the location of your data sources. 1. The data server metadata describes the source or target data store. Data stores are used in the interface to specify the source and target. 2. Create the model metadata that contains the data stores and the association with the logical schema. 3. Define execution contexts to associate logical and physical architecture. During execution, depending on the selected context, the logical schema is mapped to the appropriate physical schema so that you can switch the context from development to test to deployment. After defining your data sources, you perform the following steps to define and execute your data integration jobs: 4. Design the interface. The interface specifies the source, target, mappings, rules, and KMs. Executing the

12 interface with a specified context migrates the data from the source physical location to the target physical location. 5. Design the package. The package enables you to create a process to execute interfaces, procedures, and other logic, as required. 39 Testing the ODI Connection to a Hive Server ODI uses JDBC to connect to Hive. The panel on the right side of this screenshot shows the location that was specified for a JDBC driver and a JDBC URL. When you test the connection, you are prompted to select a physical agent. In this example, the local agent is selected. However, in production environments, you will probably choose a stand-alone agent that you defined and deployed to a remote data source. If you have the Oracle Big Data Appliance, these connection definitions pointing to the included Hive server will be predefined for you. You will only need to test the connections. 40 Import the ODIAAH Knowledge Modules Licensed users of Oracle Data Integrator Application Adapter for Hadoop need to import the six Hivespecific knowledge Modules into their ODI project. 41 Reverse Engineer Hive Structures Into ODI You can use the RKM Hive knowledge module to reverse-engineer into ODI table and view definitions from Hive, so that they can be used in an ODI interface. 42 Loading Data Into Hadoop The ODI agent is submitting tasks to the Hive server via JDBC. The Hive server generates MapReduce jobs for local files outside of the Hadoop environment and/or HDFS files within the Hadoop environment. The MapReduce jobs create SQL-like relational representations of the data as Hive tables, even though the data remains stored in HDFS files. 43 Loading Weblog File to Hive One typical application of big data analysis involves loading massive amounts of Weblog data along with a relational representation that can be queried. In this slide, the image at the upper left corner shows an ODI package managing a series of 5 interfaces. The first interface loads a Weblog file into Hive. The code box on the right shows the Hive code that ODI generates by way of the knowledge modules that are part of the ODI Application Adapter for Hadoop. It is a simple Hive query that creates a table, parses the source data fields, and loads the data into the Hadoop Distributed File System (HDFS). The result is shown at the bottom of the slide. Not only is the data loaded into the HDFS file system, but also a Hive table representation of the data is projected on top of the HDFS file. 44 File to Hive (LOAD_DATA) Mapping We now examine an ODI interface for loading data from a file to a Hive server, using the File to Hive

13 knowledge module. This screenshot shows the Mapping view of the interface, a logical-level view with the source file represented on the left, showing all of its fields. On the right is the target datastore inside Hive, with a relational representation. 45 File to Hive (LOAD_DATA) Flow This screenshot shows the Flow view of the same interface, showing the more physical aspects of how the mapping will be performed from source to target. Note that the IKM File to Hive knowledge module is chosen in the IKM Selector field. This choice of an ODIAAH knowledge module influences ODI s generation of Hive code. The right-hand side describes what this knowledge module performs, its requirements and its restrictions. The left-hand side lists all of the options available within the knowledge module. 46 Demonstration of How to Load Data Into Hive Please click the link to run the first demo. 47 Review In this module, you should have learned about: The capabilities of the File to Hive knowledge module The steps for defining a data source in Oracle Data Integrator How to define and test the ODI connection to a Hive server How to reverse engineer data structures using the RKM knowledge module And through a demo you learned how to load data using the File to Hive knowledge module 48 Quiz Module 4: Transforming and Validating Data on Hive 49 Title Slide Welcome to Module 4, Transforming and Validating Data on Hive. In this module, you learn how ODIAAH can automate the transformation and validation of data that is in HDFS. 50 Module Topics In this module, you learn: How to use an ODIAAH knowledge module that utilizes the Hive function libarary to transform and validate data in Hive How to use an ODIAAH knowledge module that supports user defined scripts to transform data in Hive And through a demo you see these Hive-specific knowledge modules being used in ODI to perform data transformation and validation using a predefined knowledge module

14 51 Can ODIAAH Transform and Validate the Data Loaded into Hive? The previous module showed how ODIAAH automates loading of data files into Hadoop. In this module, Simon learns how ODIAAH can automate the transformation and validation of data that is in HDFS. 52 Demos in this Self-Study Previously, we learned how to use ODIAAH to load data from files into Hive. The next ODIAAH process that we examine is transforming and validating data in Hive. 53 Processing Data Inside Hadoop There are two ODIAAH knowledge modules by which the ODI agent can interact with the Hive Server to submit MapReduce jobs to transform the data once it is inside the Hadoop environment. The Hive Control Append knowledge module enables SQL-like transformations of the data, utilizing the standard Hive function library. The Hive Transform knowledge module enables you to pass in your own user-defined scripts to transform the data. 54 Hive Control Append KM Mapping We now examine an ODI interface for transforming data within a Hive server, which is a Hive to Hive mapping, using the Hive Control Append knowledge module. This screenshot shows the Mapping view of the interface, with two datastores on the left, and the target datastore on the right. In the target datastore, notice there are some Hive functions available to use, such as case/when, concatenate, and cast. This example shows use of the case function. When the Customer dear field equals 0, the value is Ms.. When the Customer dear field equals 1, the value is Mr. 55 Hive Control Append KM Flow This screenshot shows the Flow view of the same interface, showing the more physical aspects of the mapping from the two sources to the target all within the same Hive environment. Note that the Hive Control Append knowledge module is chosen in the IKM Selector field. 56 Can ODIAAH Also Use Customized Transformation Scripts? Simon and his team just learned how ODIAAH works with Hive to transform data that is stored in HDFS, using the standard SQL transformations and the library of Hive expressions. Next, the team learns how it can plug in a customized, user-defined transformation script that accepts the stream of input, transforms it, and outputs a stream. 57 Hive Transform KM Mapping We now examine a different ODI interface for transforming data within a Hive server, using the Hive

15 Transform knowledge module. This screenshot shows the Mapping view of the interface, with two datastores on the left, and the target datastore on the right. 58 Hive Transform KM Flow This screenshot shows the Flow view of the same interface, showing the more physical aspects of the mapping from a preprocessed weblog to a sessionized weblog. Note that the Hive Transform knowledge module is chosen in the IKM Selector field. Also note the TRANSFORM_SCRIPT_NAME and TRANSFORM_SCRIPT fields for specifying your custom user-defined script for transformation. 59 Demonstration of How to Transform and Validate Data in Hive Using ODIAAH Please click the link to run the second demonstration. 60 Review In this module, you should have learned about: How to use an ODIAAH knowledge module that utilizes the Hive function libarary to transform and validate data in Hive How to use an ODIAAH knowledge module that supports user defined scripts to transform data in Hive And through a demo you see these Hive-specific knowledge modules being used in ODI to perform data transformation and validation using a predefined knowledge module 61 Module 4 Quiz Module 5: Loading Processed Data in Hive into Oracle 62 Title Slide: Welcome to Module 5, Loading Processed Data in Hive Into Oracle. In this xmodule, you learn how to use ODIAAH to move the processed data from HDFS into an Oracle database. 63 Module Topics In this module, you learn: How to use an ODIAAH knowledge module for loading processed Hive data into an Oracle Database And through a demo you see how this Hive-specific knowledge module processes Hive data into an Oracle database. 64 How Do We Move the Processed Hive Data into an Oracle Database? Previous modules showed how ODIAAH automated the loading and transformation of data into Hadoop. In this module, Simon and his team learn how to use ODIAAH to move the processed data from HDFS into an Oracle database.

16 65 Demos in this Self-Study We ve examined ODIAAH processes for loading data into Hive and transforming data that is in Hive. The final ODIAAH process that we examine is loading processed data in Hive into Oracle. 66 ODI Loading Data Into Oracle Using OLH To load data from Hadoop into Oracle, the ODI agent submits the jobs to the Oracle Loader for Hadoop JobClient, which then creates MapReduce jobs. The File/Hive to Oracle knowledge module uses Oracle Loader for Hadoop to load the data from Hadoop into an Oracle database. 67 File/Hive to Oracle (OLH) KM Mapping We now examine a different ODI interface for moving the data from either an HDFS file or a Hive source into an Oracle environment, using the File/Hive to Oracle knowledge module. This knowledge module utilizes the Oracle Loader for Hadoop to load the data into Oracle. 68 File/Hive to Oracle (OLH) KM Flow This screenshot shows the Flow view of the same interface. It shows the File/Hive to Oracle knowledge module has been selected. The knowledge module s options are listed on the left-hand side. The first option listed indicates that data pump will be used to copy over the data to Oracle. The description of the knowledge module is on the right-hand side. 69 Packaging the Jobs Together You can tie your job executions together, such as a series of interfaces, by placing them in an ODI package. 1. In this package, the first interface takes a Weblog and loads it into the Hadoop HDFS file system, using the File to Hadoop knowledge module. 2. The dates in the Weblogs were not in a format that could be sorted, so the second interface transforms them to an ISO Date format. 3. The third interface takes the preprocessed data and sessionizes it. That means we take these logs of web events and re-construct who visited the site, what pages the visitors clicked on and in what order, and how long they spent there. We take multiple mouse clicks and group them by specific IP address and specific timeframe. 4. Next, there is a lot of data in these Weblogs, and we don t want to send all that data to Oracle, so the fourth interface filters out anything not necessary for the intended analysis. For this step, and the previous two steps, the interfaces make use of knowledge modules that produce transformation code that runs in Hadoop. 5. Finally, the fifth interface uses the File to Oracle Using OLH knowledge module to move the data into Oracle tables.

17 70 Monitoring the Job Executions The ODI Operator Navigator allows you to monitor the progress of your executions. In this example, the details of a session task are examined. 71 Demonstration Please click the control to start the third and final demonstration. 72 Review In this module, you should have learned about: How to use an ODIAAH knowledge module for loading processed Hive data into an Oracle Database And through a demo you saw how this Hive-specific knowledge module processed Hive data into an Oracle database. 73 Quiz 74 What Have Simon Howell and His Team Decided to Do? After careful investigation, Simon and his team have concluded that the ODI Application Adapter for Hadoop is indeed a more efficient alternative to manual MapReduce programming. They have decided to use ODIAAH in their first big data project! 75 Course Summary In this self-study, you should have learned how to: Explain the concepts of Oracle s approach to Big Data Describe the main features and functions of Oracle Data Integrator Application Adapter for Hadoop Use ODIAAH to load unstructured data from a file (from either local file system or an HDFS source ) into Hive Use ODIAAH to transform and validate data that is in a Hive server and use ODIAAH to load processed data in Hive into an Oracle database 76 For Further Information about Oracle Application Adapter for Hadoop For more information about Oracle Application Adapter for Hadoop, and related technologies, you can use the links provided on this slide. Thank you for taking this self-study course on ODI Application Adapter for Hadoop! End of ODIAAH self-study

Big Data The end of Data Warehousing?

Big Data The end of Data Warehousing? Big Data The end of Data Warehousing? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Big data, data warehousing, advanced analytics, Hadoop, unstructured data Introduction If there was an Unwort

More information

Oracle Data Integrator 12c: Integration and Administration

Oracle Data Integrator 12c: Integration and Administration Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive

More information

Oracle Data Integrator 12c: Integration and Administration

Oracle Data Integrator 12c: Integration and Administration Oracle University Contact Us: +34916267792 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive data integration platform

More information

Oracle Data Integrator 12c: Integration and Administration

Oracle Data Integrator 12c: Integration and Administration Oracle University Contact Us: +27 (0)11 319-4111 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive data integration

More information

Introducing Oracle R Enterprise 1.4 -

Introducing Oracle R Enterprise 1.4 - Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I

More information

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig?

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig? Volume: 72 Questions Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig? A. update hdfs set D as./output ; B. store D

More information

Introduction to the Oracle Big Data Appliance - 1

Introduction to the Oracle Big Data Appliance - 1 Hello and welcome to this online, self-paced course titled Administering and Managing the Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled Introduction to the

More information

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT. Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem

More information

My name is Brian Pottle. I will be your guide for the next 45 minutes of interactive lectures and review on this lesson.

My name is Brian Pottle. I will be your guide for the next 45 minutes of interactive lectures and review on this lesson. Hello, and welcome to this online, self-paced lesson entitled ORE Embedded R Scripts: SQL Interface. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle.

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott 2 Program Agenda Big Data Connectors: Brief Overview Connecting Hadoop with Oracle

More information

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database: SQL and PL/SQL Fundamentals NEW Oracle University Contact Us: 001-855-844-3881 & 001-800-514-06-97 Oracle Database: SQL and PL/SQL Fundamentals NEW Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals

More information

Oracle Big Data Fundamentals Ed 1

Oracle Big Data Fundamentals Ed 1 Oracle University Contact Us: +0097143909050 Oracle Big Data Fundamentals Ed 1 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big Data

More information

Oracle Big Data Fundamentals Ed 2

Oracle Big Data Fundamentals Ed 2 Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies

More information

Oracle BDA: Working With Mammoth - 1

Oracle BDA: Working With Mammoth - 1 Hello and welcome to this online, self-paced course titled Administering and Managing the Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled Working With Mammoth.

More information

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Keywords: Big Data, Oracle Big Data Appliance, Hadoop, NoSQL, Oracle

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Oracle Big Data Appliance

Oracle Big Data Appliance Oracle Big Data Appliance Software User's Guide Release 2 (2.4) E51159-02 January 2014 Provides an introduction to the Oracle Big Data Appliance software, tools, and administrative procedures. Oracle Big

More information

Certified Big Data and Hadoop Course Curriculum

Certified Big Data and Hadoop Course Curriculum Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation

More information

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

Certified Big Data Hadoop and Spark Scala Course Curriculum

Certified Big Data Hadoop and Spark Scala Course Curriculum Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills

More information

Evolving To The Big Data Warehouse

Evolving To The Big Data Warehouse Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from

More information

Oracle Data Integrator 12c: ETL Integration Bootcamp and New Features

Oracle Data Integrator 12c: ETL Integration Bootcamp and New Features Oracle Data Integrator 12c: ETL Integration Bootcamp and New Features Training Details Training Time : 18 Hours Capacity : 16 Prerequisites : There are no prerequisites for this course. About Training

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

Oracle GoldenGate for Big Data

Oracle GoldenGate for Big Data Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and

More information

Oracle. Oracle Big Data 2017 Implementation Essentials. 1z Version: Demo. [ Total Questions: 10] Web:

Oracle. Oracle Big Data 2017 Implementation Essentials. 1z Version: Demo. [ Total Questions: 10] Web: Oracle 1z0-449 Oracle Big Data 2017 Implementation Essentials Version: Demo [ Total Questions: 10] Web: www.myexamcollection.com Email: support@myexamcollection.com IMPORTANT NOTICE Feedback We have developed

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

Building an Integrated Big Data & Analytics Infrastructure September 25, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle

Building an Integrated Big Data & Analytics Infrastructure September 25, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Building an Integrated Big Data & Analytics Infrastructure September 25, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to

More information

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications

More information

Oracle Big Data SQL High Performance Data Virtualization Explained

Oracle Big Data SQL High Performance Data Virtualization Explained Keywords: Oracle Big Data SQL High Performance Data Virtualization Explained Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data SQL, SQL, Big Data, Hadoop, NoSQL Databases, Relational Databases,

More information

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation

More information

Hadoop. Introduction / Overview

Hadoop. Introduction / Overview Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures

More information

Hello, and welcome to this online, self-paced course module covering Exadata Smart Flash Log. My name is Peter Fusek. I am a curriculum developer at

Hello, and welcome to this online, self-paced course module covering Exadata Smart Flash Log. My name is Peter Fusek. I am a curriculum developer at Hello, and welcome to this online, self-paced course module covering Exadata Smart Flash Log. My name is Peter Fusek. I am a curriculum developer at Oracle, and in various roles I have helped customers

More information

Security and Performance advances with Oracle Big Data SQL

Security and Performance advances with Oracle Big Data SQL Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,

More information

Importing and Exporting Data Between Hadoop and MySQL

Importing and Exporting Data Between Hadoop and MySQL Importing and Exporting Data Between Hadoop and MySQL + 1 About me Sarah Sproehnle Former MySQL instructor Joined Cloudera in March 2010 sarah@cloudera.com 2 What is Hadoop? An open-source framework for

More information

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

Acquiring Big Data to Realize Business Value

Acquiring Big Data to Realize Business Value Acquiring Big Data to Realize Business Value Agenda What is Big Data? Common Big Data technologies Use Case Examples Oracle Products in the Big Data space In Summary: Big Data Takeaways

More information

Integrating Big Data with Oracle Data Integrator 12c ( )

Integrating Big Data with Oracle Data Integrator 12c ( ) [1]Oracle Fusion Middleware Integrating Big Data with Oracle Data Integrator 12c (12.2.1.1) E73982-01 May 2016 Oracle Fusion Middleware Integrating Big Data with Oracle Data Integrator, 12c (12.2.1.1)

More information

Lily 2.4 What s New Product Release Notes

Lily 2.4 What s New Product Release Notes Lily 2.4 What s New Product Release Notes WHAT S NEW IN LILY 2.4 2 Table of Contents Table of Contents... 2 Purpose and Overview of this Document... 3 Product Overview... 4 General... 5 Prerequisites...

More information

<Insert Picture Here> Introduction to Big Data Technology

<Insert Picture Here> Introduction to Big Data Technology Introduction to Big Data Technology The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

More information

Hadoop Online Training

Hadoop Online Training Hadoop Online Training IQ training facility offers Hadoop Online Training. Our Hadoop trainers come with vast work experience and teaching skills. Our Hadoop training online is regarded as the one of the

More information

Oracle Big Data Appliance

Oracle Big Data Appliance Oracle Big Data Appliance Software User's Guide Release 4 (4.4) E65665-12 July 2016 Describes the Oracle Big Data Appliance software available to administrators and software developers. Oracle Big Data

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Getting Started with ORE - 1

Getting Started with ORE - 1 Hello, and welcome to this online, self-paced lesson entitled Getting Started with ORE. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I will be

More information

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,

More information

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps:// IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://www.certqueen.com Exam : 1Z1-449 Title : Oracle Big Data 2017 Implementation Essentials Version : DEMO 1 / 4 1.You need to place

More information

Oracle Data Integrator 12c New Features

Oracle Data Integrator 12c New Features Oracle Data Integrator 12c New Features Joachim Jaensch Principal Sales Consultant Copyright 2014 Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The following is intended to outline

More information

Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration WHITE PAPER / JANUARY 25, 2019 Table of Contents Introduction... 3 Harnessing the power of big data beyond the SQL world...

More information

Data Analytics Job Guarantee Program

Data Analytics Job Guarantee Program Data Analytics Job Guarantee Program 1. INSTALLATION OF VMWARE 2. MYSQL DATABASE 3. CORE JAVA 1.1 Types of Variable 1.2 Types of Datatype 1.3 Types of Modifiers 1.4 Types of constructors 1.5 Introduction

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The

More information

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera, How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS

More information

Databases 2 (VU) ( / )

Databases 2 (VU) ( / ) Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Exadata Database Machine Administration Workshop

Exadata Database Machine Administration Workshop Exadata Database Machine Administration Workshop Duration : 32 Hours This course introduces you to the Oracle Exadata Database Machine. You'll learn about the various Exadata Database Machine features

More information

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018 Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/

More information

Oracle Data Integration and OWB: New for 11gR2

Oracle Data Integration and OWB: New for 11gR2 Oracle Data Integration and OWB: New for 11gR2 C. Antonio Romero, Oracle Corporation, Redwood Shores, US Keywords: data integration, etl, real-time, data warehousing, Oracle Warehouse Builder, Oracle Data

More information

Apache Hive for Oracle DBAs. Luís Marques

Apache Hive for Oracle DBAs. Luís Marques Apache Hive for Oracle DBAs Luís Marques About me Oracle ACE Alumnus Long time open source supporter Founder of Redglue (www.redglue.eu) works for @redgluept as Lead Data Architect @drune After this talk,

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

Oracle R Technologies

Oracle R Technologies Oracle R Technologies R for the Enterprise Mark Hornick, Director, Oracle Advanced Analytics @MarkHornick mark.hornick@oracle.com Safe Harbor Statement The following is intended to outline our general

More information

Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016

Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016 Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016 Hans Viehmann Product Manager EMEA ORACLE Corporation 12. Mai 2016 Safe Harbor Statement The following

More information

Data Lake Based Systems that Work

Data Lake Based Systems that Work Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a

More information

Securing the Oracle BDA - 1

Securing the Oracle BDA - 1 Hello and welcome to this online, self-paced course titled Administering and Managing the Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled Securing the Oracle

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

Speech 2 Part 2 Transcript: The role of DB2 in Web 2.0 and in the IOD World

Speech 2 Part 2 Transcript: The role of DB2 in Web 2.0 and in the IOD World Speech 2 Part 2 Transcript: The role of DB2 in Web 2.0 and in the IOD World Slide 1: Cover Welcome to the speech, The role of DB2 in Web 2.0 and in the Information on Demand World. This is the second speech

More information

Shark: Hive (SQL) on Spark

Shark: Hive (SQL) on Spark Shark: Hive (SQL) on Spark Reynold Xin UC Berkeley AMP Camp Aug 21, 2012 UC BERKELEY SELECT page_name, SUM(page_views) views FROM wikistats GROUP BY page_name ORDER BY views DESC LIMIT 10; Stage 0: Map-Shuffle-Reduce

More information

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance

More information

VMware vsphere Big Data Extensions Administrator's and User's Guide

VMware vsphere Big Data Extensions Administrator's and User's Guide VMware vsphere Big Data Extensions Administrator's and User's Guide vsphere Big Data Extensions 1.1 This document supports the version of each product listed and supports all subsequent versions until

More information

Talend Open Studio for Data Quality. User Guide 5.5.2

Talend Open Studio for Data Quality. User Guide 5.5.2 Talend Open Studio for Data Quality User Guide 5.5.2 Talend Open Studio for Data Quality Adapted for v5.5. Supersedes previous releases. Publication date: January 29, 2015 Copyleft This documentation is

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

Oracle Big Data Appliance

Oracle Big Data Appliance Oracle Big Data Appliance Software User's Guide Release 4 (4.6) E77518-02 November 2016 Describes the Oracle Big Data Appliance software available to administrators and software developers. Oracle Big

More information

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Blended Learning Outline: Cloudera Data Analyst Training (171219a) Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills

More information

Course Description. Audience. Prerequisites. At Course Completion. : Course 40074A : Microsoft SQL Server 2014 for Oracle DBAs

Course Description. Audience. Prerequisites. At Course Completion. : Course 40074A : Microsoft SQL Server 2014 for Oracle DBAs Module Title Duration : Course 40074A : Microsoft SQL Server 2014 for Oracle DBAs : 4 days Course Description This four-day instructor-led course provides students with the knowledge and skills to capitalize

More information

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype? Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/

More information

Microsoft Big Data and Hadoop

Microsoft Big Data and Hadoop Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common

More information

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours) Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:

More information

Data Storage Infrastructure at Facebook

Data Storage Infrastructure at Facebook Data Storage Infrastructure at Facebook Spring 2018 Cleveland State University CIS 601 Presentation Yi Dong Instructor: Dr. Chung Outline Strategy of data storage, processing, and log collection Data flow

More information

Hadoop. copyright 2011 Trainologic LTD

Hadoop. copyright 2011 Trainologic LTD Hadoop Hadoop is a framework for processing large amounts of data in a distributed manner. It can scale up to thousands of machines. It provides high-availability. Provides map-reduce functionality. Hides

More information

Hyperion Interactive Reporting Reports & Dashboards Essentials

Hyperion Interactive Reporting Reports & Dashboards Essentials Oracle University Contact Us: +27 (0)11 319-4111 Hyperion Interactive Reporting 11.1.1 Reports & Dashboards Essentials Duration: 5 Days What you will learn The first part of this course focuses on two

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lecture 24: MapReduce CSE 344 - Winter 215 1 HW8 MapReduce (Hadoop) w/ declarative language (Pig) Due next Thursday evening Will send out reimbursement codes later

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

DATA SCIENCE USING SPARK: AN INTRODUCTION

DATA SCIENCE USING SPARK: AN INTRODUCTION DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data

More information

Going beyond MapReduce

Going beyond MapReduce Going beyond MapReduce MapReduce provides a simple abstraction to write distributed programs running on large-scale systems on large amounts of data MapReduce is not suitable for everyone MapReduce abstraction

More information

Exadata Database Machine Administration Workshop NEW

Exadata Database Machine Administration Workshop NEW Exadata Database Machine Administration Workshop NEW What you will learn: This course introduces students to Oracle Exadata Database Machine. Students learn about the various Exadata Database Machine features

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training:: Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional

More information

Big Data Hadoop Course Content

Big Data Hadoop Course Content Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux

More information

Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018

Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018 Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018 K. Zhang (pic source: mapr.com/blog) Copyright BUDT 2016 758 Where

More information

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Version 4.11 Last Updated: 1/10/2018 Please note: This appliance is for testing and educational purposes only;

More information

HADOOP FRAMEWORK FOR BIG DATA

HADOOP FRAMEWORK FOR BIG DATA HADOOP FRAMEWORK FOR BIG DATA Mr K. Srinivas Babu 1,Dr K. Rameshwaraiah 2 1 Research Scholar S V University, Tirupathi 2 Professor and Head NNRESGI, Hyderabad Abstract - Data has to be stored for further

More information

Oracle Big Data SQL brings SQL and Performance to Hadoop

Oracle Big Data SQL brings SQL and Performance to Hadoop Oracle Big Data SQL brings SQL and Performance to Hadoop Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data SQL, Hadoop, Big Data Appliance, SQL, Oracle, Performance, Smart Scan Introduction

More information

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale

More information

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela

More information

April Copyright 2013 Cloudera Inc. All rights reserved.

April Copyright 2013 Cloudera Inc. All rights reserved. Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on

More information

Drawing the Big Picture

Drawing the Big Picture Drawing the Big Picture Multi-Platform Data Architectures, Queries, and Analytics Philip Russom TDWI Research Director for Data Management August 26, 2015 Sponsor 2 Speakers Philip Russom TDWI Research

More information