Real-time Data Engineering in the Cloud Exercise Guide

Size: px
Start display at page:

Download "Real-time Data Engineering in the Cloud Exercise Guide"

Transcription

1 Real-time Data Engineering in the Cloud Exercise Guide Jesse Anderson 2017 SMOKING HAND LLC ALL RIGHTS RESERVED Version 1.12.a

2 Contents 1 Lab Notes 3 2 Kafka HelloWorld 6 3 Streaming ETL 8 4 Advanced Streaming 10 5 Spark Data Analysis 13 6 Real-time Dashboard 16 CONTENTS 2

3 EXERCISE 1 Lab Notes These notes will help you work through and understand the labs for this course. 1.1 General Notes Copying and pasting from this document may not work correctly in all PDF readers. We suggest you use Adobe Reader. 1.2 Command Line Examples Most labs contain commands that must be run from the command line. These commands will look like: $ cat /etc/hosts localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 2001:4800:7810:0512:e2aa:bc1f:ff04:badc cdh5-cm-vm cdh5-cm-vm cdh5-cm-vm01 When running this command you will type in everything. You will only type in the portion after the $ prompt. In this example, you would only type in cat /etc/hosts. The rest of the command contains the output of the command. Sometimes the commands will contain multiple commands: $ chkconfig --list iptables iptables 0:off 1:off 2:on 3:on 4:on 5:on 6:off $ service iptables stop CONTENTS 3

4 iptables: Flushing firewall rules: [ OK ] iptables: Setting chains to policy ACCEPT: filter [ OK ] iptables: Unloading modules: [ OK ] There are two different commands to run from this section. First, you find every $ prompt to run all of the commands. In this example, the two commands are chkconfig --list iptables and service iptables stop. Other times commands will be on multiple lines: $ hadoop fs -put \ movies.dat /user/root/movielens/movies/ This command is too long to fit on one line in the lab manual and needs to be on two lines. In this example, you would type in hadoop fs -put \, then hit <enter>, and finish off the command with movies.dat /user/root/movielens/movies/. 1.3 VirtualBox Notes If your class is using a VirtualBox virtual machine, you can make certain changes to make it run faster or share the host s file system. If you have enough RAM, you can allocate more RAM to the virtual machine. By default, the VM uses 1 GB of RAM. Adding 2 or more GB will make the virtual machine perform faster. Virtual box can share a folder to the guest VM. Once the VM is shared, you can mount the directory with the following command: $ sudo mount -t vboxsf -o rw,uid=1001,gid=1001 \ shareddirectory ~/guestvmdirectory To always mount the directory in the guest, place this line in /etc/fstab shareddirectory /home/vmuser/guestvmdirectory vboxsf rw,uid=1000,gid= Then run the command: $ sudo mount /home/vmuser/guestvmdirectory/ CONTENTS 4

5 VirtualBox has other advanced integrations such as a shared clipboard. This allows you to copy and paste information between the host and guest operating systems clipboards. See this documentation for more information. 1.4 Maven Offline Mode Maven is configured to be in offline mode. All dependencies for the class have already been loaded. If you add a new dependency, you may see a message like: Failed to retrieve org.slf4j:slf4j-api Caused by: Cannot access confluent-repository ( in offline mode and the artifact org.slf4j:slf4j-api:jar: has not been downloaded from it before. To take Maven out of offline mode, run the maven_online.sh script that is on the path. Once you re done, you can put Maven back into offline mode by running the maven_offline.sh script that is on the path. You can learn more about Maven offline mode here. CONTENTS 5

6 EXERCISE 2 Kafka HelloWorld 2.1 Objective This 45 minute lab uses Kafka to ingest data. We will: Create a producer to import data Create a consumer to read the data Project Directory: helloworld 2.2 Starting Kafka Kafka is installed on your virtual machine, but the server processes aren t started to keep memory usage low. 1. Start the ZooKeeper service. $ sudo service zookeeper start 2. Start the Kafka Broker (Kafka Server) service. sudo service kafka-server start 3. Optionally, start the Kafka REST service. Start this service if you are going to use the REST interface for Kafka. sudo service kafka-rest start 4. Optionally, start the Schema Registry service. Start this service if you are going to use Avro for messages. sudo service schema-registry start CONTENTS 6

7 Shutdown Services Once you are done with Kafka, you will need to shut down the services to regain memory. $ sudo service schema-registry stop $ sudo service kafka-rest stop $ sudo service kafka-server stop $ sudo service zookeeper stop 2.3 Kafka HelloWorld Create a KafkaProducer with the following characteristics: Reads and sends the playing_cards_datetime.tsv dataset Connects to localhost:9092 Sends messages on the hello_topic Sends all messages as Strings Create a Consumer Group with the following characteristics: Consumes messages sent on the hello_topic topic Connects to ZooKeeper on localhost Consumes all data as Strings Outputs the contents of the messages to the screen When running, start your consumer first and then start the producer. 2.4 Advanced Optional Steps Add command line producer/consumer Use the REST API with a scripting language to send out the playing_cards_datetime.tsv dataset Use Avro with Kafka to send binary objects between the producer and consumer CONTENTS 7

8 EXERCISE 3 Streaming ETL 3.1 Objective This 60 minute lab uses Spark Streaming to ETL data. We will: Create an RDD from a socket ETL the data Do a simple real-time count on the data Project Directory: sparkstreamingetl 3.2 Cards Dataset For your Spark Streaming program, you will be working with the playing card dataset. The file is on the local filesystem at: /home/vmuser/training/datasets/playing_cards.tsv The data in the playing_cards.tsv file is made up of a card number, a tab separator and a card suit: 6 Diamond 3 Diamond 4 Club For this exercise, we won t be reading the file directly. We ll be using a pre-written Python script that writes out the file to a socket. 3.3 Streaming Program Create a Spark Streaming program with the follow characteristics: CONTENTS 8

9 Sets the master to local[2] or more threads Microbatches for 10 seconds Binds to localhost and port 9998 ETL s the incoming data into a Tuple2 of the suit and the card number Sums the cards by the suit Saves the sums to a realtimeoutput directory Prints out the first 10 elements 3.4 Starting the Socket Input Before starting to test your program, you will need to start program that provides the data. You can start it with: $./streamfile.py ~/training/datasets/playing_cards.tsv Once the program is started, run your Spark program. Log4J Output Levels log4j.properties is set to WARN. Change to INFO for more output and debugging. 3.5 Advanced Optional Steps 1. Save the ETL d RDD out to disk CONTENTS 9

10 EXERCISE 4 Advanced Streaming 4.1 Objective This 60 minute lab uses Spark to process data in Kafka. We will: Consume data from Kafka ETL the incoming data Count the cards per game ID Project Directory: sparkstreamingadvanced 4.2 Starting Services To save memory, the services needed by Kafka are not started. 1. You will need to start the ZooKeeper service. $ sudo service zookeeper start 2. After letting the ZooKeeper service start, you will need to start the Kafka service. $ sudo service kafka-server start If your programs report an error connecting to Kafka, you can check the status of them with: $ sudo service zookeeper status or: $ sudo service kafka-server status CONTENTS 10

11 If the processes crash consistently, your laptop may not have enough memory to run the various processes. You can view Kafka s log by running: $ tail /var/log/kafka/kafka-server.out 4.3 Dataset This exercise will use with a more complex playing card dataset. The file is on the local filesystem at: /home/vmuser/training/datasets/playing_cards_datetime.tsv The data in the playing_cards_datetime.tsv file is made up of a timestamp, a GUID to identify a game, the type of game, the suit, and the card. Each piece of data is tab separated. The cards are no longer solely numeric and include Jacks, Queens and Kings. Here is a an example of the data: :00:00 1ea7fc17-7cf0-486d-8b8b-ad905e0d7a7a PaiGow Club Queen :00:00 1ea7fc17-7cf0-486d-8b8b-ad905e0d7a7a PaiGow Club :00:00 1ea7fc17-7cf0-486d-8b8b-ad905e0d7a7a PaiGow Heart 7 This dataset will not be read from the local filesystem. It will be read from a Kafka topic. The Kafka topic is cardsdatetime. The each message will be an individual line from the file. The key will be playing_cards_datetime and the value will be the line. 4.4 Starting the Producer Start the CardProducer class in the common package. That is the program that will read the file and produce it into Kafka. 4.5 Reading from Kafka Create a Spark Streaming program with the follow characteristics: Uses Spark Streaming with Kafka with a batch of 2 seconds CONTENTS 11

12 Creates a Kafka consumer on the cardsdatetime topic. ETLs the data by sending the GUID or game id as the key and the number as the value If the number is non-numeric, don t processes that event Sums the card numbers for a game Prints out the first 10 elements 4.6 Advanced Optional Steps Spark Streaming lacks a built-in way of producing into Kafka. Use the foreachrdd and foreachpartition methods to manually produce the data in an RDD to Kafka. Produce both the ETL d RDD and the counts RDD to Kafka. Produce the ETL RDD to the cardsetl topic and the counts RDD to the cardscounts topic. You use the built-in Kafka command line utilities to view the output. To view the ETL: $ kafka-console-consumer --bootstrap-server localhost: new-consumer \ --property print.key=true --topic cardsetl To view the counts: $ kafka-console-consumer --bootstrap-server localhost: new-consumer \ --property print.key=true --topic cardscounts CONTENTS 12

13 EXERCISE 5 Spark Data Analysis 5.1 Objective This 60 minute lab uses Spark, Spark SQL, or Apache Hive to analyze data. We will: Move the data from Kafka to the file system Prepare the data to be queried Query the data using our analytics tool of choice Project Directory: sparkanalysis Memory Limits This exercise will push the memory limits of the VM. We highly suggest you increase the VM s memory limit. If you still don t have enough memory, you may need to use a cloud resource with more memory. 5.2 Cards Dataset This exercise will use with a more complex playing card dataset. The file is on the local filesystem at: /home/vmuser/training/datasets/playing_cards_datetime.tsv The data in the playing_cards_datetime.tsv file is made up of a timestamp, a GUID to identify a game, the type of game, the suit, and the card. Each piece of data is tab separated. The cards are no longer solely numeric and include Jacks, Queens and Kings. Here is a an example of the data: CONTENTS 13

14 :00:00 1ea7fc17-7cf0-486d-8b8b-ad905e0d7a7a PaiGow Club Queen :00:00 1ea7fc17-7cf0-486d-8b8b-ad905e0d7a7a PaiGow Club :00:00 1ea7fc17-7cf0-486d-8b8b-ad905e0d7a7a PaiGow Heart 7 This dataset will be in Kafka in the cardsdatetime topic. If you did the advanced level for streaming, you will an ETL d topic named cardsetl. 5.3 Moving Data From Kafka You will need to move your data from the Kafka topic and place it into your local file system. To do this, you can use Kafka Connect. Kafka Connect allows you to move data from a Kafka topic into another system. This course doesn t focus on Kafka Connect. You can learn more about it at the Kafka Connect Documentation. 1. Change directories to the sparkanalysis directory. 2. Run $ connect-standalone /etc/kafka/connect-standalone.properties \ file-sink.properties 3. Let the connect-standalone process run for a few minutes. 4. Press Ctrl+C to stop the process. 5. Verify there is a file named cardsdatetime.txt and check that its contents look like the example data above. 5.4 Choosing an Analytics Framework Now that you ve moved the data to the file system, you ll need to choose a technology for querying the data. You have access to technologies like Apache Spark, Hadoop MapReduce, Spark SQL, Apache Hive, and Apache Impala on the VM to perform these analytics. Choose a framework that you are familiar with. CONTENTS 14

15 5.5 Analyzing the Data Once you ve chosen your analytics framework, you can start querying it. When querying and analyzing data, you re look for interesting patterns or information that will make a dashboard useful. As you re writing these queries: How will this data will be consumed by others? What will people need to know every day? Is there anything anomalous in the data? (hint: there is) As you find interesting queries or realizations, make notes about what you ve found. We re going to be using these ideas in the next exercise while creating the dashboard. Note: You may need to turn off some services you aren t using to do these analysis. CONTENTS 15

16 EXERCISE 6 Real-time Dashboard 6.1 Objective This 120 minute lab uses Spark Streaming, Kafka, and D3.js to create a real-time dashboard. We will: Create real-time analytics Consume the analytics Display the analytics on a web page with a chart Project Directory: realtimedashboard Memory Limits This exercise will push the memory limits of the VM. We highly suggest you increase the VM s memory limit. If you still don t have enough memory, you may need to use a cloud resource with more memory. 6.2 Cards Dataset This exercise will use with a more complex playing card dataset. The file is on the local filesystem at: /home/vmuser/training/datasets/playing_cards_datetime.tsv The data in the playing_cards_datetime.tsv file is made up of a timestamp, a GUID to identify a game, the type of game, the suit, and the card. Each piece of data is tab separated. The cards are no longer solely numeric and include Jacks, Queens and Kings. Here is a an example of the data: CONTENTS 16

17 :00:00 1ea7fc17-7cf0-486d-8b8b-ad905e0d7a7a PaiGow Club Queen :00:00 1ea7fc17-7cf0-486d-8b8b-ad905e0d7a7a PaiGow Club :00:00 1ea7fc17-7cf0-486d-8b8b-ad905e0d7a7a PaiGow Heart 7 This dataset will not be read from the local filesystem. It will be read from a Kafka topic. The Kafka topic is cardsdatetime. The each message will be an individual line from the file. The key will be playing_cards_datetime and the value will be the line. 6.3 Writing a Real-time Analysis Write your analytics using the framework of your choice. These analytics should be real-time representation of the ad-hoc analysis you did in the previous exercise. Publish the results of your analytics back into Kafka. For ease of ETL and moving data between RDDs, the common package has a Card class that can represent the data coming in. If you are using Spark, use the RDDProducer.produceValues helper method in the Common package to produce an RDD to Kafka. The parameter type for the RDD should be JavaPairDStream<String, String>. When converting the analytics to a string, we suggest you output as JSON. This will make it easier for the web page s AJAX calls and chart rendering. The output of the JSON string will vary depending on the analytics, but should look something like: [{"gametype":"paigow","count":3, "sum":10}] 6.4 Starting the CardProducer When you are running the analytics and dashboard code, make sure that you have the CardProducer running to add new data to Kafka. The CardProducer class is located in the Common package of the sparkstreamingadvanced project directory. CONTENTS 17

18 6.5 Running the Spark Analysis and CardProducer To keep resource usage down, you can run the CardProducer from the command line. You can run with Maven with: $ mvn exec:java -Dexec.mainClass="path.to.MainClass" You can pass in arguments to the program with: $ mvn exec:java -Dexec.mainClass="path.to.MainClass" -Dexec.args="myargs" 6.6 Writing the Dashboard The dashboard will be written using HTML and JavaScript. Depending on your familiarity with these technologies, you may or may not write this yourself Unfamiliar with HTML and JavaScript If you aren t familiar with HTML and JavaScript, you may just write the Spark side of things and use the solution s code to visualize the data. Please note, that the output of your JSON will need to match the solution s exactly Familiar with HTML and JavaScript If you are familiar with both, we have written some helper functions to make it easier to interact with Kafka s REST interface. Start off by importing the helper JavaScript module: <script src="kafkaresthelper.js"></script> In your code, you will need to instantiate the helper. After that, you can call the createconsumerinstance method and pass in the correct information. The last parameter is a number corresponding to your time interval. This interval will serve as the amount time between calls of the callback function. CONTENTS 18

19 var kafkaresthelper = new KafkaRESTHelper(); kafkaresthelper.createconsumerinstance("mygroupname", "mytopicname", mycallbackfunction, 10000) The callback function has a parameter for the data that was retrieve from Kafka over the REST interface. The data object will be an array containing all of the events in the time between the last callback and the current time. function bygametype(data) { // Do something with the data } As shown in the Spark section, this code is expecting data to be passed as JSON. All data is automatically coalesced and base 64 decoded for you. The JSON written out by the Spark analysis program should look like: [{"gametype":"paigow","count":3, "sum":10}] 6.7 Running the Dashboard When running the dashboard, you will need several service running. 1. Start the Kafka REST service. $ sudo service kafka-rest start 2. Start the web server. This should be started from the root of the realtimedashboard directory. This web server serves up the files, and more importantly, is a proxy for the Kafka REST service. To learn more about why a proxy is needed, read this article on CORS. $ ws --rewrite '/kafkarest/* -> 3. Finally, start your browser and go to CONTENTS 19

20 Unexpected value NaN Message If you see you see this message in the console: Unexpected value NaN parsing x attribute. You can usually ignore it. This happens when a count is Deploying to the Cloud Once you have tested everything locally, you will need to deploy to the Cloud. Before you do this, take the following steps: 1. You need to make sure that two people aren t using the same topic name. Please do the following things: Prefix all topics with your name. Make prefix topic names a parameter that is passed in, instead of hard coded. This include the CardProducer program. 2. Change the broker DNS name to be a parameter that is passed in in all programs. 3. Use SCP to transfer your code (but not your binaries in the target directory!). 4. Build your code using Maven. 5. Start the programs with the correct topics names and broker DNS name. 6. Start your browser and go to the instance s DNS name and port. 7. Optionally, increase the volume of data for the CardProducer program to get more data going through the system. Do this by: Changing the Thread.sleep(500); to be a parameter. Decreasing the sleep amount something in the 50 to 100 ms range. CONTENTS 20

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance

More information

Confluent Developer Training for Apache Kafka Exercise Manual B7/801/A

Confluent Developer Training for Apache Kafka Exercise Manual B7/801/A Confluent Developer Training for Apache Kafka Exercise Manual B7/801/A Table of Contents Introduction................................................................ 1 Hands-On Exercise: Using Kafka s

More information

Oracle SOA Suite VirtualBox Appliance. Introduction and Readme

Oracle SOA Suite VirtualBox Appliance. Introduction and Readme Oracle SOA Suite 12.2.1.3.0 VirtualBox Introduction and Readme December 2017 Table of Contents 1 VirtualBox... 3 1.1 Installed Software... 3 1.2 Settings... 4 1.3 User IDs... 4 1.4 Domain Configurations...

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

Developer Training for Apache Spark and Hadoop: Hands-On Exercises

Developer Training for Apache Spark and Hadoop: Hands-On Exercises 201709c Developer Training for Apache Spark and Hadoop: Hands-On Exercises Table of Contents General Notes... 1 Hands-On Exercise: Starting the Exercise Environment (Local VM)... 5 Hands-On Exercise: Starting

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

Talend Big Data Sandbox. Big Data Insights Cookbook

Talend Big Data Sandbox. Big Data Insights Cookbook Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is

More information

CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench

CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench Abstract Implementing a Hadoop-based system for processing big data and doing analytics is a topic which has been

More information

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Blended Learning Outline: Cloudera Data Analyst Training (171219a) Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills

More information

Oracle Big Data Fundamentals Ed 2

Oracle Big Data Fundamentals Ed 2 Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies

More information

Talend Big Data Sandbox. Big Data Insights Cookbook

Talend Big Data Sandbox. Big Data Insights Cookbook Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is

More information

Hadoop. Introduction / Overview

Hadoop. Introduction / Overview Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures

More information

to arrive at the system information display. In MacOS X use the menus

to arrive at the system information display. In MacOS X use the menus The Math/CS 466/666 Linux Image in VirtualBox This document explains how to install the Math/CS 466/666 Linux image onto VirtualBox to obtain a programming environment on your personal computer or laptop

More information

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Version 4.11 Last Updated: 1/10/2018 Please note: This appliance is for testing and educational purposes only;

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

Apache Spark and Scala Certification Training

Apache Spark and Scala Certification Training About Intellipaat Intellipaat is a fast-growing professional training provider that is offering training in over 150 most sought-after tools and technologies. We have a learner base of 600,000 in over

More information

iway iway Big Data Integrator New Features Bulletin and Release Notes Version DN

iway iway Big Data Integrator New Features Bulletin and Release Notes Version DN iway iway Big Data Integrator New Features Bulletin and Release Notes Version 1.5.0 DN3502232.1216 Active Technologies, EDA, EDA/SQL, FIDEL, FOCUS, Information Builders, the Information Builders logo,

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours) Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:

More information

DATA SCIENCE USING SPARK: AN INTRODUCTION

DATA SCIENCE USING SPARK: AN INTRODUCTION DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data

More information

Deploying to the Edge CouchDB

Deploying to the Edge CouchDB Deploying to the Edge CouchDB Apache Relax Who s Talking? J Chris Anderson / jchris@apache.org / @jchris PHP -> Rails -> JSON -> CouchDB Director, couch.io And You? Web developers? JavaScript coders? CouchDB

More information

HOMEWORK 9. M. Neumann. Due: THU 8 NOV PM. Getting Started SUBMISSION INSTRUCTIONS

HOMEWORK 9. M. Neumann. Due: THU 8 NOV PM. Getting Started SUBMISSION INSTRUCTIONS CSE427S HOMEWORK 9 M. Neumann Due: THU 8 NOV 2018 4PM Getting Started Update your SVN repository. When needed, you will find additional materials for homework x in the folder hwx. So, for the current assignment

More information

SCCM 1802 Install Guide using Baseline Media

SCCM 1802 Install Guide using Baseline Media SCCM 1802 Install Guide using Baseline Media By Prajwal Desai This document is a Step-by-Step SCCM 1802 Install guide using Baseline Media. I was waiting for SCCM 1802 baseline media to be released so

More information

Data Lake Based Systems that Work

Data Lake Based Systems that Work Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a

More information

Operating Systems Lab 1. Class topic: Installation of the operating system. Install Ubuntu on Oracle VirtualBox

Operating Systems Lab 1. Class topic: Installation of the operating system. Install Ubuntu on Oracle VirtualBox Operating Systems Lab 1 Class topic: Installation of the operating system. Install Ubuntu on Oracle VirtualBox Oracle VirtualBox is a cross-platform virtualization application. It installs on your existing

More information

Live Data Connection to SAP Universes

Live Data Connection to SAP Universes Live Data Connection to SAP Universes You can create a Live Data Connection to SAP Universe using the SAP BusinessObjects Enterprise (BOE) Live Data Connector component deployed on your application server.

More information

HOMEWORK 8. M. Neumann. Due: THU 29 MAR PM. Getting Started SUBMISSION INSTRUCTIONS

HOMEWORK 8. M. Neumann. Due: THU 29 MAR PM. Getting Started SUBMISSION INSTRUCTIONS CSE427S HOMEWORK 8 M. Neumann Due: THU 29 MAR 2018 1PM Getting Started Update your SVN repository. When needed, you will find additional materials for homework x in the folder hwx. So, for the current

More information

Labtainer Student Guide

Labtainer Student Guide Labtainer Student Guide January 18, 2018 1 Introduction This manual is intended for use by students performing labs with Labtainers. Labtainers assume you have a Linux system, e.g., a virtual machine.

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

Sandbox Setup Guide for HDP 2.2 and VMware

Sandbox Setup Guide for HDP 2.2 and VMware Waterline Data Inventory Sandbox Setup Guide for HDP 2.2 and VMware Product Version 2.0 Document Version 10.15.2015 2014-2015 Waterline Data, Inc. All rights reserved. All other trademarks are the property

More information

Summary. approximately too ). Download and. that appear in. the program. Browse to and

Summary. approximately too ). Download and. that appear in. the program. Browse to and BlackPearl Virtual Machine Simulator Installation Instructionss Summary The Spectra Logic BlackPearl simulator is contained within a Virtual Machine ( VM) image. This allows us to simulate the underlying

More information

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Hands-on Exercise Hadoop

Hands-on Exercise Hadoop Department of Economics and Business Administration Chair of Business Information Systems I Prof. Dr. Barbara Dinter Big Data Management Hands-on Exercise Hadoop Building and Testing a Hadoop Cluster by

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

Quick Deployment Step- by- step instructions to deploy Oracle Big Data Lite Virtual Machine

Quick Deployment Step- by- step instructions to deploy Oracle Big Data Lite Virtual Machine Quick Deployment Step- by- step instructions to deploy Oracle Big Data Lite Virtual Machine Version 4.1.0 Please note: This appliance is for testing and educational purposes only; it is unsupported and

More information

Installing an HDF cluster

Installing an HDF cluster 3 Installing an HDF cluster Date of Publish: 2018-08-13 http://docs.hortonworks.com Contents Installing Ambari...3 Installing Databases...3 Installing MySQL... 3 Configuring SAM and Schema Registry Metadata

More information

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale

More information

Greenplum-Spark Connector Examples Documentation. kong-yew,chan

Greenplum-Spark Connector Examples Documentation. kong-yew,chan Greenplum-Spark Connector Examples Documentation kong-yew,chan Dec 10, 2018 Contents 1 Overview 1 1.1 Pivotal Greenplum............................................ 1 1.2 Pivotal Greenplum-Spark Connector...................................

More information

iway Big Data Integrator New Features Bulletin and Release Notes

iway Big Data Integrator New Features Bulletin and Release Notes iway Big Data Integrator New Features Bulletin and Release Notes Version 1.5.2 DN3502232.0717 Active Technologies, EDA, EDA/SQL, FIDEL, FOCUS, Information Builders, the Information Builders logo, iway,

More information

Developer Training for Apache Spark and Hadoop: Hands-On Exercises

Developer Training for Apache Spark and Hadoop: Hands-On Exercises 201611 Developer Training for Apache Spark and Hadoop: Hands-On Exercises General Notes... 3 Hands-On Exercise: Query Hadoop Data with Apache Impala... 6 Hands-On Exercise: Access HDFS with the Command

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Getting started with System Center Essentials 2007

Getting started with System Center Essentials 2007 At a glance: Installing and upgrading Configuring Essentials 2007 Troubleshooting steps Getting started with System Center Essentials 2007 David Mills System Center Essentials 2007 is a new IT management

More information

Real Life Web Development. Joseph Paul Cohen

Real Life Web Development. Joseph Paul Cohen Real Life Web Development Joseph Paul Cohen joecohen@cs.umb.edu Index 201 - The code 404 - How to run it? 500 - Your code is broken? 200 - Someone broke into your server? 400 - How are people using your

More information

Cloud Computing II. Exercises

Cloud Computing II. Exercises Cloud Computing II Exercises Exercise 1 Creating a Private Cloud Overview In this exercise, you will install and configure a private cloud using OpenStack. This will be accomplished using a singlenode

More information

Installing HDF Services on an Existing HDP Cluster

Installing HDF Services on an Existing HDP Cluster 3 Installing HDF Services on an Existing HDP Cluster Date of Publish: 2018-08-13 http://docs.hortonworks.com Contents Upgrade Ambari and HDP...3 Installing Databases...3 Installing MySQL... 3 Configuring

More information

Configure Windows Server 2003 Release 2 Server Network File Share (NFS) as an authenticated storage repository for XenServer

Configure Windows Server 2003 Release 2 Server Network File Share (NFS) as an authenticated storage repository for XenServer Summary This document outlines the process to perform the following tasks. 1. Configure Windows Server 2003 Release 2 Server Network File Share (NFS) as an authenticated storage repository for XenServer.

More information

Installation and Configuration Guide for Windows and Linux

Installation and Configuration Guide for Windows and Linux Installation and Configuration Guide for Windows and Linux vcenter Operations Manager 5.8.1 This document supports the version of each product listed and supports all subsequent versions until the document

More information

Linux Home Lab Environment

Linux Home Lab Environment Environment Introduction Welcome! The best way to learn for most IT people is to actually do it, so that s the goal of this selfpaced lab workbook. The skills outlined here will begin to prepare you for

More information

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions Big Data Hadoop Architect Online Training (Big Data Hadoop + Apache Spark & Scala+ MongoDB Developer And Administrator + Apache Cassandra + Impala Training + Apache Kafka + Apache Storm) 1 Big Data Hadoop

More information

Unifying Big Data Workloads in Apache Spark

Unifying Big Data Workloads in Apache Spark Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache

More information

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers A Distributed System Case Study: Apache Kafka High throughput messaging for diverse consumers As always, this is not a tutorial Some of the concepts may no longer be part of the current system or implemented

More information

QLIKVIEW ARCHITECTURAL OVERVIEW

QLIKVIEW ARCHITECTURAL OVERVIEW QLIKVIEW ARCHITECTURAL OVERVIEW A QlikView Technology White Paper Published: October, 2010 qlikview.com Table of Contents Making Sense of the QlikView Platform 3 Most BI Software Is Built on Old Technology

More information

Polarion Enterprise Setup 17.2

Polarion Enterprise Setup 17.2 SIEMENS Polarion Enterprise Setup 17.2 POL005 17.2 Contents Terminology......................................................... 1-1 Overview...........................................................

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Workflow Management (August 31, 2017) docs.hortonworks.com Hortonworks Data Platform: Workflow Management Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks

More information

Tuning Enterprise Information Catalog Performance

Tuning Enterprise Information Catalog Performance Tuning Enterprise Information Catalog Performance Copyright Informatica LLC 2015, 2018. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States

More information

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training:: Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional

More information

Hadoop Development Introduction

Hadoop Development Introduction Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand

More information

Processing of big data with Apache Spark

Processing of big data with Apache Spark Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Scalable Tools - Part I Introduction to Scalable Tools

Scalable Tools - Part I Introduction to Scalable Tools Scalable Tools - Part I Introduction to Scalable Tools Adisak Sukul, Ph.D., Lecturer, Department of Computer Science, adisak@iastate.edu http://web.cs.iastate.edu/~adisak/mbds2018/ Scalable Tools session

More information

Kafka Connect FileSystem Connector Documentation

Kafka Connect FileSystem Connector Documentation Kafka Connect FileSystem Connector Documentation Release 0.1 Mario Molina Dec 25, 2017 Contents 1 Contents 3 1.1 Connector................................................ 3 1.2 Configuration Options..........................................

More information

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases

More information

Deployment Guide. 3.1 For Windows For Linux Docker image Windows Installation Installation...

Deployment Guide. 3.1 For Windows For Linux Docker image Windows Installation Installation... TABLE OF CONTENTS 1 About Guide...1 2 System Requirements...2 3 Package...3 3.1 For Windows... 3 3.2 For Linux... 3 3.3 Docker image... 4 4 Windows Installation...5 4.1 Installation... 5 4.1.1 Install

More information

Cloudera Manager Quick Start Guide

Cloudera Manager Quick Start Guide Cloudera Manager Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this

More information

HOMEWORK 8. M. Neumann. Due: THU 1 NOV PM. Getting Started SUBMISSION INSTRUCTIONS

HOMEWORK 8. M. Neumann. Due: THU 1 NOV PM. Getting Started SUBMISSION INSTRUCTIONS CSE427S HOMEWORK 8 M. Neumann Due: THU 1 NOV 2018 4PM Getting Started Update your SVN repository. When needed, you will find additional materials for homework x in the folder hwx. So, for the current assignment

More information

Building LinkedIn s Real-time Data Pipeline. Jay Kreps

Building LinkedIn s Real-time Data Pipeline. Jay Kreps Building LinkedIn s Real-time Data Pipeline Jay Kreps What is a data pipeline? What data is there? Database data Activity data Page Views, Ad Impressions, etc Messaging JMS, AMQP, etc Application and

More information

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

exam.   Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0 70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to

More information

Hadoop Tutorial. General Instructions

Hadoop Tutorial. General Instructions CS246H: Mining Massive Datasets Hadoop Lab Winter 2018 Hadoop Tutorial General Instructions The purpose of this tutorial is to get you started with Hadoop. Completing the tutorial is optional. Here you

More information

Instructor : Dr. Sunnie Chung. Independent Study Spring Pentaho. 1 P a g e

Instructor : Dr. Sunnie Chung. Independent Study Spring Pentaho. 1 P a g e ABSTRACT Pentaho Business Analytics from different data source, Analytics from csv/sql,create Star Schema Fact & Dimension Tables, kettle transformation for big data integration, MongoDB kettle Transformation,

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

Hue Application for Big Data Ingestion

Hue Application for Big Data Ingestion Hue Application for Big Data Ingestion August 2016 Author: Medina Bandić Supervisor(s): Antonio Romero Marin Manuel Martin Marquez CERN openlab Summer Student Report 2016 1 Abstract The purpose of project

More information

Informatica PowerExchange for Microsoft Azure Blob Storage 10.2 HotFix 1. User Guide

Informatica PowerExchange for Microsoft Azure Blob Storage 10.2 HotFix 1. User Guide Informatica PowerExchange for Microsoft Azure Blob Storage 10.2 HotFix 1 User Guide Informatica PowerExchange for Microsoft Azure Blob Storage User Guide 10.2 HotFix 1 July 2018 Copyright Informatica LLC

More information

Developing a Web Server Platform with SAPI support for AJAX RPC using JSON

Developing a Web Server Platform with SAPI support for AJAX RPC using JSON 94 Developing a Web Server Platform with SAPI support for AJAX RPC using JSON Assist. Iulian ILIE-NEMEDI Informatics in Economy Department, Academy of Economic Studies, Bucharest Writing a custom web server

More information

Learning vrealize Orchestrator in action V M U G L A B

Learning vrealize Orchestrator in action V M U G L A B Learning vrealize Orchestrator in action V M U G L A B Lab Learning vrealize Orchestrator in action Code examples If you don t feel like typing the code you can download it from the webserver running on

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

Azure Data Factory. Data Integration in the Cloud

Azure Data Factory. Data Integration in the Cloud Azure Data Factory Data Integration in the Cloud 2018 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and

More information

BlueMix Hands-On Workshop

BlueMix Hands-On Workshop BlueMix Hands-On Workshop Lab E - Using the Blu Big SQL application uemix MapReduce Service to build an IBM Version : 3.00 Last modification date : 05/ /11/2014 Owner : IBM Ecosystem Development Table

More information

FIRST STEPS WITH SOFIA2

FIRST STEPS WITH SOFIA2 FIRST STEPS WITH SOFIA2 DECEMBER 2014 Version 5 1 INDEX 1 INDEX... 2 2 INTRODUCTION... 3 2.1 REQUIREMENTS... 3 2.2 CURRENT DOCUMENT GOALS AND SCOPE... 3 3 STEPS TO FOLLOW... ERROR! MARCADOR NO DEFINIDO.

More information

Oracle Cloud Using Oracle Big Data Manager. Release

Oracle Cloud Using Oracle Big Data Manager. Release Oracle Cloud Using Oracle Big Data Manager Release 18.2.5 E91848-08 June 2018 Oracle Cloud Using Oracle Big Data Manager, Release 18.2.5 E91848-08 Copyright 2018, 2018, Oracle and/or its affiliates. All

More information

Oracle Cloud Using Oracle Big Data Manager. Release

Oracle Cloud Using Oracle Big Data Manager. Release Oracle Cloud Using Oracle Big Data Manager Release 18.2.1 E91848-07 April 2018 Oracle Cloud Using Oracle Big Data Manager, Release 18.2.1 E91848-07 Copyright 2018, 2018, Oracle and/or its affiliates. All

More information

Installing and Configuring VMware Identity Manager Connector (Windows) OCT 2018 VMware Identity Manager VMware Identity Manager 3.

Installing and Configuring VMware Identity Manager Connector (Windows) OCT 2018 VMware Identity Manager VMware Identity Manager 3. Installing and Configuring VMware Identity Manager Connector 2018.8.1.0 (Windows) OCT 2018 VMware Identity Manager VMware Identity Manager 3.3 You can find the most up-to-date technical documentation on

More information

Data Access 3. Managing Apache Hive. Date of Publish:

Data Access 3. Managing Apache Hive. Date of Publish: 3 Managing Apache Hive Date of Publish: 2018-07-12 http://docs.hortonworks.com Contents ACID operations... 3 Configure partitions for transactions...3 View transactions...3 View transaction locks... 4

More information

Cannot Create Index On View 'test' Because

Cannot Create Index On View 'test' Because Cannot Create Index On View 'test' Because The View Is Not Schema Bound Cannot create index on view AdventureWorks2012.dbo.viewTestIndexedView because it uses a LEFT, RIGHT, or FULL OUTER join, and no

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

DAITSS Demo Virtual Machine Quick Start Guide

DAITSS Demo Virtual Machine Quick Start Guide DAITSS Demo Virtual Machine Quick Start Guide The following topics are covered in this document: A brief Glossary Downloading the DAITSS Demo Virtual Machine Starting up the DAITSS Demo Virtual Machine

More information

StorageTapper. Real-time MySQL Change Data Uber. Ovais Tariq, Shriniket Kale & Yevgeniy Firsov. October 03, 2017

StorageTapper. Real-time MySQL Change Data Uber. Ovais Tariq, Shriniket Kale & Yevgeniy Firsov. October 03, 2017 StorageTapper Real-time MySQL Change Data Streaming @ Uber Ovais Tariq, Shriniket Kale & Yevgeniy Firsov October 03, 2017 Overview What we will cover today Background & Motivation High Level Features System

More information

Sizing Guidelines and Performance Tuning for Intelligent Streaming

Sizing Guidelines and Performance Tuning for Intelligent Streaming Sizing Guidelines and Performance Tuning for Intelligent Streaming Copyright Informatica LLC 2017. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

<Partner Name> <Partner Product> RSA Ready Implementation Guide for. MapR Converged Data Platform 3.1

<Partner Name> <Partner Product> RSA Ready Implementation Guide for. MapR Converged Data Platform 3.1 RSA Ready Implementation Guide for MapR Jeffrey Carlson, RSA Partner Engineering Last Modified: 02/25/2016 Solution Summary RSA Analytics Warehouse provides the capacity

More information

Control for CloudFlare - Installation and Preparations

Control for CloudFlare - Installation and Preparations Control for CloudFlare - Installation and Preparations Installation Backup your web directory and Magento 2 store database; Download Control for CloudFlare installation package; Copy files to /app/firebear/cloudflare/

More information

Polarion 18 Enterprise Setup

Polarion 18 Enterprise Setup SIEMENS Polarion 18 Enterprise Setup POL005 18 Contents Terminology......................................................... 1-1 Overview........................................................... 2-1

More information

Microsoft Azure Stream Analytics

Microsoft Azure Stream Analytics Microsoft Azure Stream Analytics Marcos Roriz and Markus Endler Laboratory for Advanced Collaboration (LAC) Departamento de Informática (DI) Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio)

More information

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 HOTSPOT You install the Microsoft Hive ODBC Driver on a computer that runs Windows

More information

OpenStack Havana All-in-One lab on VMware Workstation

OpenStack Havana All-in-One lab on VMware Workstation OpenStack Havana All-in-One lab on VMware Workstation With all of the popularity of OpenStack in general, and specifically with my other posts on deploying the Rackspace Private Cloud lab on VMware Workstation,

More information

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics IBM Data Science Experience White paper R Transforming R into a tool for big data analytics 2 R Executive summary This white paper introduces R, a package for the R statistical programming language that

More information

Big Data Hadoop Course Content

Big Data Hadoop Course Content Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux

More information

IBM Image-Analysis Node.js

IBM Image-Analysis Node.js IBM Image-Analysis Node.js Cognitive Solutions Application Development IBM Global Business Partners Duration: 90 minutes Updated: Feb 14, 2018 Klaus-Peter Schlotter kps@de.ibm.com Version 1 Overview The

More information

IBM Fluid Query for PureData Analytics. - Sanjit Chakraborty

IBM Fluid Query for PureData Analytics. - Sanjit Chakraborty IBM Fluid Query for PureData Analytics - Sanjit Chakraborty Introduction IBM Fluid Query is the capability that unifies data access across the Logical Data Warehouse. Users and analytic applications need

More information