WPC010 Introduction to Azure Data Lake. Andrea Uggetti Microsoft Francesco Diaz Insight

Size: px
Start display at page:

Download "WPC010 Introduction to Azure Data Lake. Andrea Uggetti Microsoft Francesco Diaz Insight"

Transcription

1 WPC010 Introduction to Azure Data Lake P R E S E N T A Andrea Uggetti Microsoft Francesco Diaz Insight

2 Agenda Data Lake concepts Introduction to Azure Data Lake DEMO Q/A

3 Data Lake concepts

4 Data Lake definition A data lake is a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schemata and structural forms, usually object blobs or files. info@wpc2016.it

5 Data Staging Areas

6 But... "The main challenge is not creating a data lake, but taking advantage of the opportunities it presents." info@wpc2016.it

7 Part of Cortana Intelligence Suite

8 Azure Data Lake Service that enable developers, data scientists and analysts to store data of any size, shape and speed. Perform analysis and processing across platforms and languages

9 Azure Data Lake Store

10 Data Lake Store A highly scalable, distributed, parallel file system in the cloud specifically designed to work with multiple analytic frameworks Devices ADL Analytics ADL Store HDInsight R Spark Machine Learning

11 Data Lake Store key concept Store any data in its native format Without schema and transformation Unstructured Semi-Structured and Structured Hadoop file system (HDFS) for the cloud Built on Yarn/WebHDFS Expose WebHDFS via compatible Rest API Support for file/folder objects and operations Integrate with HDInsight, Hortonworks, Cloudera Accessible to all HDFS-compliant project (Spark, Storm, R, )

12 Enterprise Grade High Available: automate replicate data No limits to Scale Unlimited account sizes, From bytes to petabytes Data Lake Store key concept Optimized for analytic workload Large throughput Optimized for parallel computation Automatically optimizes for any throughput

13 Auditing Audit logs for all operations Stored in text format (JSON) Logs accessible via U-SQL scripts Data Lake Store Manage and data security Access Control POSIX-compliant Access Control List (ACL) Built-in groups (Owner, Contributor, Reader) Read, Write, Execute permission ACL on files and folders Integrated with Azure Active Directory IP Filter Encryption at rest Transparent server-side encryption Azure-managed (Azure Key Vault) Customer Managed Keys

14 Data Lake Store Ingress Data can be ingested into Azure Data Lake Store from a variety of sources SQL Apache Flume Azure SQL DB Server logs Built-in copy service Azure SQL DW Table Storage Azure Table Azure Data Factory Apache Sqoop ADL Store.NET SDK JavaScript CLI Azure portal Azure PowerShell Azure Storage blobs On-premises databases Azure Event Hub Custom programs

15 Data Lake Store Egress Data can be exported from Azure Data Lake Store into numerous targets/sinks SQL Apache Flume Azure SQL DB Server logs Azure SQL DW Azure Data Factory Table Storage Azure Table Apache Sqoop ADL Store.NET SDK JavaScript CLI Azure portal Azure PowerShell On-premises databases Custom programs

16 Copying data with Data Lake Sources Sinks Azure Blob Azure SQL Database Azure DocumentDB SQL Server Oracle database DB2 database Sybase database Azure Table Azure SQL Data Warehouse Azure Data Lake Store File system MySQL database Teradata database PostgreSQL database Azure Blob Azure Table Azure SQL Database Azure SQL Data Warehouse Azure DocumentDB Azure Data Lake Store SQL Server

17 Azure Data Factory Copy Activity ADL tool ADLCopy Copying data with Data Lake Open Source Tools Apache Sqoop Runs on HDInsight clusters Transfer data between relational databases and ADLS DistCp Runs on HDInsight clusters Transfer data between Amazon S3, Azure Blob Storage and ADLS

18 Different ingestion way depending by on the source of the data Ad Hoc Data Azure Portal; Azure PowerShell; Azure Cross-Platform CLI; Using Data Lake Tools for Visual Studio; Azure Data Factory, AdlCopy tool, DistCp running on HDInsight Cluster Streamed Data Azure Stream Analytics; Azure HDInsight Storm; EventProcessorHost Relational Data Azure Data Factory; Apache Sqoop Copying data with Data Lake Web server log data Azure Cross-Platform CLI; Powershell; Data Lake Store.NET SDK; Data Factory On-Premises files Data Factory; Powershell+Azure Cross-Platform CLI; Data Lake Store.NET SDK

19 Copying data with Data Lake ADLCopy Download Syntax AdlCopy /Source <Blob or Data Lake Store source> /Dest <Data Lake Store destination> /SourceKey <Key for Blob account> /Account <Data Lake Analytics account> /Unit <Number of Analytics units> /Pattern Account: ADL Analytic account, used to copy Gb/Tb data via Analytic Unit: Data Lake Analytics Units used by AdlCopy, mandatory if use /Account Pattern: RegEx pattern that indicate which blobs/files to copy

20 DEMO Azure Data Lake Store Copy file Security overviewsecurity overview

21 Data Lake Analytics: U-SQL

22 What is U-SQL U-SQL is a query language specifically designed for big data type of workloads. It is the combination of the SQL-like declarative language with the extensibility and programmability provided by C# info@wpc2016.it

23 How U-SQL has born Based on Microsoft experience on SCOPE ( + (T- SQL, ANSI SQL, Hive, C#) The need: analyze massive data sets The Map-Reduce model limitations «The programmer provides a map function that performs grouping and a reduce function that performs aggregation. The underlying run-time system achieves parallelism by partitioning the data and processing different partitions concurrently using multiple machines. However, this model has its own set of limitations. Users are forced to map their applications to the mapreduce model in order to achieve parallelism. For some applications this mapping is very unnatural.» info@wpc2016.it

24 SCOPE and COSMOS ( How U-SQL has born

25 Data Lake Analytics Why U-SQL Integrate strengths of SQL and Big Data programming Declarativity, optimizable and parallelizability of SQL Extensibility, expressiveness and familiarity of C# U-SQL makes it easy for Unstructured and Structured data processing Declarative SQL and customer imperative code (C#) Local and remote queries Increase productivity and agility Using familiar tools (Visual Studio) Leverage your SQL/.NET skills

26 Data Lake Analytics Usage scenarios Achieve the same programming experience in batch or interactive Schematizing unstructured data (Load-Extract-Transform-Store) for analysis Prepare data for other users (LETS & Share) As unstructured data As structured data Large-scale custom processing with custom code Supplement big data with high-value data from where it lives

27 U-SQL Queries: General pattern Read Process Store Azure Storage Blobs EXTRACT OUTPUT Azure Storage Blobs Azure SQL DB SELECT RowSet SELECT FROM WHERE RowSet Azure Data Lake EXTRACT SELECT OUTPUT INSERT

28 Anatomy of a U-SQL query 10 log records by Duration (End time minus Start time). Sort rows in descending order of Duration. Rowset: Conceptually is like an intermediate table REFERENCE ASSEMBLY = EXTRACT UserID string, Start DateTime, End DatetTime, Region string, SitesVisited string, PagesVisited string FROM "swebhdfs://logs/weblogrecords.txt" USING = SELECT UserID, (End.Subtract(Start)).TotalSeconds AS Duration ORDER BY Duration DESC FETCH 10; TO "swebhdfs://logs/results/top10.txt" USING Outputter.Tsv(); U-SQL types are the same as C# types The structure (schema) is first imposed when the data is first extracted/read from the file (schema-on-read) Input is read from this file in ADL Custom function to read from input file C# Expression Output is stored in this file in ADL Built-in function that writes the output in TSV format

29 Logical Expression Tree - U-SQL does not assign the rowset to a variable - The variable is the name for the expression that is being assigned to - U-SQL is not executing each statement sequentially - It composes bigger and bigger statements until there is a big expression tree that can be optimized and executed (lambda expression composition) - Execution in parallel, no intermediate state - Intermediate investigation using declarative framework (e.g. Write to a file) or external workflows info@wpc2016.it

30 U-SQL data types Category Types Numeric Text Complex Temporal Other byte, byte? sbyte, sbyte? int, int? uint, unint? long, long? decimal, decimal? char, char? string MAP<K> ARRAY<K,T> DateTime, DateTime? bool, bool? Guid, Guid? Byte[] short, short? ushort, ushort? ulong, unlong? float, float? double, double? Note: Nullable types have to be declared with a question mark? 30

31 Reading files with custom formats Use built-in extractors to read CSV and TSV files, or create custom extractors for different formats 1 Implement IExtractor Interface 2 Upload and Register Assembly 3 Reference the Assembly and Use using Microsoft.SCOPE.Interfaces; public WebLogExtractor:IExtractor { public override IEnumerable<IRow> Extract( ) { } } CREATE ASSEMBLY WebLogExtAsm /WebLogExtAsm.dll" WITH PERMISSION_SET = RESTRICTED; CREATE EXTRACTOR WebLogExtractor EXTERNAL NAME WebLogExtractor; REFERENCE ASSEMBLY WebLogExtAsm; //now just use it like a built-in extractor SELECT * swebhdfs://logs/webrecords.txt USING WebLogExtractor(); 31

32 WebLogExtractor: Implementation Custom extractor for WebLogRecords.txt file with this format using System.Collections.Generic; using System.IO; using System.Text.RegularExpressions; namespace Demo { [SqlUserDefinedExtractor] public class MyExtractor : IExtractor { public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output) { string line; Regex ItemRegex = new Regex(@"\<(.*?)\>", RegexOptions.Compiled); string[] tokens = { "UserId", "Start", "End", "Region", "SitesVisited", "PagesVisited" }; var reader = new StreamReader(input.BaseStream); while ((line = reader.readline())!= null) { int i = 0; foreach (Match ItemMatch in ItemRegex.Matches(line)){ output.set(tokens[i], ItemMatch.Groups[1]); i++; } yield return output.asreadonly(); } } } } 32

33 Tables CREATE TABLE CREATE TABLE T (col1 int, col2 string, col3 SQL.MAP<string,string>, INDEX idx CLUSTERED (col1 ASC) DISTRIBUTED BY HASH (driver_id) ); Structured Data Built-in Data types only (no UDTs) Clustered Index (needs to be specified): row-oriented Fine-grained distribution (needs to be specified): HASH, DIRECT HASH, RANGE, ROUND ROBIN CREATE TABLE AS SELECT CREATE TABLE T (INDEX idx CLUSTERED ) AS SELECT ; CREATE TABLE T (INDEX idx CLUSTERED ) AS EXTRACT ; CREATE TABLE T (INDEX idx CLUSTERED ) AS mytvf(default); Infer the schema from the query Still requires index and partitioning ALTER TABLE ADD/DROP COLUMN ALTER TABLE T ADD COLUMN eventname string; ALTER TABLE T DROP COLUMN col3; ALTER TABLE T ADD COLUMN result string, clientid string, payload int?; ALTER TABLE T DROP COLUMN clientid, result; Meta-data only operation Existing rows will get Non-nullable types: C# data type default value (e.g., int will be 0) Nullable types: null

34 Query 4 Improve performance with TABLEs Improve performance of Query1! CREATE TABLE LogRecordsTable(UserId int, Start DateTime, End Datetime, Region string INDEX idx CLUSTERED (Region ASC) PARTITIONED BY HASH (Region)); WebLogRecords.txt Populate the table Select only required fields Azure Data Lake Run query directly against the table Top10.Tsv INSERT INTO LogRecordsTable SELECT UserId, Start, End, = SELECT UserID, (End.Subtract(Start)).TotalSeconds AS Duration FROM LogRecordsTable ORDER BY Duration DESC FETCH 10; TO adls://logs/results/top10.tsv USING Outputters.Tsv(); 34

35 U-SQL extensibility Extend U-SQL with C#/.NET in Visual Studio Built-in operators, function, aggregates User-defined operators (UDOs) User-defined functions (UDFs) User-defined aggregates (UDAGGs) C# expressions (in SELECT statements)

36 Federated queries: Query data where it lives Easily query data in multiple Azure data stores without moving it to a single store Benefits Avoid moving large amounts of data across the network between stores Single view of data irrespective of physical location Azure Storage Blobs Minimize data proliferation issues caused by maintaining multiple copies Single query language for all data Each data store maintains its own sovereignty Design choices based on the need U-SQL Query Azure Data Lake Analytics Query Result Azure SQL in VMs Azure SQL DB

37 What can you do in the Azure Portal? Create a new Big Data Analytics account Author U-SQL scripts Submit U-SQL jobs Cancel running jobs Provision users who can submit jobs Visualize usage stats (compute hours) Visualize job management chart

38 Code behind support

39 DEMO Trasform text files Load transformed text files into table Aggregate functions Produce an output table Job execution graph Job execution progress playback

40 Session Recap Data Lake concepts Introduction to Azure Data Lake DEMO Q/A

41 Additional resources Blogs and community page The bible:

42 Q/A

43 GRAZIE!

44 Contatti OverNet Education OverNet Education Tel

Azure Data Lake Store

Azure Data Lake Store Azure Data Lake Store Analytics 101 Kenneth M. Nielsen Data Solution Architect, MIcrosoft Our Sponsors About me Kenneth M. Nielsen Worked with SQL Server since 1999 Data Solution Architect at Microsoft

More information

Azure Data Lake Analytics Introduction for SQL Family. Julie

Azure Data Lake Analytics Introduction for SQL Family. Julie Azure Data Lake Analytics Introduction for SQL Family Julie Koesmarno @MsSQLGirl www.mssqlgirl.com jukoesma@microsoft.com What we have is a data glut Vernor Vinge (Emeritus Professor of Mathematics at

More information

Swimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad

Swimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad Swimming in the Data Lake Presented by Warner Chaves Moderated by Sander Stad Thank You microsoft.com hortonworks.com aws.amazon.com red-gate.com Empower users with new insights through familiar tools

More information

BIG DATA COURSE CONTENT

BIG DATA COURSE CONTENT BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

HDInsight > Hadoop. October 12, 2017

HDInsight > Hadoop. October 12, 2017 HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and

More information

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

exam.   Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0 70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to

More information

Index. Scott Klein 2017 S. Klein, IoT Solutions in Microsoft s Azure IoT Suite, DOI /

Index. Scott Klein 2017 S. Klein, IoT Solutions in Microsoft s Azure IoT Suite, DOI / Index A Advanced Message Queueing Protocol (AMQP), 44 Analytics, 9 Apache Ambari project, 209 210 API key, 244 Application data, 4 Azure Active Directory (AAD), 91, 257 Azure Blob Storage, 191 Azure data

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Ian Choy. Technology Solutions Professional

Ian Choy. Technology Solutions Professional Ian Choy Technology Solutions Professional XML KPIs SQL Server 2000 Management Studio Mirroring SQL Server 2005 Compression Policy-Based Mgmt Programmability SQL Server 2008 PowerPivot SharePoint Integration

More information

Processing Big Data. with AZURE DATA LAKE ANALYTICS. Sean Forgatch - Senior Consultant. 6/23/ TALAVANT. All Rights Reserved.

Processing Big Data. with AZURE DATA LAKE ANALYTICS. Sean Forgatch - Senior Consultant. 6/23/ TALAVANT. All Rights Reserved. Processing Big Data with AZURE DATA LAKE ANALYTICS Sean Forgatch - Senior Consultant 6/23/2018 2018 TALAVANT. All Rights Reserved. 1 SQL Saturday Iowa 2018 6/23/2018 2018 TALAVANT. All Rights Reserved.

More information

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data

More information

Introduction to U-SQL & Data Lake Alex Whittles

Introduction to U-SQL & Data Lake Alex Whittles Introduction to U-SQL & Data Lake Alex Whittles Alex@PurpleFrogSystems.com PurpleFrogSystems.com PurpleFrogSystems.com/blog @PurpleFrogAlex Alex Whittles SQL Relay Committee SQLRelay.co.uk SQL Bits Committee

More information

Azure Data Factory VS. SSIS. Reza Rad, Consultant, RADACAD

Azure Data Factory VS. SSIS. Reza Rad, Consultant, RADACAD Azure Data Factory VS. SSIS Reza Rad, Consultant, RADACAD 2 Please silence cell phones Explore Everything PASS Has to Offer FREE ONLINE WEBINAR EVENTS FREE 1-DAY LOCAL TRAINING EVENTS VOLUNTEERING OPPORTUNITIES

More information

Introduction to U-SQL & Data Lake Alex Whittles

Introduction to U-SQL & Data Lake Alex Whittles Introduction to U-SQL & Data Lake Alex Whittles Alex@PurpleFrogSystems.com PurpleFrogSystems.com PurpleFrogSystems.com/blog @PurpleFrogAlex Alex Whittles SQL Relay Committee SQLRelay.co.uk SQL Bits Committee

More information

Developing Microsoft Azure Solutions

Developing Microsoft Azure Solutions Developing Microsoft Azure Solutions Duration: 5 Days Course Code: M20532 Overview: This course is intended for students who have experience building web applications. Students should also have experience

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

Exam Questions

Exam Questions Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure

More information

Data Architectures in Azure for Analytics & Big Data

Data Architectures in Azure for Analytics & Big Data Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick Data Architecture A

More information

Alexander Klein. #SQLSatDenmark. ETL meets Azure

Alexander Klein. #SQLSatDenmark. ETL meets Azure Alexander Klein ETL meets Azure BIG Thanks to SQLSat Denmark sponsors Save the date for exiting upcoming events PASS Camp 2017 Main Camp 05.12. 07.12.2017 (04.12. Kick-Off abends) Lufthansa Training &

More information

Microsoft Big Data and Hadoop

Microsoft Big Data and Hadoop Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common

More information

Franck Mercier. Technical Solution Professional Data + AI Azure Databricks

Franck Mercier. Technical Solution Professional Data + AI Azure Databricks Franck Mercier Technical Solution Professional Data + AI http://aka.ms/franck @FranmerMS Azure Databricks Thanks to our sponsors Global Gold Silver Bronze Microsoft JetBrains Rubrik Delphix Solution OMD

More information

20532D: Developing Microsoft Azure Solutions

20532D: Developing Microsoft Azure Solutions 20532D: Developing Microsoft Azure Solutions Course Details Course Code: Duration: Notes: 20532D 5 days Elements of this syllabus are subject to change. About this course This course is intended for students

More information

Techno Expert Solutions

Techno Expert Solutions Course Content of Microsoft Windows Azzure Developer: Course Outline Module 1: Overview of the Microsoft Azure Platform Microsoft Azure provides a collection of services that you can use as building blocks

More information

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases

More information

Overview of Data Services and Streaming Data Solution with Azure

Overview of Data Services and Streaming Data Solution with Azure Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server

More information

White Paper / Azure Data Platform: Ingest

White Paper / Azure Data Platform: Ingest White Paper / Azure Data Platform: Ingest Contents White Paper / Azure Data Platform: Ingest... 1 Versioning... 2 Meta Data... 2 Foreword... 3 Prerequisites... 3 Azure Data Platform... 4 Flowchart Guidance...

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

Survey of the Azure Data Landscape. Ike Ellis

Survey of the Azure Data Landscape. Ike Ellis Survey of the Azure Data Landscape Ike Ellis Wintellect Core Services Consulting Custom software application development and architecture Instructor Led Training Microsoft s #1 training vendor for over

More information

Understanding the latent value in all content

Understanding the latent value in all content Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence

More information

WHITEPAPER. MemSQL Enterprise Feature List

WHITEPAPER. MemSQL Enterprise Feature List WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 HOTSPOT You install the Microsoft Hive ODBC Driver on a computer that runs Windows

More information

Přehled novinek v SQL Server 2016

Přehled novinek v SQL Server 2016 Přehled novinek v SQL Server 2016 Martin Rys, BI Competency Leader martin.rys@adastragrp.com https://www.linkedin.com/in/martinrys 20.4.2016 1 BI Competency development 2 Trends, modern data warehousing

More information

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 You have an Azure HDInsight cluster. You need to store data in a file format that

More information

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp. Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020

More information

Azure Development Course

Azure Development Course Azure Development Course About This Course This section provides a brief description of the course, audience, suggested prerequisites, and course objectives. COURSE DESCRIPTION This course is intended

More information

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect Tour of Database Platforms as a Service June 2016 Warner Chaves Christo Kutrovsky Solutions Architect Bio Solutions Architect at Pythian Specialize high performance data processing and analytics 15 years

More information

Developing Microsoft Azure Solutions (70-532) Syllabus

Developing Microsoft Azure Solutions (70-532) Syllabus Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages

More information

Developing Microsoft Azure Solutions (70-532) Syllabus

Developing Microsoft Azure Solutions (70-532) Syllabus Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages

More information

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most

More information

Hadoop. Introduction / Overview

Hadoop. Introduction / Overview Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures

More information

Azure Data Factory. Data Integration in the Cloud

Azure Data Factory. Data Integration in the Cloud Azure Data Factory Data Integration in the Cloud 2018 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and

More information

Developing Microsoft Azure Solutions (70-532) Syllabus

Developing Microsoft Azure Solutions (70-532) Syllabus Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages

More information

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program

More information

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,

More information

Introduction to BigData, Hadoop:-

Introduction to BigData, Hadoop:- Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Hadoop Online Training

Hadoop Online Training Hadoop Online Training IQ training facility offers Hadoop Online Training. Our Hadoop trainers come with vast work experience and teaching skills. Our Hadoop training online is regarded as the one of the

More information

Oracle GoldenGate for Big Data

Oracle GoldenGate for Big Data Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines

More information

Data in the Cloud and Analytics in the Lake

Data in the Cloud and Analytics in the Lake Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)

More information

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale

More information

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop Hadoop Open Source Projects Hadoop is supplemented by an ecosystem of open source projects Oozie 25 How to Analyze Large Data Sets in Hadoop Although the Hadoop framework is implemented in Java, MapReduce

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Certified Big Data and Hadoop Course Curriculum

Certified Big Data and Hadoop Course Curriculum Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation

More information

Alexander Klein. ETL in the Cloud

Alexander Klein. ETL in the Cloud Alexander Klein ETL in the Cloud Sponsors help us to run this event! THX! You Rock! Sponsor Gold Sponsor Silver Sponsor Bronze Sponsor You Rock! Sponsor Session 13:45 Track 1 Das super nerdige Solisyon

More information

Oliver Engels & Tillmann Eitelberg. Big Data! Big Quality?

Oliver Engels & Tillmann Eitelberg. Big Data! Big Quality? Oliver Engels & Tillmann Eitelberg Big Data! Big Quality? Like to visit Germany? PASS Camp 2017 Main Camp 5.12 7.12.2017 (4.12 Kick Off Evening) Lufthansa Training & Conference Center, Seeheim SQL Konferenz

More information

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database: SQL and PL/SQL Fundamentals NEW Oracle University Contact Us: 001-855-844-3881 & 001-800-514-06-97 Oracle Database: SQL and PL/SQL Fundamentals NEW Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals

More information

Take P, R or U. and solve your data quality problems Oliver Engels & Tillmann Eitelberg, OH22

Take P, R or U. and solve your data quality problems Oliver Engels & Tillmann Eitelberg, OH22 Take P, R or U and solve your data quality problems Oliver Engels & Tillmann Eitelberg, OH22 Oliver Engels CEO, oh22data AG @oengels Datamonster from Germany MS Data Platform MVP President of PASS Germany

More information

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Blended Learning Outline: Cloudera Data Analyst Training (171219a) Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills

More information

Developing Microsoft Azure Solutions

Developing Microsoft Azure Solutions Course 20532C: Developing Microsoft Azure Solutions Course details Course Outline Module 1: OVERVIEW OF THE MICROSOFT AZURE PLATFORM This module reviews the services available in the Azure platform and

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

Security and Performance advances with Oracle Big Data SQL

Security and Performance advances with Oracle Big Data SQL Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,

More information

Actual4Test. Actual4test - actual test exam dumps-pass for IT exams

Actual4Test.   Actual4test - actual test exam dumps-pass for IT exams Actual4Test http://www.actual4test.com Actual4test - actual test exam dumps-pass for IT exams Exam : 1z1-449 Title : Oracle Big Data 2017 Implementation Essentials Vendor : Oracle Version : DEMO Get Latest

More information

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET SOLUTION SHEET Syncsort DMX-h Simplifying Big Data Integration Goals of the Modern Data Architecture Data warehouses and mainframes are mainstays of traditional data architectures and still play a vital

More information

Certified Big Data Hadoop and Spark Scala Course Curriculum

Certified Big Data Hadoop and Spark Scala Course Curriculum Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills

More information

17/05/2017. What we ll cover. Who is Greg? Why PaaS and SaaS? What we re not discussing: IaaS

17/05/2017. What we ll cover. Who is Greg? Why PaaS and SaaS? What we re not discussing: IaaS What are all those Azure* and Power* services and why do I want them? Dr Greg Low SQL Down Under greg@sqldownunder.com Who is Greg? CEO and Principal Mentor at SDU Data Platform MVP Microsoft Regional

More information

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training:: Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional

More information

Oracle Big Data Fundamentals Ed 2

Oracle Big Data Fundamentals Ed 2 Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

Microsoft Exam

Microsoft Exam Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred

More information

New Features and Enhancements in Big Data Management 10.2

New Features and Enhancements in Big Data Management 10.2 New Features and Enhancements in Big Data Management 10.2 Copyright Informatica LLC 2017. Informatica, the Informatica logo, Big Data Management, and PowerCenter are trademarks or registered trademarks

More information

Modern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc.

Modern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc. Modern ETL Tools for Cloud and Big Data Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc. Agenda Landscape Cloud ETL Tools Big Data ETL Tools Best Practices

More information

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture Big Data Syllabus Hadoop YARN Setup Programming in YARN framework j Understanding big data and Hadoop Big Data Limitations and Solutions of existing Data Analytics Architecture Hadoop Features Hadoop Ecosystem

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

Data Warehouse Design Decisions

Data Warehouse Design Decisions Data Warehouse Design Decisions August 2015 Colleen Barnitz Director, IT Development MVT Services Colleen Barnitz over 20 Years in IT worked with SQL Server since version 6.5 developer and an architect

More information

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Hadoop & Big Data Analytics Complete Practical & Real-time Training An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE

More information

Approaching the Petabyte Analytic Database: What I learned

Approaching the Petabyte Analytic Database: What I learned Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may

More information

Event Sponsors. Expo Sponsors. Expo Light Sponsors

Event Sponsors. Expo Sponsors. Expo Light Sponsors Event Sponsors Expo Sponsors Expo Light Sponsors IoT for the BI professional David L. Bojsen - Principal Architect What to expect Level 200 session Which basically means PowerPoint and talking Enthusiastic

More information

Microsoft Perform Data Engineering on Microsoft Azure HDInsight.

Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight http://killexams.com/pass4sure/exam-detail/70-775 QUESTION: 30 You are building a security tracking solution in Apache Kafka to parse

More information

Databricks, an Introduction

Databricks, an Introduction Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,

More information

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may

More information

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. 17-18 March, 2018 Beijing Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020 Today, 80% of organizations

More information

TECHNOLOGY SOLUTION EVOLUTION

TECHNOLOGY SOLUTION EVOLUTION JAR PLATFORM JORVAK TECHNOLOGY SOLUTION EVOLUTION 1990s Build Your Own Time to Production Present Time Highly Configurable Hybrid Platforms Universal Connectivity Application Screens Integrations/Reporting

More information

Designing Database Solutions for Microsoft SQL Server 2012

Designing Database Solutions for Microsoft SQL Server 2012 Designing Database Solutions for Microsoft SQL Server 2012 Course 20465A 5 Days Instructor-led, Hands-on Introduction This course describes how to design and monitor high performance, highly available

More information

Introduction to Hive Cloudera, Inc.

Introduction to Hive Cloudera, Inc. Introduction to Hive Outline Motivation Overview Data Model Working with Hive Wrap up & Conclusions Background Started at Facebook Data was collected by nightly cron jobs into Oracle DB ETL via hand-coded

More information

MapR Enterprise Hadoop

MapR Enterprise Hadoop 2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS

More information

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours) Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:

More information

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance

More information

Transitioning From SSIS to Azure Data Factory. Meagan Longoria, Solution Architect, BlueGranite

Transitioning From SSIS to Azure Data Factory. Meagan Longoria, Solution Architect, BlueGranite Transitioning From SSIS to Azure Data Factory Meagan Longoria, Solution Architect, BlueGranite Microsoft Data Platform MVP I enjoy contributing to and learning from the Microsoft data community. Blogger

More information

Oskari Heikkinen. New capabilities of Azure Data Factory v2

Oskari Heikkinen. New capabilities of Azure Data Factory v2 Oskari Heikkinen New capabilities of Azure Data Factory v2 Oskari Heikkinen Lead Cloud Architect at BIGDATAPUMP Microsoft P-TSP Azure Advisors Numerous projects on Azure Worked with Microsoft Data Platform

More information

Tomasz Libera. Azure SQL Data Warehouse

Tomasz Libera. Azure SQL Data Warehouse Tomasz Libera Azure SQL Data Warehouse Thanks to our partners! About me Microsoft MVP Data Platform Microsoft Certified Trainer SQL Server Developer Academic Trainer datacommunity.org.pl One of the leaders

More information

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved. Gain Insights From Unstructured Data Using Pivotal HD 1 Traditional Enterprise Analytics Process 2 The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources

More information

Hadoop Development Introduction

Hadoop Development Introduction Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand

More information