Transitioning From SSIS to Azure Data Factory. Meagan Longoria, Solution Architect, BlueGranite

Similar documents
Things I Learned The Hard Way About Azure Data Platform Services So You Don t Have To -Meagan Longoria

Alexander Klein. #SQLSatDenmark. ETL meets Azure

Azure Data Factory VS. SSIS. Reza Rad, Consultant, RADACAD

Migrating Enterprise BI to Azure

Improve SSIS Delivery with a Patterns-Based Approach. Meagan Longoria July 19, 2017

Azure Data Factory. Data Integration in the Cloud

Oskari Heikkinen. New capabilities of Azure Data Factory v2

BIG DATA COURSE CONTENT

Data Architectures in Azure for Analytics & Big Data

Overview of Data Services and Streaming Data Solution with Azure

COURSE 10977A: UPDATING YOUR SQL SERVER SKILLS TO MICROSOFT SQL SERVER 2014

Alexander Klein. ETL in the Cloud

Microsoft Azure Course Content

Azure Certification BootCamp for Exam (Developer)

Modern Data Warehouse The New Approach to Azure BI

Implementing a SQL Data Warehouse

"Charting the Course... MOC B Updating Your SQL Server Skills to Microsoft SQL Server 2014 Course Summary

MOC 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Data sources. Gartner, The State of Data Warehousing in 2012

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Neues Dream Team Azure Data Factory v2 und SSIS

White Paper / Azure Data Platform: Ingest

Developing Microsoft Azure Solutions (70-532) Syllabus

Data sources. Gartner, The State of Data Warehousing in 2012

Microsoft Implementing a SQL Data Warehouse

microsoft

Course 20533B: Implementing Microsoft Azure Infrastructure Solutions

Swimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad

Exam /Course 20767B: Implementing a SQL Data Warehouse

17/05/2017. What we ll cover. Who is Greg? Why PaaS and SaaS? What we re not discussing: IaaS

Developing Microsoft Azure Solutions (70-532) Syllabus

Duration: 5 Days. EZY Intellect Pte. Ltd.,

20767: Implementing a SQL Data Warehouse

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

[MS10992]: Integrating On-Premises Core Infrastructure with Microsoft Azure

Azure Data Factory v2

SQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024

Leveraging Azure Services for a Scalable Windows Remote Desktop Deployment

JOB SCHEDULING CHECKLIST

BI ENVIRONMENT PLANNING GUIDE

20767B: IMPLEMENTING A SQL DATA WAREHOUSE

Microsoft certified solutions associate

CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List)

Integrate MATLAB Analytics into Enterprise Applications

Developing Microsoft Azure Solutions (70-532) Syllabus

SharePoint Online and Azure Integration

Integrate MATLAB Analytics into Enterprise Applications

Implementing a SQL Data Warehouse

Course Outline: Designing, Optimizing, and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

Azure Administrator Role

Developing Microsoft Azure Solutions

Integrate MATLAB Analytics into Enterprise Applications

So You Want To Be A Rockstar Report Developer?

Implementing and Maintaining Microsoft SQL Server 2008 Integration Services

Unlocking Azure with Puppet Enterprise. November 29, 2016

Implementing Microsoft Azure Infrastructure Solutions

Implement a Data Warehouse with Microsoft SQL Server

AZURE DATA FACTORY TRANSFERRING 40GB OF DATA EVERY DAY

20463C-Implementing a Data Warehouse with Microsoft SQL Server. Course Content. Course ID#: W 35 Hrs. Course Description: Audience Profile

GLOBAL INFOSKILLS SDN BHD

GLOBAL INFOSKILLS SDN BHD

Index. Scott Klein 2017 S. Klein, IoT Solutions in Microsoft s Azure IoT Suite, DOI /

Why Choose MS Azure?

Implementing a Data Warehouse with Microsoft SQL Server

R Language for the SQL Server DBA

MICROSOFT CLOUD PLATFORM AND INFRASTRUCTURE CERTIFICATION. Includes certifications for Microsoft Azure and Windows Server

Azure Free Training. Module 1 : Azure Governance Model. Azure. By Hicham KADIRI October 27, Naming. Convention. A K&K Group Company

BraindumpsQA. IT Exam Study materials / Braindumps

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR

Azure Development Course

Take P, R or U. and solve your data quality problems Oliver Engels & Tillmann Eitelberg, OH22

Developing Enterprise Cloud Solutions with Azure

Configuring and Operating a Hybrid Cloud with Microsoft Azure Stack

Implementing a SQL Data Warehouse

Javier Villegas. Azure SQL Server Managed Instance

Cloud Migration Reference Guide for the end of support of SQL Server 2008 and 2008 R2

Playing Outside Your Sandbox INTERACTING WITH OTHER SYSTEMS USING SHAREPOINT BCS

Implementing a Data Warehouse with Microsoft SQL Server 2012

Go Serverless: Design Patterns, Best Practices and Real-World Scenarios

Data Lake Based Systems that Work

The Pathway to the Cloud Using Azure SQL Managed Instance

Azure Certification BootCamp for Exam (Architect)

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Exam Questions

Managing and Auditing Organizational Migration to the Cloud TELASA SECURITY

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Microsoft vision for a new era

20533B: Implementing Microsoft Azure Infrastructure Solutions

Azure Pack is one of Microsoft s most underrated tools.

Integration Services. Creating an ETL Solution with SSIS. Module Overview. Introduction to ETL with SSIS Implementing Data Flow

Lessons Learned: Deploying Microservices Software Product in Customer Environments Mark Galpin, Solution Architect, JFrog, Inc.

Infrastructure modernization with Microsoft Azure

Understanding the latent value in all content

Implementing a SQL Data Warehouse (20767)

Deccansoft Software Services. SSIS Syllabus

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Deccansoft Software Services

Azure Logic Apps. The big picture. API Apps, Logic Apps & API Management

Deccansoft Software Services

Transcription:

Transitioning From SSIS to Azure Data Factory Meagan Longoria, Solution Architect, BlueGranite

Microsoft Data Platform MVP I enjoy contributing to and learning from the Microsoft data community. Blogger I blog about business intelligence, data visualization, and consulting at DataSavvy.me Meagan Longoria Solution Architect, BlueGranite /meaganlongoria @mmarie Contributor to a new book I had the pleasure of writing a chapter for Let Her Finish Series: Voices from the Data Platform Owner of an English Bulldog My twitter account is business intelligence with a side of bulldog.

About You SSIS developers? Data Factory developers? Accidentally walked into the room and decided not to leave?

What Is Integration Services? Microsoft Integration Services is a platform for building enterprise-level data integration and data transformations solutions Basically: A data migration and ETL tool It is a component of SQL Server that has existed since SQL Server 2005. https://docs.microsoft.com/en-us/sql/integration-services/sql-server-integration-services

SSIS Overview

Inside an SSIS Solution 1 or more projects containing 1 or more packages 0 or more project-level connection managers 0 or project parameters Solution Project(s) Package(s) Task(s)

Inside an SSIS Package Each package contains Control Flow Data Flow Connection Managers Package Control Flow Task Data Flow Data Flow Task Source Transformation Destination

Inside an SSIS Package, Continued The following objects extend the functionality of a package: Parameters Variables Event Handlers Configurations Logging and Log Providers

Example SSIS Package

ADF Overview

What is Azure Data Factory ADF is a cloud-based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation Basically: A data orchestration tool It is a Platform-as-a-Service offering in Azure that was released in 2015 and updated in 2017.

Inside an Azure Data Factory V1 Solution Linked Service defines connection properties Datasets pointer to the data you want to process/have processed, sometimes defining the schema Pipelines combine datasets and activities and define an execution schedule Data Management Gateway allows ADF to retrieve data from an on-premises data source

Inside an Azure Data Factory V1 Solution Pipeline Dataset Dataset Activity Activity Dataset Dataset Activities consume and produce datasets Pipelines are logical groupings of activities

Example ADF V1 Pipeline

New In Azure Data Factory V2 Integration Runtime compute infrastructure used to provide integration capabilities across networks; can be on prem, managed, or IaaS Control Flow activity dependencies, parameters, foreach loops, activity outputs Trigger-based flows on demand (coming soon) or at a certain time Monitoring pipeline runs rather than just activities (SDK only right now) SSIS in Azure

A Few Notes on ADF V2 Brand new SDKs for ADF V2.NET PowerShell Python Future: Java Only available in East US and East US2 For now, must be created programmatically; cloud-based GUI designer coming soon

Compare Solutions This Photo by Unknown Author is licensed under CC BY-SA

Similarities Between SSIS & ADF

Some Things Are The Same Both are developed using Visual Studio Both can copy data to and from Azure Both can fire up HDInsight clusters and run Hive and Pig scripts Both use role-based security Both can trigger alerts upon encountering an error Both have logging Both can be automated (Biml, PowerShell,.NET)

Dependencies/Order SSIS ADF V1 ADF V2

Orchestration ADF V1 is a data orchestration tool ADF V2 is a data orchestration and integration tool SSIS can be used as a data orchestration tool Have you ever seen anyone use SSIS just to execute stored procedures?

Differences Between SSIS & ADF

Not Better, Not Worse, Just Different ADF is usually used for ELT (as opposed to ETL) ADF V1 is built around the concept of timeslices, V2 has options ADF scheduling is in the pipeline, SSIS needs SQL Agent or Azure Automation (or another tool)

ADF Gaps Use SSIS, C#, or Spark for transformations, nothing built-in V1 has no built-in error handling, V2 has On Failure activity V1 has no GUI in VS, V2 will have a GUI in Azure but no source control connectivity V1 Config files in VS cause multiple copies of files per environment Can t execute more frequently than 15 minutes with native scheduler V1: Not as many data sources as SSIS Sharpen your C# Skills to get around this!

ADF Strength PaaS requires no infrastructure Easier to scale out - great with Big Data ADF JSON is easier to source control than the GUI-created XML of SSIS Updated more frequently than SSIS (new features!) Native support for zip/unzip Dynamic partitioning for folder and file name Data lineage

ADF Lessons Learned

Lessons Learned From Loading SQL DW No transforms means get ready to write a bunch of SQL Time required for deployment to Azure can vary by a couple of hours ADF is in UTC be careful of Daylight Savings In V1, one-time pipelines don t auto-execute on deployment and don t appear in the Monitor & Manage app ADF cannot natively move, only copy

Lessons Learned From Loading SQL DW Be sure to use Service Principal auth with ADLS Beware the missing JRE when converting to ORC files Deploy with PowerShell so you don t have to re-deploy datasets and affect their availability Automate with Biml! Check out Gerhard s ADF monitor made in Power BI: https://github.com/gbrueckl/azure.datafactory.powerbimonitor)

Lessons Learned: Final Thoughts ADF solutions may contain 5 or more different languages. Don t be afraid to mix and match technologies for the best fit. For now, embrace the custom activity. ADF solutions can better handle different types of data (big/small/tabular/semi-structured)

Thank You Learn more from Meagan Longoria @mmarie DataSavvy.me