Data Analytics at Logitech Snowflake + Tableau = #Winning

Similar documents
Improving the ROI of Your Data Warehouse

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

WHITEPAPER. MemSQL Enterprise Feature List

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

Oracle Autonomous Database

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

VOLTDB + HP VERTICA. page

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Oracle Database 18c and Autonomous Database

CloudExpo November 2017 Tomer Levi

BI ENVIRONMENT PLANNING GUIDE

Stages of Data Processing

5 Fundamental Strategies for Building a Data-centered Data Center

Demystifying Cloud Data Warehousing

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations

Data-Intensive Distributed Computing

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Modern Data Warehouse The New Approach to Azure BI

Shine a Light on Dark Data with Vertica Flex Tables

Modernizing Business Intelligence and Analytics

When, Where & Why to Use NoSQL?

Safe Harbor Statement

Fluentd + MongoDB + Spark = Awesome Sauce

Oracle Exadata: Strategy and Roadmap

Your New Autonomous Data Warehouse

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

DATABASE SCALE WITHOUT LIMITS ON AWS

The Snowflake Elastic Data Warehouse SIGMOD 2016 and beyond. Ashish Motivala, Jiaqi Yan

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Fast Innovation requires Fast IT

An Information Asset Hub. How to Effectively Share Your Data

Top Trends in DBMS & DW

Oracle Big Data Connectors

An Introduction to Big Data Formats

Making Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0. WEBINAR MAY 15 th, PM EST 10AM PST

Lambda Architecture for Batch and Stream Processing. October 2018

What is Gluent? The Gluent Data Platform

Security and Performance advances with Oracle Big Data SQL

Přehled novinek v SQL Server 2016

BI, Big Data, Mission Critical. Eduardo Rivadeneira Specialist Sales Manager

IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK

Oracle Machine Learning Notebook

Drawing the Big Picture

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Approaching the Petabyte Analytic Database: What I learned

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect

Actifio Test Data Management

QLIK INTEGRATION WITH AMAZON REDSHIFT

BIG DATA COURSE CONTENT

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Workload Management for an Operational Data Warehouse Oracle Database Jean-Pierre Dijcks Sr. Principal Product Manager Data Warehousing

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

How to analyze JSON with SQL

Autonomous Database Level 100

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

COMPARISON WHITEPAPER. Snowplow Insights VS SaaS load-your-data warehouse providers. We do data collection right.

<Insert Picture Here> Introduction to Big Data Technology

Overview of Data Services and Streaming Data Solution with Azure

Hybrid Data Platform

THE END OF YOUR STRUGGLE FOR DATA

Demystifying Data Warehouse as a Service (DWaaS)

Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes

FAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide

@Pentaho #BigDataWebSeries

Cloud Computing & Visualization

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE

Oracle Exadata: The World s Fastest Database Machine

ETL is No Longer King, Long Live SDD

How to integrate data into Tableau

Scalable Tools - Part I Introduction to Scalable Tools

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Top Five Reasons for Data Warehouse Modernization Philip Russom

Technical Sheet NITRODB Time-Series Database

microsoft

IBM s Integrated Data Management Solutions for the DBA

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Massive Scalability With InterSystems IRIS Data Platform

Cloudy with 100% chance of data

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Aurora, RDS, or On-Prem, Which is right for you

4) An organization needs a data store to handle the following data types and access patterns:

Microsoft Exam

THE DATA WAREHOUSE BUILT FOR THE CLOUD WHITEPAPER

USERS CONFERENCE Copyright 2016 OSIsoft, LLC

Four Steps to Unleashing The Full Potential of Your Database

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.

What s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services

Netezza The Analytics Appliance

Virtuoso Infotech Pvt. Ltd.

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012

DURATION : 03 DAYS. same along with BI tools.

Two Success Stories - Optimised Real-Time Reporting with BI Apps

Transcription:

Welcome

# T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande

I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief Software Architect www.linkedin.com/in/avinashpd1

Batch Data Velocity Real-Time Logitech Data Use Cases Natural Language Processing (NLP) VR Gaming Marketing Funnel Predictive Analytics Sales Channel Mgmt iot Retail Data scrapping Social Media Sentiment Security Video Analysis Smart Home Device Events Demand Forecasting Price violations on Retail sites Multi site ERP Machine Learning Data Warehousing Text Mining Structured Semi-Structured Unstructured

Analytics at Scale Supporting Our Growing Business

Real-Time on Demand Delivery to Your Phone, Desktop, and Dashboard Executive summaries Customer by product Product by customer Demand/Supply updates Market analytics/market share Marketing reports Competitive analysis Sentiment Consumer persona generation Granular consumer segmentation Marketing spend optimization Consumer value management Consumer lifetime value analysis Context based marketing

Cloud Empowers IT Organizations to Redefine the Way Data Services are Produced and Delivered Scalable Efficient Elastic infrastructure Simple, secure, robust, and scalable Pay as use Reliable Managed services Governed Transparency on usage patterns Breadth of services

Need for Data Virtualization Abstract access to disparate data sources A single semantic repository Optimized data availability in real-time to consumers Centralized, governed and secured data layer

Improve the User Experience User Pain: Report is always slower when I want to use it (peak business hours) Snowflake is able to flex-up compute power in seconds. Business users can have their own isolated instance of right sized compute so that performance is always consistent for the work they do, and not impacted by what others are doing.

Improve the User Experience User Pain: I want access to more historical data than I have today Snowflake s low cost, fast, infinitely scalable storage layer removes the limitations of adding and keeping more historical data than typical data warehouse solutions allow.

Improve the User Experience User Pain: Commonly used reports always seem to be slow Snowflake has the unique ability to globally cache commonly used queries that are sent via Tableau. This means that commonly used workbooks are almost always cached and end users experience extremely fast performance regardless of how many people are running the same workbook.

Improve the User Experience User Pain: I want to explore non-traditional data sets that aren t currently available Unlike other traditional DW solutions, Snowflake treats non-traditional data types like JSON/AVRO/XML as first class citizens (direct SQL access and fast performance). This allows the data to be immediately available without complex ETL.

Improve the User Experience User Pain: I m tired of waiting for new data to be loaded into the system. Snowflake s unique architecture allows customers to implement new data ingestion processes like 24/7 loading. This lets end users see their data in near real-time vs the traditional nightly batch. Use Tableau Live Connection rather than Extract.

edw Solution Architecture Data Producer Data Consumer Business Layer Reporting / Advanced Analytics Layer ebs -Exadata Reports AWS

IoT Solution Architecture Edge Compute Data Consumer Business Layer Reporting / Advanced Analytics Layer Options Edge Compute Kafka Use Snowpipe to enable realtime ingestion Keep raw data in Semi Structured JSON format Create structured objects with Cleaned and/or aggregated data Denodo Views Create business specific views for reporting Reports

SNOWFLAKE BENCHMARK

Other Popular Columnar db Architecture/Storage: Traditional shared nothing architecture. Data lives on EC2 nodes, requiring costly 24/7 uptime, even when not in use. Data Types: Requires use of additional tools (Hadoop, Mongo, etc.) to ingest and make semistructured data available. Scalability: Extended process to resize compute resources to accommodate additional demand. Concurrency: Published limits of 50 concurrent users/queries, but generally slows down around 15. Administration/Design: Need to continually manage: vacuuming, distribution/sort keys, compression, metadata, indexing, backups, etc. Need to understand data model in advance. Snowflake Architecture/Storage: Multi-cluster shared data architecture. Data stored in S3, allowing multiple EC2 compute clusters to access simultaneously without contention. Data Types: Ability to ingest and query raw JSON, XML, Avro, Parquet without prior transformation. Scalability: Data not coupled to compute, allowing the ability to resize instantly and shut down when not in use. Concurrency: Ability to isolate users on separate compute resources to avoid contention. Auto-scale feature scales compute resources horizontally to support concurrent workloads. Administration/Design: ZERO; free up your DBA team for other tasks. Load data in real time without need for model.

ATHENA SNOWFLAKE Difficult to set up and tune performance Does not provide any options for end user to influence performance Difficult to manage usage Resources usage over time Queries and data retrieved Cost associated to increase capacity and support Need to add partitions By default, concurrency limits allow you to submit twenty concurrent DDL queries and twenty concurrent SELECT queries at a time and query timeout is 30 minutes Schema needed ahead of time For performance, data needs to be converted to columnar Performance out-of-box. Advanced tuning with auto clustering Allows to reserve various compute configurations as needed Usage can be segregated at compute level Horizontal and vertical scaling without down time Cost is consistent No need to add partitions Default concurrency is 300 (15x) and can be raised if necessary Schema on read Default columnar format

Spark on Snowflake It's easier to manage data in tables than in files on S3. If you ever need to dedupe, update, or delete data, you can do that with standard SQL in Snowflake but need to write a program to do it on S3. In order to get good performance, you have to optimize the file formats, partition sizes, etc when working on files in S3. If you want to join the data with any other data in Snowflake, you can do it easily. It's easier to manage security in a database using RBAC than on files in S3 using policy documents. The performance will be better running on top of Snowflake with the custom Spark connector's pushdown capability. That feature pushes part or all of the sparkplan into Snowflake including filters, projections, joins, and aggregates. This helps minimize the amount of data the spark cluster needs to pull into memory and the amount of work it has to do to process that data.

Unique Snowflake Features JSON: ingest raw JSON without transformation. Query JSON with SQL and correlate against relational data Cloning: instant dev/test environments or point in time snapshots. Time Travel: Query data as of any point in time within the past 90 days Query Caching: instant results for Executive dashboards and commonly run reports. Backups: automatic cross data center replication Data Sharing: publish or consume data sets to or from external clients without direct system access Auto-Scaling: dynamic horizontal scaling for concurrency to deliver consistent SLAs Central Data Store: Get everyone under one platform Upgrades: weekly system updates with zero downtime Security: encryption by default Charge Back: monitor business usage to understand how much each user costs you

Big Data Fabric Data Virtualization AWS S3 Snowflake Facebook Zendesk Paypal Shipstation Google analytics Adobe analytics Amazon marketing NLP Shopify

Humanizing Data Insights Although big data and analytics have made data more accessible to business users but still requires human effort. The automation enabled a business user (e.g. a sales rep) to post a question (e.g. What are the Q3 sales trends for Product A in North America?) to a chatbot in conversational language and receive an answer with data insights that are completely humanized (e.g. The total Q3 sales for Product A in North America totaled $200.4 M, a 15% increase from Q3 last year, but only a 5% increase from last quarter.

ANUVAAD Provides you quick answers to your supply chain queries asked in English Enter a question SEND BUTTON Click Send and wait for about 15 seconds for result Question Asked Result Statistics

Insights

Operations

Retail Pricing

POS

Sentiment Analysis

Video Analysis

Text Analysis

IOT

Please complete the session survey from the Session Details screen in your TC18 app

#TC18 Thank you!