Welcome
# T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande
I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief Software Architect www.linkedin.com/in/avinashpd1
Batch Data Velocity Real-Time Logitech Data Use Cases Natural Language Processing (NLP) VR Gaming Marketing Funnel Predictive Analytics Sales Channel Mgmt iot Retail Data scrapping Social Media Sentiment Security Video Analysis Smart Home Device Events Demand Forecasting Price violations on Retail sites Multi site ERP Machine Learning Data Warehousing Text Mining Structured Semi-Structured Unstructured
Analytics at Scale Supporting Our Growing Business
Real-Time on Demand Delivery to Your Phone, Desktop, and Dashboard Executive summaries Customer by product Product by customer Demand/Supply updates Market analytics/market share Marketing reports Competitive analysis Sentiment Consumer persona generation Granular consumer segmentation Marketing spend optimization Consumer value management Consumer lifetime value analysis Context based marketing
Cloud Empowers IT Organizations to Redefine the Way Data Services are Produced and Delivered Scalable Efficient Elastic infrastructure Simple, secure, robust, and scalable Pay as use Reliable Managed services Governed Transparency on usage patterns Breadth of services
Need for Data Virtualization Abstract access to disparate data sources A single semantic repository Optimized data availability in real-time to consumers Centralized, governed and secured data layer
Improve the User Experience User Pain: Report is always slower when I want to use it (peak business hours) Snowflake is able to flex-up compute power in seconds. Business users can have their own isolated instance of right sized compute so that performance is always consistent for the work they do, and not impacted by what others are doing.
Improve the User Experience User Pain: I want access to more historical data than I have today Snowflake s low cost, fast, infinitely scalable storage layer removes the limitations of adding and keeping more historical data than typical data warehouse solutions allow.
Improve the User Experience User Pain: Commonly used reports always seem to be slow Snowflake has the unique ability to globally cache commonly used queries that are sent via Tableau. This means that commonly used workbooks are almost always cached and end users experience extremely fast performance regardless of how many people are running the same workbook.
Improve the User Experience User Pain: I want to explore non-traditional data sets that aren t currently available Unlike other traditional DW solutions, Snowflake treats non-traditional data types like JSON/AVRO/XML as first class citizens (direct SQL access and fast performance). This allows the data to be immediately available without complex ETL.
Improve the User Experience User Pain: I m tired of waiting for new data to be loaded into the system. Snowflake s unique architecture allows customers to implement new data ingestion processes like 24/7 loading. This lets end users see their data in near real-time vs the traditional nightly batch. Use Tableau Live Connection rather than Extract.
edw Solution Architecture Data Producer Data Consumer Business Layer Reporting / Advanced Analytics Layer ebs -Exadata Reports AWS
IoT Solution Architecture Edge Compute Data Consumer Business Layer Reporting / Advanced Analytics Layer Options Edge Compute Kafka Use Snowpipe to enable realtime ingestion Keep raw data in Semi Structured JSON format Create structured objects with Cleaned and/or aggregated data Denodo Views Create business specific views for reporting Reports
SNOWFLAKE BENCHMARK
Other Popular Columnar db Architecture/Storage: Traditional shared nothing architecture. Data lives on EC2 nodes, requiring costly 24/7 uptime, even when not in use. Data Types: Requires use of additional tools (Hadoop, Mongo, etc.) to ingest and make semistructured data available. Scalability: Extended process to resize compute resources to accommodate additional demand. Concurrency: Published limits of 50 concurrent users/queries, but generally slows down around 15. Administration/Design: Need to continually manage: vacuuming, distribution/sort keys, compression, metadata, indexing, backups, etc. Need to understand data model in advance. Snowflake Architecture/Storage: Multi-cluster shared data architecture. Data stored in S3, allowing multiple EC2 compute clusters to access simultaneously without contention. Data Types: Ability to ingest and query raw JSON, XML, Avro, Parquet without prior transformation. Scalability: Data not coupled to compute, allowing the ability to resize instantly and shut down when not in use. Concurrency: Ability to isolate users on separate compute resources to avoid contention. Auto-scale feature scales compute resources horizontally to support concurrent workloads. Administration/Design: ZERO; free up your DBA team for other tasks. Load data in real time without need for model.
ATHENA SNOWFLAKE Difficult to set up and tune performance Does not provide any options for end user to influence performance Difficult to manage usage Resources usage over time Queries and data retrieved Cost associated to increase capacity and support Need to add partitions By default, concurrency limits allow you to submit twenty concurrent DDL queries and twenty concurrent SELECT queries at a time and query timeout is 30 minutes Schema needed ahead of time For performance, data needs to be converted to columnar Performance out-of-box. Advanced tuning with auto clustering Allows to reserve various compute configurations as needed Usage can be segregated at compute level Horizontal and vertical scaling without down time Cost is consistent No need to add partitions Default concurrency is 300 (15x) and can be raised if necessary Schema on read Default columnar format
Spark on Snowflake It's easier to manage data in tables than in files on S3. If you ever need to dedupe, update, or delete data, you can do that with standard SQL in Snowflake but need to write a program to do it on S3. In order to get good performance, you have to optimize the file formats, partition sizes, etc when working on files in S3. If you want to join the data with any other data in Snowflake, you can do it easily. It's easier to manage security in a database using RBAC than on files in S3 using policy documents. The performance will be better running on top of Snowflake with the custom Spark connector's pushdown capability. That feature pushes part or all of the sparkplan into Snowflake including filters, projections, joins, and aggregates. This helps minimize the amount of data the spark cluster needs to pull into memory and the amount of work it has to do to process that data.
Unique Snowflake Features JSON: ingest raw JSON without transformation. Query JSON with SQL and correlate against relational data Cloning: instant dev/test environments or point in time snapshots. Time Travel: Query data as of any point in time within the past 90 days Query Caching: instant results for Executive dashboards and commonly run reports. Backups: automatic cross data center replication Data Sharing: publish or consume data sets to or from external clients without direct system access Auto-Scaling: dynamic horizontal scaling for concurrency to deliver consistent SLAs Central Data Store: Get everyone under one platform Upgrades: weekly system updates with zero downtime Security: encryption by default Charge Back: monitor business usage to understand how much each user costs you
Big Data Fabric Data Virtualization AWS S3 Snowflake Facebook Zendesk Paypal Shipstation Google analytics Adobe analytics Amazon marketing NLP Shopify
Humanizing Data Insights Although big data and analytics have made data more accessible to business users but still requires human effort. The automation enabled a business user (e.g. a sales rep) to post a question (e.g. What are the Q3 sales trends for Product A in North America?) to a chatbot in conversational language and receive an answer with data insights that are completely humanized (e.g. The total Q3 sales for Product A in North America totaled $200.4 M, a 15% increase from Q3 last year, but only a 5% increase from last quarter.
ANUVAAD Provides you quick answers to your supply chain queries asked in English Enter a question SEND BUTTON Click Send and wait for about 15 seconds for result Question Asked Result Statistics
Insights
Operations
Retail Pricing
POS
Sentiment Analysis
Video Analysis
Text Analysis
IOT
Please complete the session survey from the Session Details screen in your TC18 app
#TC18 Thank you!