Pagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB
|
|
- Vernon Patrick
- 5 years ago
- Views:
Transcription
1 Pagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB Pagely is the market leader in managed WordPress hosting, and an AWS Advanced Technology, SaaS, and Public Sector partner. We provide various tiers of high performance Wordpress hosting services for enterprise level customers like BMC, Unicef, Northwestern University, and the City of Boston, offering flexibility in our solutions and the industries best expert-only, tier-less support. Pagely utilizes a proprietary tech stack that accelerates WordPress sites through the use of our own ARES Web Application gateway, PressCACHE and PressCDN technologies, as well as open source tools such as Redis and Nginx. In order to answer usage, billing, and other customer questions, our service team requires access to the logs created by the application servers. Historically we relied on a shell script that gathered basic statistics on-demand when needed. The job to process the logs for our largest customer ran for 8 hours or more for a single report, sometimes crashing due to resource limitations. Instead of putting more effort to fix a legacy process, we decided it was time to implement a proper analytics platform. Amazon Athena allows us to run SQL queries directly against the logs, which are stored as compressed JSON files in Amazon S3. This approach is great because there is no need for us to prepare the data, simply define the table and query away. While JSON is a supported format for Amazon Athena, it is not the most efficient format for use at query time. JSON files must be read in their entirety, even if you are only using 1 or 2 fields from each row of data. Besides not being cost effective, the inefficiencies of processing JSON causes longer query times. Querying the logs of our largest customers was not ideal with Athena, as we ran into the 30 minute query timeout limit. This limit can be increased but, however the query was already taking longer than we wanted. Partitioning and Columnar Formats The best practice for structuring the data in S3 is twofold, partitioning and columnar file structure.. Partitioning is the process of splitting data into different prefixes or folders on S3 with a naming convention that s most suitable to efficient retrieval of data. This allows the Athena to skip over data that is not relevant to the particular query being executed. Apache Parquet is a columnar file format popular with tools in the Hadoop ecosystem. Parquet stores the columns of the data in separate, contiguous regions in the file. Directed by metadata footers, tools like Athena can read only the sections of the file that are needed to fulfill the query, eliminating a large portion of the IO and network transfer. Reducing IO through partitioning and parquet files not only increases query performance, but it can dramatically reduce the cost of using Athena. Engaging Beyondsoft...
2 We knew that we needed to transform our data into partitioned parquet in order to make it performant with Athena, but being a lean shop, we didn't have the bandwidth to dive into the technologies. In order to bridge the gap, we engaged Beyondsoft, an AWS Advanced Partner, to optimize our data lake using their open source tool, ConvergDB. ConvergDB ConvergDB is a devops-friendly approach to managing serverless data lakes. Tables are defined using technology agnostic schema definitions which are then deployed to concrete cloud services (such as Glue and Athena) through the use of Hashicorp Terraform. The schema and deployment definitions provide a single point of management for the structure and behavior of the data as it flows through the cloud. ConvergDB does not requires servers to operate, but is used either locally on a user's machine, or in a CI/CD pipeline. The appeal of managing our data with ConvergDB is that we can design our data lake by defining only the important elements. The schema files are used to define tables, including field level SQL expressions that are used to transform the incoming data as it is being loaded. This makes it easy to derive calculated fields, as well as the fields used for data partitioning. Once the schema is defined, the deployment file allows us to place the tables into an ETL job that is used to manage them. The ETL job schedule is specified in the deployment file, as well as optional fields such as the target S3 bucket and number of Glue DPUs to use at run time. ConvergDB is a command line binary and does not need to be installed on a server. All of the artifacts are files that can be managed with source control. This makes ConvergDB easy to integrate into CI/CD pipelines created with the tooling of your choice. The ConvergDB binary takes in all of the configuration files, then outputs a Terraform configuration containing all of the artifacts necessary to deploy the data lake such as ETL scripts, table and database definitions, IAM policies necessary to run the jobs, SNS notification
3 topics, and even a Cloudwatch dashboard showing the volume of data processed by ConvergDB ETL jobs. Speed Bumps No implementation goes perfectly. The next sections are provided by Jeremy Winters, a Beyondsoft engineer, explaining the problems they ran into and how they were addressed. Small File Problem A classic issue encountered with Hadoop ecosystem tools is known as the "small file problem". Processing a large number of small files creates a lot of overhead for the system, causing job execution times to skyrocket, and potentially fail. Pagely had approximately 4TB of history across 30 million files million of these files only represented 1.2TB of the data in S3. In order to analyze this issue, we enabled S3 inventory reporting on the source data bucket. The report is delivered daily in an ORC format. From there it is very easy to create an Athena table to analyze the bucket contents with SQL. We used Athena to identify S3 prefixes that were "hot spots"... having a large number of small files. We identified prefixes with less than 1GB of data that we could consolidate. So million files consolidated into files. The following query is a way to identify small file hot spots. The group by expression can be suited to your data. The example shows a way of grouping by the first folder in the bucket. select -- we are looking at the first string in a / delimited path -- if the key is path_to_data/ json.. it will group on path_to_data split_part(key,'/',1) as prefix -- calculate the total size in mb for all files in prefix,sum(size)/cast(1024*1024 as double) as mb -- count of objects in the prefix,count(*) as object_count from pagely_gateway_logs
4 where -- assumes that versioning is disabled -- you should use the latest date after -- refreshing all partitions dt = ' ' group by 1 having -- only return prefixes with a total size of less than 1 gb -- and a file count greater than 8 sum(size)/cast(1024*1024 as double) < 1024 and count(*) >= 8 The results show prefixes in your object paths that can, and should be consolidated. Anything less than 1GB with more than 8 files can then be consolidated into a single object, replacing the originals. To perform the actual consolidation, we ran a containerized script using Fargate, the serverless Docker container feature of ECS. Each worker container instance processed the files for a given S3 key prefix. A governor container managed the lifecycle of the workers, limiting concurrency, and keeping track of which jobs succeeded. Using Fargate, we were able to perform the consolidation of all the small files for $27. Historical Data
5 Daily data volumes for Pagely logs are in the 10s of GB per day, easily handled by the smallest AWS Glue configuration. Transforming the 4TB compressed (~28TB uncompressed) of historical data was a bit more challenging. For example, if you are 20 hours into a data transformation, and the job tries to process a file with an incorrect S3 ACL, the entire job will fail, resulting in 20 hours of wasted compute resources. ConvergDB mitigates the risk of wasting compute resources by batching the data into smaller chunks. In the case of a 20 hour job failing, only the last batch will be lost, resulting in around one hour of compute being lost. ConvergDB uses its own state tracking mechanism to communicate the failure to the next run of the job, which will clean up any mess before trying to process the batch again. Batching is an automatic feature of the ETL job created by ConvergDB, based upon the size of the Glue cluster. Post-deployment at Pagely Now that our data lake is in production, running our legacy report for a medium size application took 91 seconds to run with the legacy process, and 5 seconds when run from Athena.. For a gain of 18x. Our largest data set breaks our legacy process, and is not performant when querying the JSON directly with Athena, but the new tables enable completion of the analysis in 24 seconds. Legacy Process Athena with JSON Athena with Parquet Medium Customer Largest Customer 1m 31s 1m 6s > 8 hours > 30 min 24s While these numbers are obviously important, the biggest advantage is that now we don't have to worry about performance and cost, and the engineer can focus on solving problems, 15 minutes of writing queries and the entire team now has access to new data. I was able to upgrade the legacy process with queries dispatched to Athena through the AWS SDK. This process can now run on any lightweight machine (like my laptop) while Athena does the heavy lifting. About Beyondsoft Consulting, Inc. Beyondsoft Consulting, Inc. is a leading Cloud consulting, services, and technology company. Beyondsoft delivers solutions and services globally and across many verticals. Our team of highly skilled professionals, coupled with our focus on customer success, truly separates us as an Amazon Web Services Advanced Partner.
An Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationHow can you implement this through a script that a scheduling daemon runs daily on the application servers?
You ve been tasked with implementing an automated data backup solution for your application servers that run on Amazon EC2 with Amazon EBS volumes. You want to use a distributed data store for your backups
More informationLambda Architecture for Batch and Stream Processing. October 2018
Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationPart 1: Indexes for Big Data
JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,
More informationCorriendo R sobre un ambiente Serverless: Amazon Athena
Corriendo R sobre un ambiente Serverless: Amazon Athena Mauricio Muñoz Solutions Architect, AWS Chile April, 2017 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Web Services
More informationExperiences with Serverless Big Data
Experiences with Serverless Big Data AWS Meetup Munich 2016 Markus Schmidberger, Head of Data Service Munich, 17.10.16 Key Components of our Data Service Real-Time Monitoring Enable our development teams
More informationHow to go serverless with AWS Lambda
How to go serverless with AWS Lambda Roman Plessl, nine (AWS Partner) Zürich, AWSomeDay 12. September 2018 About myself and nine Roman Plessl Working for nine as a Solution Architect, Consultant and Leader.
More informationContainers or Serverless? Mike Gillespie Solutions Architect, AWS Solutions Architecture
Containers or Serverless? Mike Gillespie Solutions Architect, AWS Solutions Architecture A Typical Application with Microservices Client Webapp Webapp Webapp Greeting Greeting Greeting Name Name Name Microservice
More informationDatabricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes
Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes AN UNDER THE HOOD LOOK Databricks Delta, a component of the Databricks Unified Analytics Platform*, is a unified
More informationFIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION
FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION The process of planning and executing SQL Server migrations can be complex and risk-prone. This is a case where the right approach and
More informationExam Questions
Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure
More informationDevOps Tooling from AWS
DevOps Tooling from AWS What is DevOps? Improved Collaboration - the dropping of silos between teams allows greater collaboration and understanding of how the application is built and deployed. This allows
More informationAWS Service Catalog. User Guide
AWS Service Catalog User Guide AWS Service Catalog: User Guide Copyright 2017 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in
More informationAWS Course Syllabus. Linux Fundamentals. Installation and Initialization:
AWS Course Syllabus Linux Fundamentals Installation and Initialization: Installation, Package Selection Anatomy of a Kickstart File, Command line Introduction to Bash Shell System Initialization, Starting
More informationPUBLIC SAP Vora Sizing Guide
SAP Vora 2.0 Document Version: 1.1 2017-11-14 PUBLIC Content 1 Introduction to SAP Vora....3 1.1 System Architecture....5 2 Factors That Influence Performance....6 3 Sizing Fundamentals and Terminology....7
More informationAutonomous Database Level 100
Autonomous Database Level 100 Sanjay Narvekar December 2018 1 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationexam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0
70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to
More information4 Effective Tools for Docker Monitoring. By Ranvijay Jamwal
4 Effective Tools for Docker Monitoring By Ranvijay Jamwal CONTENT 1. The need for Container Technologies 2. Introduction to Docker 2.1. What is Docker? 2.2. Why is Docker popular? 2.3. How does a Docker
More informationAWS Administration. Suggested Pre-requisites Basic IT Knowledge
Course Description Amazon Web Services Administration (AWS Administration) course starts your Cloud Journey. If you are planning to learn Cloud Computing and Amazon Web Services in particular, then this
More informationIntro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect
Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Igor Roiter Big Data Cloud Solution Architect Working as a Data Specialist for the last 11 years 9 of them as a Consultant specializing
More informationBI ENVIRONMENT PLANNING GUIDE
BI ENVIRONMENT PLANNING GUIDE Business Intelligence can involve a number of technologies and foster many opportunities for improving your business. This document serves as a guideline for planning strategies
More informationAbout Intellipaat. About the Course. Why Take This Course?
About Intellipaat Intellipaat is a fast growing professional training provider that is offering training in over 150 most sought-after tools and technologies. We have a learner base of 600,000 in over
More informationAccenture Cloud Platform Serverless Journey
ARC202 Accenture Cloud Platform Serverless Journey Tom Myers, Sr. Cloud Architect, Accenture Cloud Platform Matt Lancaster, Lightweight Architectures Global Lead November 29, 2016 2016, Amazon Web Services,
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationAALOK INSTITUTE. DevOps Training
DevOps Training Duration: 40Hrs (8 Hours per Day * 5 Days) DevOps Syllabus 1. What is DevOps? a. History of DevOps? b. How does DevOps work anyways? c. Principle of DevOps: d. DevOps combines the best
More informationAutomating Elasticity. March 2018
Automating Elasticity March 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only. It represents AWS s current product
More informationResearch at PNNL: Powered by AWS NLIT 2018
Research at PNNL: Powered by AWS NLIT 2018 RALPH PERKO AND MIKE GIARDINELLI Pacific Northwest National Laboratory Reference herein to any specific commercial product, process, or service by trade name,
More informationEsgynDB Enterprise 2.0 Platform Reference Architecture
EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed
More informationEnergy Management with AWS
Energy Management with AWS Kyle Hart and Nandakumar Sreenivasan Amazon Web Services August [XX], 2017 Tampa Convention Center Tampa, Florida What is Cloud? The NIST Definition Broad Network Access On-Demand
More informationLow Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR
Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR Table of Contents Foreword... 2 New Era of Rapid Data Warehousing... 3 Eliminating Slow Reporting and Analytics Pains... 3 Applying 20 Years
More informationProject Direction Proven ability to lead and manage a wide variety of design and development projects in team and independent situations.
+ Mike Hall Software Developer Email: mike@just3ws.com Telephone: (847) 877-3825 LinkedIn: linkedin.com/in/just3ws Skills API design Designed and refactored many application interfaces for use as libraries
More informationCloudExpo November 2017 Tomer Levi
CloudExpo November 2017 Tomer Levi About me Full Stack Engineer @ Intel s Advanced Analytics group. Artificial Intelligence unit at Intel. Responsible for (1) Radical improvement of critical processes
More informationDURATION : 03 DAYS. same along with BI tools.
AWS REDSHIFT TRAINING MILDAIN DURATION : 03 DAYS To benefit from this Amazon Redshift Training course from mildain, you will need to have basic IT application development and deployment concepts, and good
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Aurélie Urbain MathWorks Consulting Services 2015 The MathWorks, Inc. 1 Data Analytics Workflow Data Acquisition Data Analytics Analytics Integration Business
More informationTour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect
Tour of Database Platforms as a Service June 2016 Warner Chaves Christo Kutrovsky Solutions Architect Bio Solutions Architect at Pythian Specialize high performance data processing and analytics 15 years
More informationBest Practices and Performance Tuning on Amazon Elastic MapReduce
Best Practices and Performance Tuning on Amazon Elastic MapReduce Michael Hanisch Solutions Architect Amo Abeyaratne Big Data and Analytics Consultant ANZ 12.04.2016 2016, Amazon Web Services, Inc. or
More informationGabriel Villa. Architecting an Analytics Solution on AWS
Gabriel Villa Architecting an Analytics Solution on AWS Cloud and Data Architect Skilled leader, solution architect, and technical expert focusing primarily on Microsoft technologies and AWS. Passionate
More informationReal-time Streaming Applications on AWS Patterns and Use Cases
Real-time Streaming Applications on AWS Patterns and Use Cases Paul Armstrong - Solutions Architect (AWS) Tom Seddon - Data Engineering Tech Lead (Deliveroo) 28 th June 2017 2016, Amazon Web Services,
More informationEvolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo
Evolution of Big Data Architectures@ Facebook Architecture Summit, Shenzhen, August 2012 Ashish Thusoo About Me Currently Co-founder/CEO of Qubole Ran the Data Infrastructure Team at Facebook till 2011
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Dr. Roland Michaely 2015 The MathWorks, Inc. 1 Data Analytics Workflow Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics
More informationMigrate from Netezza Workload Migration
Migrate from Netezza Automated Big Data Open Netezza Source Workload Migration CASE SOLUTION STUDY BRIEF Automated Netezza Workload Migration To achieve greater scalability and tighter integration with
More information70-532: Developing Microsoft Azure Solutions
70-532: Developing Microsoft Azure Solutions Exam Design Target Audience Candidates of this exam are experienced in designing, programming, implementing, automating, and monitoring Microsoft Azure solutions.
More informationSTATE OF MODERN APPLICATIONS IN THE CLOUD
STATE OF MODERN APPLICATIONS IN THE CLOUD 2017 Introduction The Rise of Modern Applications What is the Modern Application? Today s leading enterprises are striving to deliver high performance, highly
More informationQLIK INTEGRATION WITH AMAZON REDSHIFT
QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik
More informationWhat is Gluent? The Gluent Data Platform
What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the
More informationIBM Big SQL Partner Application Verification Quick Guide
IBM Big SQL Partner Application Verification Quick Guide VERSION: 1.6 DATE: Sept 13, 2017 EDITORS: R. Wozniak D. Rangarao Table of Contents 1 Overview of the Application Verification Process... 3 2 Platform
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationMonitoring and Operating Cisco Prime Service Catalog Reports
Monitoring and Operating Cisco Prime Service Catalog Reports This chapter contains the following topics: Configuring Cognos Memory Usage, page 1 Refreshing the Standard Reports Package, page 2 Refreshing
More informationSALESFORCE DEVELOPER LIMITS AND ALLOCATIONS QUICK REFERENCE
SALESFORCE DEVELOPER LIMITS AND ALLOCATIONS QUICK REFERENCE Summary Find the most critical limits for developing Lightning Platform applications. About This Quick Reference This quick reference provides
More informationImportant DevOps Technologies (3+2+3days) for Deployment
Important DevOps Technologies (3+2+3days) for Deployment DevOps is the blending of tasks performed by a company's application development and systems operations teams. The term DevOps is being used in
More informationOracle Exadata: Strategy and Roadmap
Oracle Exadata: Strategy and Roadmap - New Technologies, Cloud, and On-Premises Juan Loaiza Senior Vice President, Database Systems Technologies, Oracle Safe Harbor Statement The following is intended
More informationAccelerate Big Data Insights
Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not
More informationIntegrating Splunk with AWS services:
Integrating Splunk with AWS services: Using Redshi+, Elas0c Map Reduce (EMR), Amazon Machine Learning & S3 to gain ac0onable insights via predic0ve analy0cs via Splunk Patrick Shumate Solutions Architect,
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationFAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide
FAQs 1. What is the browser compatibility for logging into the TCS Connected Intelligence Data Lake for Business Portal? Please check whether you are using Mozilla Firefox 18 or above and Google Chrome
More informationArchitectural challenges for building a low latency, scalable multi-tenant data warehouse
Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics
More informationC ibm IBM C Foundations of IBM Cloud Reference Architecture V5 Version 1.0
C5050-287.ibm Number: C5050-287 Passing Score: 800 Time Limit: 120 min File Version: 1.0 IBM C5050-287 Foundations of IBM Cloud Reference Architecture V5 Version 1.0 Exam A QUESTION 1 Which IT methodology
More informationARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS
ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS Dr Adnene Guabtni, Senior Research Scientist, NICTA/Data61, CSIRO Adnene.Guabtni@csiro.au EC2 S3 ELB RDS AMI
More informationThe Seven Steps to Implement DataOps
The Seven Steps to Implement Ops ABSTRACT analytics teams challenged by inflexibility and poor quality have found that Ops can address these and many other obstacles. Ops includes tools and process improvements
More informationAutomated Netezza Migration to Big Data Open Source
Automated Netezza Migration to Big Data Open Source CASE STUDY Client Overview Our client is one of the largest cable companies in the world*, offering a wide range of services including basic cable, digital
More informationCS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014
CS15-319 / 15-619 Cloud Computing Recitation 3 September 9 th & 11 th, 2014 Overview Last Week s Reflection --Project 1.1, Quiz 1, Unit 1 This Week s Schedule --Unit2 (module 3 & 4), Project 1.2 Questions
More informationServerless Architectures with AWS Lambda. David Brais & Udayan Das
Serverless Architectures with AWS Lambda by David Brais & Udayan Das 1 AGENDA AWS Lambda Basics Invoking Lambda Setting up Lambda Handlers Use Cases ASP.NET Web Service Log Processing with AWS Lambda +
More informationMapR Enterprise Hadoop
2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationIn-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet
In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Big data analytics / machine learning 6+ years
More informationWerden Sie ein Teil von Internet der Dinge auf AWS. AWS Enterprise Summit 2015 Dr. Markus Schmidberger -
Werden Sie ein Teil von Internet der Dinge auf AWS AWS Enterprise Summit 2015 Dr. Markus Schmidberger - schmidbe@amazon.de Internet of Things is the network of physical objects or "things" embedded with
More informationTECHNICAL BRIEF. Scheduling and Orchestration of Heterogeneous Docker-Based IT Landscapes. January 2017 Version 2.0 For Public Use
TECHNICAL BRIEF Scheduling and Orchestration of Heterogeneous Docker-Based IT Landscapes January 2017 Version 2.0 For Public Use Table of Contents 1 Summary... 2 2 Introduction... 2 3 Stonebranch DevOps
More informationData Lake Best Practices
Data Lake Best Practices Agenda Why Data Lake Key Components of a Data Lake Modern Data Architecture Some Best Practices Case Study Summary Takeaways What is a Data Lake? What, why etc. What is a data
More informationAccelerating BI on Hadoop: Full-Scan, Cubes or Indexes?
White Paper Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? How to Accelerate BI on Hadoop: Cubes or Indexes? Why not both? 1 +1(844)384-3844 INFO@JETHRO.IO Overview Organizations are storing more
More informationShuttleService. Scalable Big Data Processing Utilizing Cloud Structures. A Tick Data Custom Data Solutions Group Case Study
ShuttleService Scalable Big Data Processing Utilizing Cloud Structures A Tick Data Custom Data Solutions Group Case Study Robert Fenster, Senior Engineer and AWS Certified Solutions Architect Neal Falkenberry,
More informationHigh School Technology Services myhsts.org Certification Courses
AWS Associate certification training Last updated on June 2017 a- AWS Certified Solutions Architect (40 hours) Amazon Web Services (AWS) Certification is fast becoming the must have certificates for any
More informationAmazon Web Services (AWS) Solutions Architect Intermediate Level Course Content
Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content Introduction to Cloud Computing A Short history Client Server Computing Concepts Challenges with Distributed Computing Introduction
More informationAsanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks
Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More information70-532: Developing Microsoft Azure Solutions
70-532: Developing Microsoft Azure Solutions Objective Domain Note: This document shows tracked changes that are effective as of January 18, 2018. Create and Manage Azure Resource Manager Virtual Machines
More informationPercona Server for MySQL 8.0 Walkthrough
Percona Server for MySQL 8.0 Walkthrough Overview, Features, and Future Direction Tyler Duzan Product Manager MySQL Software & Cloud 01/08/2019 1 About Percona Solutions for your success with MySQL, MongoDB,
More informationOverview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::
Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional
More informationAt Course Completion Prepares you as per certification requirements for AWS Developer Associate.
[AWS-DAW]: AWS Cloud Developer Associate Workshop Length Delivery Method : 4 days : Instructor-led (Classroom) At Course Completion Prepares you as per certification requirements for AWS Developer Associate.
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016
Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation
More informationSan Jose Water Company Expedites New Feature Delivery with DevOps Help from ClearScale on AWS
San Jose Water Company Expedites New Feature Delivery with DevOps Help from ClearScale on AWS 2016 ClearScale LLC. All rights reserved. Executive Summary Founded in 1866, San Jose Water Company (SJWC)
More informationWhen, Where & Why to Use NoSQL?
When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),
More informationTechnical Sheet NITRODB Time-Series Database
Technical Sheet NITRODB Time-Series Database 10X Performance, 1/10th the Cost INTRODUCTION "#$#!%&''$!! NITRODB is an Apache Spark Based Time Series Database built to store and analyze 100s of terabytes
More informationApache Hive for Oracle DBAs. Luís Marques
Apache Hive for Oracle DBAs Luís Marques About me Oracle ACE Alumnus Long time open source supporter Founder of Redglue (www.redglue.eu) works for @redgluept as Lead Data Architect @drune After this talk,
More informationAmazon Search Services. Christoph Schmitter
Amazon Search Services Christoph Schmitter csc@amazon.de What we'll cover Overview of Amazon Search Services Understand the difference between Cloudsearch and Amazon ElasticSearch Service Q&A Amazon Search
More information2018 Database DevOps Survey DBmaestro 1
2018 Database DevOps Survey 2017 DBmaestro 1 Table of Contents Executive Summary... 3 What Percentage of IT Projects in Your Company Use a DevOps Approach?... 4 Integration of DBAs with DevOps Teams...
More information4) An organization needs a data store to handle the following data types and access patterns:
1) A company needs to deploy a data lake solution for their data scientists in which all company data is accessible and stored in a central S3 bucket. The company segregates the data by business unit,
More informationScaling DreamFactory
Scaling DreamFactory This white paper is designed to provide information to enterprise customers about how to scale a DreamFactory Instance. The sections below talk about horizontal, vertical, and cloud
More informationFuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc
Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationSALESFORCE DEVELOPER LIMITS AND ALLOCATIONS QUICK REFERENCE
SALESFORCE DEVELOPER LIMITS AND ALLOCATIONS QUICK REFERENCE Summary Find the most critical limits for developing Lightning Platform applications. About This Quick Reference This quick reference provides
More informationWhite Paper / Azure Data Platform: Ingest
White Paper / Azure Data Platform: Ingest Contents White Paper / Azure Data Platform: Ingest... 1 Versioning... 2 Meta Data... 2 Foreword... 3 Prerequisites... 3 Azure Data Platform... 4 Flowchart Guidance...
More informationMicrosoft Perform Data Engineering on Microsoft Azure HDInsight.
Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight http://killexams.com/pass4sure/exam-detail/70-775 QUESTION: 30 You are building a security tracking solution in Apache Kafka to parse
More informationIntroduction to Database Services
Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational
More informationDocumentation. This PDF was generated for your convenience. For the latest documentation, always see
Management Pack for AWS 1.50 Table of Contents Home... 1 Release Notes... 3 What's New in Release 1.50... 4 Known Problems and Workarounds... 5 Get started... 7 Key concepts... 8 Install... 10 Installation
More informationThe OLX data theory of everything
The OLX data theory of everything Caspar Schönau Head of Global BI Jakub Orłowski Data engineering manager The biggest internet company that you have never heard of Founded 1915 South-Africa Market cap:
More informationIntroduction to AWS GoldBase. A Solution to Automate Security, Compliance, and Governance in AWS
Introduction to AWS GoldBase A Solution to Automate Security, Compliance, and Governance in AWS September 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationPocket: Elastic Ephemeral Storage for Serverless Analytics
Pocket: Elastic Ephemeral Storage for Serverless Analytics Ana Klimovic*, Yawen Wang*, Patrick Stuedi +, Animesh Trivedi +, Jonas Pfefferle +, Christos Kozyrakis* *Stanford University, + IBM Research 1
More informationCloud Analytics and Business Intelligence on AWS
Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse
More information