Scaling Massive Content Stores in the Cloud. CloudExpo New York June Alfresco Founder & CTO

Similar documents
BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Which technology to choose in AWS?

Amazon Aurora Relational databases reimagined.

Amazon Aurora Deep Dive

Aurora, RDS, or On-Prem, Which is right for you

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN. PostgresConf US

Amazon Aurora Deep Dive

Highly Available Database Architectures in AWS. Santa Clara, California April 23th 25th, 2018 Mike Benshoof, Technical Account Manager, Percona

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

Amazon Aurora Deep Dive

About Intellipaat. About the Course. Why Take This Course?

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

Introduction to Database Services

Designing Fault-Tolerant Applications

Cloud object storage : the right way. Orit Wasserman Open Source Summit 2018

At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

Object storage platform How it can help? Martin Lenk, Specialist Senior Systems Engineer Unstructured Data Solution, Dell EMC

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

Deploying High Availability and Business Resilient R12 Applications over the Cloud

Advanced Architectures for Oracle Database on Amazon EC2

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

TestkingPass. Reliable test dumps & stable pass king & valid test questions

Deep Dive on Amazon Elastic File System

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Cloud Analytics and Business Intelligence on AWS

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

Write On Aws. Aws Tools For Windows Powershell User Guide using the aws tools for windows powershell (p. 19) this section includes information about

DATABASE SCALE WITHOUT LIMITS ON AWS

Enroll Now to Take online Course Contact: Demo video By Chandra sir

AWS Solution Architect Associate

AWS Certified Solutions Architect - Associate 2018 (SAA-001)

Amazon Elastic File System

AWS_SOA-C00 Exam. Volume: 758 Questions

Overview of AWS Security - Database Services

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN

EBOOK. FROM DISASTER RECOVERY TO ACTIVE-ACTIVE: NuoDB AND MULTI-DATA CENTER DEPLOYMENTS

Scaling on AWS. From 1 to 10 Million Users. Matthias Jung, Solutions Architect

Cloudian Sizing and Architecture Guidelines

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

Take Risks But Don t Be Stupid! Patrick Eaton, PhD

AWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS


Running MySQL on AWS. Michael Coburn Wednesday, April 15th, 2015

CIT 668: System Architecture. Amazon Web Services

Microservices Architekturen aufbauen, aber wie?

Training on Amazon AWS Cloud Computing. Course Content

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

Amazon. Exam Questions AWS-Certified-Solutions-Architect- Professional. AWS-Certified-Solutions-Architect-Professional.

Deploying Liferay Digital Experience Platform in Amazon Web Services

We are ready to serve Latest IT Trends, Are you ready to learn? New Batches Info

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

Microservices on AWS. Matthias Jung, Solutions Architect AWS

Client Success in an Open Source World. Udi Shamay Head of Client Strategy, Magento

SAA-C01. AWS Solutions Architect Associate. Exam Summary Syllabus Questions

HOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION

MySQL Replication Options. Peter Zaitsev, CEO, Percona Moscow MySQL User Meetup Moscow,Russia

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN. PGConf.EU 2017, Warsaw

Cloud Computing. Amazon Web Services (AWS)

ArcGIS Server Architecture Considerations. Andrew Sakowicz

MapR Enterprise Hadoop

Exploring Amazon RDS MySQL Second Tier Read Replica

Werden Sie ein Teil von Internet der Dinge auf AWS. AWS Enterprise Summit 2015 Dr. Markus Schmidberger -

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Principal Solutions Architect. Architecting in the Cloud

White Paper Amazon Aurora A Fast, Affordable and Powerful RDBMS

Performance Test Results for ScaleArc for MySQL on Aurora RDS Nov ScaleArc. All Rights Reserved. 1

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

AWS Services for Data Migration Luke Anderson Head of Storage, AWS APAC

PrepAwayExam. High-efficient Exam Materials are the best high pass-rate Exam Dumps

EXAM - AWS-Solution-Architect- Associate. AWS Certified Solutions Architect - Associate. Buy Full Product

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

Cloud and Storage. Transforming IT with AWS and Zadara. Doug Cliche, Storage Solutions Architect June 5, 2018

Using MySQL for Distributed Database Architectures

Migrating to Aurora MySQL and Monitoring with PMM. Percona Technical Webinars August 1, 2018

Agenda. Introduction Storage Primer Block Storage Shared File Systems Object Store On-Premises Storage Integration

8/3/17. Encryption and Decryption centralized Single point of contact First line of defense. Bishop

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

CIT 668: System Architecture

AWS Administration. Suggested Pre-requisites Basic IT Knowledge

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

The Cloud's Cutting Edge: ArcGIS for Server Use Cases for Amazon Web Services. David Cordes David McGuire Jim Herries Sridhar Karra

Running Databases in Containers.

Introducing Amazon Elastic File System (EFS)

Protecting Your Data in AWS. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

4) An organization needs a data store to handle the following data types and access patterns:

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Data Movement & Tiering with DMF 7

Deep Dive on Amazon Relational Database Service

High Noon at AWS. ~ Amazon MySQL RDS versus Tungsten Clustering running MySQL on AWS EC2

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Scaling for Humongous amounts of data with MongoDB

Amazon AWS-DevOps-Engineer-Professional Exam

Architecture of a Real-Time Operational DBMS

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services

A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff

Transcription:

Scaling Massive Content Stores in the Cloud CloudExpo New York June 2016 @johnnewton Alfresco Founder & CTO

Alfresco Customers Government Financial Services Healthcare Manufacturing Corporate

Somewhere in a secret underground location

someone is trying to store One Billion Documents!!! http://www.warnerbros.com/austin-powers-international-man-mystery

Some have attempted before and failed

Content Use Cases at Scale Enterprise Document Library Medical & Personnel Records Transaction & Logistics Records Government Records & Archives Claims & Case Processing Research & Analysis Real-time Video Internet of Things Discovery & Litigation Loans & Policies

Content Management Applications Document Library Search & Retrieval File Sync & Share Business Process Management Image Management Records Management Media Management Case Management Information Archiving

Content vs. Data vs Files vs. EFSS Data Files EFSS Content and ECM

Content Architecture as a Big Data Problem Context Access Create Manage Distribute Use Activities Directory Relationships Categories Types Search Security Content Object People Processes / Tasks Rules APIs Indexes Metadata Files / Renditions Semantics Solr / ElasticSearch Database Distributed FS Database 10

Content at Scale in the Enterprise Users at Scale Geographic Distribution Read/Write Throughput Concurrency Content Count Volume Size

The Problem with Traditional Approaches Lack of Redundancy Lack of Elasticity Provisioning and Administration Geographic Distribution Lack of Agility

Content Management Architecture Alfresco Share Alfresco Repository Activiti Workflow Engine Database RDS FS Content Store EBS or Ephemeral S3 (or Glacier) Alfresco SOLR Indexes EC2 PIOPS EBS 13

Scaling in Tiers Alfresco Share Alfresco Share Alfresco Activiti Suite Alfresco Transformation Server Alfresco Repository Alfresco Repository Alfresco Activiti Suite Alfresco Transformation Server Alfresco Local Repo (Index Tracking) Alfresco Local Repo (Index Tracking) Alfresco Solr Alfresco Solr

Data Meta-Model Model Metadata Organization Class Property Association Constraint Type Folder Property name Folder A Type Aspect Child Association 1 Billion 15 Billion Child Association contains Type Document Association rendition Property name content Folder B Doc C Aspect Auditable Property who by when Doc D rendition

Next Generation Relational Architectures MySQL with standby Next Generation DBMS AZ 1 AZ 2 AZ 1 AZ 2 AZ 3 Primary Instance Standby Instance Primary Instance Replica Instance async 4/6 quorum PiTR Amazon Elastic Block Store (EBS) EBS EBS mirror Sequential write EBS mirror Sequential write Distributed writes Amazon S3 Amazon S3 Highly-available synchronous vs. asynchronous replication Significantly more efficient use of network I/O Self-healing, Fault-tolerant, Instant crash recovery

Index and Search Architecture Full-Text Query x 20 instances Text Extraction Term-hit Highlighting Results Process Results Processing Metadata Query Facets & Buckets Security Filters Credit: Ryan Tobora ThinkBig, Teradata http://thinkbig.teradata.com/solrcl oud-terminology/ Metadata Injection & Path Processing Shingles ACL Processing

File Storage Architecture APIs File System Protocols Direct Streaming Metadata Metadata Content Content Metadata Aurora S3 EBS Storage Layer Amazon Glacier Archive Layer In Place Content AWS Import/Export

BM4 Test Execution Environment 1.2B Docs Simulate 500 Users Selenium / Firefox 1 hour constant load 10 sec think time UI Test UI Test ELB UI Test x 20 m3.2xlarge Alfresco with Share and Repo Alfresco Alfresco Alfresco x 10 c3.2xlarge Sharded Solr Cloud Solr Solr Solr x 20 m3.2xlarge Simulate AWS Import/Export (in place) Aurora x 1 db.r3.xlarge sites folders files transactions dbsize GB 10,804 1,168,206 1,168,206,000 15,475,064 3,185

Benchmark Results Document load rate 1000 documents per second (with 10 nodes) 3 Million per Hour! Load rate was consistent even passing the 1B document Sub-second login times and good responses for other actions Open Library: 4.5s Page Results: 1s Navigate to Site: 2.3 Aurora indexes used efficiently at 3.2TB No indications of any size-related bottlenecks with 1.1 Billion Documents CPU loads: Database: 8-10% Alfresco (each of 10 nodes): 25-30%

What a Difference Load Balancer ECM ECM ECM Search Search Search FS FS FS HSM HSM HSM Hardware Hardware Hardware DR Plan 3-6 Months Questionable Scale Little Redundancy Lots of $$$ ELB Alfresco Alfresco Alfresco Solr Solr Solr EBS EBS EBS S3 EC2 EC2 EC2 AZ1 AZ2 AZ3 < 30 mins 10x Faster Fault-Tolerant Open, Cost Effective

Well, what am I supposed to do with all this frickin hardware?!!

Thank you john.newton@alfresco.com @johnnewton