EsgynDB Enterprise 2.0 Platform Reference Architecture

Size: px
Start display at page:

Download "EsgynDB Enterprise 2.0 Platform Reference Architecture"

Transcription

1 EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed support and extensions. It outlines server configurations for various vendors and attempts to describe some of the considerations for sizing an EsgynDB Application 1 INTRODUCTION ARCHITECTURE CAPACITY PLANNING PROCESSING USAGE MEMORY USAGE DISK USAGE NETWORK USAGE REFERENCE ARCHITECTURE GUIDANCE FOR PRODUCTION BARE METAL CLUSTER MEDIUM/LARGE DEPLOYMENT SMALL DEPLOYMENT CLOUD DEPLOYMENT CONCLUSION Introduction The Apache Trafodion (Incubating) project provides a full transactional SQL database integrated into the Apache Hadoop ecosystem to support operational workloads. EsgynDB Enterprise 2.0, built on Apache Trafodion, offers a fully-supported, enterprise-ready version with extensions for additional features, including cross datacenter support in EsgynDB Enterprise Advanced 2.0. The reference architecture description describes a purpose-built EsgynDB Enterprise installation. Specifically it describes the architecture and provisioning for a cluster whose purpose is running one or more EsgynDB application workload(s). This reference architecture does not describe a configuration where EsgynDB Enterprise is part of a wider Hadoop cluster running other ecosystem applications such as MapReduce. Clusters running mixed workloads can start from sizing/provisioning information here. But the final sizing/provisioning must also incorporate requirements from the other workloads in the cluster. As such it is beyond the scope of this document. 2 Architecture Apache Trafodion provides an enterprise-class, web scale database engine in the Hadoop ecosystem. In addition, Trafodion enables SQL query language and transactional semantics for native Apache HBase and Apache Hive tables. Trafodion provides transactional support for data stored in HBase. It supports fully distributed ACID transactions across multiple statements, tables, and rows, which enables 06Dec Esgyn Corporation 1

2 EsgynDB Enterprise to support operational workloads that are generally beyond most Hadoop ecosystem components. EsgynDB Enterprise Release 2.0 extends Apache Trafodion by providing additional features such as cross datacenter support, using the architecture depicted below The architecture involves one or more clients concurrently using SQL queries to access the data managed by EsgynDB via a driver (ODBC/JDBC/ADO.NET). The driver library provides the connection and session between the application (which might or might not execute on the same cluster) query and the SQL engine layer. In the SQL engine layer, a master query execution server process for each query prepares and executes a query process. Depending on the specifics of the workload, it might involve a distributed transaction manager, or one or more groups of executor server processes (ESPs) that execute portions of the query plan in parallel. These groups of ESPs (for a given query, there might be zero or more groups), reflect the degree of parallelism for the query. The query can reference native HBase or Hive tables as well. Ultimately EsgynDB uses HDFS as the storage layer foundation, with an appropriate replication factor (usually 3, but in some cloud configurations 2 is the appropriate replication factor) to provide availability if a node fails. Significant processes used for query processing include: Process Name Description Distribution Count DCS Master Initial connection point for locating a session-hosting mxosrvr On one single node One active per cluster, often configured with a floating IP for high DCS Server Master executor (mxosrvr) Executor Server Process (ESP) Process that manages status and connection usage for mxosrvr processes Master executor process that hosts the SQL session, does query compilation and execution of root operator Executes parallel fragments of SQL plans On each node Multiple on all data nodes in instance Multiple run on all data nodes in cluster, in variable size groups. availability. One for each node where mxosrvrs run. Count defines maximum number of concurrent sessions. Workload Dependent: determined by concurrent 06Dec Esgyn Corporation 2

3 DTM Maintains transactional state and log outcome information for transactions. Runs on all data nodes in the instance. parallel queries, query plan, and degree of parallelism. One per data node. For the EsgynDB Enterprise 2.0 version, cross datacenter support is implemented via DTM, which communicates with Transaction Manager processes on the peer datacenter clusters to replicate the transactions on both clusters. The EsgynDB Manager architecture was simplified in the above architecture picture to show its relationship to the query processing engine. The EsgynDB Manager subsystem architecture expands to multiple processes as depicted in the following picture: EsgynDB Manager processes include: Process Name Description Distribution Count DB Manager Web application server that browser connects with to On one single node One per cluster on the first data node OpenTSDB Lightweight service On each node One per node processes for collecting time-series metrics TCollectors Collection scripts that collect time-based metrics at interval Multiple on all data nodes in instance; processes per node vary System and HBase metrics are collected on each node EsgynDB metrics collected cluster-wide from a process on the first data node 06Dec Esgyn Corporation 3

4 REST Server Process that handles REST requests from on- and offcluster clients One per cluster One per cluster on the first data node In addition to the listed processes used for query-processing and manageability, there are other processes that are part of the EsgynDB stack supporting its runtime execution environment. These processes generally use fewer resources and have little material impact on platform sizing and provisioning. EsgynDB Enterprise is integrated into the Hadoop ecosystem as depicted in the following picture: The EsgynDB database engine uses HBase for storage services. As such, it relies on HBase configuration and tuning to achieve optimal performance. EsgynDB cluster provisioning must incorporate HBase configuration considerations. HBase processes can be divided into two classes: control processes and data processes. Control processes are one-offs that are involved in managing the HBase system and managing its metadata. Data processes are processes that are involved in serving the data itself, including reading, updating, and writing (HBase scan, get, and put operations). HBase control processes include: Process HMaster ZooKeeper Description Metadata and table creation/deletion Not an HBase process, but used for information management and coordination across nodes. HBase data processes include: Process RegionServer Description Controls data serving, including servicing get/put, and separation of data into individual regions. HBase in turn uses HDFS services for scalability, availability, and recovery (replication) within the cluster. As such, EsgynDB cluster provisioning must also incorporate HDFS configuration considerations, including replication. Control processes are singleton processes that manage the HDFS file system. In HDFS, they control the location for individual data blocks. Data processes are involved in reading and accessing that data. 06Dec Esgyn Corporation 4

5 HDFS control processes include: Process NameNode Secondary NameNode HDFS data processes include: Process DataNode Description Manages the metadata files that are used to map blocks to individual files and select locations for replication. Gets a checkpoint of all metadata from NameNode once per interval (hour is the default). This data can be used to recreate the block -> file mappings if the NameNode is lost. However, it is not simply a hot backup for the NameNode. Description Serves up reads and writes from individual files, and sends periodic I m-alive messages, including files/blocks it is managing to the NameNode. In addition to the HBase and HDFS control processes listed above, other control node processes include: Process Management Server Process Description Ambari, Cloudera Manager, etc. web page node. Some management servers do detailed database and analytic function. In smaller clusters, control processes and data processes might reside on the same node. For larger clusters, management processes have significantly different provisioning requirements and so are often isolated on different nodes. The reference architecture assumes separate control and data nodes. 3 Capacity Planning This section discusses issues and sizing recommendations to take into consideration when sizing an EsgynDB Enterprise database. 3.1 Processing Usage When sizing the processing power for an EsgynDB Enterprise cluster, consider the following: In a typical high-performance configuration, nodes for management are configured separately from data nodes. The two types are typically provisioned differently for storage (size, configuration) as well as network and memory. In a very small or test configuration, the distinction between data nodes and control nodes is blurred, and most management processes are collocated with data processes for both Hadoop/HBase and EsgynDB. So long as this configuration meets performance and availability objectives, this configuration is valid, especially for basic development and test clusters. Consider the following factors when assessing the required number of nodes: More nodes with fewer cores is preferable to an equivalent number of cores spread over fewer nodes, so long as the number of cores per node is reasonably modern (e.g., 8 or more) for typical production workloads. Scaling out (increasing the number of nodes to achieve the desired number of cores) is preferable to scaling up (increasing the cores per node to achieve the desired number of cores) because: o More nodes with fewer cores is typically cheaper than fewer nodes with more cores 06Dec Esgyn Corporation 5

6 o o The domain of failure is smaller when losing a node or disk on a cluster with more nodes The available I/O bandwidth and parallelism is higher with more nodes. Clusters smaller than 3 nodes are not advised, given HDFS replication requirements for availability and recoverability. The number of simultaneous users (concurrency) drives the number external corporate networkconnected nodes, as does the ingest rate for data arrival/refresh. This number determines the total number of mxosrvr processes. The actual connections are distributed around the cluster based on mxosrvr process distribution. Multiple mxosrvr processes can run on the same node. Types of workloads are the other key considerations for number of nodes. The number of nodes and cores reflects the amount of parallelism available for concurrent users of the applications running on the cluster. If typical workloads are high-concurrency short queries, then thinner nodes might be acceptable. If typical workloads involve large scans, then more processing power is needed. Understand the types, frequency, plans, and typical concurrency for the application, ideally via prototyping the workloads and queries whenever possible. 3.2 Memory Usage When sizing an EsgynDB Enterprise cluster for memory usage, keep in mind the following considerations: Many Hadoop ecosystem processes are Java processes. Due to memory efficiency optimizations for the JVM, there is a significant restriction just below 32GB. Crossing this threshold actually results in less usable memory because the internal representation of pointers changes in a way that consumes significantly more space. Large memory consumers for data nodes include: o HDFS DataNode processes o HBase RegionServers Among control processes, the large memory consumers are: HDFS NameNode processes Plan for these processes to use a heap size of 16-32GB each for optimal performance on a large cluster. Reducing the memory for these components affects performance significantly, so do careful tuning and analysis before choosing a smaller value. The primary users of memory in the EsgynDB database engine are the mxosrvrs. For each concurrent connection on a node, plan for 512MB (0.5 GB) per connection per node. 3.3 Disk Usage When sizing an EsgynDB Enterprise cluster for disk usage, keep in mind the following considerations: For data nodes, SSD is only beneficial for high concurrency write. In general HDD is sufficient. For control nodes, SSD is similarly not cost effective the goal is to have most control information cached in memory. For data nodes, HDD data disks configure disks as direct attached storage in a JBOD (Just a Bunch of Disks) configuration. RAID striping slows down HDFS and actually reduces concurrency and recoverability. For control nodes, data disks can be configured as either JBOD or RAID1 or RAID10. 06Dec Esgyn Corporation 6

7 As with processing power, disks are a unit of parallelism. For a given total-disk-per-node value, if workloads include many large scans, it is often most effective to have more smaller disks than fewer larger disks per node on data nodes. The reference architecture assumes that most workloads include large scans. HBase SNAPPY or GZ compression is strongly suggested. SNAPPY has less CPU overhead, but GZ compresses better. Degree of compression varies widely depending on the data and workload patterns, but generally accepted calculations suggest around a 30%-40% reduction, depending on data. Compression adds to the path length for reading and writing, which can have an effect on data growth and ingest. Compression happens at the HBase file block level, limiting the amount of un-compression required at read time. When calculating overall disk space and data disk space per node, be sure to account for working space and anticipated ingest/outflow per node. Also remember that blocks of an HDFS file come with a replication factor (typically set to 3, so 3 copies of the data). That means that each 10 GB file actually occupies 30GB on disk. Esgyn recommends leaving approximately 33% of disk space free for overhead workspace. 3.4 Network Usage When sizing an EsgynDB Enterprise cluster for network usage, keep in mind the following considerations: In general, 10GigE is the standard for networking for data traffic within an EsgynDB cluster. Using a slower network for data flow can significantly impact performance. 2 bonded 10GigE networks provide more throughput for I/O intensive applications. In some cases, a second slower network is configured for cluster (not Hadoop/HBase) maintenance in order to keep that traffic separate from the operational data workflow. Consider failure scenarios when connecting nodes from different racks together. HDFS block placement algorithm is biased to select nodes on at least 2 different racks for a block s location if the replication factor is 3 or greater. If using the cross datacenter feature in EsgynDB Enterprise Advanced 2.0, there must be a highspeed connection between the two data centers. If using the cross datacenter feature in EsgynDB Enterprise Advanced 2.0, both clusters must be configured so that the application can actively connect to either peer cluster via EsgynDB drivers when both are running and accessible. This capability ensures that the application can access either cluster exclusively in case of loss of communications with one of the two. 4 Reference Architecture Guidance for Production Bare Metal Cluster This section contains recommendations for hardware configurations and software provisioning for a bare metal EsgynDB cluster. The recommendations are hardware-independent. Check with your hardware vendor for specific part numbers and availability/timeliness. The configuration described is for a medium or large EsgynDB installation, with separate control and data nodes. Smaller configurations with all processes on the same nodes are covered in separate section. For Data Nodes, the basic hardware recommendation for each node is: 06Dec Esgyn Corporation 7

8 Resource CPU Memory Recommendation Intel XEON or AMD 64-bit processors 8 Number of cores per node 16 64GB for overall Hadoop ecosystem and query processing plus usual overhead plus 0.5GB for each mxosrvr on the node. To calculate the number of mxosrvr processes: max number of concurrent connections number of nodes 64GB Memory Size 128GB. Most common value is 96GB. Network Storage 10GigE, 1GigE, or 2x10GigE Bonded SATA or SAS or SSD, typically TB disks configured in a JBOD configuration For Control Nodes, the basic hardware recommendation for each node is: Resource CPU Memory Recommendation Intel XEON or AMD 64-bit processors 8 Number of cores per node 16 64GB for overall Hadoop ecosystem and query + overhead for swapping and process maintenance as possible/desired. 64GB Memory Size 128GB. Most common value is 96GB. Network Storage 10GigE, 1GigE, or 2x10GigE Bonded, plus appropriate switches for off platform to on platform. SATA or SAS or SSD, typically TB disks configured in a RAID1 or RAID10 configuration 4.1 Medium/Large Deployment A medium or large deployment uses the specifications above including both control and management nodes. Processes are placed in these nodes as depicted in the following figure: 06Dec Esgyn Corporation 8

9 In the above picture, the control nodes flank the data nodes and are only used for the DCS master process. There s no specific constraint for node naming conventions, including no assumption that nodes are consecutively numbered. The vertical bars represent individual nodes, and the ovals represent processes within the node. 4.2 Small Deployment For a small (2-3 node, typically less than one rack) deployment, the control nodes are collapsed into the regular node infrastructure as follows: In the above picture, the control nodes have been removed and control processes run on the same nodes as the functional processes. 06Dec Esgyn Corporation 9

10 5 Cloud Deployment When deploying EsgynDB in a cloud environment such as Amazon s AWS, use the guidelines above to provision resources. For configuration use HDFS replication factor 3 if you choose instance local store for the file system, otherwise use HDFS replication factor 2 if you use EBS volumes. 6 Conclusion This EsgynDB Platform Reference Architecture document serves as a starting point for defining the platform to build an EsgynDB cluster where EsgynDB is the primary purpose for the cluster. It also is intended to assist application developers and users in planning the deployment strategy for EsgynDB applications. Esgyn recommends consulting with an Esgyn technical resource to get additional information, training, and guidance. 06Dec Esgyn Corporation 10

IBM InfoSphere Streams v4.0 Performance Best Practices

IBM InfoSphere Streams v4.0 Performance Best Practices Henry May IBM InfoSphere Streams v4.0 Performance Best Practices Abstract Streams v4.0 introduces powerful high availability features. Leveraging these requires careful consideration of performance related

More information

April Copyright 2013 Cloudera Inc. All rights reserved.

April Copyright 2013 Cloudera Inc. All rights reserved. Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Planning for the HDP Cluster

Planning for the HDP Cluster 3 Planning for the HDP Cluster Date of Publish: 2018-08-30 http://docs.hortonworks.com Contents... 3 Typical Hadoop Cluster...3 Typical Workload Patterns For Hadoop... 3 Early Deployments...4 Server Node

More information

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler MSST 10 Hadoop in Perspective Hadoop scales computation capacity, storage capacity, and I/O bandwidth by

More information

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads WHITE PAPER Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads December 2014 Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents

More information

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper Architecture Overview Copyright 2016 Paperspace, Co. All Rights Reserved June - 1-2017 Technical Whitepaper Paperspace Whitepaper: Architecture Overview Content 1. Overview 3 2. Virtualization 3 Xen Hypervisor

More information

Tuning Enterprise Information Catalog Performance

Tuning Enterprise Information Catalog Performance Tuning Enterprise Information Catalog Performance Copyright Informatica LLC 2015, 2018. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States

More information

Accelerate Big Data Insights

Accelerate Big Data Insights Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not

More information

MapR Enterprise Hadoop

MapR Enterprise Hadoop 2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS

More information

EsgynDB Multi- DataCenter Replication Guide

EsgynDB Multi- DataCenter Replication Guide Esgyn Corporation EsgynDB Multi- DataCenter Replication Guide Published: November 2015 Edition: EsgynDB Release 2.0.0 Contents 1. About This Document...3 2. Intended Audience...3 3. Overview...3 4. Synchronous

More information

DriveScale-DellEMC Reference Architecture

DriveScale-DellEMC Reference Architecture DriveScale-DellEMC Reference Architecture DellEMC/DRIVESCALE Introduction DriveScale has pioneered the concept of Software Composable Infrastructure that is designed to radically change the way data center

More information

The Oracle Database Appliance I/O and Performance Architecture

The Oracle Database Appliance I/O and Performance Architecture Simple Reliable Affordable The Oracle Database Appliance I/O and Performance Architecture Tammy Bednar, Sr. Principal Product Manager, ODA 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

More information

Next-Generation Cloud Platform

Next-Generation Cloud Platform Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology

More information

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions A comparative analysis with PowerEdge R510 and PERC H700 Global Solutions Engineering Dell Product

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Cluster Planning (June 1, 2017) docs.hortonworks.com Hortonworks Data Platform: Cluster Planning Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks Data

More information

Trafodion Enterprise-Class Transactional SQL-on-HBase

Trafodion Enterprise-Class Transactional SQL-on-HBase Trafodion Enterprise-Class Transactional SQL-on-HBase Trafodion Introduction (Welsh for transactions) Joint HP Labs & HP-IT project for transactional SQL database capabilities on Hadoop Leveraging 20+

More information

Hadoop. Introduction / Overview

Hadoop. Introduction / Overview Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures

More information

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache

More information

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest

More information

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage A Dell Technical White Paper Dell Database Engineering Solutions Anthony Fernandez April 2010 THIS

More information

Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma

Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma Apache Hadoop Goes Realtime at Facebook Guide - Dr. Sunny S. Chung Presented By- Anand K Singh Himanshu Sharma Index Problem with Current Stack Apache Hadoop and Hbase Zookeeper Applications of HBase at

More information

Insight Case Studies. Tuning the Beloved DB-Engines. Presented By Nithya Koka and Michael Arnold

Insight Case Studies. Tuning the Beloved DB-Engines. Presented By Nithya Koka and Michael Arnold Insight Case Studies Tuning the Beloved DB-Engines Presented By Nithya Koka and Michael Arnold Who is Nithya Koka? Senior Hadoop Administrator Project Lead Client Engagement On-Call Engineer Cluster Ninja

More information

Patriot Hardware and Systems Software Requirements

Patriot Hardware and Systems Software Requirements Patriot Hardware and Systems Software Requirements Patriot is designed and written for Microsoft Windows in native C# and.net. As a result, it is a stable and consistent Windows application. Patriot is

More information

@joerg_schad Nightmares of a Container Orchestration System

@joerg_schad Nightmares of a Container Orchestration System @joerg_schad Nightmares of a Container Orchestration System 2017 Mesosphere, Inc. All Rights Reserved. 1 Jörg Schad Distributed Systems Engineer @joerg_schad Jan Repnak Support Engineer/ Solution Architect

More information

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here 2013-11-12 Copyright 2013 Cloudera

More information

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC white paper FlashGrid Software Intel SSD DC P3700/P3600/P3500 Topic: Hyper-converged Database/Storage FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC Abstract FlashGrid

More information

Cloudian Sizing and Architecture Guidelines

Cloudian Sizing and Architecture Guidelines Cloudian Sizing and Architecture Guidelines The purpose of this document is to detail the key design parameters that should be considered when designing a Cloudian HyperStore architecture. The primary

More information

Distributed Filesystem

Distributed Filesystem Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the

More information

Tuning Intelligent Data Lake Performance

Tuning Intelligent Data Lake Performance Tuning Intelligent Data Lake Performance 2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without

More information

Albis: High-Performance File Format for Big Data Systems

Albis: High-Performance File Format for Big Data Systems Albis: High-Performance File Format for Big Data Systems Animesh Trivedi, Patrick Stuedi, Jonas Pfefferle, Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510

A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510 A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510 Incentives for migrating to Exchange 2010 on Dell PowerEdge R720xd Global Solutions Engineering

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com Hortonworks Data Platform : Cluster Planning Guide Copyright 2012-2014 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively

More information

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018 Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

IBM Tivoli Storage Manager for Windows Version Installation Guide IBM

IBM Tivoli Storage Manager for Windows Version Installation Guide IBM IBM Tivoli Storage Manager for Windows Version 7.1.8 Installation Guide IBM IBM Tivoli Storage Manager for Windows Version 7.1.8 Installation Guide IBM Note: Before you use this information and the product

More information

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation

More information

HCI: Hyper-Converged Infrastructure

HCI: Hyper-Converged Infrastructure Key Benefits: Innovative IT solution for high performance, simplicity and low cost Complete solution for IT workloads: compute, storage and networking in a single appliance High performance enabled by

More information

Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ]

Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ] s@lm@n Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ] Question No : 1 Which two updates occur when a client application opens a stream

More information

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

Configuring and Deploying Hadoop Cluster Deployment Templates

Configuring and Deploying Hadoop Cluster Deployment Templates Configuring and Deploying Hadoop Cluster Deployment Templates This chapter contains the following sections: Hadoop Cluster Profile Templates, on page 1 Creating a Hadoop Cluster Profile Template, on page

More information

Service Description. IBM DB2 on Cloud. 1. Cloud Service. 1.1 IBM DB2 on Cloud Standard Small. 1.2 IBM DB2 on Cloud Standard Medium

Service Description. IBM DB2 on Cloud. 1. Cloud Service. 1.1 IBM DB2 on Cloud Standard Small. 1.2 IBM DB2 on Cloud Standard Medium Service Description IBM DB2 on Cloud This Service Description describes the Cloud Service IBM provides to Client. Client means the company and its authorized users and recipients of the Cloud Service.

More information

Part 1: Indexes for Big Data

Part 1: Indexes for Big Data JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,

More information

Cisco Tetration Analytics Platform: A Dive into Blazing Fast Deep Storage

Cisco Tetration Analytics Platform: A Dive into Blazing Fast Deep Storage White Paper Cisco Tetration Analytics Platform: A Dive into Blazing Fast Deep Storage What You Will Learn A Cisco Tetration Analytics appliance bundles computing, networking, and storage resources in one

More information

10 Million Smart Meter Data with Apache HBase

10 Million Smart Meter Data with Apache HBase 10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on

More information

Lenovo Database Configuration Guide

Lenovo Database Configuration Guide Lenovo Database Configuration Guide for Microsoft SQL Server OLTP on ThinkAgile SXM Reduce time to value with validated hardware configurations up to 2 million TPM in a DS14v2 VM SQL Server in an Infrastructure

More information

Hedvig as backup target for Veeam

Hedvig as backup target for Veeam Hedvig as backup target for Veeam Solution Whitepaper Version 1.0 April 2018 Table of contents Executive overview... 3 Introduction... 3 Solution components... 4 Hedvig... 4 Hedvig Virtual Disk (vdisk)...

More information

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive

More information

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. 1. MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. MapReduce is a framework for processing big data which processes data in two phases, a Map

More information

An Oracle Technical White Paper October Sizing Guide for Single Click Configurations of Oracle s MySQL on Sun Fire x86 Servers

An Oracle Technical White Paper October Sizing Guide for Single Click Configurations of Oracle s MySQL on Sun Fire x86 Servers An Oracle Technical White Paper October 2011 Sizing Guide for Single Click Configurations of Oracle s MySQL on Sun Fire x86 Servers Introduction... 1 Foundation for an Enterprise Infrastructure... 2 Sun

More information

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage

More information

Veeam Backup & Replication on IBM Cloud Solution Architecture

Veeam Backup & Replication on IBM Cloud Solution Architecture Veeam Backup & Replication on IBM Cloud Solution Architecture Date: 2018 07 20 Copyright IBM Corporation 2018 Page 1 of 12 Table of Contents 1 Introduction... 4 1.1 About Veeam Backup & Replication...

More information

WHITE PAPER: BEST PRACTICES. Sizing and Scalability Recommendations for Symantec Endpoint Protection. Symantec Enterprise Security Solutions Group

WHITE PAPER: BEST PRACTICES. Sizing and Scalability Recommendations for Symantec Endpoint Protection. Symantec Enterprise Security Solutions Group WHITE PAPER: BEST PRACTICES Sizing and Scalability Recommendations for Symantec Rev 2.2 Symantec Enterprise Security Solutions Group White Paper: Symantec Best Practices Contents Introduction... 4 The

More information

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad

More information

WHITEPAPER. Improve Hadoop Performance with Memblaze PBlaze SSD

WHITEPAPER. Improve Hadoop Performance with Memblaze PBlaze SSD Improve Hadoop Performance with Memblaze PBlaze SSD Improve Hadoop Performance with Memblaze PBlaze SSD Exclusive Summary We live in the data age. It s not easy to measure the total volume of data stored

More information

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012 SGI Overview HPC User Forum Dearborn, Michigan September 17 th, 2012 SGI Market Strategy HPC Commercial Scientific Modeling & Simulation Big Data Hadoop In-memory Analytics Archive Cloud Public Private

More information

IBM Terms of Use SaaS Specific Offering Terms. IBM DB2 on Cloud. 1. IBM SaaS. 2. Charge Metrics

IBM Terms of Use SaaS Specific Offering Terms. IBM DB2 on Cloud. 1. IBM SaaS. 2. Charge Metrics IBM Terms of Use SaaS Specific Offering Terms IBM DB2 on Cloud The Terms of Use ( ToU ) is composed of this IBM Terms of Use - SaaS Specific Offering Terms ( SaaS Specific Offering Terms ) and a document

More information

Introduction to BigData, Hadoop:-

Introduction to BigData, Hadoop:- Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,

More information

Service Description. IBM DB2 on Cloud. 1. Cloud Service. 1.1 IBM DB2 on Cloud Standard Small. 1.2 IBM DB2 on Cloud Standard Medium

Service Description. IBM DB2 on Cloud. 1. Cloud Service. 1.1 IBM DB2 on Cloud Standard Small. 1.2 IBM DB2 on Cloud Standard Medium Service Description IBM DB2 on Cloud This Service Description describes the Cloud Service IBM provides to Client. Client means the company and its authorized users and recipients of the Cloud Service.

More information

SQL Server 2014 Upgrade

SQL Server 2014 Upgrade SQL Server 2014 Upgrade Case study featuring In-Memory OLTP and Hybrid-Cloud Scenarios Evgeny Ternovsky, Program Manager II, Data Platform Group Bill Kan, Service Engineer II, Data Platform Group Background

More information

Assessing performance in HP LeftHand SANs

Assessing performance in HP LeftHand SANs Assessing performance in HP LeftHand SANs HP LeftHand Starter, Virtualization, and Multi-Site SANs deliver reliable, scalable, and predictable performance White paper Introduction... 2 The advantages of

More information

System Requirements EDT 6.0. discoveredt.com

System Requirements EDT 6.0. discoveredt.com System Requirements EDT 6.0 discoveredt.com Contents Introduction... 3 1 Components, Modules & Data Repositories... 3 2 Infrastructure Options... 5 2.1 Scenario 1 - EDT Portable or Server... 5 2.2 Scenario

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

Cloudera Enterprise 5 Reference Architecture

Cloudera Enterprise 5 Reference Architecture Cloudera Enterprise 5 Reference Architecture A PSSC Labs Reference Architecture Guide December 2016 Introduction PSSC Labs continues to bring innovative compute server and cluster platforms to market.

More information

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication CDS and Sky Tech Brief Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication Actifio recommends using Dedup-Async Replication (DAR) for RPO of 4 hours or more and using StreamSnap for

More information

Correlation based File Prefetching Approach for Hadoop

Correlation based File Prefetching Approach for Hadoop IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie

More information

EMC XTREMCACHE ACCELERATES VIRTUALIZED ORACLE

EMC XTREMCACHE ACCELERATES VIRTUALIZED ORACLE White Paper EMC XTREMCACHE ACCELERATES VIRTUALIZED ORACLE EMC XtremSF, EMC XtremCache, EMC Symmetrix VMAX and Symmetrix VMAX 10K, XtremSF and XtremCache dramatically improve Oracle performance Symmetrix

More information

Dell EMC CIFS-ECS Tool

Dell EMC CIFS-ECS Tool Dell EMC CIFS-ECS Tool Architecture Overview, Performance and Best Practices March 2018 A Dell EMC Technical Whitepaper Revisions Date May 2016 September 2016 Description Initial release Renaming of tool

More information

HPE Verified Reference Architecture for Vertica SQL on Hadoop Using Hortonworks HDP 2.3 on HPE Apollo 4200 Gen9 with RHEL

HPE Verified Reference Architecture for Vertica SQL on Hadoop Using Hortonworks HDP 2.3 on HPE Apollo 4200 Gen9 with RHEL HPE Verified Reference Architecture for Vertica SQL on Hadoop Using Hortonworks HDP 2.3 on HPE Apollo 4200 Gen9 with RHEL HPE Reference Architectures April, 2016 Legal Notices The only warranties for Hewlett

More information

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System Best practices Roland Mueller IBM Systems and Technology Group ISV Enablement April 2012 Copyright IBM Corporation, 2012

More information

EMC Backup and Recovery for Microsoft SQL Server

EMC Backup and Recovery for Microsoft SQL Server EMC Backup and Recovery for Microsoft SQL Server Enabled by Microsoft SQL Native Backup Reference Copyright 2010 EMC Corporation. All rights reserved. Published February, 2010 EMC believes the information

More information

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) 4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,

More information

A BigData Tour HDFS, Ceph and MapReduce

A BigData Tour HDFS, Ceph and MapReduce A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!

More information

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation report prepared under contract with Dot Hill August 2015 Executive Summary Solid state

More information

VERITAS Storage Foundation 4.0 TM for Databases

VERITAS Storage Foundation 4.0 TM for Databases VERITAS Storage Foundation 4.0 TM for Databases Powerful Manageability, High Availability and Superior Performance for Oracle, DB2 and Sybase Databases Enterprises today are experiencing tremendous growth

More information

Exchange 2010 Tested Solutions: 500 Mailboxes in a Single Site Running Hyper-V on Dell Servers

Exchange 2010 Tested Solutions: 500 Mailboxes in a Single Site Running Hyper-V on Dell Servers Exchange 2010 Tested Solutions: 500 Mailboxes in a Single Site Running Hyper-V on Dell Servers Rob Simpson, Program Manager, Microsoft Exchange Server; Akshai Parthasarathy, Systems Engineer, Dell; Casey

More information

EMC Business Continuity for Microsoft Applications

EMC Business Continuity for Microsoft Applications EMC Business Continuity for Microsoft Applications Enabled by EMC Celerra, EMC MirrorView/A, EMC Celerra Replicator, VMware Site Recovery Manager, and VMware vsphere 4 Copyright 2009 EMC Corporation. All

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

White Paper. Major Performance Tuning Considerations for Weblogic Server

White Paper. Major Performance Tuning Considerations for Weblogic Server White Paper Major Performance Tuning Considerations for Weblogic Server Table of Contents Introduction and Background Information... 2 Understanding the Performance Objectives... 3 Measuring your Performance

More information

Warehouse- Scale Computing and the BDAS Stack

Warehouse- Scale Computing and the BDAS Stack Warehouse- Scale Computing and the BDAS Stack Ion Stoica UC Berkeley UC BERKELEY Overview Workloads Hardware trends and implications in modern datacenters BDAS stack What is Big Data used For? Reports,

More information

Advanced Architectures for Oracle Database on Amazon EC2

Advanced Architectures for Oracle Database on Amazon EC2 Advanced Architectures for Oracle Database on Amazon EC2 Abdul Sathar Sait Jinyoung Jung Amazon Web Services November 2014 Last update: April 2016 Contents Abstract 2 Introduction 3 Oracle Database Editions

More information

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE Digital transformation is taking place in businesses of all sizes Big Data and Analytics Mobility Internet of Things

More information

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini White Paper Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini February 2015 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 1 of 9 Contents

More information

Chapter 5. The MapReduce Programming Model and Implementation

Chapter 5. The MapReduce Programming Model and Implementation Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing

More information

Data Protection for Cisco HyperFlex with Veeam Availability Suite. Solution Overview Cisco Public

Data Protection for Cisco HyperFlex with Veeam Availability Suite. Solution Overview Cisco Public Data Protection for Cisco HyperFlex with Veeam Availability Suite 1 2017 2017 Cisco Cisco and/or and/or its affiliates. its affiliates. All rights All rights reserved. reserved. Highlights Is Cisco compatible

More information

Cisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr

Cisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr Solution Overview Cisco UCS Integrated Infrastructure for Big Data and Analytics with Cloudera Enterprise Bring faster performance and scalability for big data analytics. Highlights Proven platform for

More information

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved. Gain Insights From Unstructured Data Using Pivotal HD 1 Traditional Enterprise Analytics Process 2 The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources

More information

NEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III

NEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III NEC Express5800 A2040b 22TB Data Warehouse Fast Track Reference Architecture with SW mirrored HGST FlashMAX III Based on Microsoft SQL Server 2014 Data Warehouse Fast Track (DWFT) Reference Architecture

More information

Falling Out of the Clouds: When Your Big Data Needs a New Home

Falling Out of the Clouds: When Your Big Data Needs a New Home Falling Out of the Clouds: When Your Big Data Needs a New Home Executive Summary Today s public cloud computing infrastructures are not architected to support truly large Big Data applications. While it

More information

Apache Hadoop 3. Balazs Gaspar Sales Engineer CEE & CIS Cloudera, Inc. All rights reserved.

Apache Hadoop 3. Balazs Gaspar Sales Engineer CEE & CIS Cloudera, Inc. All rights reserved. Apache Hadoop 3 Balazs Gaspar Sales Engineer CEE & CIS balazs@cloudera.com 1 We believe data can make what is impossible today, possible tomorrow 2 We empower people to transform complex data into clear

More information

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Hadoop & Big Data Analytics Complete Practical & Real-time Training An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE

More information

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c White Paper Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c What You Will Learn This document demonstrates the benefits

More information

Distributed Systems 16. Distributed File Systems II

Distributed Systems 16. Distributed File Systems II Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS

More information

Storage for HPC, HPDA and Machine Learning (ML)

Storage for HPC, HPDA and Machine Learning (ML) for HPC, HPDA and Machine Learning (ML) Frank Kraemer, IBM Systems Architect mailto:kraemerf@de.ibm.com IBM Data Management for Autonomous Driving (AD) significantly increase development efficiency by

More information

Dell Reference Configuration for Hortonworks Data Platform 2.4

Dell Reference Configuration for Hortonworks Data Platform 2.4 Dell Reference Configuration for Hortonworks Data Platform 2.4 A Quick Reference Configuration Guide Kris Applegate Solution Architect Dell Solution Centers Executive Summary This document details the

More information