Presented by Nanditha Thinderu

Size: px

Start display at page:

Download "Presented by Nanditha Thinderu"

Rosamund Cecily Cole
5 years ago
Views:

1 Presented by Nanditha Thinderu

2 Enterprise systems are highly distributed and heterogeneous which makes administration a complex task Application Performance Management tools developed to retrieve information about failures rates and resource utilization. APM platform for monitoring big data with a tight resource budget and fast response time

3 APM is refers to monitoring and managing the enterprise software systems. The two approaches are Black box approach API based approach By capturing every method invocation in an enterprise system, APM tools can generate a vast amount data

4 APM data consists of a metric name, a value and a time stamp. In storage system, the queries can be two major types Single value lookups to retrieve the most current value Small scans for retrieving systems health information Metric Name valu e Min Max Timesta mp Duration

5 Yahoo! Cloud servicing Benchmark is designed for evaluation of key values stores using APM properties. We define five workloads (R,W,RSW,RW,RSW) as APM data is append only. It comprises a data generator, a workload generator as well as drivers for several keyvalue stores

6 The goal was not only to get a pure performance comparison but also a broad overview of available solutions. Data stores used can be classified into categories Key-value stores : project Voldemort and Redis Extensible record stores: HBase and Cassandra Scalable relational stores: My SQL Cluster an VoltDB

7 We used Hbase v running on top of Hadoop v Hbase uses HDFS it also requires the installation and configuration of Hadoop Tables in Hbase can be accessed through API

8 We used the recent rc2 version and default Random Partitioner that distributes the data across the nodes randomly Implemented Cassandra YCSB client which is required to set just one column family to store all fields, each of them corresponding to a column It s a systematic system and employs consistent hashing for distributing the values across the nodes

9 We used with embedded BerkeleyDB storage and already implemented Voldemort configuration was easy for most part. It is highly scalable storage system with a simpler design compared to relational database

We used 2.4.2 version as cluster version was in an unstable state and could not run a complete test.

10 We used version as cluster version was in an unstable state and could not run a complete test. The default updated Redis YCBS client to use SharedJedisPool For data storage, YCSB uses a hash map as well as sorted set.

11 We used VoltDB v2.1.3 and the default configuration YCSB client driver for the VoltDB that connects to all servers is implemented

12 We used MySQL v and InnoDB as the storage engine RDBMS YCSB client which is implemented and connects to databases using JDBC

14 The workload has the most read intensive with 95% and only 5% writes. We present latencies and throughout using logarithmic scale Redis has highest throughput Hbase has highest Read latency Cassandra has highest write latency

15 In the second experiment, workload RW is used which has 50%writes VoltDB achieves highest throughput for one node which is slightly lower compare to workload R In write latency Hbase and MySQL have important differences compared to Workload RW

16 Workload is one that is closest to APM use case It has 99% write rate The throughput results is similar to workload RW For the read latency, the apparent change is the high latency of Hbase For write latency, Hbase has increased significantly

17 The workload RS has 47% read and scan and 6% write operations The MYSQL has best throughput for a single node Cassandra, HBase obtain a linear increase in throughput for number of nodes

18 This workload has 50% reads of which 25% are scans The most of results are similar to RS

19 In this we used 8 nodes of each system The results are calculated for workload R We observe varying latencies for different key store values The write latencies have similar development for Cassandra, Voldemort, Redis

20 The most efficient system in storage is Hbase REDIS an VoltDB are omitted as do not store data on disk Cassandra stores the data most efficiently The disk usage can be reduced by compression

21 Series of tests conducted on cluster D The throughput increases for all systems with higher ratios Project Voldemort has best read latency HBase has a low write latency but it is best for workload RW

23 Cassandra:Its achieves highest throughput for maximum number of nodes and its performance is best for high rates. Hbase: Hbase throughput is lowest for one node. But increases linearly with number of nodes. It has low write latency, however read latency is much higher than other systems. Project Voldemort: At low the read and write latencies are similar and are stable. MYSQL:It achieved high throughput, however latency decreases with the number of nodes. Redis:It has high throughput which exceeds all other systems for read intensive. But latencies decreases for both read and write operations VoltDB:The performance is high for single instance but never achieved throughput increase with more than one node

25 we optimized each system for our workload and tested it with a number of open connections which was 4 times higher than the number of cores in the host CPUs. Higher numbers of connections led to congestion and slowed down the systems considerably while lower numbers did not fully utilize the systems. This configuration resulted in an average latency of the request processing that was much higher than in previously published performance measurements. Since our use case does not have the strict latency requirements that are common in online applications and similar environments, the latencies in most results are still adequate

MDHIM: A Parallel Key/Value Store Framework for HPC

MDHIM: A Parallel Key/Value Store Framework for HPC Hugh Greenberg 7/6/2015 LA-UR-15-25039 HPC Clusters Managed by a job scheduler (e.g., Slurm, Moab) Designed for running user jobs Difficult to run system