Workload-Aware Data Partitioning in CommunityDriven Data Grids
|
|
- Lionel Ball
- 6 years ago
- Views:
Transcription
1 Workload-Aware Data Partitioning in CommunityDriven Data Grids Tobias Scholl, Bernhard Bauer, Jessica Müller, Benjamin Gufler, Angelika Reiser, and Alfons Kemper Department of Computer Science, Germany
2 ? e t Many challenges and opportunities in e-science for database a c i l research p e High-throughput data managementr r o Correlation of distributedidata sources t l p Community-driven data grids S I Dealing lwith d data skew and query hot spots u o h Workload-awareness by employing cost model during S partitioning 2
3 Query Load Balancing via Partitioning 3
4 Query Load Balancing via Partitioning 4
5 Query Load Balancing via Partitioning 5
6 Query Load Balancing via Partitioning X 6
7 Query Load Balancing via Replication 7
8 Query Load Balancing via Replication 8
9 Query Load Balancing via Replication 9
10 The AstroGrid-D Project German Astronomy Community Grid Funded by the German Ministry of Education and Research Part of D-Grid 10
11 Up-Coming Data-Intensive Applications Alex Szalay, Jim Gray (Nature, 2006): Science in an exponential world Data rates LHC Terabytes a day/night Petabytes a year LHC LSST LOFAR Pan-STARRS LOFAR 11
12 The Multiwavelength Milky Way 12
13 Research Challenges Directly deal with Terabyte/Petabyte-scale data sets Integrate with existing community infrastructures High throughput for growing user communities 13
14 Current Sharing in Data Grids Data autonomy Policies allow partners to access data Each institution ensures Availability (replication) Scalability Various organizational structures [Venugopal et al. 2006]: Centralized Hierarchical Federated Hybrid 14
15 Community-Driven Data Grids (HiSbase) 15
16 Distribute by Region not by Archive! 16
17 Distribute by Region not by Archive! 17
18 Distribute by Region not by Archive! 18
19 Distribute by Region not by Archive! 19
20 Mapping Data to Nodes 20
21 Workload-Aware Training Phase Incorporate query traces during training phase Base partitioning scheme on Data load Query load Challenges Balance query load without losing data load balancing Approximate real query hot spots from query sample 21
22 Dealing with Query Hot Spots Query skew triggered by increased interest in particular subsets of the data Two well-known query load balancing techniques: Data partitioning Data replication Finding trade-offs between both 22
23 When to Split (Partition) or to Replicate Considers partition characteristics Amount of data (few/many data points) Number of queries (few/many queries) Extent of regions and queries (small/big queries) Data points Few Queries Many Queries Small Big Small Big Few SPLIT REPLICATE Many SPLIT SPLIT SPLIT REPLICATE 23
24 Region Weight Functions Data only (#objects in a region) Queries only (#queries in a region) Scaled queries Approximate real extent of hot spot Avoid overfitting to training query set Heat of a region (#objects * #queries) Extents of regions and queries Replicate when many big queries big small 24
25 Evaluation Weight functions: data, heat, extent Data sets (observational, simulation) Workloads (SDSS query log, synthetic) Partitioning Scheme Properties Load distribution Communication overhead Throughput Measurements Distributed setup FreePastry simulator Pobs 25
26 Load Distribution Uniform data set from the Millennium simulation Workload with extreme hot spot In the following: 1024 partitions Heat of a region (#data * #queries) Normalized across all partitioning schemes 26
27 Query-unaware Training 27
28 Training with Scaled Queries (scaled 50x) 28
29 Training with Scaled Queries (scaled 400x) 29
30 Heat-based, Extent-based Training 30
31 Communication Overhead for Pobs 31
32 Throughput for Pobs 32
33 Load Balancing During Runtime Complement workload-aware partitioning with runtime loadbalancing Short-term peaks Master-slave approach Load monitoring Long-term trends Based on load monitoring Histogram evolution 33
34 Related Work On-line load balancing Hundreds of thousands to millions of nodes Reacting fast Treating objects individually HiSbase 34
35 Should I Split or Replicate? Many challenges and opportunities in e-science for database research High-throughput data management Correlation of distributed data sources Community-driven data grids Dealing with data skew and query hot spots Workload-awareness by employing cost model during partitioning 35
36 Get in Touch Database systems group, TU München Web site: The HiSbase project Thank You for Your Attention 36
37 Queries Intersecting Multiple Regions 37
38 Regions Without Queries 38
39 Throughput for Pobs (300 nodes, sim.) 39
40 Throughput for Pobs (1000 nodes, sim.) 40
41 Throughput (Region-Uniform Queries) 41
Scalable Community-Driven Data Sharing in e-science Grids
NOTICE: this is the author s version of a work that was accepted for publication in Future Generation Computer Systems. Changes resulting from the publishing process, such as peer review, editing, corrections,
More informationPLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS
PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad
More informationDistributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016
Distributed Systems 05r. Case study: Google Cluster Architecture Paul Krzyzanowski Rutgers University Fall 2016 1 A note about relevancy This describes the Google search cluster architecture in the mid
More informationFinding a needle in Haystack: Facebook's photo storage
Finding a needle in Haystack: Facebook's photo storage The paper is written at facebook and describes a object storage system called Haystack. Since facebook processes a lot of photos (20 petabytes total,
More informationChisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique
Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Prateek Dhawalia Sriram Kailasam D. Janakiram Distributed and Object Systems Lab Dept. of Comp.
More informationHarnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets
Page 1 of 5 1 Year 1 Proposal Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets Year 1 Progress Report & Year 2 Proposal In order to setup the context for this progress
More informationSDSS Dataset and SkyServer Workloads
SDSS Dataset and SkyServer Workloads Overview Understanding the SDSS dataset composition and typical usage patterns is important for identifying strategies to optimize the performance of the AstroPortal
More informationParallel Databases C H A P T E R18. Practice Exercises
C H A P T E R18 Parallel Databases Practice Exercises 181 In a range selection on a range-partitioned attribute, it is possible that only one disk may need to be accessed Describe the benefits and drawbacks
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationStorage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore
Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago DSL Seminar November st, 006 Analysis
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationCLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters
CLUSTERING HIVEMQ Building highly available, horizontally scalable MQTT Broker Clusters 12/2016 About this document MQTT is based on a publish/subscribe architecture that decouples MQTT clients and uses
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More information<Insert Picture Here> Oracle Coherence & Extreme Transaction Processing (XTP)
Oracle Coherence & Extreme Transaction Processing (XTP) Gary Hawks Oracle Coherence Solution Specialist Extreme Transaction Processing What is XTP? Introduction to Oracle Coherence
More informationINSPIRE and Service Level Management Why it matters and how to implement it
Service Level Agreements for D-Grid INSPIRE and Service Level Management Why it matters and how to implement it Bastian Baranski con terra GmbH Münster, Germany http://www.sla4d-grid.de Motivation Evolution
More informationDatabases in the Cloud
Databases in the Cloud Ani Thakar Alex Szalay Nolan Li Center for Astrophysical Sciences and Institute for Data Intensive Engineering and Science (IDIES) The Johns Hopkins University Cloudy with a chance
More informationChapter 20: Database System Architectures
Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types
More informationMitigating Data Skew Using Map Reduce Application
Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,
More informationThe Virtual Observatory and the IVOA
The Virtual Observatory and the IVOA The Virtual Observatory Emergence of the Virtual Observatory concept by 2000 Concerns about the data avalanche, with in mind in particular very large surveys such as
More informationLecture 23 Database System Architectures
CMSC 461, Database Management Systems Spring 2018 Lecture 23 Database System Architectures These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used
More informationHYBRID TRANSACTION/ANALYTICAL PROCESSING COLIN MACNAUGHTON
HYBRID TRANSACTION/ANALYTICAL PROCESSING COLIN MACNAUGHTON WHO IS NEEVE RESEARCH? Headquartered in Silicon Valley Creators of the X Platform - Memory Oriented Application Platform Passionate about high
More informationDistributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC
Distributed Meta-data Servers: Architecture and Design Sarah Sharafkandi David H.C. Du DISC 5/22/07 1 Outline Meta-Data Server (MDS) functions Why a distributed and global Architecture? Problem description
More informationTowards Energy Proportionality for Large-Scale Latency-Critical Workloads
Towards Energy Proportionality for Large-Scale Latency-Critical Workloads David Lo *, Liqun Cheng *, Rama Govindaraju *, Luiz André Barroso *, Christos Kozyrakis Stanford University * Google Inc. 2012
More informationBe Fast, Cheap and in Control with SwitchKV Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman Goal: fast and cost-effective key-value store Target: cluster-level storage for
More informationCHAPTER 7 CONCLUSION AND FUTURE SCOPE
121 CHAPTER 7 CONCLUSION AND FUTURE SCOPE This research has addressed the issues of grid scheduling, load balancing and fault tolerance for large scale computational grids. To investigate the solution
More informationChapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More informationGiovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France
Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France ERF, Big data & Open data Brussels, 7-8 May 2014 EU-T0, Data
More informationAssignment 5. Georgia Koloniari
Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last
More informationDatacenter replication solution with quasardb
Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION
More informationIntroduction to the Active Everywhere Database
Introduction to the Active Everywhere Database INTRODUCTION For almost half a century, the relational database management system (RDBMS) has been the dominant model for database management. This more than
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system
More informationSinbad. Leveraging Endpoint Flexibility in Data-Intensive Clusters. Mosharaf Chowdhury, Srikanth Kandula, Ion Stoica. UC Berkeley
Sinbad Leveraging Endpoint Flexibility in Data-Intensive Clusters Mosharaf Chowdhury, Srikanth Kandula, Ion Stoica UC Berkeley Communication is Crucial for Analytics at Scale Performance Facebook analytics
More informationRBF: A New Storage Structure for Space- Efficient Queries for Multidimensional Metadata in OSS
RBF: A New Storage Structure for Space- Efficient Queries for Multidimensional Metadata in OSS Yu Hua 1, Dan Feng 1, Hong Jiang 2, Lei Tian 1 1 School of Computer, Huazhong University of Science and Technology,
More informationTake Back Lost Revenue by Activating Virtuozzo Storage Today
Take Back Lost Revenue by Activating Virtuozzo Storage Today JUNE, 2017 2017 Virtuozzo. All rights reserved. 1 Introduction New software-defined storage (SDS) solutions are enabling hosting companies to
More informationPeta-Scale Simulations with the HPC Software Framework walberla:
Peta-Scale Simulations with the HPC Software Framework walberla: Massively Parallel AMR for the Lattice Boltzmann Method SIAM PP 2016, Paris April 15, 2016 Florian Schornbaum, Christian Godenschwager,
More informationChapter 24 NOSQL Databases and Big Data Storage Systems
Chapter 24 NOSQL Databases and Big Data Storage Systems - Large amounts of data such as social media, Web links, user profiles, marketing and sales, posts and tweets, road maps, spatial data, email - NOSQL
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture X: Parallel Databases Topics Motivation and Goals Architectures Data placement Query processing Load balancing
More informationApril 21, 2017 Revision GridDB Reliability and Robustness
April 21, 2017 Revision 1.0.6 GridDB Reliability and Robustness Table of Contents Executive Summary... 2 Introduction... 2 Reliability Features... 2 Hybrid Cluster Management Architecture... 3 Partition
More informationFacebook Tao Distributed Data Store for the Social Graph
L. Lancia, G. Salillari Cloud Computing Master Degree in Data Science Sapienza Università di Roma Facebook Tao Distributed Data Store for the Social Graph L. Lancia & G. Salillari 1 / 40 Table of Contents
More informationToward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017
Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store Wei Xie TTU CS Department Seminar, 3/7/2017 1 Outline General introduction Study 1: Elastic Consistent Hashing based Store
More informationStrategic Briefing Paper Big Data
Strategic Briefing Paper Big Data The promise of Big Data is improved competitiveness, reduced cost and minimized risk by taking better decisions. This requires affordable solution architectures which
More informationAn Adaptive Online System for Efficient Processing of Hierarchical Data
Dimitrios Tsoumakos Nectarios Koziris {nasia, dtsouma, nkoziris}@cslab.ece.ntua.gr Motivation (1) Efficient, on-line processing of bulk data Organized in concept hierarchies Over one or more dimensions
More informationBig Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.
Big Data Programming: an Introduction Spring 2015, X. Zhang Fordham Univ. Outline What the course is about? scope Introduction to big data programming Opportunity and challenge of big data Origin of Hadoop
More informationOracle NoSQL Database Overview Marie-Anne Neimat, VP Development
Oracle NoSQL Database Overview Marie-Anne Neimat, VP Development June14, 2012 1 Copyright 2012, Oracle and/or its affiliates. All rights Agenda Big Data Overview Oracle NoSQL Database Architecture Technical
More informationB.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2
Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance,
More informationAdvanced Databases: Parallel Databases A.Poulovassilis
1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger
More informationDistributed KIDS Labs 1
Distributed Databases @ KIDS Labs 1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Appears to user as a single system Database
More informationVIRTUAL OBSERVATORY TECHNOLOGIES
VIRTUAL OBSERVATORY TECHNOLOGIES / The Johns Hopkins University Moore s Law, Big Data! 2 Outline 3 SQL for Big Data Computing where the bytes are Database and GPU integration CUDA from SQL Data intensive
More informationMassively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data
Big Data Really Fast A Proven In-Memory Analytical Processing Platform for Big Data 2 Executive Summary / Overview: Big Data can be a big headache for organizations that have outgrown the practicality
More informationStorage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu, Ian Foster
Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu, Ian Foster. Overview Both the industry and academia have an increase demand for good policies and mechanisms to
More informationSURVEY ON LOAD BALANCING AND DATA SKEW MITIGATION IN MAPREDUCE APPLICATIONS
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)
More informationTowards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures
Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures Frédéric Suter Joint work with Gabriel Antoniu, Julien Bigot, Cristophe Blanchet, Luc
More informationRAMCloud. Scalable High-Performance Storage Entirely in DRAM. by John Ousterhout et al. Stanford University. presented by Slavik Derevyanko
RAMCloud Scalable High-Performance Storage Entirely in DRAM 2009 by John Ousterhout et al. Stanford University presented by Slavik Derevyanko Outline RAMCloud project overview Motivation for RAMCloud storage:
More informationStorage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium
Storage on the Lunatic Fringe Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium tmruwart@dtc.umn.edu Orientation Who are the lunatics? What are their requirements?
More informationTools for Social Networking Infrastructures
Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes
More informationWhen, Where & Why to Use NoSQL?
When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),
More informationHybrid Data Platform
UniConnect-Powered Data Aggregation Across Enterprise Data Warehouses and Big Data Storage Platforms A Percipient Technology White Paper Author: Ai Meun Lim Chief Product Officer Updated Aug 2017 2017,
More informationUsing Statistics for Computing Joins with MapReduce
Using Statistics for Computing Joins with MapReduce Theresa Csar 1, Reinhard Pichler 1, Emanuel Sallinger 1, and Vadim Savenkov 2 1 Vienna University of Technology {csar, pichler, sallinger}@dbaituwienacat
More informationAbstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight
ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group
More informationVisualization-supported Analysis of System Data for Controlled VMI-based Intrusion Detection
Visualization-supported Analysis of System Data for Controlled VMI-based Intrusion Detection Noëlle Rakotondravony, Prof. Hans P. Reiser Juniorprofessur für Sicherheit in Informationssystemen Universität
More informationDynamic Metadata Management for Petabyte-scale File Systems
Dynamic Metadata Management for Petabyte-scale File Systems Sage Weil Kristal T. Pollack, Scott A. Brandt, Ethan L. Miller UC Santa Cruz November 1, 2006 Presented by Jae Geuk, Kim System Overview Petabytes
More informatione-infrastructures in FP7 INFO DAY - Paris
e-infrastructures in FP7 INFO DAY - Paris Carlos Morais Pires European Commission DG INFSO GÉANT & e-infrastructure Unit 1 Global challenges with high societal impact Big Science and the role of empowered
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationSCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING
SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING ZEYNEP KORKMAZ CS742 - PARALLEL AND DISTRIBUTED DATABASE SYSTEMS UNIVERSITY OF WATERLOO OUTLINE. Background 2. What is Schism?
More informationApril 2010 Rosen Shingle Creek Resort Orlando, Florida
Data Reduction and File Systems Jeffrey Tofano Chief Technical Officer, Quantum Corporation Today s Agenda File Systems and Data Reduction Overview File System and Data Reduction Integration Issues Reviewing
More informationThe Intersection of Cloud & Solid State Storage
The Intersection of Cloud & Solid State Storage Val Bercovici Cloud Czar, NetApp Office of the CTO SNIA Cloud Storage Initiative SNIA Solid State Storage Initiative Cloud Backdrop Worldwide IT spending
More informationHYRISE In-Memory Storage Engine
HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University
More informationLH*Algorithm: Scalable Distributed Data Structure (SDDS) and its implementation on Switched Multicomputers
LH*Algorithm: Scalable Distributed Data Structure (SDDS) and its implementation on Switched Multicomputers Written by: Salman Zubair Toor E-Mail: salman.toor@it.uu.se Teacher: Tore Risch Term paper for
More informationDecentralized Distributed Storage System for Big Data
Decentralized Distributed Storage System for Big Presenter: Wei Xie -Intensive Scalable Computing Laboratory(DISCL) Computer Science Department Texas Tech University Outline Trends in Big and Cloud Storage
More informationUse Cases for Partitioning. Bill Karwin Percona, Inc
Use Cases for Partitioning Bill Karwin Percona, Inc. 2011-02-16 1 Why Use Cases?! Anyone can read the reference manual: http://dev.mysql.com/doc/refman/5.1/en/partitioning.html! This talk is about when
More informationDDN About Us Solving Large Enterprise and Web Scale Challenges
1 DDN About Us Solving Large Enterprise and Web Scale Challenges History Founded in 98 World s Largest Private Storage Company Growing, Profitable, Self Funded Headquarters: Santa Clara and Chatsworth,
More informationFujitsu/Fujitsu Labs Technologies for Big Data in Cloud and Business Opportunities
Fujitsu/Fujitsu Labs Technologies for Big Data in Cloud and Business Opportunities Satoshi Tsuchiya Cloud Computing Research Center Fujitsu Laboratories Ltd. January, 2012 Overview: Fujitsu s Cloud and
More informationRecent Advances in Analytical Modeling of SSD Garbage Collection
Recent Advances in Analytical Modeling of SSD Garbage Collection Jianwen Zhu, Yue Yang Electrical and Computer Engineering University of Toronto Flash Memory Summit 2014 Santa Clara, CA 1 Agenda Introduction
More informationRAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System
RAIDIX Data Storage Solution Clustered Data Storage Based on the RAIDIX Software and GPFS File System 2017 Contents Synopsis... 2 Introduction... 3 Challenges and the Solution... 4 Solution Architecture...
More information2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice
2014 年 3 月 13 日星期四 From Big Data to Big Value Infrastructure Needs and Huawei Best Practice Data-driven insight Making better, more informed decisions, faster Raw Data Capture Store Process Insight 1 Data
More informationAuthors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani
The Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani CS5204 Operating Systems 1 Introduction GFS is a scalable distributed file system for large data intensive
More informationOracle and Tangosol Acquisition Announcement
Oracle and Tangosol Acquisition Announcement March 23, 2007 The following is intended to outline our general product direction. It is intended for information purposes only, and may
More informationEMC ISILON HARDWARE PLATFORM
EMC ISILON HARDWARE PLATFORM Three flexible product lines that can be combined in a single file system tailored to specific business needs. S-SERIES Purpose-built for highly transactional & IOPSintensive
More informationECMWF's Next Generation IO for the IFS Model and Product Generation
ECMWF's Next Generation IO for the IFS Model and Product Generation Future workflow adaptations Tiago Quintino, B. Raoult, S. Smart, A. Bonanni, F. Rathgeber, P. Bauer ECMWF tiago.quintino@ecmwf.int ECMWF
More informationStorage Optimization with Oracle Database 11g
Storage Optimization with Oracle Database 11g Terabytes of Data Reduce Storage Costs by Factor of 10x Data Growth Continues to Outpace Budget Growth Rate of Database Growth 1000 800 600 400 200 1998 2000
More informationdata parallelism Chris Olston Yahoo! Research
data parallelism Chris Olston Yahoo! Research set-oriented computation data management operations tend to be set-oriented, e.g.: apply f() to each member of a set compute intersection of two sets easy
More informationSo you think you know everything about Partitioning?
So you think you know everything about Partitioning? Presenting with Hermann Bär, Director Product Management Oracle Herbert Rossgoderer, CEO ISE Informatik 1 Copyright 2011, Oracle and/or its affiliates.
More informationFrom data center OS to Cloud architectures The future is Open Syed M Shaaf
From data center OS to Cloud architectures The future is Open Syed M Shaaf Solution Architect Red Hat Norway August 2013 1 COMPANY REVENUE FY 2003 FY 2014 400 350 300 the 1 DOLLAR OPEN SOURCE (in millions)
More informationOracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation
Oracle 10G Lindsey M. Pickle, Jr. Senior Solution Specialist Technologies Oracle Corporation Oracle 10g Goals Highest Availability, Reliability, Security Highest Performance, Scalability Problem: Islands
More informationHuge market -- essentially all high performance databases work this way
11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch
More informationMatthias Wobben working in Berlin, Germany. Senior Sales Engineer at Nextcloud
Matthias Wobben matthias@nextcloud.com working in Berlin, Germany Senior Sales Engineer at Nextcloud Before: 3 rd level IT Engineer and Administrator at Systems Provider with focus on EFSS and collaboration
More informationA Data Diffusion Approach to Large Scale Scientific Exploration
A Data Diffusion Approach to Large Scale Scientific Exploration Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Yong Zhao: Microsoft Ian Foster:
More informationcontext: massive systems
cutting the electric bill for internetscale systems Asfandyar Qureshi (MIT) Rick Weber (Akamai) Hari Balakrishnan (MIT) John Guttag (MIT) Bruce Maggs (Duke/Akamai) Éole @ flickr context: massive systems
More informationCisco SAN Analytics and SAN Telemetry Streaming
Cisco SAN Analytics and SAN Telemetry Streaming A deeper look at enterprise storage infrastructure The enterprise storage industry is going through a historic transformation. On one end, deep adoption
More informationJingren Zhou. Microsoft Corp.
Jingren Zhou Microsoft Corp. Microsoft Bing Infrastructure BING applications fall into two broad categories: Back-end: Massive batch processing creates new datasets Front-end: Online request processing
More informationExperiment-Driven Evaluation of Cloud-based Distributed Systems
Experiment-Driven Evaluation of Cloud-based Distributed Systems Markus Klems,, TU Berlin 11th Symposium and Summer School On Service-Oriented Computing Agenda Introduction Experiments Experiment Automation
More informationMigrating to MySQL. Ted Wennmark, consultant and cluster specialist. Copyright 2014, Oracle and/or its its affiliates. All All rights reserved.
Migrating to MySQL Ted Wennmark, consultant and cluster specialist Copyright 2014, Oracle and/or its its affiliates. All All rights reserved. MySQL is Everywhere MULTIPLE PLATFORMS Multiple Languages MULTIPLE
More informationHANDLING DATA SKEW IN MAPREDUCE
Benjamin Gufler 1, Nikolaus Augsten 2, Angelika Reiser 1 and Alfons Kemper 1 1 Technische Universität München, München, Germany 2 Free University of Bozen-Bolzano, Bolzano, Italy {gufler, reiser, kemper}@in.tum.de,
More informationNative Support of Multi-tenancy in RDBMS for Software as a Service
Native Support of Multi-tenancy in RDBMS for Software as a Service Oliver Schiller Benjamin Schiller Andreas Brodt Bernhard Mitschang Applications of Parallel and Distributed Systems Universität Stuttgart
More informationFunctional Testing of SQL Server on Kaminario K2 Storage
Functional Testing of SQL Server on Kaminario K2 Storage September 2016 TABLE OF CONTENTS 2 3 4 11 12 14 Executive Summary Introduction to Kaminario K2 Functionality Tests for SQL Server Summary Appendix:
More informationSAT, SMT and QBF Solving in a Multi-Core Environment
SAT, SMT and QBF Solving in a Multi-Core Environment Bernd Becker Tobias Schubert Faculty of Engineering, Albert-Ludwigs-University Freiburg, 79110 Freiburg im Breisgau, Germany {becker schubert}@informatik.uni-freiburg.de
More informationAdaptive Query Processing on Prefix Trees Wolfgang Lehner
Adaptive Query Processing on Prefix Trees Wolfgang Lehner Fachgruppentreffen, 22.11.2012 TU München Prof. Dr.-Ing. Wolfgang Lehner > Challenges for Database Systems Three things are important in the database
More informationArchitekturen für die Cloud
Architekturen für die Cloud Eberhard Wolff Architecture & Technology Manager adesso AG 08.06.11 What is Cloud? National Institute for Standards and Technology (NIST) Definition On-demand self-service >
More informationDr. Angelika Reiser Chair for Database Systems (I3)
Introduction Dr. Angelika Reiser Chair for Database Systems (I3) www-db.in.tum.de de TU München / Garching reiser@in.tum.de Lecture Web page of the lecture: see TUMonline www-db.in.tum.de/teaching/ws1617/dbsandere
More information