HOW INDEX TO STORE DATA DATA

Similar documents
data systems 101 prof. Stratos Idreos class 2

column-stores basics

data systems 101 prof. Stratos Idreos class 2

basic db architectures & layouts

column-stores basics

class 5 column stores 2.0 prof. Stratos Idreos

Models & Intro to DB Architectures

from bits to systems

SQL & intro to db architectures

class 17 updates prof. Stratos Idreos

class 17 updates prof. Stratos Idreos

systems & research project

CMPSCI 645 Database Design & Implementation

complex plans and hybrid layouts

Models & Intro to DB Architectures

class 6 more about column-store plans and compression prof. Stratos Idreos

class 9 fast scans 1.0 prof. Stratos Idreos

class 11 b-trees prof. Stratos Idreos

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

class 20 updates 2.0 prof. Stratos Idreos

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

class 12 b-trees 2.0 prof. Stratos Idreos

class 10 b-trees 2.0 prof. Stratos Idreos

Course Introduction & History of Database Systems

CAS CS 460/660 Introduction to Database Systems. Fall

CS564: Database Management Systems. Lecture 1: Course Overview. Acks: Chris Ré 1

Internet Scale Storage

CS 245: Principles of Data-Intensive Systems. Instructor: Matei Zaharia cs245.stanford.edu

CSE 544 Principles of Database Management Systems

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Computer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College

CSE 344 JANUARY 3 RD - INTRODUCTION

Introduction to Data Management CSE 344. Lecture 2: Data Models

ENGIN 112 Intro to Electrical and Computer Engineering

Modern Database Systems CS-E4610

CS61C : Machine Structures

CMPT 354: Database System I. Lecture 1. Course Introduction

Fundamentals of Database Systems

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"

CompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy

+ Random-Access Memory (RAM)

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Announcements. Using Electronics in Class. Review. Staff Instructor: Alvin Cheung Office hour on Wednesdays, 1-2pm. Class Overview

CS145: Intro to Databases. Lecture 1: Course Overview

Why are you here? Introduction. Course roadmap. Course goals. What do you want from a DBMS? What is a database system? Aren t databases just

Caches II CSE 351 Spring #include <yoda.h> int is_try(int do_flag) { return!(do_flag!do_flag); }

CS639: Data Management for Data Science. Lecture 1: Intro to Data Science and Course Overview. Theodoros Rekatsinas

CMPSC 311- Introduction to Systems Programming Module: Caching

Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age. Presented by Dennis Grishin

CS162 Operating Systems and Systems Programming Lecture 13. Caches and TLBs. Page 1

CMPT 354: Database System I. Lecture 7. Basics of Query Optimization

Storing Data: Disks and Files

CS61C : Machine Structures

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

CS 261 Fall Mike Lam, Professor. Virtual Memory

MEMORY HIERARCHY DESIGN

CSE 544 Principles of Database Management Systems

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs"

Professor Yashar Ganjali Department of Computer Science University of Toronto.

Computing Curricula 2005

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Database Systems: Concepts, design, and implementation ISE 382 (3 Units)

Disks and Files. Jim Gray s Storage Latency Analogy: How Far Away is the Data? Components of a Disk. Disks

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014

Advanced Operating Systems (CS 202)

Module 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.

Chapter 5. Memory Technology

Lecture-14 (Memory Hierarchy) CS422-Spring

NoSQL database and its business applications

Overview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri

CTL.SC4x Technology and Systems

class 8 b-trees prof. Stratos Idreos

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015

SYLLABUS. 3. Total estimated time (hours/semester of didactic activities) 3.1 Hours per week 3 Of which: 3.2 course seminar/laboratory1 sem

Modernizing Healthcare IT for the Data-driven Cognitive Era Storage and Software-Defined Infrastructure

Page 1. Goals for Today" TLB organization" CS162 Operating Systems and Systems Programming Lecture 11. Page Allocation and Replacement"

Introduction to Data Management. Lecture #2 (Big Picture, Cont.)

Evolution of Database Systems

Modeling and evaluation on Ad hoc query processing with Adaptive Index in Map Reduce Environment

Principles of Data Management. Lecture #2 (Storing Data: Disks and Files)

Information Security CS526

CISC 360. Cache Memories Exercises Dec 3, 2009

Introduction to Data Management. Lecture 14 (Storage and Indexing)

CompSci 516 Database Systems

CSC Memory System. A. A Hierarchy and Driving Forces

CSC 261/461 Database Systems. Fall 2017 MW 12:30 pm 1:45 pm CSB 601

61A LECTURE 1 FUNCTIONS, VALUES. Steven Tang and Eric Tzeng June 24, 2013

Recent Innovations in Data Storage Technologies Dr Roger MacNicol Software Architect

Topics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL

Department of Information Technology B.E/B.Tech : CSE/IT Regulation: 2013 Sub. Code / Sub. Name : CS6302 Database Management Systems

/ Cloud Computing. Recitation 3 Sep 13 & 15, 2016

Parallel DBMS. Chapter 22, Part A

Overview of Data Management

Cycle Time for Non-pipelined & Pipelined processors

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

BBM371- Data Management. Lecture 1: Course policies, Introduction to DBMS

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

1/3/2015. Column-Store: An Overview. Row-Store vs Column-Store. Column-Store Optimizations. Compression Compress values per column

USC Viterbi School of Engineering

Transcription:

Stratos Idreos

HOW INDEX DATA TO STORE DATA

ALGORITHMS data structure decisions define the algorithms that access data INDEX DATA

ALGORITHMS unordered [7,4,2,6,1,3,9,10,5,8] INDEX DATA

ALGORITHMS unordered [7,4,2,6,1,3,9,10,5,8] INDEX DATA

ALGORITHMS unordered [7,4,2,6,1,3,9,10,5,8] ordered [1,2,3,4,5,6,7,8,9,10] INDEX DATA

ALGORITHMS INDEX DATA

DATA SYSTEMS ALGORITHMS INDEX DATA

speed COMPUTE DATA MOVEMENT 2018 DATA STRUCTURES DEFINE PERFORMANCE

speed COMPUTE register = this room caches = this city DATA MOVEMENT memory = nearby city disk = Pluto Jim Gray, Turing Award 1998 2018

no perfect structure Update Read amplification Memory

no perfect structure Read Update Read amplification Memory Update Memory

no perfect structure Read point tree Read amplification Update Memory Update differential approximate Memory

no perfect structure Read point tree Hash-Table Array Read Update Memory B-tree amplification Skip-List Update Linked-List differential approximate Memory Sorted Array Trie

How do I make my data system run x times as fast? (sql,nosql,bigdata, )

How do I make my data system run x times as fast? (sql,nosql,bigdata, ) How do I minimize my bill in the cloud?

How do I make my data system run x times as fast? (sql,nosql,bigdata, ) How do I minimize my bill in the cloud? How do I extend the lifetime of my hardware?

How do I make my data system run x times as fast? (sql,nosql,bigdata, ) How do I minimize my bill in the cloud? How do I extend the lifetime of my hardware? How to accelerate statistics computation for data science/ml?

How do I make my data system run x times as fast? (sql,nosql,bigdata, ) How do I minimize my bill in the cloud? How do I extend the lifetime of my hardware? How to accelerate statistics computation for data science/ml? How do I train my neural network x times faster?

NEW APPLICATIONS

NEW APPLICATIONS existing systems need to change too

NEW APPLICATIONS existing systems need to change too WORKLOAD HARDWARE ADAPT

NEW APPLICATIONS existing systems need to change too IMPROVE WORKLOAD HARDWARE WITHIN A BUDGET WHAT WILL BREAK MY SYSTEM? ADAPT REASON

new applications more data continuous need for new storage solutions new h/w

learning outcome fundamental of storage software engineering data-driven startup research data structures, SQL, NoSQL, Big Data, Neural Networks, Statistics, Data Science

There is no such thing as a wrong question/answer!!!! interaction: in and out of class

Each student: 2 reviews per week/1 presentation review and slides should focus on what is the problem why is it important why is it hard why existing solutions do not work what is the core intuition for the solution solution step by step does the paper prove its claims exact setup of analysis/experiments are there any gaps in the logic/proof possible next steps Recent Research Papers * follow a few citations to gain more background

Each student: 2 reviews per week/1 presentation learn to judge constructively review and slides should focus on learn to present learn to prepare slides Recent Research Papers what is the problem why is it important why is it hard why existing solutions do not work what is the core intuition for the solution solution step by step does the paper prove its claims exact setup of analysis/experiments are there any gaps in the logic/proof possible next steps * follow a few citations to gain more background

semester project: due in the end of semester + a midway check in (early March,10%) systems project research project

semester project: due in the end of semester + a midway check in (early March,10%) systems project research project individual project NoSQL, in c/c++

semester project: due in the end of semester + a midway check in (early March,10%) systems project research project groups of three individual project NoSQL, in c/c++ NoSQL, Neural Networks Periodic Table of Data Structures only open to cs165 students (unless proven otherwise)

semester project: due in the end of semester + a midway check in (early March,10%) systems project research project groups of three individual project NoSQL, in c/c++ NoSQL, Neural Networks Periodic Table of Data Structures only open to cs165 students (unless proven otherwise)

ACM Special Interest Group In Data Management (SIGMOD) Undergrad Research Competition first prize in 2016, 2017, 2018 Adaptive Denormalization Evolving Trees Splaying LSM-Trees

ACM Special Interest Group In Data Management (SIGMOD) Undergrad Research Competition first prize in 2016, 2017, 2018 Adaptive Denormalization Evolving Trees Splaying LSM-Trees Design continuums at CIDR 2019, LSM/BTree hybrids in SIGMOD finals

piazza forum classes are recorded (links on class website) all announcements & discussions (link on class website - check out usage guidelines) Project: 40% Midway Check-in:10% Discussion: 20% Presentation: 15% Reviews: 15% NO LAPTOP/PHONE POLICY class is based on participation!

Get familiar with the very basics of traditional database architectures: Architecture of a Database System. By J. Hellerstein, M. Stonebraker and J. Hamilton. Foundations and Trends in Databases, 2007 Get familiar with very basics of modern database architectures: The Design and Implementation of Modern Column-store Database Systems. By D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden. Foundations and Trends in Databases, 2013 Get familiar with the very basics of modern large scale systems: Massively Parallel Databases and MapReduce Systems. By Shivnath Babu and Herodotos Herodotou. Foundations and Trends in Databases, 2013 Check out: syllabus, preparation readings, project 0, systems project, online sections http://daslab.seas.harvard.edu/classes/cs265/

Teaching Fellows: subarna wilson wasay Off class discussions are key! question on readings, ideas, help with code/analysis

questions on logistics?

Next few classes: BASICS of storage Intro to RESEACH topics Discussion phase/presentation as of week 3 ask/answer questions, keep notes/ideas

cheaper faster CPU registers on chip cache on board cache memory disk SRAM DRAM cache miss: looking for something which is not in the cache ~1ns ~10ns ~100ns speed memory wall memory miss: looking for something which is not in memory cpu mem time

cheaper faster CPU registers on chip cache on board cache memory disk SRAM DRAM cache miss: looking for something which is not in the cache ~1ns ~10ns ~100ns speed memory wall memory miss: looking for something which is not in memory cpu mem time

registers my head ~0 2x on chip cache this room 1 min 10x on board cache this building 10 min 100x memory New York 1.5 hours Jim Gray, IBM, Tandem, DEC, Microsoft ACM Turing award ACM SIGMOD Edgar F. Codd Innovations award 100Kx disk Pluto 2 years

CPU need to only read x but have to read all of page 1 registers data value x on chip cache page1 page2 page3 data move on board cache memory disk

query x<5 (size=120 bytes) memory level N memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

query x<5 scan (size=120 bytes) 5 10 6 4 12 memory level N memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

query x<5 scan (size=120 bytes) memory level N 5 10 6 4 12 4 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

40 bytes query x<5 scan (size=120 bytes) memory level N 5 10 6 4 12 4 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

40 bytes query x<5 scan scan (size=120 bytes) memory level N 5 10 6 4 12 2 8 9 7 6 4 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

40 bytes query x<5 scan scan (size=120 bytes) memory level N 5 10 6 4 12 2 8 9 7 6 4 2 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

80 bytes query x<5 scan scan (size=120 bytes) memory level N 5 10 6 4 12 2 8 9 7 6 4 2 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

80 bytes query x<5 (size=120 bytes) 2 8 9 7 6 4 2 memory level N memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

80 bytes query x<5 scan (size=120 bytes) memory level N 7 11 3 9 6 2 8 9 7 6 4 2 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

80 bytes query x<5 scan (size=120 bytes) memory level N 7 11 3 9 6 2 8 9 7 6 4 2 3 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

120 bytes query x<5 scan (size=120 bytes) memory level N 7 11 3 9 6 2 8 9 7 6 4 2 3 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

an oracle gives us the positions query x<5 (size=120 bytes) memory level N memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

an oracle gives us the positions query x<5 oracle (size=120 bytes) 5 10 6 4 12 memory level N memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

an oracle gives us the positions query x<5 oracle (size=120 bytes) memory level N 5 10 6 4 12 4 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

an oracle gives us the positions 40 bytes query x<5 oracle (size=120 bytes) memory level N 5 10 6 4 12 4 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

an oracle gives us the positions 40 bytes query x<5 oracle oracle (size=120 bytes) memory level N 5 10 6 4 12 2 8 9 7 6 4 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

an oracle gives us the positions 40 bytes query x<5 oracle oracle (size=120 bytes) memory level N 5 10 6 4 12 2 8 9 7 6 4 2 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

an oracle gives us the positions 80 bytes query x<5 oracle oracle (size=120 bytes) memory level N 5 10 6 4 12 2 8 9 7 6 4 2 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

an oracle gives us the positions 80 bytes query x<5 (size=120 bytes) 2 8 9 7 6 4 2 memory level N memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

an oracle gives us the positions 80 bytes query x<5 oracle (size=120 bytes) memory level N 7 11 3 9 6 2 8 9 7 6 4 2 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

an oracle gives us the positions 80 bytes query x<5 oracle (size=120 bytes) memory level N 7 11 3 9 6 2 8 9 7 6 4 2 3 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

an oracle gives us the positions 120 bytes query x<5 oracle (size=120 bytes) memory level N 7 11 3 9 6 2 8 9 7 6 4 2 3 memory level N-1 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6 page size: 5x8 bytes

when does it make sense to have an oracle how can we minimize the cost e.g., query x<5 5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

CPU DATA MOVEMENT (ENERGY) MEMORY REQUIREMENT ROBUSTNESS SPACE REQUIREMENT

SQL, NoSQL, Graph, Neural Nets, Statistics > Data to Processing CPU DATA MOVEMENT (ENERGY) MEMORY REQUIREMENT ROBUSTNESS SPACE REQUIREMENT

SQL, NoSQL, Graph, Neural Nets, Statistics > Data to Processing CPU DATA MOVEMENT (ENERGY) MEMORY REQUIREMENT ROBUSTNESS SPACE REQUIREMENT TIME CLOUD COSTS

Stratos Idreos