A NEW BEHAVIOUR PROPOSED FOR R15A JAY NELSON

Similar documents
CA485 Ray Walshe Google File System

Chapter 8: Virtual Memory. Operating System Concepts Essentials 2 nd Edition

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

Experiments in OTP-Compliant Dataflow Programming

Chapter 1 Disk Storage, Basic File Structures, and Hashing.

File Structures and Indexing

Tree-Structured Indexes

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Chapter 9: Virtual Memory

The Google File System

Principles of Data Management. Lecture #5 (Tree-Based Index Structures)

Google is Really Different.

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

Erlang 101. Google Doc

Tree-Structured Indexes

RADU POPESCU IMPROVING THE WRITE SCALABILITY OF THE CERNVM FILE SYSTEM WITH ERLANG/OTP

Tree-Structured Indexes

Advanced Memory Organizations

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Tree-Structured Indexes

Chapter 12: Query Processing. Chapter 12: Query Processing

1 Lecture 5: Advanced Data Structures

CS143: Disks and Files

Macros, and protocols, and metaprogramming - Oh My!

Introduction. Choice orthogonal to indexing technique used to locate entries K.

OS impact on performance

CS 333 Introduction to Operating Systems. Class 11 Virtual Memory (1) Jonathan Walpole Computer Science Portland State University

CS510 Advanced Topics in Concurrency. Jonathan Walpole

Storing Data: Disks and Files

External Sorting. Why We Need New Algorithms

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Scaling with Continuous Deployment

Chapter 12: Query Processing

CSE 380 Computer Operating Systems

Data and File Structures Chapter 3. Secondary Storage and System Software: CD-ROM & Issues in Data Management

Chapter 7 Multimedia Operating Systems

Formal Specification and Verification

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

NPTEL Course Jan K. Gopinath Indian Institute of Science

Advanced Functional Programming, 1DL Lecture 2, Cons T Åhs

Software Engineering using Formal Methods

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*:

Beyond ILP. Hemanth M Bharathan Balaji. Hemanth M & Bharathan Balaji

The Google File System

Advanced Programming & C++ Language

Database Applications (15-415)

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Scalable ejabberd. Konstantin Tcepliaev. Moscow Erlang Factory Lite June 2012

PROCESSES AND THREADS THREADING MODELS. CS124 Operating Systems Winter , Lecture 8

Tree-Structured Indexes. Chapter 10

OPERATING SYSTEM. Chapter 9: Virtual Memory

Application debugging USB Bus utilization graph

3. Quality of Service

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

CSE 124: Networked Services Lecture-16

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:

Chapter 11: File System Implementation. Objectives

Software Engineering using Formal Methods

Tail Latency in ZooKeeper and a Simple Reimplementation

Lecture: Consistency Models, TM

CPU DB Data Visualization Senior Project Report

Breaking Down Barriers: An Intro to GPU Synchronization. Matt Pettineo Lead Engine Programmer Ready At Dawn Studios

Locality of Reference

Understanding the TOP Server ControlLogix Ethernet Driver

Cache Refill/Access Decoupling for Vector Machines

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Storing Data: Disks and Files

COMPUTER SCIENCE TRIPOS

Memory. From Chapter 3 of High Performance Computing. c R. Leduc

The Google File System

Virtual or Logical. Logical Addr. MMU (Memory Mgt. Unit) Physical. Addr. 1. (50 ns access)

Storing Data: Disks and Files

AC: COMPOSABLE ASYNCHRONOUS IO FOR NATIVE LANGUAGES. Tim Harris, Martín Abadi, Rebecca Isaacs & Ross McIlroy

Disks and I/O Hakan Uraz - File Organization 1

A Language for Specifying Type Contracts in Erlang and its Interaction with Success Typings

I/O Management and Disk Scheduling. Chapter 11

L9: Storage Manager Physical Data Organization

Tree-Structured Indexes

Introduction to Apache Kafka

External Sorting. Chapter 13. Comp 521 Files and Databases Fall

Sample Question Paper

Simple variant of coding with a variable number of symbols and fixlength codewords.

CS399 New Beginnings. Jonathan Walpole

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Database Management Systems need to:

Roadmap. Handling large amount of data efficiently. Stable storage. Parallel dataflow. External memory algorithms and data structures

TRANSACTION MANAGEMENT

Example Implementations of File Systems

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Chapter 7

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

Tree-Structured Indexes

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

CSE306 - Homework2 and Solutions

REMOTE PERSISTENT MEMORY ACCESS WORKLOAD SCENARIOS AND RDMA SEMANTICS

Acceleration Systems Technical Overview. September 2014, v1.4

MobiLink Performance. A whitepaper from ianywhere Solutions, Inc., a subsidiary of Sybase, Inc.

Apica ZebraTester. Advanced Load Testing Tool and Cloud Platform

Transcription:

PROCESS-STRIPED BUFFERING WITH GEN_STREAM A NEW BEHAVIOUR PROPOSED FOR R15A JAY NELSON HTTP://WWW.DUOMARK.COM/ @DUOMARK

GENESIS WIDEFINDER (TIM BRAY S* CONCURRENCY CHALLENGE) COUNT WEBPAGE VISIT FREQUENCY ~10 LINES OF RUBY WANTED TO SCALE TO MULTI-CORE WITHOUT EFFORT *HTTP://WWW.TBRAY.ORG/ONGOING/WHEN/200X/ 2007/09/20/WIDE-FINDER

WIDEFINDER RESULTS ERLANG FARED POORLY (INITIALLY) TEXT I/O PERFORMANCE WAS LACKING CONCERTED EFFORT BY ERLANGERS RESULT = OVER 350 LINES OF CODE (VINOSKI, CAOYUAN AND OTHERS)

COMMON PATTERN PATTERN EMERGED IN ERLANG SUBMISSIONS BINARY READ FILE FIND LINE BREAKS DISTRIBUTE LINES SEEMED SIMPLE, INVOLVED HUNDREDS OF SLOC MIRRORED MY EARLIER EXPERIMENTS WITH BINARIES CAN T ASSUME BINARY FITS IN MEMORY

WIDEFINDER2 (ASIDE) FINAL WIDEFINDER2 SOLUTIONS ARE MOSTLY C ULTIMATE WINNER OF WIDEFINDER2 HTTP://WWW.1024CORES.NET/ BLOG IS A GOOD READ ON CONCURRENCY ISSUES

GEN_STREAM

CONCEPT BUILT ON GEN_SERVER MAINTAINS AN INTERNAL MATRIX OF BUFFERS EACH COLUMN IS A PROCESS EACH CELL IS A BLOCK OF MEMORY SERIAL STREAM IS STRIPED ACROSS PROCESSES ADJACENT SEGMENTS ARE IN DIFFERENT PROCESSES COLUMN REFILLS INTERLEAVE WITH REQUESTS

CONCEPT (CONT.)

EXAMPLE API USAGE {ok, Pid} = gen_stream:start_link([{stream_type, {binary, BinaryInMemory}, {num_procs, 4}, {chunks_per_proc, 3}]); read_all(pid). read_all(pid) -> case gen_stream:next_block(pid) of {block, Block} -> process_block(block), read_all(pid); {end_of_stream} -> gen_stream:stop(pid) end.

IMPLEMENTATION START / START_LINK STREAM_TYPE OPTIONS (REQ D) {stream_type, {binary, Bin::binary()}} {stream_type, {file, FileName::string()}} {stream_type, {behaviour, Mod::atom(), ModArgs::list()}} DETERMINES SOURCE DATA TYPE BUILT-INS USE SUB-BINARIES WHERE POSSIBLE

IMPLEMENTATION (CONT.) START / START_LINK BUFFER SIZING OPTIONS {num_procs, pos_integer()} => concurrency {chunks_per_proc, pos_integer()} => stacked buffers {chunk_size, pos_integer()} => single buffer size {block_factor, pos_integer()} => # records per buffer LIMIT MAXIMUM MEMORY USAGE ALLOW PACKING OF SMALL DATA DEFINE CONCURRENT DATA LOADING

IMPLEMENTATION (CONT.) START / START_LINK REPLAY OPTIONS {is_circular, boolean()} => continuous data stream START / START_LINK TRANSFORM CHUNK OPTIONS {x_mfa, {module(), atom(), list()}} {x_fun, fun()} CONVERTS DATA CONCURRENTLY AS IT IS LOADING

BEHAVIOUR INTERFACE behaviour_info(callbacks) -> [ {init, 1}, % Creates ModState {stream_size, 1}, % may be is_circular {inc_progress, 2}, % Seen + ThisChunkSize {extract_block, 5}, {extract_final_block, 5}, {terminate, 2}, {code_change, 3} ];

EXTRACT_BLOCK/5 MODULE STATE (FROM MODULE:INIT() CALL) POSITION (OFFSET FROM START OF STREAM) NUMBER OF BYTES TO PRODUCE CHUNK SIZE (NUMBER OF BYTES IN A CHUNK) BLOCKING FACTOR (E.G., 10 CHUNKS PER BLOCK)

EXTRACT_FINAL_BLOCK/5 SAME PARAMETERS AS EXTRACT_BLOCK/5 NUMBER OF BYTES IS CAPPED TO STREAM_SIZE GEN_STREAM HANDLES CIRCULARITY

DYNAMICS INIT/1 - INSTANTIATES INTERNAL PROCESSES SEND {next_block, self()} TO EACH PROCESS CLIENT REQUESTS gen_stream:next_block(pid) CLIENT AND FILL BUFFER REQUESTS INTERLEAVE IF BUFFER EMPTY, CLIENT REQUEST IS IMMEDIATE FILL FETCH, RETURN AND MESSAGE SELF TO FILL BUFFER

IMPLICATIONS ALWAYS REPRESENTS A SERIAL, ORDERED STREAM DESIGNED FOR PULL SEMANTICS (PUSH CAN OVERFLOW) EQUIVALENT TO A COMPREHENSION ON EXTERNAL DATA CAN IMPLEMENT INFINITE COMPREHENSIONS MAIN CONCURRENCY IS OVERLAPPED DATA FETCHES SECONDARY CONCURRENCY IN REFILLING BUFFERS CONCURRENT ON-THE-FLY TRANSFORMATIONS

USER CHOICES STREAM DYNAMICS RESOURCES CONSUMED: MEMORY, PROCESSES DATA PROCESSING MODEL DATA GRANULARITY / ELEMENT BLOCKING ARCHITECTURAL CHOKE POINTS THROTTLE DATA TIMING / THROUGHPUT ADAPTIVELY CONTROLLED ON EACH INSTANTIATION

PROMISE (HOPE?)

EFFICIENT TEXT FILES COVERS THE WIDEFINDER CODE EXAMPLES BINARY BLOCKS OF TEXT ALLOWS VARIABLE CHUNK SIZES USER-DEFINED x_mfa OR x_fun TO BREAK BLOCKS OPTIONALLY ELIMINATE OR FILTER DATA BLOCKS COULD ALSO COMPRESS / DECOMPRESS ANY DATA TRANSFORMATION

FIXED-SIZE RECORDS EXTREMELY EFFICIENT FIXED LENGTH RECORD LOADING PREDICTIVE LOCATIONS ALLOW FULL CONCURRENCY DATA CAN FLOW THROUGH BUFFERS AS BINARIES INDEX GENERATION (RECORDS AND LOCATIONS) BULK-LOADING OF VERY SHORT RECORDS block_factor LOADS MULTIPLE RECORDS PER CHUNK TRANSFORM CAN SPLIT TO A LIST OF SUB-BINARIES

BUFFERING ORIGINAL GOAL OF THE PATTERN ON REFLECTION PROBABLY LEAST USEFUL FEATURE I/O ALREADY BUFFERED AT LEAST TWICE HTTP://SNA-PROJECTS.COM/KAFKA/DESIGN.PHP FROM LINKEDIN S KAFKA MESSAGING

STREAM IDIOM CONCISE, EASY-TO-USE INTERFACE BINARY, FILE OR FUNCTIONAL GENERATION (FUTURE CONTINUATION-BASED OPTION) INFINITE DATA / LAZY DATA GENERATION STANDARDIZES ALGORITHMS TO UNITS OF WORK ARCHITECTURAL LEVEL COMPREHENSIONS EXTENDS MAPPING BEYOND MEMORY SIZE

SEQUENCING EVENTS STREAMS CAN BE SEQUENTIALLY ORDERED EVENTS REPRODUCIBLE TESTING SCENARIOS SCRIPTED EVENTS CAN DRIVE STATE MACHINES SCRIPTING AS AN ARCHITECTURAL PATTERN POOLED SOURCE OF SLOW TO GENERATE SEQUENCES BEWARE THAT NEXT_BLOCK MAY TIMEOUT

TESTING MEMORY EFFICIENT, REPEATABLE EVENTING LARGE EXTERNAL SOURCE OF TEST EXAMPLES GENERATED TEST CASES VIA A MODULE INFINITE STREAMS OF DATA (CIRCULAR OR NOT) INFINITE RANDOM SAMPLING FROM A SET STRESS TESTING / MEMORY LEAK IDENTIFICATION DYNAMICALLY SCRIPTED EVENTING

FEEDBACK CODE IS COOKING IN PU ON GITHUB: ERLANG/OTP jn/gen_stream (stdlib) (730c7fd) WILL BE AVAILABLE AT HTTP://WWW.DUOMARK.COM/ EASIER TO LOAD FROM THE SHELL PLEASE TRY IT, GIVE FEEDBACK -- GOOD OR BAD DEMAND ACCEPTANCE FROM YOU SWEDISH OTP REP!!