functional thinking applying the philosophy of functional programming to system design and architecture

Similar documents
Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

1. Which of these Git client commands creates a copy of the repository and a working directory in the client s workspace. (Choose one.

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

Weak Consistency and Disconnected Operation in git. Raymond Cheng

EECS 482 Introduction to Operating Systems

Versioning with git. Moritz August Git/Bash/Python-Course for MPE. Moritz August Versioning with Git

The Google File System

Towards A Better SCM: Matt Mackall Selenic Consulting

Topics. " Start using a write-ahead log on disk " Log all updates Commit

CS122 Lecture 15 Winter Term,

COS 318: Operating Systems. Journaling, NFS and WAFL

The Old World. Have you ever had to collaborate on a project by

What is version control? (discuss) Who has used version control? Favorite VCS? Uses of version control (read)

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Today s Class. Faloutsos/Pavlo CMU /615

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU

Pillaging DVCS Repos Adam Baldwin

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?

NFS. Don Porter CSE 506

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

Version Control Systems

Git: (Distributed) Version Control

FS Consistency & Journaling

Git: (Distributed) Version Control

b. Developing multiple versions of a software project in parallel

Lecture 21: Logging Schemes /645 Database Systems (Fall 2017) Carnegie Mellon University Prof. Andy Pavlo

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Bazaar Architecture Overview Release 2.8.0dev1

DATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016

Announcements. Persistence: Log-Structured FS (LFS)

Git. Christoph Matthies Software Engineering II WS 2018/19. Enterprise Platform and Integration Concepts group

Functional Programming Invades Architecture. George Fairbanks SATURN May 2017

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

CS 111. Operating Systems Peter Reiher

Source control with Subversion A user perspective

M E R C U R I A L (The Source Control Management)

File Systems Management and Examples

6.033 Lecture Logging April 8, saw transactions, which are a powerful way to ensure atomicity

Implementation Garbage Collection

Modeling Process. Rich Hickey

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Inherence and Nonsubstantial Particulars: An Object-Oriented View Samuel H. Kenyon

Git. Ľubomír Prda. IT4Innovations.

Long-term Information Storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple proces

Algorithm Engineering

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

What is a file system

Stanford University Computer Science Department CS 240 Quiz 2 with Answers Spring May 24, total

CITS3211 FUNCTIONAL PROGRAMMING. 14. Graph reduction

Distributed File Systems

COMP3151/9151 Foundations of Concurrency Lecture 8

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 20 April 2011 Spring 2011 Exam 2

Concurrent & Distributed Systems Supervision Exercises

EECS 470 Lab 4. Version Control System. Friday, 31 st January, 2014

Push up your code next generation version control with (E)Git

Git. all meaningful operations can be expressed in terms of the rebase command. -Linus Torvalds, 2015

Identity, State and Values

Key-value store with eventual consistency without trusting individual nodes

Distributed Systems

Version Control with GIT

Con$nuous Integra$on Development Environment. Kovács Gábor

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

Blockchain & Distributed Internet Infrastructure

Getting Started with Functional Programming in JavaScript. Eric Normand

GETTING STARTED WITH. Michael Lessard Senior Solutions Architect June 2017

Introduction to Cryptography in Blockchain Technology. December 23, 2018

The Tux3 File System

Some Lessons Learned from Designing the Resource PKI

CSE 5306 Distributed Systems. Fault Tolerance

Announcements. Persistence: Crash Consistency

August 22, New Views on your History with git replace. Christian Couder

Linus Torvalds inventor of Linux wanted a better source control system so he wrote one

Shared snapshots. 1 Abstract. 2 Introduction. Mikulas Patocka Red Hat Czech, s.r.o. Purkynova , Brno Czech Republic

Persistent Data Structures and Managed References

CSE 374 Programming Concepts & Tools. Hal Perkins Winter 2012 Lecture 16 Version control and svn

The Google File System (GFS)

SECURE CLOUD BACKUP AND RECOVERY

Version Control with GIT: an introduction

Chapter 4: Transaction Models

NFS 3/25/14. Overview. Intui>on. Disconnec>on. Challenges

It turns out that races can be eliminated without sacrificing much in terms of performance or expressive power.

Fault tolerance with transactions: past, present and future. Dr Mark Little Technical Development Manager, Red Hat

Disk Drives and Geometry. Maximizing Cylinder Locality. (maximizing cylinder locality) Disk Throughput vs. Block Size. Disk Seek/Latency Scheduling

Disk Drives and Geometry. Maximizing Cylinder Locality. (maximizing cylinder locality) Disk Throughput vs. Block Size. Disk Seek/Latency Scheduling

Tricky issues in file systems

John DeDourek Professor Emeritus Faculty of Computer Science University of New Brunswick GIT

Journaling and Log-structured file systems

Final Review. May 9, 2017

Final Review. May 9, 2018 May 11, 2018

Version Control with Git

Git Branching. Chapter What a Branch Is

NFS. CSE/ISE 311: Systems Administra5on

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Using Git For Development. Shantanu Pavgi, UAB IT Research Computing

Virtual File System. Don Porter CSE 506

Functional Architecture:

Nigori: Storing Secrets in the Cloud. Ben Laurie

File System Consistency. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CLOUD-SCALE FILE SYSTEMS

Transcription:

functional thinking applying the philosophy of functional programming to system design and architecture

Jed Wesley-Smith @jedws

functional programming has many benefits: better program reasonability, composition, refactorability and performance yet, the dominant models & paradigms for software architecture and building software systems today remain rooted in mutation and side-effects many of the ideas and principles of functional programming have been applied to solve design problems including security, concurrency, auditing and robustness it is possible and desirable to apply them to all of the systems we build, and gain practical advantage from doing so

is the universe mutable?

what is change? what about the past? what is now?

what is functional programming?

programming, with functions!

a function f : A -> B relates one value from its domain: A to exactly one value from its range or co-domain: B always the same or equivalent value and nothing else!

programming with values

values immutable, values do not change shareable, can be cached forever referentially transparent expressions the state of a thing in an instant in time functions are values too

what about identities?

identity what we think of as the things around us; you, me, the plants and animals, rivers and mountains identities are things we name we are used to thinking of the world in terms of identities, they are the objects in our world

since the time of Plato and Aristotle, philosophers have posited true reality as timeless, based on permanent substances, while processes are denied or subordinated to timeless substances if Socrates changes, becoming sick, Socrates is still the same, and change (his sickness) only glides over his substance: change is accidental, whereas the substance is essential http://en.wikipedia.org/wiki/process_philosophy

No man ever steps in the same river twice, for it's not the same river and he's not the same man. Heraclitus

an identity is a series of values over time

reifying time

f : A -> B

f : A -> B

f : A -> B f : A -> T -> B

f : A -> B f : A -> T -> B

Version: 1, Time: A

Version 1 Version: 2, Time: B

Version 1 Version 2 Version: 3, Time C

a -> t1 -> X

a -> t1 -> X a -> t2 -> X'

change

X + Δ = X' X' - X = Δ X' - Δ = X we can store entire versions, or we can store deltas they are equivalent being in possession of any two allows us to traverse time

architecture in the Real World

problem: atomic updates

journaling file system many writes in a single update describe writes in a log perform writes mark logged writes as complete replay incomplete writes to recover from system failure journal is an append-only immutable structure, contains an audit log of all changes (usually deltas) can be used to revert a system to a previous state

journaling file system: zfs constant time snapshot of file-system state incremental changes create multiple versions that are persistent, revertable and replayable (ie. copy-on-write) high cache efficiency due to immutability of data storage compaction via data de-duplication continuous integrity checking and automatic data repair

content-addressable storage files are stored at an address computed from their content: a content hash names are associated with a hash retrieval looks up the current hash for a name, then accessing the content stored at that address update adds new content, then a new (name, hash) pair caches only cache content at a hash, not at a name

git: version control system non-linear development, branching/merging distributed development, changes must be shareable between repositories that are not necessarily connected cryptographic authentication of history, the ability to uniquely identify the complete development history of any change to the resources in a repository

git: design content is stored as a directed acyclic graph (DAG) of content and content deltas plus meta-data content blobs are stored using the hash of the content or delta trees store lists of file names and links to content in the form of other trees, or blob hash commits are stored using a hash of the meta-data, including tree hash, author, date, parent commit/s

git: file format updates add new deltas, or a full version known as a pack all old versions are reconstructable the same content produces the same hash, equivalent updates commute data-structure is (mostly) immutable mutable pointer to head of a branch

git: benefits presents a mutable view of an immutable structure commit hash includes parent commits, providing a cryptographically secure signature of content and history commit and content data are shareable values, enabling distribution between multiple repositories

lucene full-text indexing and search needs to maintain a stable searchable view of an index in the face of concurrent updates

lucene: index an index is a collection of Documents a document is a collection of Fields and has an ID an index is updated by deleting and re-adding documents searching is done via a Searcher for its lifetime, a searcher will see the state of the index as it was when it was opened

lucene: file-format an index is made of Segment files segments contain documents deleting a document adds the document ID to a per-segment.del file ie. it doesn t modify the segment file directly when no searchers reference a segment with many deleted documents, it may be be merged with others into a new segment containing the remaining documents ie. garbage collection

segment 1 document 0 document 1 document 2 document 3 document 4 document 5 document 6 document 7 document 8 document 9 segment 2 document 10 document 11 document 12 document 13 document 14 document 15 document 16 document 17 document 18 document 19

searcher1 segment 1 document 0 document 1 document 2 document 3 document 4 document 5 document 6 document 7 document 8 document 9 segment 2 document 10 document 11 document 12 document 13 document 14 document 15 document 16 document 17 document 18 document 19

searcher1 segment 1 document 0 document 1 document 2 document 3 document 4 document 5 document 6 document 7 document 8 document 9 document 3 document 8 segment 2 document 10 document 11 document 12 document 13 document 14 document 15 document 16 document 17 document 18 document 19 document 11 segment 3 document 20 document 21 document 22

searcher1 segment 1 document 0 document 1 document 2 document 3 document 4 document 5 document 6 document 7 document 8 document 9 document 3 document 8 segment 2 document 10 document 11 document 12 document 13 document 14 document 15 document 16 document 17 document 18 document 19 document 11 segment 3 document 20 document 21 document 22

searcher1 searcher2 segment 1 document 0 document 1 document 2 document 3 document 4 document 5 document 6 document 7 document 8 document 9 document 3 document 8 segment 2 document 10 document 11 document 12 document 13 document 14 document 15 document 16 document 17 document 18 document 19 document 11 segment 3 document 20 document 21 document 22

netflix scale: 30% of last-mile internet traffic, +10k AWS instances immutable everything, including servers: servers are values, not modified new versions are printed and deployed old versions are replaced idempotent updates ReactiveJava/RX (JavaScript) programming model

conclusions avoid mutation at all costs values replace or occlude values store change apply changes to construct a temporal view apply these ideas to your entire system architecture profit!

thanks