functional thinking applying the philosophy of functional programming to system design and architecture
Jed Wesley-Smith @jedws
functional programming has many benefits: better program reasonability, composition, refactorability and performance yet, the dominant models & paradigms for software architecture and building software systems today remain rooted in mutation and side-effects many of the ideas and principles of functional programming have been applied to solve design problems including security, concurrency, auditing and robustness it is possible and desirable to apply them to all of the systems we build, and gain practical advantage from doing so
is the universe mutable?
what is change? what about the past? what is now?
what is functional programming?
programming, with functions!
a function f : A -> B relates one value from its domain: A to exactly one value from its range or co-domain: B always the same or equivalent value and nothing else!
programming with values
values immutable, values do not change shareable, can be cached forever referentially transparent expressions the state of a thing in an instant in time functions are values too
what about identities?
identity what we think of as the things around us; you, me, the plants and animals, rivers and mountains identities are things we name we are used to thinking of the world in terms of identities, they are the objects in our world
since the time of Plato and Aristotle, philosophers have posited true reality as timeless, based on permanent substances, while processes are denied or subordinated to timeless substances if Socrates changes, becoming sick, Socrates is still the same, and change (his sickness) only glides over his substance: change is accidental, whereas the substance is essential http://en.wikipedia.org/wiki/process_philosophy
No man ever steps in the same river twice, for it's not the same river and he's not the same man. Heraclitus
an identity is a series of values over time
reifying time
f : A -> B
f : A -> B
f : A -> B f : A -> T -> B
f : A -> B f : A -> T -> B
Version: 1, Time: A
Version 1 Version: 2, Time: B
Version 1 Version 2 Version: 3, Time C
a -> t1 -> X
a -> t1 -> X a -> t2 -> X'
change
X + Δ = X' X' - X = Δ X' - Δ = X we can store entire versions, or we can store deltas they are equivalent being in possession of any two allows us to traverse time
architecture in the Real World
problem: atomic updates
journaling file system many writes in a single update describe writes in a log perform writes mark logged writes as complete replay incomplete writes to recover from system failure journal is an append-only immutable structure, contains an audit log of all changes (usually deltas) can be used to revert a system to a previous state
journaling file system: zfs constant time snapshot of file-system state incremental changes create multiple versions that are persistent, revertable and replayable (ie. copy-on-write) high cache efficiency due to immutability of data storage compaction via data de-duplication continuous integrity checking and automatic data repair
content-addressable storage files are stored at an address computed from their content: a content hash names are associated with a hash retrieval looks up the current hash for a name, then accessing the content stored at that address update adds new content, then a new (name, hash) pair caches only cache content at a hash, not at a name
git: version control system non-linear development, branching/merging distributed development, changes must be shareable between repositories that are not necessarily connected cryptographic authentication of history, the ability to uniquely identify the complete development history of any change to the resources in a repository
git: design content is stored as a directed acyclic graph (DAG) of content and content deltas plus meta-data content blobs are stored using the hash of the content or delta trees store lists of file names and links to content in the form of other trees, or blob hash commits are stored using a hash of the meta-data, including tree hash, author, date, parent commit/s
git: file format updates add new deltas, or a full version known as a pack all old versions are reconstructable the same content produces the same hash, equivalent updates commute data-structure is (mostly) immutable mutable pointer to head of a branch
git: benefits presents a mutable view of an immutable structure commit hash includes parent commits, providing a cryptographically secure signature of content and history commit and content data are shareable values, enabling distribution between multiple repositories
lucene full-text indexing and search needs to maintain a stable searchable view of an index in the face of concurrent updates
lucene: index an index is a collection of Documents a document is a collection of Fields and has an ID an index is updated by deleting and re-adding documents searching is done via a Searcher for its lifetime, a searcher will see the state of the index as it was when it was opened
lucene: file-format an index is made of Segment files segments contain documents deleting a document adds the document ID to a per-segment.del file ie. it doesn t modify the segment file directly when no searchers reference a segment with many deleted documents, it may be be merged with others into a new segment containing the remaining documents ie. garbage collection
segment 1 document 0 document 1 document 2 document 3 document 4 document 5 document 6 document 7 document 8 document 9 segment 2 document 10 document 11 document 12 document 13 document 14 document 15 document 16 document 17 document 18 document 19
searcher1 segment 1 document 0 document 1 document 2 document 3 document 4 document 5 document 6 document 7 document 8 document 9 segment 2 document 10 document 11 document 12 document 13 document 14 document 15 document 16 document 17 document 18 document 19
searcher1 segment 1 document 0 document 1 document 2 document 3 document 4 document 5 document 6 document 7 document 8 document 9 document 3 document 8 segment 2 document 10 document 11 document 12 document 13 document 14 document 15 document 16 document 17 document 18 document 19 document 11 segment 3 document 20 document 21 document 22
searcher1 segment 1 document 0 document 1 document 2 document 3 document 4 document 5 document 6 document 7 document 8 document 9 document 3 document 8 segment 2 document 10 document 11 document 12 document 13 document 14 document 15 document 16 document 17 document 18 document 19 document 11 segment 3 document 20 document 21 document 22
searcher1 searcher2 segment 1 document 0 document 1 document 2 document 3 document 4 document 5 document 6 document 7 document 8 document 9 document 3 document 8 segment 2 document 10 document 11 document 12 document 13 document 14 document 15 document 16 document 17 document 18 document 19 document 11 segment 3 document 20 document 21 document 22
netflix scale: 30% of last-mile internet traffic, +10k AWS instances immutable everything, including servers: servers are values, not modified new versions are printed and deployed old versions are replaced idempotent updates ReactiveJava/RX (JavaScript) programming model
conclusions avoid mutation at all costs values replace or occlude values store change apply changes to construct a temporal view apply these ideas to your entire system architecture profit!
thanks