A Summary of Out of the Tar Pit

Similar documents
State, Concurrency & Complexity. Does programming really need to be so hard?

Functional Programming in Java. CSE 219 Department of Computer Science, Stony Brook University

RAQUEL s Relational Operators

The goal of the Pangaea project, as we stated it in the introduction, was to show that

The Logical Design of the Tokeniser

Memory model for multithreaded C++: August 2005 status update

Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers

Lecture 10: Introduction to Correctness

Concepts of Programming Languages

The Bizarre Truth! Automating the Automation. Complicated & Confusing taxonomy of Model Based Testing approach A CONFORMIQ WHITEPAPER

Introduction to the UML

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Chapter 3: Dynamic Testing Techniques

Stored Relvars 18 th April 2013 (30 th March 2001) David Livingstone. Stored Relvars

Deep Integration of Scripting Languages and Semantic Web Technologies

Illustrative Example of Logical Database Creation

Is Functional Programming (FP) for me? ACCU Conference 2008 Hubert Matthews

"Relations for Relationships"

Out of the Tar Pit. 1 Introduction. Peter Marks Ben Moseley February 6, 2006

Chapter 9. Software Testing

Lecture 02. Fall 2017 Borough of Manhattan Community College

MULTIMEDIA TECHNOLOGIES FOR THE USE OF INTERPRETERS AND TRANSLATORS. By Angela Carabelli SSLMIT, Trieste

Taxonomies and controlled vocabularies best practices for metadata

administrivia today UML start design patterns Tuesday, September 28, 2010

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

On Constraint Problems with Incomplete or Erroneous Data

Illustrative Example of Logical Database Creation

Safety Case Composition Using Contracts - Refinements based on Feedback from an Industrial Case Study

Chapter 3. Describing Syntax and Semantics

Cryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

The Submission Data File System Automating the Creation of CDISC SDTM and ADaM Datasets

Goals of the BPEL4WS Specification


Functional Programming. Ionut G. Stan - OpenAgile 2010

Knowledge enrichment through dynamic annotation

Propositional Logic. Part I


THE RELATIONAL DATABASE MODEL

Database Management System 2

A Generalized Method to Solve Text-Based CAPTCHAs

BCS THE CHARTERED INSTITUTE FOR IT. BCS Higher Education Qualifications BCS Level 6 Professional Graduate Diploma in IT EXAMINERS' REPORT

NOTES ON OBJECT-ORIENTED MODELING AND DESIGN

CHAPTER 2: DATA MODELS

Software Testing CS 408

IMAGE/SQL: Part 3 of N

Evaluation of Visual Fabrique (VF)

CHAPTER 2: DATA MODELS

Relational Theory and Data Independence: Unfinished Business. Logical Data Independence and the CREATE VIEW Statement.

CS252 Advanced Programming Language Principles. Prof. Tom Austin San José State University Fall 2013

Lecture2: Database Environment

Side Effects (3B) Young Won Lim 11/20/17

Fifth Generation CS 4100 LISP. What do we need? Example LISP Program 11/13/13. Chapter 9: List Processing: LISP. Central Idea: Function Application

Distributed Consensus Protocols

Side Effects (3B) Young Won Lim 11/23/17

Flash Drive Emulation

6. The Document Engineering Approach

COMP 410 Lecture 1. Kyle Dewey

Side Effects (3B) Young Won Lim 11/27/17

Lecturer 4: File Handling

AXIOMS OF AN IMPERATIVE LANGUAGE PARTIAL CORRECTNESS WEAK AND STRONG CONDITIONS. THE AXIOM FOR nop

The Potential for Blockchain to Transform Electronic Health Records ARTICLE TECHNOLOGY. by John D. Halamka, MD, Andrew Lippman and Ariel Ekblaw

CMPT 354 Database Systems I

Several major software companies including IBM, Informix, Microsoft, Oracle, and Sybase have all released object-relational versions of their

Distribution From: J. A. Weeldreyer Date: April 5, 1978 Subject: Enhancements to the Multics Data Base Manager

Compilers Project Proposals

Creating SQL Tables and using Data Types

Database Management Systems MIT Introduction By S. Sabraz Nawaz

Message-Passing and MPI Programming

Lecture 2: Software Engineering (a review)

M301: Software Systems & their Development. Unit 4: Inheritance, Composition and Polymorphism


Database Fundamentals Chapter 1

Essay Question: Explain 4 different means by which constrains are represented in the Conceptual Data Model (CDM).

CSE373: Data Structure & Algorithms Lecture 23: Programming Languages. Aaron Bauer Winter 2014

2. An implementation-ready data model needn't necessarily contain enforceable rules to guarantee the integrity of the data.

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

Refinement and Optimization of Streaming Architectures. Don Batory and Taylor Riché Department of Computer Science University of Texas at Austin

Chapter 3B Objectives. Relational Set Operators. Relational Set Operators. Relational Algebra Operations

Introducing MESSIA: A Methodology of Developing Software Architectures Supporting Implementation Independence

The Design Patterns Matrix From Analysis to Implementation

Design Patterns

Part I Logic programming paradigm

Database Design. IIO30100 Tietokantojen suunnittelu. Michal Zabovsky. Presentation overview

6.001 Notes: Section 8.1

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Architectural Styles I

ARiSA First Contact Analysis

Database Management System (15ECSC208) UNIT I: Chapter 1: Introduction to DBMS and ER-Model

AOSA - Betriebssystemkomponenten und der Aspektmoderatoransatz

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Principles of Programming Languages

Object-Oriented Introduction

6.001 Notes: Section 6.1

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall

Data about data is database Select correct option: True False Partially True None of the Above

Integrating Systems with Event Driven Architecture. Eoin Woods

6.001 Notes: Section 31.1

Approximating Square Roots

CSc 372. Comparative Programming Languages. 2 : Functional Programming. Department of Computer Science University of Arizona

Transcription:

A Summary of Out of the Tar Pit Introduction This document summarises some points from the paper Out of the Tar Pit (written by Ben Moseley and Peter Marks, dated 6 th February 2006) which are relevant to the development of the RAQUEL DBMS. The paper is split into 2 parts. The first considers the problem of Complexity, the single major difficulty in the successful development of large-scale software systems (quote), its common causes and a general approach to eliminating the causes. The second part considers the relational model (not SQL) as an example of the successful application of their general approach. Since the intention is to build a DBMS that implements the relational model, this summary considers only the first part. A DBMS is a large-scale software system. Therefore it is helpful to summarise the common causes of complexity and how to eliminate complexity with a view to applying this understanding to the building of the RAQUEL DBMS. Essence and Accident Out of the Tar Pit distinguishes between essential difficulty and accidental difficulty as defined in Frederick P. Brooks paper No Silver Bullet Essence and Accident in Software Engineering. Brooks definitions are : Essential Difficulties Those inherent in the nature of software, the fashioning of the complex conceptual structures that compose the abstract software entity. Accidental Difficulties Those that today attend its production but are not inherent, the representation of those abstract entities in programming languages, and the mapping of these onto machine languages within space and speed constraints. Complexity Complexity is considered to be the root cause of the vast majority of problems with software today. The relevance of complexity is widely recognised. Complexity is caused by : State Quoting Brooks : From the complexity comes the difficulty of enumerating, much less understanding, all the possible states of the program. However Moseley and Marks believe it is the presence of many possible states which give rise to the complexity; i.e. the large number of states is the causal factor. Control Control is about the order in which things happen (and includes concurrency). Programming languages force a far more detailed concern Page 1 of 6

about the order than is usually desirable or necessary. Only the relative order needs to be correct, but usually a precise detailed order must be specified. The relative order may be defined as follows. For a sequence of items, s, containing items x and y, x s, y s ( x Precede y ) ( Pos(x) < Pos(y) ) where Precede and Pos(ition) are functions with the obvious semantics. Code Volume This is considered to be caused by the state and control problems. Complexity Breeds Complexity In other words, complex systems generate more complexity in the attempt to manage the initial complexity. Simplicity is Hard This is now increasingly acknowledged 1. Classical Approaches to Managing Complexity Object-Orientation OO is treated as essentially an Abstract Data Type approach to imperative programming, with integrity constraints over an object s states being applied by access methods. The following problems arising with integrity constraints. Firstly, if several object procedures access the same part of the object s state, then they are all need to enforce the integrity constraints. Secondly, it is awkward to enforce multi-object integrity constraints (as opposed to single-entity constraints). OO has both intensional identity (i.e. its object identity) and extensional identity (determined by the values of its attributes). This complicates the states to be considered. OO uses traditional control flow mechanisms. Therefore OO suffers from state-derived and control-derived complexity. Functional Programming The primary strength of functional programming, in its pure form, is that it avoids state and side effects, and provides referential transparency. This means that a given set of input values to a function always results in the same set of output values from the function. By avoiding state, functional programming avoids all the weaknesses that stem from it. However functions can pass parameters through themselves, thereby undermining their statelessness. The order in which functions appear in expressions applies control sequencing. However this can be ameliorated by the use of high level functions (e.g. fold, map) to apply control. 1 In passing, it is the author s opinion that not only is simplicity hard to achieve, it is hard to understand simplicity and what simplicity is to be achieved; also when the resulting simplicity is obvious with hindsight, the hard work is not appreciated by one s peers. Page 2 of 6

The weakness of functional programming stems from its strength, because some systems include state as part of their very nature, and so this must be permitted in the system s program code. Logic Programming Pure logic programming specifies what needs to be done (as a set of axioms which describe the problem, and the attributes of the solution), and leaves the infrastructure to derive the solution (by using the axioms to prove the solution value). As a result, pure logic programming avoids states. Unfortunately, a lot of control features are built in. These are the implicit ordering of sub-goals in statements, and the implicit of ordering of clause application via the statement sequence. Often in practice, this needs to be supplemented by extra-logical features (e.g. Prolog s cut facility) to prevent non-termination of the proofs. The logic programming paradigm does not lend itself to the development of many computer systems. Accidents and Essence Moseley and Marks use these terms in a similar way to Brooks, but start with the complexity of the user s problem. Thus their terms are defined as follows : Essential Complexity That which is inherent to the logic of the user s problem, even in an ideal world. Anything of which the user is unaware cannot be essential complexity. Accidental Complexity All the other complexity, which would not exist in an ideal world. In practice it arises from performance problems, sub-optimal programming languages and infrastructure, etc, which the software developers have to deal with to produce a user-acceptable working system. The complexity concerned is that with which software developers have to contend. Brooks and others suggest that the complexity of software is an essential property, not an accidental one. Moseley and Marks contend that complexity is not necessarily an inherent, essential property of software, and that much of today s complexity is accidental. Moseley and Marks suggest that systems contain both essential and accidental complexity, and that the goal of software engineering is to eliminate as much accidental complexity as possible and to assist with essential complexity. Determining Essential and Accidental Complexity In the ideal world, developments start with a formally specified version of the user s requirements for their system. These are essential requirements. No accidental aspects can be allowed in the formal specification. As a Page 3 of 6

consequence, no aspects of the specification can include any aspect of its execution; i.e. no performance requirements, no ease-of-use requirements, no infrastructure requirements, etc 2. Ideally one should just be able to execute the user s (functional) specification. It is hoped that most system state in the ideal world is accidental and can be got rid of. Data that is part of the user s requirements is essential. Not all of that data may give rise to essential state. All data in the system is either input or derived from input. Derived data is either immutable or mutable (because the users can update that data). Input Data User-required input data is essential. It falls into 2 cases : 1. Data which the system may need to refer to in future. This gives rise to essential state. 2. Data which is never referenced in future. Such data need not be kept. Essential Derived Data Immutable This can always be re-derived from essential state data that has been input. Ideally it need not be kept. If kept in practice, it gives rise to accidental state. Essential Derived Data Mutable This can always be re-derived from essential state data that has been input if the function carrying out the mutation has an inverse function. If the mutating function has no inverse, then changes to the data have to be considered as input. If needed in future, this gives rise to essential state. Accidental Derived Data State which is derived but not in the user s requirements is accidental state. Data Classifications :- Data Essentiality Data Type Data Mutability Classification Essential Input - Essential State Essential Derived Immutable Accidental State Essential Derived Mutable Accidental State Accidental Derived - Accidental State The classification Accidental State means that in the ideal world the corresponding data can be excluded from the system. The implication of the above is that there are large amounts of accidental state in typical systems, which ideally would be removed. 2 Traditionally the essential requirements were called Functional Requirements, and any other requirements (performance, ease-of-use, infrastructure, etc) were called Non-Functional Requirements. Page 4 of 6

Ideally the only system state is that which is visible to the user and part of their specification. Control in the Ideal World Control can generally be completely omitted from the ideal world, and hence is entirely accidental. This is because it rarely appears in the user s formal specification of their requirements. Summary Most complexity is accidental. Therefore it may be possible to significantly reduce the complexity of real large systems. How close is it possible to get to the ideal? Theoretical and Practical Limitations There may in practice be a fuzzy boundary between what is essential and what is accidental, but the distinction is still viable and worthwhile. Systems limited to essentiality may be too inefficient to be practical. Therefore accidental components may be need to be included for efficiency. Situations can arise where derived data is dependent on both initial input values and a later series of user input values. (This is normally true of the value of a DB). To achieve ease of expression and usage (of say the DB) for the user, it is preferable to maintain the current accidental state and treat it as if it were essential state, even in the ideal world. Thus some accidental complexity may need to be added to provide acceptable performance and ease of use/expression. The recommended strategy for dealing with complexity is : 1. Avoid. Accept only essential complexity. 2. Separate. Where accidental complexity is required for performance or ease of use, separate it out in order to better manage it. These recommendations are not new, but not typically applied to the development of today s software. It is recommended that the accidental complexity required for performance reasons is put into a completely separate infrastructure that handles performance and is separate from the essential complexity. It is recommended that the accidental complexity required for ease of use is treated as essential complexity and separated as discussed below. Separation takes 2 forms : 1. Separate all complexity of any kind from the pure logic of the system. (Logic is not considered to have anything to do with either state or control, and therefore not part of the complexity). This may be called the Logic/State split. 2. Divide the retained complexity into the essential and accidental. This may be called the Essential/Accidental split. Page 5 of 6

3. Split the state and control components of the Useful Accidental Complexity system component. The following table summarises the recommendations :- Complexity Type Recommendation Essential Complexity State Separate Essential Logic - Separate Useful Accidental Complexity State Separate Useful Accidental Complexity Control Separate Not-Useful Accidental Complexity State / Control Avoid The differing nature of the 4 components to be retained but kept separate from each other leads to the following relationships between them : Essential State This is the foundation of the system and is self-contained. It makes no reference to other parts of the system. Changes here may require changes to other parts of the system, but changes to other parts of the system never require changes here. This component drives the entire system. Essential Logic This is the heart of the system. It expresses in terms of state what must be true. It makes no reference to the accidental components of the system. Changes to the Essential State may require changes here. Changes here may require changes to the accidental components of the system, but changes to the accidental components of the system never require changes here. Accidental State and Control These components support the essential components. Changes here never affect the essential components. Changes to the essential components may affect these components. Summary A system should be split into the 4 non-avoidable components described above. The goals of avoid and separate must be at the top of the design agenda for a system, not merely desirable constraints. There should be no premature optimisation or designing for performance ; i.e. the design should be as simple as possible and only made more complicated when hot-spot analysis of performance reveals what optimisation is actually needed. Improving the performance of a simple, slow system is far easier than removing complexity from a complex system, which probably is not as fast as intended anyway. Page 6 of 6