The Mercury project. Zoltan Somogyi

Similar documents
Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

G Programming Languages - Fall 2012

The role of semantic analysis in a compiler

C++ for Java Programmers

CCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs

QUIZ. What is wrong with this code that uses default arguments?

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Introduction to Java

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

Concurrent Programming in the D Programming Language. by Walter Bright Digital Mars

Packing sub-word-size arguments. Zoltan Somogyi

Programming with MPI

We do not teach programming

CSE 431S Type Checking. Washington University Spring 2013

Outline. Java Models for variables Types and type checking, type safety Interpretation vs. compilation. Reasoning about code. CSCI 2600 Spring

Agenda. Peer Instruction Question 1. Peer Instruction Answer 1. Peer Instruction Question 2 6/22/2011

CS61C Machine Structures. Lecture 4 C Pointers and Arrays. 1/25/2006 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/

Declaration Syntax. Declarations. Declarators. Declaration Specifiers. Declaration Examples. Declaration Examples. Declarators include:

Pointers, Dynamic Data, and Reference Types

Question No: 1 ( Marks: 1 ) - Please choose one One difference LISP and PROLOG is. AI Puzzle Game All f the given

Hacking in C. Pointers. Radboud University, Nijmegen, The Netherlands. Spring 2019

Functions in C C Programming and Software Tools

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

6. Pointers, Structs, and Arrays. 1. Juli 2011

Kurt Schmidt. October 30, 2018

Chapter 5: Processes & Process Concept. Objectives. Process Concept Process Scheduling Operations on Processes. Communication in Client-Server Systems

CS 231 Data Structures and Algorithms, Fall 2016

Ch. 3: The C in C++ - Continued -

Fibonacci in Lisp. Computer Programming: Skills & Concepts (CP1) Programming Languages. Varieties of Programing Language

1d: tests knowing about bitwise fields and union/struct differences.

C Programming Review CSC 4320/6320

Annotation Annotation or block comments Provide high-level description and documentation of section of code More detail than simple comments

Concurrency, Thread. Dongkun Shin, SKKU

[0569] p 0318 garbage

Computer Components. Software{ User Programs. Operating System. Hardware

News in RSA-RTE 10.2 updated for sprint Mattias Mohlin, May 2018

Chapter 1 Getting Started

Runtime support for region-based memory management in Mercury

CS527 Software Security

Ian Sommerville 2006 Software Engineering, 8th edition. Chapter 22 Slide 1

Functions in C C Programming and Software Tools. N.C. State Department of Computer Science

Week 5 9 April 2018 NWEN 241 More on pointers. Alvin Valera. School of Engineering and Computer Science Victoria University of Wellington

So far, system calls have had easy syntax. Integer, character string, and structure arguments.

Chapter 1 INTRODUCTION SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

ECE454 Tutorial. June 16, (Material prepared by Evan Jones)

Lectures 5-6: Introduction to C

Exception Handling Introduction. Error-Prevention Tip 13.1 OBJECTIVES

Programming in C. What is C?... What is C?

Programming in C UVic SEng 265

Programming in C. What is C?... What is C?

Memory, Arrays & Pointers

Pointers (continued), arrays and strings

Review! * follows a pointer to its value! & gets the address of a variable! Pearce, Summer 2010 UCB! ! int x = 1000; Pearce, Summer 2010 UCB!

Introduction to Programming Using Java (98-388)

CS61C : Machine Structures

Semantic Analysis. Lecture 9. February 7, 2018

QUIZ. What are 3 differences between C and C++ const variables?

Instantiation of Template class

Special Topics: Programming Languages

The issues. Programming in C++ Common storage modes. Static storage in C++ Session 8 Memory Management

C Syntax Out: 15 September, 1995

QUIZ How do we implement run-time constants and. compile-time constants inside classes?

Optimising for the p690 memory system

Part 5. Verification and Validation

COSC 2P95. Procedural Abstraction. Week 3. Brock University. Brock University (Week 3) Procedural Abstraction 1 / 26

OpenACC Course. Office Hour #2 Q&A

Separate Compilation Model

Pointers (continued), arrays and strings

Programming Languages. Streams Wrapup, Memoization, Type Systems, and Some Monty Python

CPSC 3740 Programming Languages University of Lethbridge. Data Types

Chapter 3 Processes. Process Concept. Process Concept. Process Concept (Cont.) Process Concept (Cont.) Process Concept (Cont.)

CMSC 330: Organization of Programming Languages

Introduction to C++ with content from

CSC C69: OPERATING SYSTEMS

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

CS3157: Advanced Programming. Outline

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads

Introduce C# as Object Oriented programming language. Explain, tokens,

Laboratory 2: Programming Basics and Variables. Lecture notes: 1. A quick review of hello_comment.c 2. Some useful information

CS349/SE382 A1 C Programming Tutorial

CMSC 330: Organization of Programming Languages

Java Object Oriented Design. CSC207 Fall 2014

SE352b: Roadmap. SE352b Software Engineering Design Tools. W3: Programming Paradigms

CSCI Object Oriented Design: Java Review Errors George Blankenship. Java Review - Errors George Blankenship 1

CSCI Object-Oriented Design. Java Review Topics. Program Errors. George Blankenship 1. Java Review Errors George Blankenship

Type Checking and Type Equality

2 rd class Department of Programming. OOP with Java Programming

Types. Chapter Six Modern Programming Languages, 2nd ed. 1

Writing Functions in C

6. Pointers, Structs, and Arrays. March 14 & 15, 2011

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

High-performance computing and programming Intro to C on Unix/Linux. Uppsala universitet

Homework #2 Think in C, Write in Assembly

377 Student Guide to C++

Verification and Validation

Storage class and Scope:

Why Pointers. Pointers. Pointer Declaration. Two Pointer Operators. What Are Pointers? Memory address POINTERVariable Contents ...

Controlling Loops in Parallel Mercury Code

CSE 303 Concepts and Tools for Software Development. Magdalena Balazinska Winter 2010 Lecture 27 Final Exam Revision

Transcription:

The Mercury project Zoltan Somogyi The University of Melbourne Linux Users Victoria 7 June 2011 Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 1 / 23

Introduction Mercury: objectives Classic AI languages like Lisp and Prolog are well suited for exploratory programming, where flexibility and quick turn-around time are the most important objectives. Mercury is a declarative programming language, but it has different objectives: program reliability program maintainability scalability to large programs and teams programmer productivity program efficiency Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 2 / 23

Introduction This talk I do not have enough time to give a tutorial on Mercury (one is available on the Mercury web site). I will instead tell you about two very useful technologies that are available in Mercury, but not in imperative languages like C and Java. These two technologies are declarative debugging, and automatic parallelization. The purity of Mercury is key in making both of these feasible. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 3 / 23

Purity The Mercury project Introduction Imperative programs are based on side effects. You call a function such as strcat(str1, str2), and it returns a value, but the reason you call it is for its side effect. Purely declarative programs have no side effects. If a predicate has an effect, it has to be a main effect, reflected in its argument list. hello(s0, S) :- io.write_string("hello, ", S0, S1), io.write_string("world\n", S1, S). In purely declarative languages, data structures are immutable. Instead of updating an existing data structure, programs create slight variants of existing data structures, typically reusing almost all their memory. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 4 / 23

Declarative debugging Declarative debugging The execution of the program is represented by a tree: each node corresponds to a procedure call. The children of each node are the calls to functions / predicates / procedures / methods in the body of the parent. A node is valid if its actual result matches its intended result; otherwise, it is erroneous. A bug is an erroneous node with no erroneous children. The declarative debugger searches for a bug by asking questions about the validity of nodes. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 5 / 23

Example The Mercury project Declarative debugging The proof tree for insertion sort([3,1,2], [1,3]): s([3,1,2],[1,3]) s([1,2],[1]) i(3,[1],[1,3]) s([2],[2]) i(1,[2],[1]) 3>1 i(3,[],[3]) s([],[]) i(2,[],[2]) 1=<2 Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 6 / 23

Example The Mercury project Declarative debugging mdb> p insertion_sort([3, 1, 2], [1, 3]) mdb> dd insertion_sort([3, 1, 2], [1, 3]) Valid? n insertion_sort([1, 2], [1]) Valid? n insertion_sort([2], [2]) Valid? y insert(1, [2], [1]) Valid? n The bug occurs during the call insert(1, [2], [1]) Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 7 / 23

Declarative debugging Advantages of declarative debugging It automates recordkeeping: users need not keep a record in their heads of everything they have done. It avoids repeated testing of hypotheses, which puts an upper bound on the effort required to find the bug. The computer directs the bug search, not the programmer. This is especially good for novices, who often don t know where to begin. The search algorithms programmed into the declarative debugger can automate heuristics followed by humans who are expert debuggers. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 8 / 23

Declarative debugging Why is it not widely used? Reason 1 The Is this valid? question must describe exactly what it is. For a pure predicate or function in Mercury, this is given by the values of the input and output arguments. For a Mercury predicate that does I/O, it also includes the sequence of executed I/O primitives. For a function or method in an imperative program, it also includes the value of every global variable in the program that the function or method has access to, directly or indirectly. Therefore for the questions to be of a manageable size, the program must be declarative. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 9 / 23

Why is it not widely used? Declarative debugging Reason 2 The tree does not fit in memory for computations that run for more than a second or two. The only option is to keep only the parts of the tree needed right now, recomputing the other parts on demand by reexecuting the calls that generate them. This requires rebuilding the state of the computation at the time of the call. In Mercury, data structures are immutable; you never need to restore them to a previous state. Mercury does need to remember the results of previously executed I/O actions. Being relatively rare, this is feasible. In imperative language programs, every assignment statement destroys earlier state. Recording every assignment in an undo/redo log is possible in theory, but infeasible in practice except for toy programs. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 10 / 23

Parallelization and side effects Automatic parallelism Can you execute these two C calls in parallel? c = p(a, b); f = q(d, e); Answering the question requires computing all the side-effects that p may have, and figuring out whether q is affected by any of them. In multi-module programs using libraries, this is very hard to do. What about these two Mercury calls? p(a, B, C), % produces C q(d, E, F) % produces F There is no need for any program analysis, and the answer is yes. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 11 / 23

Overheads and dependencies Automatic parallelism p(a, B, C, G), % produces C and G q(c, D, E, F) % produces F Can you execute these two calls in parallel? Yes, of course. The compiler will handle the synchronization on C. Should you execute these two calls in parallel? Answering that is harder. First, we need to know how expensive the two calls are. If e.g. the call to q executes in twenty instructions, there is no point in spawning it off, since the spawn operation itself executes a few hundred instructions. Second, we need to know when q needs the value of C. There is no point in spawning it off if it will immediately block waiting for C (the usual case). The main problem in parallelizing declarative programs is too much parallelism, not too little. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 12 / 23

Execution overlap The Mercury project Automatic parallelism The overlap between the executions of p and q depends on when shared variable C is produced and consumed. Seq: p, q pc pr qc qr Seq: p, q pc pr qc qr produce C pc pr Par: p & q qc qr consume C Par: p & q qc produce C consume C pc pr qr consume C consume C In the scenario on the left, executing p and q in parallel yields the best possible speedup. In the scenario on the right, parallel execution looks like yielding a smaller speedup, and may yield a slowdown in practice due to overheads. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 13 / 23

Estimating execution overlap Automatic parallelism The Mercury system has a profiler than can tell us not just the average time taken by calls to a given a predicate or function, but also the average time taken by a given call site. The compiler also arranges for programs being profiled to put a description of their own structure into the profiling data file. This includes complete information about where the value of each variable is produced and consumed. Our automatic parallelization tool can use this information to estimate the times taken by p and q, both in total, and up to the production or first consumption of C. From this, one can estimate the speedup from executing p and q in parallel. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 14 / 23

Automatic parallelism Automatic parallelization workflow Compile the program with profiling enabled. Run the program on some typical input. This generates a profiling data file. Invoke our automatic parallelization tool. This tool reads the profiling data file, finds conjunctions containing two or more expensive calls, estimates the speedups from of parallelizing them in many possible ways (1: p & q & r, 2: (p & q), r, etc) if the best way yields a nontrivial speedup even when taking estimated overheads into account, then record a recommendation for this parallelization, put all the recommendations into a feedback file. Compile the program asking for automatic parallelization, specifying the feedback file. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 15 / 23

Results The Mercury project Automatic parallelism Program Seq 1 CPU 2 CPUs 3 CPUs 4 CPUs matrixmult 11.0 14.6 (0.75) 7.5 (1.47) 6.2 (1.83) 5.2 (2.12) raytracer 22.7 25.1 (0.90) 16.0 (1.42) 11.2 (2.03) 9.4 (2.42) mandelbrot 33.4 35.6 (0.94) 17.9 (1.87) 12.1 (2.76) 9.1 (3.67) Parallel code needs to use a machine register to point to thread-specific data, so enabling parallel execution but not using it leads to slowdowns. Matrixmult has one memory store for each FP multiply/add pair. Its speedup is limited by memory bus bandwidth, which it saturates relatively quickly. Raytracer generates many intermediate data structures. The gc system consumes 40% of the execution time in stop-the-world collections on 4 CPUs (it is parallel, but does not always scale well). Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 16 / 23

Conclusion The Mercury project Conclusion In our experience, programming in Mercury is significantly more productive than programming in imperative languages. Programs written in Mercury are guaranteed to be free of several classses of bugs that can occur in most other languages. Programmers working in Mercury have powerful tools they can use to chase down any bugs not caught by the compiler. We expect that Mercury programmers will soon have tools they can use to help them parallelize programs with relatively little effort. For more information about the Mercury project, visit http://www.mercury.csse.unimelb.edu.au/. Any questions? Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 17 / 23

Powerful type system Reserve slides The type system used by Mercury is both safe and flexible. It allows programmers to describe their intentions much more closely than the type systems of languages like C and Java. :- type token ---> ident(string) ; int_const(int) ; float_const(float) ; left_paren ; right_paren ;... You cannot access e.g. the float without checking that the token is a float const. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 18 / 23

Weak type system The Mercury project Reserve slides typedef enum { IDENT, INT_CONST, FLOAT_CONST, LEFT_PAREN, RIGHT_PAREN,...} TokenKind; typedef struct { TokenKind kind; char *name; int int_const; float float_const;... } Token; You can access token.float const without checking that token.kind == FLOAT CONST. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 19 / 23

Reserve slides Some bug types Memory allocation errors. Cannot be expressed in Mercury. Pointer arithmetic errors. Cannot be expressed in Mercury. Can-be-null vs cannot-be-null confusion. Cannot arise in Mercury. Uninitialized variables. The Mercury compiler is guaranteed to catch this. Uninitialized fields in structures. The Mercury compiler is guaranteed to catch this. Uncovered cases in a switch. The Mercury compiler is guaranteed to catch this. Incorrect casting (C) or dynamic type tests (Java). Casts do not exist, and the need for dynamic type tests can be trivially avoided in almost all cases. Use of union or structure fields when not valid. Can be trivially avoided in almost all cases. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 20 / 23

Reserve slides Why is it not widely used? Some other reasons: The search algorithm may ask too many questions. We have implemented better search algorithms finding bugs with fewer questions. Some focus on a user-selected wrong part of the answer, while some others use coverage information from successful vs failed test cases to focus on parts of the program that were executed relatively more frequently during failed test cases. The questions asked may be difficult to answer. The Mercury declarative debugger has tools to make it easier, and allows users to skip questions if they wish to. Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 21 / 23

Reserve slides Tree materialization on demand Less memory, more time More memory, less time Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 22 / 23

Updating a tree The Mercury project Reserve slides d 4 d 4 b 2 f 6 f 6 a 1 / / c 3 / / e 5 / / g 7 / / g 7 / h 8 / / Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 23 / 23