Automatic Memory Management in newlisp

Similar documents
Compiler Construction

Compiler Construction

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008

Programming Language Implementation

G Programming Languages - Fall 2012

Run-Time Environments/Garbage Collection

Garbage Collection. Steven R. Bagley

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

A Small Interpreted Language

Heap Management. Heap Allocation

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

Recursive Functions of Symbolic Expressions and Their Computation by Machine Part I

3.Constructors and Destructors. Develop cpp program to implement constructor and destructor.

Programming Languages Third Edition. Chapter 10 Control II Procedures and Environments

Contents. 8-1 Copyright (c) N. Afshartous

The PCAT Programming Language Reference Manual

Parsing Scheme (+ (* 2 3) 1) * 1

Robust Memory Management Schemes

Memory Management. Memory Management... Memory Management... Interface to Dynamic allocation

Symbol Tables. ASU Textbook Chapter 7.6, 6.5 and 6.3. Tsan-sheng Hsu.

G Programming Languages - Fall 2012

Smalltalk Implementation

Programming Languages Third Edition. Chapter 7 Basic Semantics

Procedure Calls Main Procedure. MIPS Calling Convention. MIPS-specific info. Procedure Calls. MIPS-specific info who cares? Chapter 2.7 Appendix A.

Fifth Generation CS 4100 LISP. What do we need? Example LISP Program 11/13/13. Chapter 9: List Processing: LISP. Central Idea: Function Application

Frequently asked questions from the previous class survey

Motivation for Dynamic Memory. Dynamic Memory Allocation. Stack Organization. Stack Discussion. Questions answered in this lecture:

Principles of Programming Languages Topic: Scope and Memory Professor Louis Steinberg Fall 2004

A.Arpaci-Dusseau. Mapping from logical address space to physical address space. CS 537:Operating Systems lecture12.fm.2

CS 345. Garbage Collection. Vitaly Shmatikov. slide 1

Introduction to LISP. York University Department of Computer Science and Engineering. York University- CSE V.

ECE 2035 Programming HW/SW Systems Fall problems, 5 pages Exam Three 19 November 2014

CA Compiler Construction

CSC 2400: Computer Systems. Using the Stack for Function Calls

Precept 2: Non-preemptive Scheduler. COS 318: Fall 2018

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection

Functional Programming. Pure Functional Languages

ALISP interpreter in Awk

Chapter 7 Subroutines. Richard P. Paul, SPARC Architecture, Assembly Language Programming, and C

Q.1 Explain Computer s Basic Elements

High-Level Language VMs

Functional Programming. Pure Functional Languages

The SPL Programming Language Reference Manual

About Tcl, on page 1 Running the Tclsh Command, on page 3 Navigating Cisco NX-OS Modes from the Tclsh Command, on page 4 Tcl References, on page 6

Computer Architecture

Lecture 13: Garbage Collection

A THREAD IMPLEMENTATION PROJECT SUPPORTING AN OPERATING SYSTEMS COURSE

Compiler Construction D7011E

Advanced Programming & C++ Language

Older geometric based addressing is called CHS for cylinder-head-sector. This triple value uniquely identifies every sector.

Memory management COSC346

UNIT 3

Memory Management: The Details

CPSC 341 OS & Networks. Processes. Dr. Yingwu Zhu

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages

Address spaces and memory management

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank

Binding and Storage. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace

:.NET Won t Slow You Down

MIPS Programming. A basic rule is: try to be mechanical (that is, don't be "tricky") when you translate high-level code into assembler code.

Memory Management 3/29/14 21:38

CS 241 Honors Memory

Short Notes of CS201

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

Tcl has the following configuration guidelines and limitations:

Architecture of LISP Machines. Kshitij Sudan March 6, 2008

Concepts Introduced in Chapter 7

CS201 - Introduction to Programming Glossary By

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection

Chapter 7 The Potential of Special-Purpose Hardware

Stacks and Frames Demystified. CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han

Scheme: Data. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, April 3, Glenn G.

Storage. Outline. Variables and Updating. Composite Variables. Storables Lifetime : Programming Languages. Course slides - Storage

Basic Search Algorithms

Ruby: Introduction, Basics

CS 330 Lecture 18. Symbol table. C scope rules. Declarations. Chapter 5 Louden Outline

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

CHAPTER 4 FUNCTIONS. 4.1 Introduction

Unit 7. Functions. Need of User Defined Functions

Lecture 15 Garbage Collection

Problem with Scanning an Infix Expression

Summary: Direct Code Generation

18.3 Deleting a key from a B-tree

Stack. 4. In Stack all Operations such as Insertion and Deletion are permitted at only one end. Size of the Stack 6. Maximum Value of Stack Top 5

Fall 2018 Discussion 8: October 24, 2018 Solutions. 1 Introduction. 2 Primitives

Operating System - Virtual Memory

CS399 New Beginnings. Jonathan Walpole

SCHEME 7. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. October 29, 2015

CS370 Operating Systems

Computer Organization & Assembly Language Programming. CSE 2312 Lecture 15 Addressing and Subroutine

Documentation for LISP in BASIC

Example. program sort; var a : array[0..10] of integer; procedure readarray; : function partition (y, z :integer) :integer; var i, j,x, v :integer; :

Implementation Garbage Collection

CS61C : Machine Structures

CS 4410 Operating Systems. Computer Architecture Review Oliver Kennedy

Run-time Environments - 3

Chapter 8 :: Composite Types

Transcription:

SS Procedia Computer Science 18 (2013) 159 16 8 International Conference on Computational Science, ICCS 2013 Automatic Memory Management in newlisp German Molt, Miguel caballer, EloyRomero, Carlos dealfonso a Instituto de Instrumentaci ón para Imagen Molecular (I3M). Centro mixto CSIC Universitat Politècnica de València CIEMAT, camino de Vera s/n, 46022 Valencia, Espa ña Abstract ORO (One Reference Only) automatic memory management developed for newlisp is a fast and resources saving alternative to classic garbage collection algorithms in dynamic, interactive programming languages. This article explains how ORO memory management works newlisp and any other interactive language system will constantly generate new memory objects during expression evaluation. The new memory objects are intermediate evaluation results, reassigned memory objects, or memory objects whose content was changed. If newlisp did not delete some of the objects created, it would eventually run out of available memory. In order to understand newlisp's automatic memory management, it is necessary to first review the traditional methods employed by other languages. Traditional automatic memory management (Garbage Collection) In most programming languages, a process registers allocated memory, and another process finds and recycles the unused parts of the allocated memory pool. The recycling process can be triggered by some memory allocation limit or can be scheduled to happen between

evaluation steps. This form of automatic memory management is called Garbage Collection. Traditional garbage collection schemes developed for LISP employed one of two algorithms: ¹ (1) The mark-and-sweep algorithm registers each allocated memory object. A mark phase periodically flags each object in the allocated memory pool. A named object (a variable symbol) directly or indirectly references each memory object in the system. The sweep phase frees the memory of the marked objects when they are no longer in use. (2) A reference-counting scheme registers each allocated memory object together with a count of references to the object. This reference count gets incremented or decremented during expression evaluation. Whenever an object's reference count reaches zero, the object's allocated memory is freed. Over time, many elaborate garbage collection schemes have been attempted based on these principles. The first garbage collection algorithms appeared in LISP. The inventors of the Smalltalk language used more elaborate garbage collection schemes. The history of Smalltalk-80 is an exciting account of the challenges of implementing memory management in an interactive programming language; see [Glenn Krasner, 1983: Smalltalk-80, Bits of History, Words of Advice]. A more recent overview of garbage collection methods can be found in [Richard Jones, Rafael Lins, 1996: Garbage Collection, Algorithms for Automatic Dynamic Memory Management]. One reference only, (ORO) memory management Memory management in newlisp does not rely on a garbage collection algorithm. Memory is not marked or reference-counted. Instead, a decision whether to delete a newly created memory object is made right after the memory object is created. Empirical studies of LISP have shown that most LISP cells are not shared and so can be reclaimed during the evaluation process. Aside from some optimizations for part of the built-in functions, newlisp deletes memory new objects containing intermediate evaluation results once it reaches a higher evaluation level. newlisp does this by pushing

a reference to each created memory object onto a result stack. When newlisp reaches a higher evaluation level, it removes the last evaluation result's reference from the result stack and deletes the evaluation result's memory object. This should not be confused with one-bit reference counting. ORO memory management does not set bits to mark objects as sticky. newlisp follows a one reference only (ORO) rule. Every memory object not referenced by a symbol is obsolete once newlisp reaches a higher evaluation level during expression evaluation. Objects in newlisp (excluding symbols and contexts) are passed by value copy to other user-defined functions. As a result, each newlisp object only requires one reference. newlisp's ORO rule has advantages. It simplifies not only memory management but also other aspects of the newlisp language. For example, while users of traditional LISP have to distinguish between equality of copied memory objects and equality of references to memory objects, newlisp users do not. newlisp's ORO rule forces newlisp to constantly allocate and then free LISP cells. newlisp optimizes this process by allocating large chunks of cell memory from the host operating system. newlisp will request LISP cells from a free cell list and then recycle those cells back into that list. As a result, only a few CPU instructions (pointer assignments) are needed to unlink a free cell or to re-insert a deleted cell. The overall effect of ORO memory management is a faster evaluation time and a smaller memory and disk footprint than traditional interpreted LISP's can offer. Time spent linking and unlinking memory objects is more than compensated for by the lack of processing time used in traditional garbage collection. ORO memory management also avoids occasional processing pauses seen in languages using traditional garbage collection and the tuning of garbage collection parameters required when running memory intensive programs. ORO memory management happens synchronous to other processing in the interpreter, which results in deterministic processing times. In versions before 10.1.3, newlisp employed a classic mark and sweep algorithm to free un-referenced cells under error conditions. Starting

version 10.1.3, this has been eliminated and replaced by a proper cleanup of the result stack under error conditions. Performance considerations with copying parameters In theory, passing parameters to user-defined functions by value (memory copying) instead by reference poses a potential disadvantage when dealing with large lists, arrays or strings. But in practice newlisp performs faster or as fast than other scripting languages and offers language facilities to pass very large memory object by reference. Since newlisp version 9.4.5 functions can pass list, array and string type parameters as references using default functornamespace ids. Namespaces (called contexts in newlisp) have very little overhead and can be used to wrap functions and data. This allows reference passing of large memory object into user-defined functions. Since version 10.2 FOOP (Functional Object Oriented Programming) in newlisp also passes the target object of a method call by reference. But even in instances where reference passing and other optimizations are nor present, the speed of ORO memory management more than compensates for the overhead required to copy and delete objects. Optimizations to ORO memory management ² Since newlisp version 10.1, all lists, arrays and strings are passed in and out of built-in functions by reference. All built-in functions work directly on memory objects returned by reference from other built-in functions. This has substantially reduced the need for copying and deleting memory objects and increased the speed of some built-in functions. Now only parameters into user-defined functions and return values passed out of user-defined functions are ORO managed. Since version 10.3.2, newlisp checks the result stack before copying LISP cells. This has reduced the amount of cells copied by about 83% and has significantly increased the speed of many operations on bigger lists. Memory and datatypes in newlisp

The memory objects of newlisp strings are allocated from and freed to the host's OS, whenever newlisp recycles the cells from its allocated chunks of cell memory. This means that newlisp handles cell memory more efficiently than string memory. As a result, it is often better to use symbols rather than strings for efficient processing. For example, when handling natural language it is more efficient to handle natural language words as individual symbols in a separated name-space, then as single strings. The bayes-train function in newlisp uses this method. newlisp can handle millions of symbols without degrading performance. Programmers coming from other programming languages frequently overlook that symbols in LISP can act as more than just variables or object references. The symbol is a useful data type in itself, which in many cases can replace the string data type. Integer numbers and double floating-point numbers are stored directly in newlisp's LISP cells and do not need a separate memory allocation cycle. For efficiency during matrix operations like matrix multiplication or inversion, newlisp allocates non-cell memory objects for matrices, converts the results to LISP cells, and then frees the matrix memory objects. newlisp allocates an array as a group of LISP cells. The LISP cells are allocated linearly. As a result, array indices have faster random access to the LISP cells. Only a subset of newlisp list functions can be used on arrays. Automatic memory management in newlisp handles arrays in a manner similar to how it handles lists. Implementing ORO memory management The following pseudo code illustrates the algorithm implemented in newlisp in the context of LISP expression evaluation. Only two functions and one data structure are necessary to implement ORO memory management: function pushresultstack(evalationresult) function popresultstack() ; implies deleting

array resultstack[] ; preallocated stack area The first two functions pushresultstack and popresultstack push or pop a LISP object handle on or off a stack.pushresultstack increases the value resultstackindex while popresultstack decreases it. In newlisp every object is contained in a LISP cell structure. The object handle of that structure is simply the memory pointer to the cell structure. The cell itself may contain pointer addresses to other memory objects like string buffers or other LISP cells linked to the original object. Small objects like numbers are stored directly. In this paper function popresultstack()also implies that the popped object gets deleted. The two resultstack management functions described are called by newlisp's evaluateexpression function: ³ function evaluateexpression(expr) { resultstackindexsave = resultstackindex if typeof(expr) is BOOLEAN or NUMBER or STRING return(expr) if typeof(expr) is SYMBOL return(symbolcontents(expr)) if typeof(expr) is QUOTE return(quotecontents(expr)) if typeof(expr) is LIST { func = evaluateexpression(firstof(expr)) args = rest(expr) if typeof(func) is BUILTIN_FUNCTION result = evaluatefunc(func, args) else if typeof(func) = LAMBDA_FUNCTION result = evaluatelambda(func, args) } } while (resultstackindex > resultstackindexsave) deletelist(popresultstack())

pushresultstack(result) return(result) } The function evaluateexpression introduces the two variables resultstackindexsave and resultstackindexand a few other functions: resultstackindex is an index pointing to the top element in the resultstack. The deeper the level of evaluation the higher the value of resultstackindex. resultstackindexsave serves as a temporary storage for the value of resultstackindex upon entry of theevaluateexpression(func, args) function. Before exit the resultstack is popped to the saved level ofresultstackindex. Popping the resultstack implies deleting the memory objects pointed to by entries in theresultstack. resultstack[] is a preallocated stack area for saving pointers to LISP cells and indexed byresultstackindex. symbolcontents(expr) and quotecontents(expr) extract contents from symbols or quote-envelope cells. typeof(expr) extracts the type of an expression, which is either a BOOLEAN constant like nil or true or anumber or STRING, or is a variable SYMBOL holding some contents, or a QUOTE serving as an envelope to some other LIST expression expr. evaluatefunc(func, args) is the application of a built-in function to its arguments. The built-in function is the evaluated first member of a list in expr and the arguments are the rest of the list in expr. The function func is extracted calling evaluateexpression(first(expr)) recursively. For example if the expression (expr is (foo x y) than foo is a built-in function and x and y are the function arguments or parameters.

evaluatelambda(func, args) works similar to evaluatefunc(func, args), applying a user-defined function first(expr) to its arguments in rest(expr). In case of a user-defined function we have two types of arguments in rest(expr), a list of local parameters followed by one or more body expressions evaluated in sequence. Both, evaluatefunc(func, args) and evaluatelambda(func, args) will return a newly created or copied LISP cell object, which may be any type of the above mentioned expressions. Since version 10.0, many built-in functions processed with evaluatefunc(func, args) are optimized and return references instead of a newly created or copied objects. Except for these optimizations, result values will always be newly created LISP cell objects destined to be destroyed on the next higher evaluation level, after the current evaluateexpression(expr) function execution returned. Both functions will recursively call evaluateexpression(expr) to evaluate their arguments. As recursion deepens, the recursion level of the function increases. Before evaluateexpression(func, args) returns, it will pop the resultstack deleting the result values from deeper level of evaluation and returned by one of the two functions, either evaluatefunc or evaluatelambda. Any newly created result expression is destined to be destroyed later but its deletion is delayed until a higher, less deep, level of evaluation is reached. This permits results to be used and/or copied by calling functions. The following example shows the evaluation of a small user-defined LISP function sum-of-squares and the creation and deletion of associated memory objects: (define (sum-of-squares x y) (+ (* x x) (* y y))) (sum-of-squares 3 4) => 25

sum-of-squares is a user-defined lambda-function calling to builtin functions + and *. The following trace shows the relevant steps when defining the sumof-squares function and when executing it with the arguments 3 and 4. > (define (sum-of-squares x y) (+ (* x x) (* y y))) level 0: evaluateexpression( (define (sum-of-squares x y) (+ (* x x) (* y y))) ) level 1: evaluatefunc( define <6598> ) level 1: return( (lambda (x y) (+ (* x x) (* y y))) ) (lambda (x y) (+ (* x x) (* y y))) > (sum-of-squares 3 4) level 0: evaluateexpression( (sum-of-squares 3 4) ) level 1: evaluatelambda( (lambda (x y) (+ (* x x) (* y y))), (3 4) ) level 1: evaluateexpression( (+ (* x x) (* y y)) ) level 2: evaluatefunc( +, ((* x x) (* y y)) ) level 2: evaluateexpression( (* x x) ) level 3: evaluatefunc( *, (x x) ) level 3: pushresultstack( 9 ) level 3: return( 9 ) level 2: evaluateexpression( (* y y) ) level 3: evaluatefunc( *, (y y) ) level 3: pushresultstack( 16 ) level 3: return( 16 ) level 2: popresultstack() 16 level 2: popresultstack() 9 level 2: pushresultstack( 25 ) level 2: return( 25 ) level 1: return( 25 ) 25 The actual C-language implementation is optimized in some places to avoid pushing the resultstack and avoid calling evaluateexpression(expr). Only the most relevant steps are shown. The function evaluatelambda(func, args) does not need to evaluate its arguments 3 and 4 because they are constants,

but evaluatelambda(func, args) will call evaluateexpression(expr) twice to evaluate the two body expressions (+ (* x x) and (+ (* x x). Lines preceded by the prompt > show the command-line entry. evaluatelambda(func, args) also saves the environment for the variable symbols x and y, copies parameters into local variables and restores the old environment upon exit. These actions too involve creation and deletion of memory objects. Details are omitted, because they are similar to methods in other dynamic languages. References Glenn Krasner, 1983: Smalltalk-80, Bits of History, Words of Advice Addison Wesley Publishing Company Richard Jones, Rafael Lins, 1996: Garbage Collection, Algorithms for Automatic Dynamic Memory Management John Wiley & Sons ¹ Reference counting and mark-and-sweep algorithms where specifically developed for LISP. Other schemes like copying or generational algorithms where developed for other languages like Smalltalk and later also used in LISP. ² This chapter was added in October 2008 and extended August 2011. ³ This is a shortened rendition of expression evaluation not including handling of default functors and implicit indexing. For more information on expression evaluation see: Expression evaluation, Implicit Indexing, Contexts and Default Functors in the newlisp Scripting Language. Copyright 2004-2013, Lutz Mueller http://newlisp.org. All rights reserved.