LuaTEX says goodbye to Pascal

Similar documents
Objectives. Chapter 4: Control Structures I (Selection) Objectives (cont d.) Control Structures. Control Structures (cont d.) Relational Operators

This chapter is intended to take you through the basic steps of using the Visual Basic

Meeting One. Aaron Ecay. February 2, 2011

Chapter 4: Control Structures I (Selection) Objectives. Objectives (cont d.) Control Structures. Control Structures (cont d.

Week - 01 Lecture - 04 Downloading and installing Python

S2P. Eetex executable changes

Introduction to Programming Style

CSCI 1100L: Topics in Computing Lab Lab 11: Programming with Scratch

Preprocessor Directives

How to approach a computational problem

QUIZ. What is wrong with this code that uses default arguments?

Rescuing Lost Files from CDs and DVDs

development Introducing Eetex

These are notes for the third lecture; if statements and loops.

Programming Paradigms. Literate Programming

Java Bytecode (binary file)

Intro. Scheme Basics. scm> 5 5. scm>

The compiler is spewing error messages.

Using Tab Stops in Microsoft Word

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

_APP A_541_10/31/06. Appendix A. Backing Up Your Project Files

Math 395 Homework #1 Due Wednesday, April 12

Introduction to C Programming. What is a C program?

A PROGRAM IS A SEQUENCE of instructions that a computer can execute to

Chapter 1 Introduction

The name of our class will be Yo. Type that in where it says Class Name. Don t hit the OK button yet.

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

Interface. 2. Interface Adobe InDesign CS2 H O T

COSC 2P91. Introduction Part Deux. Week 1b. Brock University. Brock University (Week 1b) Introduction Part Deux 1 / 14

Using L A TEX. A numbered list is just that a collection of items sorted and labeled by number.

6.001 Notes: Section 15.1

Chapter 7 Functions. Now consider a more advanced example:

Programming for Engineers in Python

Objectives. Chapter 4: Control Structures I (Selection) Objectives (cont d.) Control Structures

CS 211 Programming Practicum Fall 2018

Computer Labs: Debugging

COSC 2P95. Procedural Abstraction. Week 3. Brock University. Brock University (Week 3) Procedural Abstraction 1 / 26

COPYRIGHTED MATERIAL. Starting Strong with Visual C# 2005 Express Edition

4. Structure of a C++ program

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

TUGboat, Volume 30 (2009), No

Abstract. Chapter 6 Writing a Program. Overview. Writing a program: Strategy. Building a program. Bjarne Stroustrup

Rule 1-3: Use white space to break a function into paragraphs. Rule 1-5: Avoid very long statements. Use multiple shorter statements instead.

Chapter 5 Errors. Bjarne Stroustrup

SKILL AREA 304: Review Programming Language Concept. Computer Programming (YPG)

Visual Debugging in TEX Part 1: The Story

The afterpage package

Problem Solving through Programming In C Prof. Anupam Basu Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

If Statements, For Loops, Functions

CONTENTS: What Is Programming? How a Computer Works Programming Languages Java Basics. COMP-202 Unit 1: Introduction

1 Executing the Program

CS52 - Assignment 8. Due Friday 4/15 at 5:00pm.

LinkedIn s New Profile User Interface Work-Arounds

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

1. Welcome. (1) Hello. My name is Dr. Christopher Raridan (Dr. R). (3) In this tutorial I will introduce you to the amsart documentclass.

Lesson 4 - Basic Text Formatting

It s possible to get your inbox to zero and keep it there, even if you get hundreds of s a day.

Learning to Program with Haiku

CS2112 Fall Assignment 4 Parsing and Fault Injection. Due: March 18, 2014 Overview draft due: March 14, 2014

Creating Word Outlines from Compendium on a Mac

3.1 Constructions with sets

Lecture 4 CSE July 1992

Topic C. Communicating the Precision of Measured Numbers

A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer.

Errors. Lecture 6. Hartmut Kaiser hkaiser/fall_2011/csc1254.html

CS558 Programming Languages

Nano-Lisp The Tutorial Handbook

We do not teach programming

Notes By: Shailesh Bdr. Pandey, TA, Computer Engineering Department, Nepal Engineering College

Input, output, and sequence

Civil Engineering Computation

Introduction to Scientific Typesetting Lesson 1: Getting Started

Chapter 5 Errors. Hyunyoung Lee. Based on slides by Bjarne Stroustrup.

Computers and Computation. The Modern Computer. The Operating System. The Operating System

More Examples /

VISUAL GUIDE to. RX Scripting. for Roulette Xtreme - System Designer 2.0. L J Howell UX Software Ver. 1.0

Binary, Hexadecimal and Octal number system

The widetable package

CS 177 Recitation. Week 1 Intro to Java

Topic Notes: Java and Objectdraw Basics

QUIZ. Source:

How To Get Your Word Document. Ready For Your Editor

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

The first program: Little Crab

Difference Between Dates Case Study 2002 M. J. Clancy and M. C. Linn

qstest.sty QuinScape Unit Test Package version

Dynamism and Detection

Shorthand for values: variables

32 TUGboat, Volume 36 (2015), No. 1

Program Organization and Comments

L A TEX Primer. Randall R. Holmes. August 17, 2018

qstest, a L A TEX package for unit tests

Should you know scanf and printf?

CMSC/BIOL 361: Emergence Cellular Automata: Introduction to NetLogo

Notes from a Short Introductory Lecture on Scala (Based on Programming in Scala, 2nd Ed.)

Snowflake Numbers. A look at the Collatz Conjecture in a recreational manner. By Sir Charles W. Shults III

Full file at

Lecture 1: Overview

We have written lots of code so far It has all been inside of the main() method What about a big program? The main() method is going to get really

Transcription:

Taco Hoekwater EUROTEX 2009 E1 LuaTEX says goodbye to Pascal Abstract LuaTEX 0.50 features a complete departure from Pascal source code. This article explains a little of the why and how of this change. Introduction For more than a quarter of a century, all implementations of the TEX programming language have been based on WEB, a literate programming environment invented by Donald Knuth. WEB input files consist of a combination of Pascal source code and TEX explanatory texts, and have to be processed by special tools to create either input that is suitable for a Pascal compiler to create an executable (this is achieved via a program called tangle) or input for TEX82 to create a typeset representation of the implemented program (via a program called weave). A WEB file is split into numerous small building blocks called modules that consist of an explanatory text followed by source code doing the implementation. The explanations and source are combined in the WEB input in a way that attempts to maximise the reader s understanding of the program when the typeset result is read sequentially. This article will, however, focus on the program source code. Pascal WEB The listing that follows shows a small part of the WEB input of TEX82, defining two modules. It implements the function that scans for a left brace in the TEX input stream, for example at the start of a token list. @ The scan_left_brace routine is called when a left brace is supposed to be the next non-blank token. (The term left brace means, more precisely, a character whose catcode is left_brace.) \TeX\ allows \.\\relax to appear before the left_brace. @p procedure scan_left_brace; reads a mandatory left_brace begin @<Get the next non-blank non-relax non-call token@>; if cur_cmd<>left_brace then begin print_err("missing inserted"); @.Missing \ inserted@> help4("a left brace was mandatory here, so I ve put one in.")@/ ("You might want to delete and/or insert some corrections")@/ ("so that I will find a matching right brace soon.")@/ ("(If you re confused by all this, try typing I now.)"); back_error; cur_tok:=left_brace_token+""; cur_cmd:=left_brace; cur_chr:=""; incr(align_state); end; end; @ @<Get the next non-blank non-relax non-call token@>= repeat get_x_token; until (cur_cmd<>spacer)and(cur_cmd<>relax)

E2 MAPS 39 Taco Hoekwater It would take too much space here to explain everything about the WEB programming, but it is necessary to explain a few things. Any @ sign followed by a single character introduces a WEB command that is interpreted by tangle, weave or both. The commands used in the listing are: @ This indicates the start of a new module, and the explanatory text that follows uses special TEX macros but is otherwise standard TEX input (this command is followed by a space). @p This command starts the Pascal implementation code that will be transformed into the compiler input. @< When this command is seen after a @p has already been seen, the text up to the following @> is a reference to a named module that is to be inserted in the compiler output at this spot. If a @p has not been seen yet, then instead it defines or extends that named module. @. This defines an index entry up to the following @>, used by weave and filtered out by tangle. @/ This is a command to control the line breaks in the typeset result generated by weave. If you are familiar with Pascal, the code above may look a bit odd to you. The main reason for that apparent weirdness of the Pascal code is that the WEB system has a macro processor built in. The symbol help4 is actually one of those macros, and it handles the four sets of parenthesized strings that follow. In expanded form, its output would look like this: begin help_ptr:=4; help_line[3]:="a left brace was mandatory here, so I ve put one in."; help_line[2]:="you might want to delete and/or insert some corrections"; help_line[1]:="so that I will find a matching right brace soon."; help_line[0]:="(if you re confused by all this, try typing I now.)"; end The symbol print_err is another macro, and it expand into this: begin if interaction=error_stop_mode then wake_up_terminal; print_nl("! "); print("missing inserted"); end Tangle output Now let s have a look at the generated Pascal. :381403:procedure scanleftbrace;begin404:repeat getxtoken; until(curcmd<>10)and(curcmd<>0):404; if curcmd<>1 then begin begin if interaction=3 then; if filelineerrorstylep then printfileline else printnl(262);print(671); end;begin helpptr:=4;helpline[3]:=672;helpline[2]:=673;helpline[1]:=674; helpline[0]:=675;end;backerror;curtok:=379;curcmd:=1;curchr:=123; incr(alignstate);end;end;:403405:procedure scanoptionalequals; That looks even weirder! Don t panic. Let go through the changes one by one. First, tangle does not care about indentation. The result is supposedly only read by the Pascal compiler, so everything is glued together. Second, tangle has added Pascal comments before and after each module that give the sequence number of that module in the WEB source: those are 403 and 404. Third, tangle removes the underscores from identifiers, so scan_left_brace became scanleftbrace etcetera.

LuaTEX says goodbye to Pascal Page EUROTEX 2009 E3 Fourth, many of the identifiers you thought were present in the WEB source are really macros that expand to constant numbers, which explains the disappearance of left_brace, spacer, relax, left_brace_token, and error_stop_mode. Fifth, macro expansion has removed print_err, help4 and wake_up_terminal (which expands to nothing). Sixth, WEB does not make use of Pascal strings. Instead, the strings are collected by tangle and output to a separate file that is read by the generated executable at runtime and then stored in a variable that can be indexed as an array. This explains the disappearance of all the program text surrounded by ". Seventh, addition of two constants is optimized away in some cases. Since single character strings in the input have a fixed string index (its ASCII value), left_brace_token+"" becomes 379 instead of 256+123. Web2c Assuming you have generated the Pascal code above via tangle, you now run into a small practical problem: the limited form of Pascal that is used by the WEB program is not actually understood by any of the current Pascal compilers and has not for a quite some time. The build process of modern TEX distributions therefore make use of a different program called web2c that converts the tangle output into C code. The result of running web2c on the above snippet is in the next listing. void scanleftbrace ( void ) do getxtoken () ; while (! ( ( curcmd!= 10 ) && ( curcmd!= 0 ) ) ) ; if ( curcmd!= 1 ) if ( interaction == 3 ) ; printnl ( 262 ) ; print ( 671 ) ; helpptr = 4 ; helpline [3 ]= 672 ; helpline [2 ]= 673 ; helpline [1 ]= 674 ; helpline [0 ]= 675 ; backerror () ; curtok = 379 ; curcmd = 1 ; curchr = 123 ; incr ( alignstate ) ; This output is easier to read for us humans because of the added indentation and the addition of parentheses after procedure calls, but does not significantly alter the code. Actually, some more tools are used by the build process of modern TEX distributions, for example there is a small program that splits the enormous source file into a few smaller C source files, and another program converts Pascal write procedure calls to C printf function calls.

E4 MAPS 39 Taco Hoekwater Issues with WEB development Looking at the previous paragraphs, there are number of issues when using WEB for continuous development. Compiling WEB source, especially on a modern platform without a suitable Pascal compiler, is a fairly complicated and lengthy process requiring a list of dedicated tools that have to be run in the correct order. Literate programming environments like WEB never really caught on, so it is hard to find programmers that are familiar with the concepts, and there are nearly no editors that support its use with the editor features that modern programmers have come to depend on, like syntax highlighting. Only a subset of an old form of Pascal is used in the Knuthian sources, and as this subset is about the extent of Pascal that is understood by the web2c program, that is all that can be used. This makes interfacing with external programming libraries hard, often requiring extra C source files just to glue the bits together. The ubiquitous use of WEB macros makes the external interface even harder, as these macros do not survive into the generated C source. Quite a lot of useful debugging information is irretrievably lost in the tangle stage. Of course, TEX82 is so bug-free that this hardly matters, but when one is writing new extensions to TEX, as is the case in LuaTEX, this quickly becomes problematic. Finally, to many current programmers WEB source simply feels over-documented and even more important is that the general impression is that of a finished book: sometimes it seems like WEB actively discourages development. This is a subjective point, but nevertheless a quite important one. Our solution In the winter of 2008 2009, we invested a lot of time in hand-converting the entire LuaTEX code base into a set of C source files that are much closer to current programming practices. The big WEB file has been split into about five dozen pairs of C source files and include headers. During conversion, quite a bit of effort went into making the source behave more like a good C program should: most of the WEB macros with arguments have been converted into C #defines, most of the numerical WEB macros are now C enumerations, and many of the WEB global variables are now function arguments or static variables. Nevertheless, this part of the conversion process is nowhere near complete yet. The new implementation of scan_left_brace in LuaTEX looks like this: /* The scan_left_brace routine is called when a left brace is supposed to be the next non-blank token. (The term left brace means, more precisely, a character whose catcode is left_brace.) \TeX\ allows \.\\relax to appear before the left_brace. */ void scan_left_brace(void) /* reads a mandatory left_brace */ /* Get the next non-blank non-relax non-call token */ do get_x_token(); while ((cur_cmd == spacer_cmd) (cur_cmd == relax_cmd)); if (cur_cmd!= left_brace_cmd) print_err("missing inserted");

LuaTEX says goodbye to Pascal Page EUROTEX 2009 E5 help4("a left brace was mandatory here, so I ve put one in.", "You might want to delete and/or insert some corrections", "so that I will find a matching right brace soon.", "If you re confused by all this, try typing I now."); back_error(); cur_tok = left_brace_token + ; cur_cmd = left_brace_cmd; cur_chr = ; incr(align_state); I could write down all of the differences with the previously shown C code, but that is not a lot of fun so I will leave it to the reader to browse back a few pages and spot all the changes. The only thing I want to make a note of is that we have kept all of the WEB explanatory text in C comments, and we are actively thinking about a way to reinstate the ability to create a beautifully typeset source book. Taco Hoekwater taco (at) luatex dot org