Modules:Context-Sensitive Keyword

Similar documents
Modules: ADL & Internal Linkage

Modules:Dependent ADL

1 Lexical Considerations

Global Module Fragment is Unnecessary

The Syntax of auto Declarations

Appendix. Grammar. A.1 Introduction. A.2 Keywords. There is no worse danger for a teacher than to teach words instead of things.

A Module Mapper. 1 Background. Nathan Sidwell. Document Number: p1184r1 Date: SC22/WG21 SG15. /

C++ Module TS Issues List Gabriel Dos Reis Microsoft

Merging Modules Contents

Lexical Considerations

Lexical Considerations

Have examined process Creating program Have developed program Written in C Source code

The New C Standard (Excerpted material)

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements

A Module Mapper. 1 Background. Nathan Sidwell. Document Number: p1184r0 Date: SC22/WG21 SG15. /

fpp: Fortran preprocessor March 9, 2009

1. Describe History of C++? 2. What is Dev. C++? 3. Why Use Dev. C++ instead of C++ DOS IDE?

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #54. Organizing Code in multiple files

Working Draft, Extensions to C++ for Modules

Fundamentals of Programming Session 4

CSE 401 Midterm Exam Sample Solution 11/4/11

2 nd Week Lecture Notes

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

Allowing Class Template Specializations in Associated Namespaces (revision 1)

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

Module Names. Nathan Sidwell. The dotted identifier nature causes new users to immediately make incorrect assumptions:

Working Draft, Extensions to C++ for Modules

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Expansion statements. Version history. Introduction. Basic usage

Language Reference Manual simplicity

Gabriel Hugh Elkaim Spring CMPE 013/L: C Programming. CMPE 013/L: C Programming

A Simple Syntax-Directed Translator

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Code Structure Visualization

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Intermediate Code Generation

8. Control statements

Control flow and string example. C and C++ Functions. Function type-system nasties. 2. Functions Preprocessor. Alastair R. Beresford.

Programming languages - C

Syntax Errors; Static Semantics

Programming for Engineers Introduction to C

QUIZ. What is wrong with this code that uses default arguments?

Wording for lambdas in unevaluated contexts

Semantic constraint matching for concepts

4. An interpreter is a program that

SOFTWARE ARCHITECTURE 5. COMPILER

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Compilers and computer architecture From strings to ASTs (2): context free grammars

Full file at C How to Program, 6/e Multiple Choice Test Bank

Stroustrup Modules and macros P0955r0. Modules and macros. Bjarne Stroustrup

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression

Compiler, Assembler, and Linker

Short Notes of CS201

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

4. Inputting data or messages to a function is called passing data to the function.

CS201 - Introduction to Programming Glossary By

Rule 1-3: Use white space to break a function into paragraphs. Rule 1-5: Avoid very long statements. Use multiple shorter statements instead.

Typescript on LLVM Language Reference Manual

Single-pass Static Semantic Check for Efficient Translation in YAPL

Reference Grammar Meta-notation: hfooi means foo is a nonterminal. foo (in bold font) means that foo is a terminal i.e., a token or a part of a token.

CS 326 Operating Systems C Programming. Greg Benson Department of Computer Science University of San Francisco

IC Language Specification

Languages and Compilers

Fundamentals of Programming. Lecture 3: Introduction to C Programming

Proposal of Bit field Default Member Initializers

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1

Fundamental Concepts and Definitions

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

CSC 467 Lecture 3: Regular Expressions

Language Extensions for Vector loop level parallelism

Macros in C/C++ Computer Science and Engineering College of Engineering The Ohio State University. Lecture 33

10. Functions (Part 2)

COP4020 Programming Assignment 1 - Spring 2011

The PCAT Programming Language Reference Manual

Compiler Design Concepts. Syntax Analysis

Coccinelle Usage (version 0.1.7)

A simple syntax-directed

Chapter 2, Part I Introduction to C Programming

Types and Static Type Checking (Introducing Micro-Haskell)

ECE251 Midterm practice questions, Fall 2010

ISO/IEC JTC1/SC22/WG5 N1247

C and C++ 2. Functions Preprocessor. Alan Mycroft

FRAC: Language Reference Manual

CSc 10200! Introduction to Computing. Lecture 2-3 Edgardo Molina Fall 2013 City College of New York

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Integrating Concepts: Open items for consideration

The New C Standard (Excerpted material)

The New C Standard (Excerpted material)

CSCE 314 Programming Languages. Type System

The development of a CreditCard class

C Syntax Out: 15 September, 1995

C Language Programming

Extended friend Declarations

3. Except for strings, double quotes, identifiers, and keywords, C++ ignores all white space.

Object Oriented Design

The New C Standard (Excerpted material)

Control Flow. COMS W1007 Introduction to Computer Science. Christopher Conway 3 June 2003

UNIT-4 (COMPILER DESIGN)

Transcription:

Document Number: P0924r1 Date: 2018-11-21 To: SC22/WG21 EWG Reply to: Nathan Sidwell nathan@acm.org / nathans@fb.com Re: Merging Modules, p1103r2 Modules:Context-Sensitive Keyword Nathan Sidwell The new module & import keywords presents some difficulties with converting existing code bases that use them as identifiers, particularly in externally publicized interfaces. This paper discusses avenues available for making module & import context-sensitive keywords. 1 Background At Albuquerque 17 I presented two papers that could simplify making module context-sensitive, although that was not their main goal: P0774 Module Declaration Location P0787 Proclaiming Ownership Declarations This paper draws on those, but has the goal of making module context-sensitive, rather than a desirable side-effect of another change. That version of the paper proposed full tentative parsing, in the same manner as Most Vexing Parse, and was presented at the Jacksonville 18 meeting. It was not accepted. Consensus was to proceed with import and module as keywords. Since that time, context has changed. The ATOM proposal, p0947r0, was presented at the same meeting and EWG elected to merge components of ATOM with the TS, thereby producing p1103. Of significance to this paper, p1103r0 had the following: 1. Header units, which have header names. 2. Removal of proclaimed-ownership-declarations. 3. Removal of import declarations from nested export declarations. Also, p0713 was accepted, requiring the global module fragment be introduced by a nameless module declaration. P0924r1:Modules:Context-Sensitive Keyword - 1 - Nathan Sidwell

2 Discussion The restriction of module and import declarations to top-level constructs, and the removal of proclaimed-ownership-declarations, changes the cost of making module and import context-sensitive. 2.1 Full Tentative Parsing TL;DR: This is the proposal of the r0 paper, retained for historical record. This disambiguation rule is that such ambiguities are always parsed as a module-declaration if a (toplevel) declaration could be a module-declaration, it is. This is simple to state. It would change the meaning of the following C++17 source (presume module is a typedef): // implementation unit me, not variable me module me; // interface unit me, not export of variable me export module me; This disambiguation occurs, regardless of whether the interpretation is semantically ill-formed. Function declarations, more complex variable declarations, member declarations and non-globalnamespace-scope declarations would remain with their C++17 meaning: class module { /* Unspecified. */ }; module frob (); // function returning module module *me; // variable pointing to a module namespace bits { export module me; // exporting variable of type module } class thing { module me; // data member of type module }; For completeness, the following uses of module, as an identifier, continue unchanged: class module; // class named module class thing { module m; // field m type module }; int module (); // function called module It is my understanding that the known uses of module do not fall into cases that would be interpreted as module-declarations under this disambiguation. The disambiguation can usually be implemented with minimal look ahead, not requiring full tentative parsing. If the token after the first identifier of a potential module-name is. or ;, the declaration P0924r1:Modules:Context-Sensitive Keyword - 2 - Nathan Sidwell

must be a module-declaration or ill-formed. If the next two tokens are [[, it could be either reduction, and one must skip to the matching ]] to look 1 for a ; indicating a module-declaration. Otherwise it must be some other declaration (or even expression-statement), or ill-formed. Here are some examples: // module-declarations: module m; module m [[ whatever ]]; module m.n; // declarations (ill-formed if module not a type): module *m; module m ( ); module (m); module m[]; module m, n; module m [[ whatever ]], n; All those examples parse the same way if preceded by export. As mentioned above, this was not accepted. 2.2 Top-Level Disambiguation As module and import declarations are only valid at the top level, a simple rule is to always interpret them as identifiers at any other level. That is within inner scopes, or within extended linkage or export declarations. Conveniently, this is equivalent to being outside of any bracing. However, this is insufficiently compatible with C++17, as known sources use module and/or import as global scope types or namespaces. To further disambiguate, we can rely on the module or import declaration syntax begins as: export opt module not-scope-ref export opt import not-scope-ref where not-scope-ref, is any token except ::. Where a declaration does not begin thusly, it should be interpreted as a declaration other than a import or module 2 declaration. Existing code using import or module as a global-scope typename or template name may need adjusting to insert a leading ::. 1 Similar lookahead will work for compiler extensions such as attribute ((...)). 2 Certain module constructs employ module but are not themselves module-declarations. For brevity this paper considers them all syntactically module declarations. P0924r1:Modules:Context-Sensitive Keyword - 3 - Nathan Sidwell

2.3 Tokenization The names of header units are header-names ([lex.header]), which are recognized by the preprocessor in phase 3. The grammar is: header-name: < h-char-sequence > " q-char-sequence " where the h-char & q-char sequences are composed of any member of the source character set except the final character of their respective header names (> and " respectively) or an end of line. It is implementation defined as to any interpretation of escape sequences, some systems use \ as a directory separator. A similar disambiguation rule to that given above could be used within the preprocessor to determine whether the token following import should be lexed as a header name, if it begins with an appropriate character. However, to implement such disambiguation requires examining tokens after macro expansion it is only then that brace nesting can be correctly counted. With header-names the ambiguities are cases such as: template <typename T> class import {}; ::import <int> X; // not a header-name import <int>; // a header-name (with a peculiar name) Unfortunately macro expansion is performed at phase 4, so such an algorithm violates the separation of phases. I had unfortunately misunderstood some of the p1103 semantics, and that this had already been violated, because macro importation from header units appeared to be a path from phase 7 to phase 4. However, that is not the case. P1103 specified that header units were recognized by the preprocessor at phase 4, and the macros therefrom loaded at that point. The header units were again recognized at phase 7 to load their C++ declarations etc. 3 Such a back propagation proposed would not cause implementation difficulties of the 4 compiler developers queried. However the comment was made that a preprocessor might choose to implement phases 1 to 3 as a first pass and then process phase 4. Such an implementation would clearly have difficulty if phase 4 affected the tokenization of phase 3. 2.3.1 Import-Is-Keyword Locations Determining when an import identifier is at a location where it may be a keyword was expressed with two conditions: 1. outside of any brace nesting, and 3 Implementations may well behave in the manner I had inferred though. P0924r1:Modules:Context-Sensitive Keyword - 4 - Nathan Sidwell

2. immediately after a close-brace or semicolon, with a possibly intervening export keyword. Before macro expansion we can neither count braces, nor unambiguously determine if we are at the start of a declaration. Obviously, if the programmer hides export or import inside macro expansions, the phase-3 tokenizer will fail completely. Brace nesting will fail for idioms such as: #define NAMESPACE_BEGIN namespace my_company { #end NAMESPACE_END } NAMESPACE_BEGIN /* */ NAMESPACE_END import <frob>; In general, any identifier could be a macro expansion that ends a declaration. Similarly any ) could be the end of a macro invocation with the same effect. We can be certain we are not at the start of a declaration after a non-identifier cpp-token that is not a close parenthesis, semicolon or close brace. We may be reasonably certain that if we have tokenized an unclosed brace outside of a preprocessing directive, that we are not at the top-level. This heuristic will fail for: #ifndef NO_NAMESPACES namespace foo { #endif /* #1 stuff */ #ifndef NO_NAMESPACES } using namespace foo; #endif However, considering the region #1 as not at the outer level regardless of NO_NAMESPACES is probably non-breaking. The pretend-it s-pascal idiom of: #define begin { #define end } int foo () begin /* stuff */ end will also break brace-counting, and lead to ambiguities of the form: import < lower import >= upper? v = 1 : v = 2; which is a baroque way to express conditional assignment. If we add the following restrictions, phase-3 tokenization becomes simpler: P0924r1:Modules:Context-Sensitive Keyword - 5 - Nathan Sidwell

1. export & import in header-unit declarations must not come from macro expansion. 2. Braces originating in macro expansions might not be counted in brace nesting (braces in preprocessor directives are not counted). 3. Braces not arising from macro expansion must be balanced. 4. The preceding token before a header-unit import declaration must be a non-macro expanded close brace or semicolon. Rule 1 resurrects P0947 s preamble rule, but only for header-unit imports. Heuristic 2 permits implementations to choose to count braces at phase 4, but does not compel them. Rule 3 inhibits arbitrary mix & match of macro-expanded and direct braces. Rule 4 is a modification of P0947 s semicolon rule, applying to the token before the declaration of interest. Generally code formatting tools already have difficulty if Rules 3 & 4 are not followed. Likewise they are often blind to bracing hidden by macro expansions. Thus these rules are likely followed by existing code bases. If these conditions are satisfied, the preprocessor should tokenize the following characters as a headername at phase 3. If they do not begin with a valid header-name initial character, they should be tokenized as normal. If the header-name cannot be lexed (an end of line occurs) an error should be issued. 2.3.2 Macro Expansions Macros that expand to (parts of) import declarations will be problematic, in the same way as macro expanding an include directive or has_include operand: #define NAME <file> import NAME; // #1 #define IMPORT(ARG) import ARG IMPORT(<file>); // #2 #define INIT import <file> INIT; // #3 In all these cases, the <file> fragment is tokenized as <, file, >. The identifier file is subject to macro expansion itself. Were it to be spelled "file", it would be tokenized as a string-literal. Prohibiting macro expansion for an import-declaration s operand may well cause user confusion, particularly as has_include has no such restriction. Allowing such macro expansions, implies the grammar for an import-declaration needs to be extended to permit a string-literal and <, pp-tokens, > sequence. Unlike the has_include case, there is ambiguity in where such a pp-tokens sequence might end. 4 Consider: 4 has_include is treated as-if a macro, therefore the closing argument parenthesis marks the end. P0924r1:Modules:Context-Sensitive Keyword - 6 - Nathan Sidwell

#define NAME <file // #1 missing > import NAME; > // #2 sometime later The expansion of NAME does not include a terminating >, should pp-tokens be concatenated up to >, should it stop at a ; (or [ ), or perhaps it should signal an error if a newline is encountered? For avoidance of doubt I suggest that any retokenization of <, pp-tokens, > occur at phase 4 and not in the C++ parsing phase. 2.3.3 Include Translation Include translation occurs at phase 4. Such translation is explicitly generating a header unit import, and relies on the location of the translation being at the start of an outermost declaration. There is no difficulty generating the header-name token. 3 Proposal As accepted at San Diego 18, both module and import are context-sensitive keywords. They are part of a module or import declaration iff: At the toplevel, outside of any { } nesting, and at the beginning of a declaration, possibly preceded by export, and not immediately followed by ::. Updated p1103 expresses the first two directly in the grammar, and expresses the third in text. The preprocessor tokenizing rules should specify the rules in Section 2.3.1, but allow implementations to feedback from phase 4, if they so choose. The retokenizing of Section 2.3.2 should be added. 4 Revision History R0 Original paper, presented Jacksonville 18 R1 Updated paper, from presentation made in San Diego 18 P0924r1:Modules:Context-Sensitive Keyword - 7 - Nathan Sidwell