Design Principles for a Beginning Programming Language John T Minor and Laxmi P Gewali School of Computer Science University of Nevada, Las Vegas Abstract: We consider the issue of designing an appropriate programming language for teaching beginning computer science courses. We examine the features of widely used teaching languages that include C++/C#/Java and show that the constructs of these languages can be significantly improved to make them more effective for class-room teaching. We then propose a set of design principles that can lead to a pedagogical-friendly programming language with promising scope for enhanced learning. Keywords: programming language, pedagogical issues, language design principles 1. Introduction In this paper we examine the issues of designing a programming language that can be adopted for undergraduate teaching. We propose a set of principles that can be followed as guide-lines for developing a student-friendly programming language. These principles have been formulated so that the resulting programming language can be used by beginners in the first courses on undergraduate programming. The widely used programming languages that include C/C++/C# and Java are not well designed for beginners. To meet the requirement of fast execution time, these programming languages are not user-friendly for fresh programmers. The widely used C#/C++/Java languages are complicated and can be improved for class-room and laboratory instruction. We examine some of the draw-backs of these languages and suggest approaches for making improvements. We show that some of the complicated programming constructs in C#/C++/Java are not needed at all in the first courses in programming. 2. Keep the size of the language small A "teaching programming language" should have a small size in the number of reserved words, operators, statement/declaration constructs, and precedence rules. Unlike C++ for example, the language should be designed with less than 18 precedence levels. A number of features in C++/C#/Java can easily be eliminated because they are redundant or rarely used. In fact, the "struct" construction in C++ is redundant. Any situation where "struct" is needed, one can use simple class construction. C++ and Java allow comments to be written in two ways: (i) by using slash-star pairs (/* Comment */) or (ii) slash-slash (//Comment). The comment in the form /*..*/ is not necessary and redundant. We can similarly examine the need for static variables. Static variables are rarely used in the first courses on programming. Static variables lead to debugging complications and can be eliminated without compromising expressive needs. The inheritance mechanism in C++ can be improved for instructional purpose. Single-inheritance is adequate for writing programs in beginning courses and hence multiple-inheritance is unnecessary. Programs that do not use multiple
inheritance are easily communicable. Qualifiers such as "friend" or "protected" inside class definitions confuse fresh learners and should be addressed only later. Java uses a lot of "Interfaces" even in the first course. Interface-abstraction is not easily comprehensible to beginning students and its use is not necessary. Similarly, function parameters are rarely used and not needed. 3. Natural and consistent syntax A student-friendly programming language should be internally consistent with minimal exceptions and be as natural as possible. The term "internally consistent" is used to mean that the language construct should not allow too many variations in the use of statement constructs. For example C++/C# does not make it mandatory to initialize variables. This freedom leads to errors that are hard to spot. A pedagogical friendly programming language should force the programmer to initialize every variable during its declaration. If a variable is not initialized when declared then the compiler should flag an error. Proper initialization during declarations should be done for scalar and nonscalar objects as well. When an array is declared, values at all positions should be initialized. Furthermore, for array initialization, there should be a simple way to initialize all array entries with the same value. Even FORTRAN has simple aggregate array initialization. For objects, initialization by listing values for members should be made available to programmers. The way C++ does this by using "initializing list" is too cryptic and complicated. C++/C# have no mechanism for initializing the members values, a source of inconsistency. One is compelled to use constructors, which is not consistent with simple variable initialization. A pedagogical friendly language should allow user-defined functions to return any built-in primitive or structured type. In C++, arrays cannot be returned explicitly. They are returned in C++ either by wrapping in an object or by returning a pointer to the beginning of the array. This is confusing to beginners. It is natural that functions be able to return the array type directly with the size of the array explicitly known. Furthermore, assignment between array types and between structure types should be allowed. In C++/Java, assignments between objects are allowed but not between arrays; this is inconsistent. It is remarked that even the earlier versions of FORTRAN allowed assignment between arrays of the same type. The association rules of widely used programming languages are complicated for beginners. In C++, some associations are left-to-right and others are right-to-left, even for binary operators. It is necessary pedagogically to simplify association rules. One suggestion would be to make all binary operators have left-to-right association while all unary operators have right-to-left association. There should be no exception to these association rules. Consider the issue of unnatural conventions used in the programming languages C++/Java/C#. One source of unnatural convention is that functions by default are recursive. Beginners find it hard to learn recursive functions. Recursive functions cannot be comprehended in a natural way in the first course on programming. It is more appropriate to make functions non-recursive by default. When recursive functions are to be used, they must be specified by being tagged "recursive". This will emphasize the fact that special care should be done while designing recursive functions. Another example of an unnatural convention in the teaching of programming languages is the placement of
the return type in function definitions. In C++/Java/C#, the return type is written before the function name and parameter listing as in "char max(char a, char b, char c)". It is more natural to put the return type at the end, as in "max(char a, char b, char c)-> char". The specification of parameter passing in function calls in the existing popular programming languages is unnatural and confusing. Take the example of parameter passing in C++: "void findmax(int a, int b, int & c){..}". In this example, the first two parameters a and b are passed by value and the third one c is passed by reference. Using the symbol '&' to indicate pass by reference is very cryptic and counter intuitive. The syntax of parameter passing should reflect intended purpose. It is much more appropriate to use tags "in/out/in-out" in parameter passing. Here, the tag "in" is used to indicate the parameter whose value is read for computation inside the function and the tag "in-out" is used to indicate two way communication. With this convention the findmax function can be written as: void findmax(int a in, int b in, int c out){..} Such a construct emphasizes, in clean terms, that the first two parameters are passed read-only and the third one is used to implicitly return the computed value. Chances of having unintended errors will be reduced by this convention. 4. No error-prone operators A numbers of operators in C++/Java are very confusing when someone is learning programming. We can take the example of the increment operator ( ++ ). This operator is mostly used for increasing the value of an operand by 1. It is noted that the operand is incremented by 1 when operator ++ is applied as prefix or postfix. The main difference in the meaning of postfix and prefix increment operator is in the value of the expression ++i or i++. In the prefix application the value of ++i is the new value of i and in the postfix application the value of i++ is the old value of i. We can imagine how confused a beginner would be when they have to get buried in such unnecessary detail when they are just starting to comprehend the meaning of elementary statements. The same thing can be argued for the decrement operator ( -- ) and extended assignment operators such as +=, -=, *=, etc. It is critical pedagogically to replace unintuitive i++ by the very intuitive assignment statement i = i+1. Similarly, cryptic extended assignment x += 3 should be replaced by simple assignment statement x = x+3. These observances underscore the need for removing complicated side-effect generating operators that include ++, --, =+, =-, and =*. The pointer arithmetic operations used in C/C++ are very difficult to understand for fresh programmers. Errors due to pointer misuse are very difficult to debug. Java corrected this demerit of C/C++ by not allowing pointers at all. 5. Only high-level built-in types There should be only a small-set of built-in types and these built-in types should be all high-level. The C#/C++/Java programming languages distinguishes primitive types into too many cases: integer, short, long, unsigned, etc. This forces the programmer to think about the underlying hardware while writing and analyzing programs. This slows down the learning process. The programmer should be
freed from worrying about the low level hardware detail by leaving the issue of hardware choice to the compiler. Let the compiler select the best hardware representation based on the use as stated by the programmer. An appropriate way would be for the programmer to specify the range of an integer variable by writing down its lower and upper bounds. For example, consider a declaration such as int low high m, where this indicates that the integer variable m can take integer values between the limits low and high. The compiler then can translate to the appropriate hardware word-size. This eliminates the need of distinguishing between short, integer, unsigned and long for the programmer. This argument equally applies to float types etc. The low level pointer-oriented memory allocation of array and structures practiced in C#/C++ are not easily communicable and prone to errors. Such constructs should be replaced with high level dynamic string and/or list types. Dynamic allocation of space should be done automatically when needed by string or list operations. It is remarked that de-allocation of unused memory should also be done automatically by a garbage collection technique and programmers should not have to worry about this problem (or the dangling reference problem). 6. Proper naming convention There should be enforced fixed naming conventions for user-defined identifiers so that anyone scanning or debugging code will immediately recognize the purpose and role of an identifier from its form. A program written without these conventions should be flagged by the compiler. Typical naming convention could include: (i) Class names and enum types must start with a capital letter, and the rest must be capitals, digits, or underscores. (ii) User-defined functions or methods must start with a capital letter and the rest must be lower case letters, digits, or underscores. (iii) User defined constants, including const and enum constants, must start with a lower-case letter and the rest must be lower-case letters, digits, or underscores. (iv) User defined type parameters as used in generic class or functions, must start with a question mark (?)and the rest must be capital letters, digits, or underscores. (v) User defined variables including non-type parameters, objects and datafields must start with a question mark (?) and the rest must be lower-case letters, digits or underscores. 7. Discussion We outlined only a few principles for designing a teaching programming language. Due to space limitations we did not present a complete list. Some additional principles that would be useful in designing a pedagogical programming language could be issues dealing with selection of programming paradigm, side-effect, implicit operands, parametric polymorphism, debugging aids and dynamic error-checking. For example, the language should be flexible and not tied down to a particular programming paradigm. One-paradigm languages (e.g. Java) lead to awkward/complex implementations of those algorithms that do not fit the given model [1,4].
When functions and class definitions are first presented to the students, generic type parameters should also be introduced. Students should be taught that when functions/classes are written as general as possible, reuse is more likely, and that is desirable. Templates should NOT be a separate construct introduced independently late in the instruction. Whether to (or not to) do dynamic error-checking should be at the option of the programmer and not decided by the language. Error-recognition capabilities (needed during project development) must be balanced with run-time efficiency (desirable in the final product). Both are useful at different phases of software development. A complete syntax and specification of a pedagogical programming language based on the principles discussed in this paper is available as a technical report [1]. References [1] John T. Minor, HIGH-C: A Pedagogical Language Based on High-Level Design Principles, Technical Report (20 pages) CRS-11-001, School of Computer Science, University of Nevada, Las Vegas, 2011. [2] John T. Minor and Laxmi Gewali, Pedagogical Issues in Programming Languages, Proceedings of the IEEE International Conference on Information Technology, pp. 562-565, April 2004. [3] Daconta, Michael, Kevin Smith, Donald Avondiolo, and Clay Richardson, More Java Pitfalls, J. Wiley and Sons, 2003. [4] Gewali, Laxmi and John T. Minor, "Multi-Paradigm Approach to Teaching Programming," Proceeding of 2006 International Conference on Frontiers in Educations: Computer Science, pages 141-146, June 2006. [5] Daconta, Michael, Eric Monk, J. Paul Keller, and Keith Bohnenberger, Java Pitfalls, J. Wiley and Sons, 2000.