Irecently had an assignment in
|
|
- Diana Montgomery
- 6 years ago
- Views:
Transcription
1 by BILL TRUDELL Keys to Writing Efficient Embedded Code A key to writing efficient real-time embedded software is to understand clearly your processor s architecture, the programming language, the compiler s features, and the object model used by the compiler. With this understanding, you can identify potentially slow code, make the code faster, and thus write more efficient applications. Nance Paternoster Irecently had an assignment in which I was responsible for identifying ways to write efficient embedded code. What I discovered isn t rocket science common mistakes, misunderstandings, or assumptions about the demands made on the compiler and over-estimating the power of the microprocessor can adversely impact the execution time of an application. Most of my effort focused on implementing code that doesn t enable floating-point operations, but instead relies on the math libraries supplied by the compiler vendor. Examples are presented primarily in C, but compiled in C++. I will leave the analysis of virtual tables and the like to the C++ experts. I hope a good compiler vendor does a reasonable job of implementing such things. Most of what I learned, though, can be applied to any programming language. Inefficient code seems to be more closely related to the human condition than to the chosen programming language. Slow code is probably slow because that s the way it was written, however unintentionally. I do believe that it s better to first write code that is correct and then to optimize it. There 52 EMBEDDED SYSTEMS PROGRAMMING OCTOBER 1997
2 Automatic type conversion is generally taken for granted, but it does chew up valuable processor time. will always be another compiler switch, faster clock chip, or newer processor around the bend. Well-written test drivers can prove equality between two implementations. My general assumption is that to implement efficient embedded software, a developer must be familiar with the code that the compiler generates, as well as with the microprocessor architecture. Writing efficient code, though, can sometimes make it less portable. Code written in Assembler is usually processor-specific and not portable. The code you write today will very likely need to run on a different processor in a year or two. Using Assembler, then, to make the code fast might not be prudent, except for interrupt service routines or frequently-used functions. Analysis of any improvements made to working code is very important. Validate changes to ensure that errors have not been introduced. Make sure that the desired level of precision and the accuracy with which calculations are performed is maintained or is adequate. You can easily overlook rounding and truncation errors. DATA TYPE SPECIFICATIONS One common oversight is specifying the wrong data type and then allowing the compiler or preprocessor to convert the type automatically. Automatic type conversion is generally taken for granted, but it does chew up valuable processor time. Without looking at the related Assembler, the code might compile, link, run, and produce the right output, yet be very inefficient. Table 1 contrasts two code segments generated for a processor without an FPU. The segment on the left omitted the use of the single-precision floatingpoint specifier f an easily overlooked mistake. The value 10.0 defaulted to double precision and forced the numerator to first be converted to double precision before the double-precision divide. The result of the division, a double-precision value, is then converted back to single precision. Without an FPU, these conversion operations, as implemented in a software math library, are very expensive as compared to integer operations. Correctly specifying the divisor as a single-precision value produces the code shown in the right-hand segment in Table 1. A single-precision division is used and the type conversions are avoided. If the numerator were an integer, a conversion from long to float would be required. This would be a time-consuming operation that could be avoided if the data type were specified as float. (See section A.6 of The C Programming Language by Kernighan and Ritchie. 1 ) Single-precision accuracy is probably used more frequently than double precision. Therefore, if double precision is required, a simple comment in the code would remove all doubt as to the developer s intention and design. AUTOMATIC PROMOTIONS Implied promotions can easily be taken for granted. In some cases you ll find it desirable or necessary to write and debug the code first in a PC environment, and later port or recompile it for the embedded processor. The clock speed on the PC will usually be much faster than the embedded hardware, and the PC will surely OCTOBER 1997 EMBEDDED SYSTEMS PROGRAMMING 53
3 TABLE 1 Floating-point specifiers. C Code omitting float specifier f : C Code using float specifier f : float res = val /10.0; float res = val / 10.0 f; move.l -4(a6),-(sp) move.l -4(a6),-(sp) jsr ftod move.l # ,-(sp) clr.l -(sp) jsr fdiv move.l # ,-(sp) move.l (sp)+,-8(a6) jsr ddiv jsr dtof (Less is Better.) move.l (sp)+,-8(a6) TABLE 2 Using math functions and the effect of automatic promotions. C Code, Automatic Type Promotion: C Code, Casting to Avoid Excessive Promotions: (val is of type float) float res = float res = ( 17.0f * sqrt( val ) ) / 10.0f; ( 17.0f * (float)(sqrt( val )) ) / 10.0f; move.l -4(a6),-(sp) // Load val on stack, move.l -4(a6),-(sp) jsr ftod // Convert it to dbl jsr ftod jsr _sqrt // Dbl prec. Sqrt() jsr _sqrt addq.l #4,sp // Adjust Stack addq.l #4,sp move.l d1,(sp) // Load sqrt() result move.l d1,(sp) // Load Stack with move.l d0,-(sp) // d1&d2 on stack move.l d0,-(sp) //sqrt() result & clr.l -(sp) // Load Stack with jsr dtof //convert to single move.l # ,-(sp) // dbl for 17.0 move.l # ,-(sp) // 17.0f jsr dmul // Dbl Prec. Mult. jsr fmul //Single Prec. clr.l -(sp) // Load Stack with move.l # ,-(sp) // 10.0f move.l # ,-(sp) // dbl for 10.0 jsr fdiv // Single Prec. jsr ddiv // Dbl Prec. Divide move.l (sp)+,-8(a6) // Save Result jsr dtof // Double to Float in res move.l (sp)+,-8(a6) // Save in res have a floating-point unit. The implied promotions performed by the compiler for the embedded code might not execute as fast as they did on the workstation. Standard math routines usually take double-precision inputs and return double-precision outputs. If only single precision is required, the return value should immediately be cast back to single precision, provided that accuracy and overflow conditions are satisfied. If this isn t done, further promotions can be precipitated, causing slower execution. Table 2 contrasts the use of automatic promotion using the sqrt() function as is, with the casting of the sqrt() function s return value. Using the sqrt() function as is forces the other variables to be promoted. Casting the return of the sqrt() functions replaces the double-precision multiply and divides with single-precision versions, which should execute faster because in this case, they re implemented in the software. If the input to sqrt() were of type double instead of float, the costly call to convert the float to double could be avoided. REWRITING AND REARRANGING EXPRESSIONS Rearranging operands and operators in an equation can give the preprocessor a better chance at pre-evaluating expressions at compile time instead of run time, saving clock cycles of execution for other important operations. The equation used in Table 2 can be rearranged for faster execution without losing readability, as shown in Table 3. Significant savings result because a single precision division is no longer necessary, as 17.0f/10.0f is equivalent to 1.7f. In general, for both native instruction sets and floating-point emulation, divides take much longer to execute than multiplies. Therefore, provided that accuracy requirements are met and overflow and under-flow conditions are considered, trading a divide for a multiply usually saves time. For example, an algebraic equation can be rewritten for faster execution, as shown in Table 4. Here, the segment on the left takes two divides, whereas the rewrite takes one divide and one multiply, and will be much faster. Coding algorithms and procedures for the most frequently executed path also contributes to faster overall execution. Assembler branches taken are usually faster than those not taken, so put the evaluation of the most frequently occurring conditions first. 54 EMBEDDED SYSTEMS PROGRAMMING OCTOBER 1997
4 TABLE 3 Rearranging equations for efficient preprocessing. C Code Example: float num = ( 17.0f / 10.0f) * (float)(sqrt( val )); move.l -4(a6),-(sp) // Load Stack with float val jsr ftod // Convert val from Single to Double jsr _sqrt // Double Precision Square Root addq.l #4,sp move.l d1,(sp) // Load Stack with Double Result move.l d0,-(sp) jsr dtof // Convert sqrt() Double Result to Single move.l # ,-(sp) // Load pre-evaluated 17.0f/10.0f jsr fmul // Single Precision Multiply move.l (sp)+,-8(a6) // Pop and Store Result in variable num TABLE 4 Rewriting algebraic expressions for better efficiency. C Code Successive Divides: C Code Multiply instead of Divide: D = A / B / C; D = A / ( B * C ); Compilers generally have their own optimization for switch and case statements which use jump tables and take note of large gaps in values used in the various cases. Table 5 shows some different ways to optimize algebraic equations. Look for the repeated use or evaluation of the same expression. In Equation 1, the product A * B is evaluated twice. Defining another variable to hold the product increases code size but avoids the extra multiply. Depending on the processor and data type, this can result in significant time move.l -4(a6),-(sp) move.l -4(a6),-(sp) move.l -8(a6),-(sp) move.l -8(a6),-(sp) jsr fdiv move.l -12(a6),-(sp) move.l -12(a6),-(sp) jsr fmul jsr fdiv jsr fdiv move.l (sp)+,-16(a6) move.l (sp)+,-16(a6) savings. Note the following example: D = A / (B * C) E = 1 / (1 + (B * C)) evaluate B * C once, bc = B * C (1) LITERAL DEFINITIONS The specification of common values using #defines or const terms might be pragmatic, but is also prone to error. In the following example, several significant observations can be made. The value 3.14 is double precision, forcing a double-precision multiply and later a double- to single-precision type conversion call, all of which is time-consuming. #define TWO_PI 2 * 3.14 ; float c, r; ; ; c = 2 * 3.14 * r; move.l -8(a6),-(sp) // load r jsr ftod // Make dbl move.l # ,-(sp) // Load move.l # ,-(sp) // 2 * PI jsr dmul // Multiply 2PI * r jsr dtof // Convert to single move.l (sp)+,-4(a6) // Save in c The next example shows that the defined value, along with the hierarchy of operators, results in an incorrect solution for a circle s radius because the circumference variable c is first divided by two, and not the product (2 * PI). The code also shows that the multiplication of 2 * PI occurs at run time everywhere the literal TWO_PI is used: #define TWO_PI 2 * 3.14f ; float c, r; ; ; r = c / 2 * 3.14f; move.l -4(a6),-(sp) move.l # ,-(sp) jsr fdiv move.l # ,-(sp) jsr fmul move.l (sp)+,-8(a6)?line 10973,22 Two correct implementations follow, one using a #define, and the other, a const variable. The substitution for the literal definitions avoids a multiply because the preprocessor evaluates the expression inside the parentheses at compile time. Similarly, the const variable is also evaluated at compile time, but requires more memory and some overhead for referencing the variable: #define TWO_PI (float)(2 * 3.14) // const float TWO_PI = 2 * 3.14f; 56 EMBEDDED SYSTEMS PROGRAMMING OCTOBER 1997
5 TABLE 5 Algebraic simplifications and the laws of exponents. Original Expression: Optimized Expression: a 2-3a + 2 (a - 1) * (a - 2) 1 multiply (square term) 2 subtractions 1 subtraction 1 multiplication 1 multiply ( 3a) 1 addition (a - 1) * (a + 1) a subtraction 1 multiply (square term) 1 multiplication 1 subtraction 1 addition 1 / (1 + a / b) b / (b + a * b) 2 divides 1 divide 1 addition 1 addition 1 multiply a m * a n a m + n 2 power functions 1 addition 1 multiply 1 power function (a m ) n a m * n 2 power functions 1 multiply 1 power function INTEGER MATH VS. FLOATING POINT If inputs are bound by definition, convention, or data type, a chance exists that floating-point computations might be substituted for integer math and appropriate scaling, as shown in Table 6. The savings in this case may seem unimpressive, but replacing a floatingpoint subtraction with a left shift and integer subtraction represents a significant savings in execution time. Don t forget that pushing arguments on the stack, jumping to the subroutine, returning, and adjusting the stack are all overhead in solving the problem. THE STANDARD MATH LIBRARY As I ve already mentioned, the standard math library generally expects double-precision values. Massive penalties result when converting from single precision to double precision and back again when using floating-point emulation software. However, using double precision as default data types isn t the solution either, because it consumes more space and time (unless you re using a Pentium Pro, which does everything in double precision but isn t really an embedded processor). Table 7 shows how a single-precision absolute value can be written. While the alternative generates more code, it s much faster than the type conversion function calls. This alternative can be encapsulated in a macro or in-line function. It would be even better if the function abs() was overloaded for all relevant data types. COMPILER OPTIONS AND ISSUES Compilers offer many degrees of optimization. Some of these features are related to the programming language or object model, while others use specific knowledge of the processor to make the code execute faster. If developers are expected to 58 EMBEDDED SYSTEMS PROGRAMMING OCTOBER 1997
6 write efficient code, they ll need to know which options are enabled or disabled, which default options are being used, and the effect of these options on their application. Some general options supplied by compilers for optimizing code include: stack processing global flow inlining local optimization instruction scheduling code produced for a specific processor run-time library function inlining allocation of frequently used variables in registers code space optimization speed optimization floating-point emulation Some processors have separate calculation and execution stages. Because of this separation, instructions can be re-ordered to take advantage of known latencies in specific instructions, so that any stalling in the instruction pipeline is avoided or minimized. Good compilers have options for enabling these decisions. The code generated for one processor might not be optimal for another processor because of architectural differences, even if they re in the same family. Therefore, be sure to specify the correct processor in the compile switches. TABLE 6 Integer math instead of floating point. C Code: unsigned short int input = 12345; // Range 0 to float output; output = ( (float)input f ) / f; // (Slow Code Ahead) move.l #12345,d4 // Load into register d4 move.l d4,d0 // Load register d0, input to ultof jsr ultof // Convert unsigned long to float move.l # ,-(sp) // Push f jsr fsub // input f move.l # ,-(sp) // Push f jsr fdiv // (input f) / f move.l (sp)+,-4(a6) // Save result in output C Code using scaling: // Using Integer Data Types where possible and scaling numbers by 2^15 output = (float)( ((int)input<<15) ) / ; move.l d4,d0 // Load reg. d4 into d0 ( input = 12345) moveq #15,d1 // Load register d1 with shift amount lsl.l d1,d0 // Left Shift register d0 by d1 subi.l # ,d0 // input*2^ * 2^15 jsr ltof // Convert numerator from long to float move.l # ,-(sp) // Load float of int jsr fdiv // Single Precision Division move.l (sp)+,-4(a6) // Save result in output variable OCTOBER 1997 EMBEDDED SYSTEMS PROGRAMMING 59
7 TABLE 7 Floating-point absolute value. C Code: output = fabs(input); The inlining of run-time libraries can also help reduce execution time by avoiding function call overhead or by supporting optimized operations. For example, a memcpy routine can be optimized for a small number of bytes and result in an inline expansion. If the size is very large or the data type is userspecified instead of a primitive type, a function call to the memcpy routine might be generated. I found in one particular case that using memset resulted in a function call. While the library function handled the various data organization schemes that could be selected by the user, a quicker inline version might be a better choice if the size is known to be small. Even though the // (Slow Code Ahead) move.l -4(a6),-(sp) // Load input on stack jsr ftod // Convert it from Single to Double Precision jsr _fabs // Double Precision ABS addq.l #4,sp move.l d1,(sp) // Load result move.l d0,-(sp) jsr dtof // Convert result from Double to Single move.l (sp)+,-8(a6) // Save result in output C Code Test against zero instead of abs() function: if ( input < 0 ) output = - input; else output = input; move.l -4(a6),-(sp) clr.l -(sp) jsr fcmp bge.s L38 move.l -4(a6),-(sp) eori.b #128,(sp) // XOR the sign bit move.l (sp)+,-8(a6) // Save output bra.s L39 L38:move.l -4(a6),-8(a6) // Save output L39: internals of the memset function were implemented in Assembler, the overhead for setting a long word to zero by using a memset was excessive. For portability, among other reasons, you should give special care to the alignment of data. These choices can affect efficiency and can vary with the processor used. If your processor has an FPU, make sure the compiler has this switch turned on and that you re not running software to do floating-point calculations. Default options should be well understood because they may have an impact on performance. For example, Microsoft Visual C++ supports a workaround for Pentium processors with flawed floating-point instructions. The Help Index states: By default, the workaround is disabled (/QIfdiv), and the code generator emits code that is unsafe on a flawed Pentium. If the workaround is enabled (/QIfdiv), the code generator emits fatter, safe code that tests for the processor bug and calls run-time routines instead of using the native instructions of the processor to generate correct floating-point results. 2 So a trade-off exists between accuracy and speed. Therefore, be very careful when generating benchmarks and comparing the accuracy of generated values with specific versions of programs, like a debug build versus a release build. It s important to understand the implications of the switches chosen. Obviously, with a flawed Pentium, the run-time routines will run much slower than native instructions, but they ll be more accurate. A Pentium would not be a typical embedded processor selection, but your processor or compiler may have its own set of quirks. Some high-end processors have both instruction and data caches. The compiler provides switches for enabling these features. The instruction cache should be enabled; the data cache should only be enabled if sufficient consideration has been given to data synchronization. Multiprocessing will require extra hardware for bus snooping to be sure that the data cache is synchronized between two or more processors or processes. MEMORY ALLOCATION For time-critical sections of code, the use of the memory manager should be scrutinized. If the size of an object or data type is small and the scope is sufficiently restricted, the stack might be a better choice for a storage area. This analysis might only be obvious after a design has been implemented. Some assumptions can 60 EMBEDDED SYSTEMS PROGRAMMING OCTOBER 1997
8 then be made regarding the future growth of an object: if the object is small, maybe it s better left on the stack, rather than incurring the overhead of calling memory allocation functions. Another benefit is that keeping the smaller segments out of the heap can lessen fragmentation. In C and C++, using malloc, free, new, and delete doesn t come without some time penalty, yet it can be more flexible than using the stack. C++ Some mistakes I ve frequently made in C++ code have made my programs run more slowly than I would have liked. I found it especially important to know when a constructor is being executed. Trace statements or reference counting can help monitor these events. Most often, it s the copy constructor or assignment operator that is easily overlooked. It was only after stepping through the code that I noticed my design and implementation required the use of a copy constructor. The copy constructor was being called in a tight loop, which cost significant execution time. The design had to be re-done. Whenever possible, use a profiler or incorporate some crude timing elements to judge how efficient the program is executing. This tactic is handy for benchmarking and making changes to performance that are difficult to prove except by empirical methods. Some of the better development environments have a built-in profiler or an add-on component that monitors entries and returns from methods. Another common mistake is passing an object back on the stack instead of passing its reference. The copy constructor is executed for the temporary object. After a design has been implemented, tested, and used, a second pass can be made to improve the application. Classes that were originally necessary may become superfluous and can be eliminated or encapsulated in a superclass. This may remove unnecessary A top-down approach should be used to find where the time is being spent hopefully, you ll find a smoking gun. layers of inheritance which add to the overhead and processing time of an application. MISCELLANEOUS SETBACKS Acommon source of performance degradation is the cutting and pasting of working code as a base for newer features. One bad example, copied and promulgated in the system, significantly degrades performance. Bad examples should always be corrected, or at least commented. Developers will appreciate your honesty and humility. We often write code with the assumption that it might be used in the future. When the future arrives, the requirements may have changed or may be understood such that the old design is insufficient or wrong. Thus, when in doubt, leave it out. Don t code if it isn t needed. If you re using archival tools, the questionable code can be saved there. It s frustrating to browse through code that is no longer built, looking for code that needs to be optimized, or for common defects in relic code. RECOGNIZE THE SIGNS I ve discussed some common implementation oversights and mistakes and offered some recommendations regarding implementation details that will optimize code for time. A top-down approach should generally be used to find where the time is being spent hopefully, you ll find a smoking gun. However, this approach will address only specific instances of code inefficiency. A parallel path can also be taken, in which the cross reference section of link maps are inspected for unusual references. Double-precision math operations are an example. Most importantly, optimize something that already works the first step is to get the correct solution. Some fundamental concepts must first be understood in order to write efficient real-time embedded software. These concepts include knowing your processor s architecture, the programming language used, the features of the compiler, and even the object model used by the compiler. Inspecting the code generated by the compiler closes the loop and allows you to see how well the compiler has interpreted your coding requests and converted them into executable code. Be ready for some surprises. With this understanding and mindset, you will learn to recognize those warning signs that read slow code ahead. Then you can speed up your code, and write more efficient applications in the process. Bill Trudell is a software engineer currently employed by Fisher Rosemount Systems, Inc., a solutions provider for the process control industry. He has extensive experience in the design of real-time multitasking embedded and PC applications in various domains. Bill can be reached via at billtrudell@msn.com. REFERENCES 1. Kernighan, Brian W. and Dennis M. Ritchie. The C Programming Language, Second Edition. Englewood Cliffs, NJ: Prentice-Hall, Microsoft Visual C++, Version 4.2, Books On-line. Microsoft Corp., Redmond, WA. 62 EMBEDDED SYSTEMS PROGRAMMING OCTOBER 1997
Fixed-Point Math and Other Optimizations
Fixed-Point Math and Other Optimizations Embedded Systems 8-1 Fixed Point Math Why and How Floating point is too slow and integers truncate the data Floating point subroutines: slower than native, overhead
More informationComputer Systems A Programmer s Perspective 1 (Beta Draft)
Computer Systems A Programmer s Perspective 1 (Beta Draft) Randal E. Bryant David R. O Hallaron August 1, 2001 1 Copyright c 2001, R. E. Bryant, D. R. O Hallaron. All rights reserved. 2 Contents Preface
More informationAnnotation Annotation or block comments Provide high-level description and documentation of section of code More detail than simple comments
Variables, Data Types, and More Introduction In this lesson will introduce and study C annotation and comments C variables Identifiers C data types First thoughts on good coding style Declarations vs.
More informationChapter 5:: Target Machine Architecture (cont.)
Chapter 5:: Target Machine Architecture (cont.) Programming Language Pragmatics Michael L. Scott Review Describe the heap for dynamic memory allocation? What is scope and with most languages how what happens
More informationHeap Management. Heap Allocation
Heap Management Heap Allocation A very flexible storage allocation mechanism is heap allocation. Any number of data objects can be allocated and freed in a memory pool, called a heap. Heap allocation is
More informationIntroduction to C. Why C? Difference between Python and C C compiler stages Basic syntax in C
Final Review CS304 Introduction to C Why C? Difference between Python and C C compiler stages Basic syntax in C Pointers What is a pointer? declaration, &, dereference... Pointer & dynamic memory allocation
More informationNumerical Data. CS 180 Sunil Prabhakar Department of Computer Science Purdue University
Numerical Data CS 180 Sunil Prabhakar Department of Computer Science Purdue University Problem Write a program to compute the area and perimeter of a circle given its radius. Requires that we perform operations
More informationOptimisation. CS7GV3 Real-time Rendering
Optimisation CS7GV3 Real-time Rendering Introduction Talk about lower-level optimization Higher-level optimization is better algorithms Example: not using a spatial data structure vs. using one After that
More informationIn examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured
System Performance Analysis Introduction Performance Means many things to many people Important in any design Critical in real time systems 1 ns can mean the difference between system Doing job expected
More informationProgramming Style and Optimisations - An Overview
Programming Style and Optimisations - An Overview Summary In this lesson we introduce some of the style and optimization features you may find useful to understand as a C++ Programmer. Note however this
More informationDeallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection
Deallocation Mechanisms User-controlled Deallocation Allocating heap space is fairly easy. But how do we deallocate heap memory no longer in use? Sometimes we may never need to deallocate! If heaps objects
More informationCompiler Optimization
Compiler Optimization The compiler translates programs written in a high-level language to assembly language code Assembly language code is translated to object code by an assembler Object code modules
More information1 Motivation for Improving Matrix Multiplication
CS170 Spring 2007 Lecture 7 Feb 6 1 Motivation for Improving Matrix Multiplication Now we will just consider the best way to implement the usual algorithm for matrix multiplication, the one that take 2n
More informationWork relative to other classes
Work relative to other classes 1 Hours/week on projects 2 C BOOTCAMP DAY 1 CS3600, Northeastern University Slides adapted from Anandha Gopalan s CS132 course at Univ. of Pittsburgh Overview C: A language
More informationControl Flow. COMS W1007 Introduction to Computer Science. Christopher Conway 3 June 2003
Control Flow COMS W1007 Introduction to Computer Science Christopher Conway 3 June 2003 Overflow from Last Time: Why Types? Assembly code is typeless. You can take any 32 bits in memory, say this is an
More informationQUIZ. What is wrong with this code that uses default arguments?
QUIZ What is wrong with this code that uses default arguments? Solution The value of the default argument should be placed in either declaration or definition, not both! QUIZ What is wrong with this code
More informationOverview (4) CPE 101 mod/reusing slides from a UW course. Assignment Statement: Review. Why Study Expressions? D-1
CPE 101 mod/reusing slides from a UW course Overview (4) Lecture 4: Arithmetic Expressions Arithmetic expressions Integer and floating-point (double) types Unary and binary operators Precedence Associativity
More informationComputer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra
Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating
More informationAgenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1
Agenda CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Summer 2004 Java virtual machine architecture.class files Class loading Execution engines Interpreters & JITs various strategies
More informationA brief introduction to C++
A brief introduction to C++ Rupert Nash r.nash@epcc.ed.ac.uk 13 June 2018 1 References Bjarne Stroustrup, Programming: Principles and Practice Using C++ (2nd Ed.). Assumes very little but it s long Bjarne
More informationCIS133J. Working with Numbers in Java
CIS133J Working with Numbers in Java Contents: Using variables with integral numbers Using variables with floating point numbers How to declare integral variables How to declare floating point variables
More informationGroup B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction
Group B Assignment 8 Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Code optimization using DAG. 8.1.1 Problem Definition: Code optimization using DAG. 8.1.2 Perquisite: Lex, Yacc, Compiler
More informationCSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1
CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Winter 2008 3/11/2008 2002-08 Hal Perkins & UW CSE V-1 Agenda Java virtual machine architecture.class files Class loading Execution engines
More informationCSCE 5610: Computer Architecture
HW #1 1.3, 1.5, 1.9, 1.12 Due: Sept 12, 2018 Review: Execution time of a program Arithmetic Average, Weighted Arithmetic Average Geometric Mean Benchmarks, kernels and synthetic benchmarks Computing CPI
More informationAPPENDIX B. Fortran Hints
APPENDIX B Fortran Hints This appix contains hints on how to find errors in your programs, and how to avoid some common Fortran errors in the first place. The basics on how to invoke the Fortran compiler
More informationShort Notes of CS201
#includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system
More informationCS201 - Introduction to Programming Glossary By
CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with
More informationC++ Data Types. 1 Simple C++ Data Types 2. 3 Numeric Types Integers (whole numbers) Decimal Numbers... 5
C++ Data Types Contents 1 Simple C++ Data Types 2 2 Quick Note About Representations 3 3 Numeric Types 4 3.1 Integers (whole numbers)............................................ 4 3.2 Decimal Numbers.................................................
More informationComputers Programming Course 5. Iulian Năstac
Computers Programming Course 5 Iulian Năstac Recap from previous course Classification of the programming languages High level (Ada, Pascal, Fortran, etc.) programming languages with strong abstraction
More informationSOME ASSEMBLY REQUIRED
SOME ASSEMBLY REQUIRED Assembly Language Programming with the AVR Microcontroller TIMOTHY S. MARGUSH CRC Press Taylor & Francis Group CRC Press is an imprint of the Taylor & Francis Croup an Informa business
More informationData Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8
Data Representation At its most basic level, all digital information must reduce to 0s and 1s, which can be discussed as binary, octal, or hex data. There s no practical limit on how it can be interpreted
More informationRun time environment of a MIPS program
Run time environment of a MIPS program Stack pointer Frame pointer Temporary local variables Return address Saved argument registers beyond a0-a3 Low address Growth of stack High address A translation
More informationIntroduction to Programming Using Java (98-388)
Introduction to Programming Using Java (98-388) Understand Java fundamentals Describe the use of main in a Java application Signature of main, why it is static; how to consume an instance of your own class;
More informationInstruction Set extensions to X86. Floating Point SIMD instructions
Instruction Set extensions to X86 Some extensions to x86 instruction set intended to accelerate 3D graphics AMD 3D-Now! Instructions simply accelerate floating point arithmetic. Accelerate object transformations
More informationFunctions in C C Programming and Software Tools
Functions in C C Programming and Software Tools N.C. State Department of Computer Science Functions in C Functions are also called subroutines or procedures One part of a program calls (or invokes the
More informationMartin Kruliš, v
Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal
More informationCS 241 Honors Memory
CS 241 Honors Memory Ben Kurtovic Atul Sandur Bhuvan Venkatesh Brian Zhou Kevin Hong University of Illinois Urbana Champaign February 20, 2018 CS 241 Course Staff (UIUC) Memory February 20, 2018 1 / 35
More informationReal instruction set architectures. Part 2: a representative sample
Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length
More informationAuthor: Steve Gorman Title: Programming with the Intel architecture in the flat memory model
Author: Steve Gorman Title: Programming with the Intel architecture in the flat memory model Abstract: As the Intel architecture moves off the desktop into a variety of other computing applications, developers
More information4. Java Project Design, Input Methods
4-1 4. Java Project Design, Input Methods Review and Preview You should now be fairly comfortable with creating, compiling and running simple Java projects. In this class, we continue learning new Java
More informationHigh Performance Computing and Programming, Lecture 3
High Performance Computing and Programming, Lecture 3 Memory usage and some other things Ali Dorostkar Division of Scientific Computing, Department of Information Technology, Uppsala University, Sweden
More informationSection 1.1 Definitions and Properties
Section 1.1 Definitions and Properties Objectives In this section, you will learn to: To successfully complete this section, you need to understand: Abbreviate repeated addition using Exponents and Square
More informationC Language Programming
Experiment 2 C Language Programming During the infancy years of microprocessor based systems, programs were developed using assemblers and fused into the EPROMs. There used to be no mechanism to find what
More informationInstruction Sets: Characteristics and Functions Addressing Modes
Instruction Sets: Characteristics and Functions Addressing Modes Chapters 10 and 11, William Stallings Computer Organization and Architecture 7 th Edition What is an Instruction Set? The complete collection
More informationSemantic Analysis. CSE 307 Principles of Programming Languages Stony Brook University
Semantic Analysis CSE 307 Principles of Programming Languages Stony Brook University http://www.cs.stonybrook.edu/~cse307 1 Role of Semantic Analysis Syntax vs. Semantics: syntax concerns the form of a
More informationG Programming Languages - Fall 2012
G22.2110-003 Programming Languages - Fall 2012 Lecture 4 Thomas Wies New York University Review Last week Control Structures Selection Loops Adding Invariants Outline Subprograms Calling Sequences Parameter
More informationCS 11 C track: lecture 8
CS 11 C track: lecture 8 n Last week: hash tables, C preprocessor n This week: n Other integral types: short, long, unsigned n bitwise operators n switch n "fun" assignment: virtual machine Integral types
More informationPage 1. Stuff. Last Time. Today. Safety-Critical Systems MISRA-C. Terminology. Interrupts Inline assembly Intrinsics
Stuff Last Time Homework due next week Lab due two weeks from today Questions? Interrupts Inline assembly Intrinsics Today Safety-Critical Systems MISRA-C Subset of C language for critical systems System
More informationDivisibility Rules and Their Explanations
Divisibility Rules and Their Explanations Increase Your Number Sense These divisibility rules apply to determining the divisibility of a positive integer (1, 2, 3, ) by another positive integer or 0 (although
More informationComputer Components. Software{ User Programs. Operating System. Hardware
Computer Components Software{ User Programs Operating System Hardware What are Programs? Programs provide instructions for computers Similar to giving directions to a person who is trying to get from point
More informationHeap Arrays. Steven R. Bagley
Heap Arrays Steven R. Bagley Recap Data is stored in variables Can be accessed by the variable name Or in an array, accessed by name and index a[42] = 35; Variables and arrays have a type int, char, double,
More informationCS 31: Intro to Systems Binary Arithmetic. Martin Gagné Swarthmore College January 24, 2016
CS 31: Intro to Systems Binary Arithmetic Martin Gagné Swarthmore College January 24, 2016 Unsigned Integers Suppose we had one byte Can represent 2 8 (256) values If unsigned (strictly non-negative):
More information3. Simple Types, Variables, and Constants
3. Simple Types, Variables, and Constants This section of the lectures will look at simple containers in which you can storing single values in the programming language C++. You might find it interesting
More informationCSIS1120A. 10. Instruction Set & Addressing Mode. CSIS1120A 10. Instruction Set & Addressing Mode 1
CSIS1120A 10. Instruction Set & Addressing Mode CSIS1120A 10. Instruction Set & Addressing Mode 1 Elements of a Machine Instruction Operation Code specifies the operation to be performed, e.g. ADD, SUB
More informationCourse Outline Introduction to C-Programming
ECE3411 Fall 2015 Lecture 1a. Course Outline Introduction to C-Programming Marten van Dijk, Syed Kamran Haider Department of Electrical & Computer Engineering University of Connecticut Email: {vandijk,
More informationProblem with Scanning an Infix Expression
Operator Notation Consider the infix expression (X Y) + (W U), with parentheses added to make the evaluation order perfectly obvious. This is an arithmetic expression written in standard form, called infix
More informationCS102: Variables and Expressions
CS102: Variables and Expressions The topic of variables is one of the most important in C or any other high-level programming language. We will start with a simple example: int x; printf("the value of
More informationCS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More informationOptimising for the p690 memory system
Optimising for the p690 memory Introduction As with all performance optimisation it is important to understand what is limiting the performance of a code. The Power4 is a very powerful micro-processor
More informationTSEA44 - Design for FPGAs
2015-11-24 Now for something else... Adapting designs to FPGAs Why? Clock frequency Area Power Target FPGA architecture: Xilinx FPGAs with 4 input LUTs (such as Virtex-II) Determining the maximum frequency
More informationBits, Words, and Integers
Computer Science 52 Bits, Words, and Integers Spring Semester, 2017 In this document, we look at how bits are organized into meaningful data. In particular, we will see the details of how integers are
More informationLecture 2: SML Basics
15-150 Lecture 2: SML Basics Lecture by Dan Licata January 19, 2012 I d like to start off by talking about someone named Alfred North Whitehead. With someone named Bertrand Russell, Whitehead wrote Principia
More informationSection 0.3 The Order of Operations
Section 0.3 The Contents: Evaluating an Expression Grouping Symbols OPERATIONS The Distributive Property Answers Focus Exercises Let s be reminded of those operations seen thus far in the course: Operation
More informationCS 101, Mock Computer Architecture
CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically
More informationIntro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming
Intro to Programming Unit 7 Intro to Programming 1 What is Programming? 1. Programming Languages 2. Markup vs. Programming 1. Introduction 2. Print Statement 3. Strings 4. Types and Values 5. Math Externals
More information2/5/2018. Expressions are Used to Perform Calculations. ECE 220: Computer Systems & Programming. Our Class Focuses on Four Types of Operator in C
University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 220: Computer Systems & Programming Expressions and Operators in C (Partially a Review) Expressions are Used
More informationThe Design Process. General Development Issues. C/C++ and OO Rules of Thumb. Home
A l l e n I. H o l u b & A s s o c i a t e s Home C/C++ and OO Rules of Thumb The following list is essentially the table of contents for my book Enough Rope to Shoot Yourself in the Foot (McGraw-Hill,
More information1 Epic Test Review 2 Epic Test Review 3 Epic Test Review 4. Epic Test Review 5 Epic Test Review 6 Epic Test Review 7 Epic Test Review 8
Epic Test Review 1 Epic Test Review 2 Epic Test Review 3 Epic Test Review 4 Write a line of code that outputs the phase Hello World to the console without creating a new line character. System.out.print(
More informationEXAMINING THE CODE. 1. Examining the Design and Code 2. Formal Review: 3. Coding Standards and Guidelines: 4. Generic Code Review Checklist:
EXAMINING THE CODE CONTENTS I. Static White Box Testing II. 1. Examining the Design and Code 2. Formal Review: 3. Coding Standards and Guidelines: 4. Generic Code Review Checklist: Dynamic White Box Testing
More informationCS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11
CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11 CS 536 Spring 2015 1 Handling Overloaded Declarations Two approaches are popular: 1. Create a single symbol table
More informationChapter 13 Reduced Instruction Set Computers
Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining
More informationMemory Management. Kevin Webb Swarthmore College February 27, 2018
Memory Management Kevin Webb Swarthmore College February 27, 2018 Today s Goals Shifting topics: different process resource memory Motivate virtual memory, including what it might look like without it
More informationDynamic Control Hazard Avoidance
Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>
More informationRationale for TR Extension to the programming language C. Decimal Floating-Point Arithmetic
WG14 N1161 Rationale for TR 24732 Extension to the programming language C Decimal Floating-Point Arithmetic Contents 1 Introduction... 1 1.1 Background... 1 1.2 The Arithmetic Model... 3 1.3 The Encodings...
More informationCS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines
CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines Assigned April 7 Problem Set #5 Due April 21 http://inst.eecs.berkeley.edu/~cs152/sp09 The problem sets are intended
More informationBy, Ajinkya Karande Adarsh Yoga
By, Ajinkya Karande Adarsh Yoga Introduction Early computer designers believed saving computer time and memory were more important than programmer time. Bug in the divide algorithm used in Intel chips.
More informationLast Time. Low-level parts of the toolchain for embedded systems. Any weak link in the toolchain will hinder development
Last Time Low-level parts of the toolchain for embedded systems Ø Linkers Ø Programmers Ø Booting an embedded CPU Ø Debuggers Ø JTAG Any weak link in the toolchain will hinder development Today: Intro
More informationFundamental of Programming (C)
Borrowed from lecturer notes by Omid Jafarinezhad Fundamental of Programming (C) Lecturer: Vahid Khodabakhshi Lecture 3 Constants, Variables, Data Types, And Operations Department of Computer Engineering
More informationControl Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary
Control Instructions Computer Organization Architectures for Embedded Computing Thursday, 26 September 2013 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,
More informationControl Instructions
Control Instructions Tuesday 22 September 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class Instruction Set
More informationCOSC 2P95. Procedural Abstraction. Week 3. Brock University. Brock University (Week 3) Procedural Abstraction 1 / 26
COSC 2P95 Procedural Abstraction Week 3 Brock University Brock University (Week 3) Procedural Abstraction 1 / 26 Procedural Abstraction We ve already discussed how to arrange complex sets of actions (e.g.
More informationLecture 4 CSE July 1992
Lecture 4 CSE 110 6 July 1992 1 More Operators C has many operators. Some of them, like +, are binary, which means that they require two operands, as in 4 + 5. Others are unary, which means they require
More informationChapter 7 The Potential of Special-Purpose Hardware
Chapter 7 The Potential of Special-Purpose Hardware The preceding chapters have described various implementation methods and performance data for TIGRE. This chapter uses those data points to propose architecture
More informationImportant From Last Time
Important From Last Time Embedded C Pros and cons Macros and how to avoid them Intrinsics Interrupt syntax Inline assembly Today Advanced C What C programs mean How to create C programs that mean nothing
More informationCUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.
Schedule CUDA Digging further into the programming manual Application Programming Interface (API) text only part, sorry Image utilities (simple CUDA examples) Performace considerations Matrix multiplication
More informationFloating Point Considerations
Chapter 6 Floating Point Considerations In the early days of computing, floating point arithmetic capability was found only in mainframes and supercomputers. Although many microprocessors designed in the
More informationPerformance analysis basics
Performance analysis basics Christian Iwainsky Iwainsky@rz.rwth-aachen.de 25.3.2010 1 Overview 1. Motivation 2. Performance analysis basics 3. Measurement Techniques 2 Why bother with performance analysis
More informationC++ for System Developers with Design Pattern
C++ for System Developers with Design Pattern Introduction: This course introduces the C++ language for use on real time and embedded applications. The first part of the course focuses on the language
More informationComputer Components. Software{ User Programs. Operating System. Hardware
Computer Components Software{ User Programs Operating System Hardware What are Programs? Programs provide instructions for computers Similar to giving directions to a person who is trying to get from point
More informationComputer Organization & Assembly Language Programming
Computer Organization & Assembly Language Programming CSE 2312 Lecture 11 Introduction of Assembly Language 1 Assembly Language Translation The Assembly Language layer is implemented by translation rather
More informationSpecial Topics for Embedded Programming
1 Special Topics for Embedded Programming ETH Zurich Fall 2018 Reference: The C Programming Language by Kernighan & Ritchie 1 2 Overview of Topics Microprocessor architecture Peripherals Registers Memory
More informationContents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides
Slide Set 1 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section
More informationCPU Pipelining Issues
CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput
More informationFinding Firmware Defects Class T-18 Sean M. Beatty
Sean Beatty Sean Beatty is a Principal with High Impact Services in Indianapolis. He holds a BSEE from the University of Wisconsin - Milwaukee. Sean has worked in the embedded systems field since 1986,
More informationLanguage Basics. /* The NUMBER GAME - User tries to guess a number between 1 and 10 */ /* Generate a random number between 1 and 10 */
Overview Language Basics This chapter describes the basic elements of Rexx. It discusses the simple components that make up the language. These include script structure, elements of the language, operators,
More informationCS107 Handout 13 Spring 2008 April 18, 2008 Computer Architecture: Take II
CS107 Handout 13 Spring 2008 April 18, 2008 Computer Architecture: Take II Example: Simple variables Handout written by Julie Zelenski and Nick Parlante A variable is a location in memory. When a variable
More informationMidterm II CS164, Spring 2006
Midterm II CS164, Spring 2006 April 11, 2006 Please read all instructions (including these) carefully. Write your name, login, SID, and circle the section time. There are 10 pages in this exam and 4 questions,
More informationMajor Advances (continued)
CSCI 4717/5717 Computer Architecture Topic: RISC Processors Reading: Stallings, Chapter 13 Major Advances A number of advances have occurred since the von Neumann architecture was proposed: Family concept
More informationThe Role of Performance
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture The Role of Performance What is performance? A set of metrics that allow us to compare two different hardware
More informationLESSON 13: LANGUAGE TRANSLATION
LESSON 13: LANGUAGE TRANSLATION Objective Interpreters and Compilers. Language Translation Phases. Interpreters and Compilers A COMPILER is a program that translates a complete source program into machine
More information