JANGO DVM. User Specification. Alpha. Convex D I X I E. x Target. ISAs. Source. ISAs. User Simulator S P E E D Y. Mips Alpha.

Size: px
Start display at page:

Download "JANGO DVM. User Specification. Alpha. Convex D I X I E. x Target. ISAs. Source. ISAs. User Simulator S P E E D Y. Mips Alpha."

Transcription

1 Dixie: A Retargetable Binary Instrumentation Tool Manel Fernandez, Alex Ramrez, Silvia Cernuda and Roger Espasa Computer Architecture Dept. Universitat Politecnica de Catalunya-Barcelona fmfernand,aramirez,scernuda,rogerg@ac.upc.es December 16, 1998 Abstract Dixie is a toolset that enables computer architecture researchers to instrument and monitor all aspects of certain binary when run on its native environment. Dixie is even able to provide instructions on the \wrong path" of a program. Moreover, Dixie has cross-platform capabilities, allowing a researcher to trace a binary specied in ISA `A' on a workstation with a dierent ISA `B'. Not only that, but through its emulation capabilities, Dixie enables researchers to explore extensions to current ISAs and even to design, emulate and evaluate a completely new ISA. 1 Project Goal The Dixie projects seeks a a tool that allows exible instrumentation of program binaries to perform computer architecture research. The major features of the Dixie tool are listed below: Full architecture coverage The tool is able to provide at any point in time during the execution of the traced binary the value of all the architected registers of the machine, the value of all memory locations and the sequence of user instructions executed. High accuracy The tool does not distort in any way the virtual addresses generated by the program or the location of the program itself. That is, the tool accurately reproduces all instruction and data addresses generated during program execution as if the program was being executed in its native environment. Wrong Path Execution The tool is able to trace the sequence of instructions that follow a control ow mis-speculation. That is, on a wrong branch prediction, the tool provides the instructions being fetched, its associated register values and the potentially speculative memory contents. Multi-ISA A very important characteristic of Dixie is that it is easily retargetable. Currently it supports a RISC-style ISA (Alpha), a vector ISA (Convex) and is being ported to a CISC-style ISA (x86). The tool should also be able to work on user-dened ISAs, that is, ISAs that do not have real processors to run on. On-the-y trace processing The tool allows execution-driven simulators, to avoid storing huge trace les to disk. Cross-Platform execution Since the tool is based on emulation techniques, it can be used to execute and trace binaries of a certain architecture on top of a completely dierent architecture. Currently, the Convex binaries can be executed on an Alpha workstation. Atom-like interface Due to its widespread use, the user interface provided by the ATOM tool [8] has been chosen as the user interface for Dixie. 2 Project Non-Goals There are a few topics that Dixie does not try to address. These are: Operating System Traces At the current stage, there are no provisions to incorporate the ability to trace the operating system code to Dixie. If one desires such features, using tools like SimOS [7] or SimICS [6] is probably a better choice. 1

2 Self-modifying Code At the current stage, binaries that use this technique do not work under the Dixie emulation environment. 3 Dixie Overview Due to the variety of requirements stated in previous sections, Dixie is based on a combination of two main techniques: Binary translation, a technique to translate the object code of an application (binary le) from its native architecture into an equivalent object code for a new architecture. Instruction-set emulation, typically executing one instruction set (native ISA) in terms of another instruction set (host ISA). The need for binary emulation is obvious due to the requirement of providing wrong-path information. None of the current tools based on instrumenting a binary (as opposed to emulating it) can provide such information (for example, ATOM on Alpha, Pixie on Mips, etc.). Also, emulation greatly simplies the \accuracy" requirement also stated in section 1. When using emulation, it is easier to guarantee that the instruction and data addresses generated by the binary in its native environment are faithfully reproduced. The need for binary translation arises from the wide variety of instruction sets that Dixie tries to be able to instrument. Currently, Dixie can handle ISAs as dierent as the Alpha ISA (very RISC and simple), the Convex ISA (a vector instruction set with many CISCy features) and the x86 ISA (much more complex than any of the previous two). Instead of writing an emulator for each of these ISAs, the basic idea behind the Dixie project is to have a single instruction set, the Dixie instruction set, to which all binaries are translated. Then, this Dixie ISA is emulated, providing all the benets of emulation as stated in the previous paragraph. The complete Dixie toolset can be seen in gure 1. Next we describe each component in some depth: Dixie compiler The Dixie compiler can accept as input several binary formats and does an instruction by instruction translation from the source ISA into the Dixie ISA. This translation is typically a 1-to-N mapping, since the Dixie ISA is a simple, register-to-register ISA, while the source ISA's can be very complex. Therefore, it is expected that a single instruction from a source ISA can turn into several instructions in the Dixie ISA. For example, a string instruction in the x86 ISA translates into a loop of Dixie instructions. The Dixie instruction set has been kept as small as possible, placing the burden of the translation on the writer of the machine descriptions. In order to preserve the semantics of the original program, and to allow user transformations on the Dixie intermediate format, each instruction translated also carries with it a lot of extra semantic information. This extra information describes the original registers involved in the instruction, how eective address computation is carried out, etc. Jango Once the input binary has been translated into a Dixie binary, the user can proceed to instrument the intermediate binary. Using an interface very similar to that of ATOM, the user will supply a specication le and some instrumentation code. JANGO will analyze the dixie binary following the requests in the specication le and will insert special \breakpoints" in the Dixie code to collect the information required by the user. These breakpoints are in fact instructions that read a certain register and dump its contents through the output channel connected to the user simulator. The value generated by the DVM will be collected by stub routines (automatically generated by JANGO) that will call the instrumentation code provided by the user. A very important point in the design of JANGO is that the user never interacts with the Dixie intermediate ISA. That is, if a user is instrumenting an Alpha binary, the instrumentation le passed to JANGO describes the instrumentation of Alpha instructions, not Dixie instructions. In order to perform the instrumentation, JANGO understands both the semantics of the Alpha instructions and the semantics of the translation of each Alpha instruction into Dixie instructions. Speedy Once the Dixie binary has been generated and, possibly, instrumented, the user can optionally select to optimize it. The Speedy tool analyzes the Dixie binary at the basic block level and performs two basic types of optimizations: First, machine-independent optimizations, such as common subexpression elimination, constant folding, etc. Second, Speedy can select some of the basic blocks and translate them to native machine language, that is, to the ISA on which the 2

3 Alpha JANGO User Specification Convex x86... D I X I E Dixie ISA S P E E D Y Mips Alpha Target ISAs Source ISAs DVM (Dixie Virtual Machine) User Simulator 64-bit host Figure 1: The Dixie toolset. DVM itself will run. These translated basic blocks will be specially tagged, so that when the DVM loads the binary in memory realizes that a machine translation exists for them. During execution, if the DVM encounters one of these specially tagged basic blocks, it will jump to the native machine translation rather than interpreting the basic block instruction by instruction. Dixie virtual machine (DVM) Finally, when the input binary has been translated into a Dixie binary, instrumented if necessary and optimized if desired, the user runs it on the Dixie Virtual Machine (DVM). The virtual machine emulates the behavior of each Dixie instruction and performs all the necessary jacketing and translation for mapping native system calls into host system calls. As already mentioned, the DVM can work in two modes: either interpreting each Dixie instruction in turn or, when encountering a specially marked basic block, jumping into a sequence of host machine instructions that perform the operations specied in the basic block. Of course, the second method of operation is much more ecient than the instruction-by-instruction interpretation. The input user program will be run to completion, producing whatever results it should produce, and, meanwhile, the DVM will dump through its output channel all the information requested by the user. 4 Dixie ISA Highlights In this section we describe the main features of the Dixie ISA. For a full description, refer to [4]. As a reminder, we note that the general user of the Dixie toolset should never be exposed to using and/or knowing anything about the Dixie ISA. Only a developer 3

4 porting the toolset to a new ISA needs to be concerned with the inner workings of the Dixie ISA. The main goals of this intermediate ISA are simplicity, ease of interpretation and sucient expressive power to accommodate a wide variety of source ISAs. In particular, a variety of instructions has been provided to accommodate the following features: Big- and Little-endian memory architectures, 32-bit and 64-bit address spaces, Vector registers, with the associated vector length, vector stride and vector mask special purpose registers, Condition codes computation, to accommodate ISAs such as the x86, The only major limiting assumption made by the Dixie ISA is that the DVM will run on a host that provides oating point instructions compliant with the IEEE 754 standard. Therefore, emulating ISAs with dierent oating point formats would be quite dicult in the Dixie framework. Despite being possible, a developer would have to express the basic oating point operations of his native format (VAX, for example) in terms of Dixie integer instructions. Aside from being a very tedious and error-prone task, the performance of resulting binaries would be very slow. The main features of the Dixie ISA are summarized in the following list: Load-store architecture Dixie follows a strict loadstore architecture were all arithmetic operations are performed on registers and reading and writing from/to memory is always done through load/store instructions. Address space Dixie oers a 64-bit address space, with byte granularity that can be read either in little endian mode or in big endian mode. There are dierent load instructions for reading with the two dierent endianisms. Also, despite being a 64-bit architecture, there is the possibility to generate memory addresses that are only 32 bits in length. Again, using appropriate versions of the load/store instructions the eective address can be truncated to 32 bit. This greatly simplies emulation of 32-bit machines on 64-bit hosts. Single register le The Dixie architecture oers a single register le with registers, each register being 64-bits wide. There is no distinction between oating point and integer registers as is usually the case with most other instruction sets. The large number of registers is required to accommodate vector instructions. The register le can be viewed as a collection of vector registers with variable length. For example, when emulating a vector machine with eight 128-elements vector registers, the user can set aside 1024 of the general purpose register le to emulate those native vector registers. Thus, without strictly providing vector registers, the Dixie ISA can easily accommodate vector instructions. 128-bit instructions All Dixie instructions are 128- bit, with a single format, reserving 32 bits for opcode and the remaining bits to specify the operands. There are two source operands (15 and 64 bits respectively) and a destination operand (15 bits). The 64-bit source operand can be either a register (in which case only 15 bits are used) or a 64-bit constant (in two's complement). Vector support Coupled with the general purpose registers, the Dixie ISA oers a special addressing mode, known as increment mode, to support vector execution and to facilitate emulation of vector instructions. Imagine a simple instruction such as add r1,r300->r400. If the rst and last operands are marked as \increment mode" (yielding add i1,r300->i400) we have a vector instruction. This instruction indicates that registers 1 and 300 must be added and the result placed in register 400. Then, the operation must be repeated as many times as indicated in the VL register (the Vector Length register) incrementing each time the registers marked as \increment". That is, the instruction will be executed again, but this time registers 2 and 300 will be added and the result placed in register 401. To support vector memory operations, a vector stride register is also provided. Finally, masked operation is also provided using a special vector mask register designator. Data types Dixie oers the usual variety of data types. For xed point computation, signed and unsigned representations are provided coming in four data sizes: 8, 16, 32 and 64 bits. For oating point computations, IEEE 754 single and double precision are available. 4

5 5 Dixie Compiler The Dixie compiler accepts as input a particular binary format and performs an instruction by instruction translation from the source ISA into the Dixie ISA. Of course, for each source ISA, a dierent Dixie compiler is required. The tool set provides a convenient framework to easily build a particular Dixie compiler for a particular ISA. This framework includes a couple of dierent tools that allow the user to easily specify: (a) The parsing and semantics of each instruction in the ISA being analyzed and (b) the mapping of system calls between the input binary and the host machine on which the DVM will run. The idea is to provide concise and easy ways to specify the translation of the input ISA into the Dixie ISA. When porting the Dixie Compiler to a new ISA, the user is required to write three main pieces of code that precisely describe the new ISA environment: A binary le loader, that is, appropriate functions that allow reading and writing dierent sections of the input binary. An architecture description, written in a pseudo- C language that describes how instructions must be parsed (at the bit level), decoded and translated into Dixie instructions. The language in which this description is written is designed to help the user minimize porting time and also reduce the number of bugs introduced in the translation. An operating system description, which describes the correspondence between the input binary operating system and DixOS, the Dixie Operating System. This mapping between dierent operating systems is also written in a pseudo-c language, to facilitate translation. For the last two items, we provide two compilers that take the input descriptions and generate the appropriate C code. From these automatically generated code, plus the binary le loader plus some routines in the Dixie framework, the desired compiler can be readily constructed. This whole process is shown in Figure 2. 6 Jango Jango provides the interface to instrument the program binary. The user writes a piece of C code that describes, using a xed set of predened functions, how the binary has to be instrumented. For example, the user can specify that, just before each memory instruction, a certain user-dened function must be called and passed as a parameter the PC of the instruction and the eective memory address. Jango processes this request, locates all memory instruction in the binary, inserts \pseudo-breakpoints", and generates stub routines that will collect the PC and eective address generated by the pseudo-breakpoints and forward them to the user routine. When porting the Dixie toolset to a new ISA, the Jango toolset must be made aware of the particular characteristics of the new ISA. However, a big portion of the interface routines of Jango remain stable across dierent ISAs. Moreover, the general \architecture concepts" coded inside Jango (the concept of a register, of an eective address, of an instruction PC, etc.) are also very stable. Therefore, when porting Jango to a new ISA, minimal changes have to be made to its internal structure to accommodate these basic concepts to the new ISA. 7 Speedy The Speedy tool accepts as input a Dixie binary, analyzes the Dixie instructions, optimizes them and, optionally produces a translation of the Dixie instructions to a native machine language (currently, only Alpha is supported). The resulting native translation is stored inside the Dixie binary itself (the \fat binary" concept) and, in fact, a Dixie binary could carry inside multiple translations for dierent native ISAs. Speedy is composed of two main phases: a machineindependent optimizer and a code generator. The optimizer works at the Dixie machine language level and is independent of the target host on which the DVM will be running. With minimal information from the code generator, the optimizer performs register allocation of the Dixie virtual registers into real hardware registers. The optimizer also removes some redundant instructions inevitably introduced during the translation process and performs some simple common subexpression elimination. In the second phase, the code generator selects appropriate machine instructions to translate the Dixie instructions into some native machine instruction set. Code generation always works at the corresponding basic block level. Speedy takes a basic block, translates it, and stores in the Dixie binary the translated basic block and a \translation pointer". When the DVM loads the binary into memory, checks all the 5

6 Machine description (.md3) md2c compiler Decode (decode.c) Operating System description (.os) os2c compiler System calls (os-tab.c) Native file loader (.aout.c) Dixie library cc compiler Native binary Dixie compiler (dixiec) Dixie Binary Figure 2: Dixie Compiler Components. translation pointers available. For each translation pointer, the DVM associates the native machine language translation to the rst instruction of the basic block. Not all basic blocks can or must be translated. Therefore, at program load time, the DVM will see that some instructions have a pointer to machine language code and some others do not. As the program is emulated, the DVM will have two choices for executing an instruction: if the instruction does not have an associated machine translation, it will be interpreted. It the instruction does have a translation, the DVM will jump to it and resume after the machine code block is done. As expected, the native translations are much faster than interpretation. The interesting property of this hybrid translated/interpreted approach is that Speedy can be developed in a very gradual and smooth manner. Not all basic blocks need be translated and Speedy can potentially focus only on those that are really important in terms of execution time. This mixed strategy will also enable that, in the future, Speedy acts like a Just-in-Time compiler. 8 Dixie Virtual Machine After the original binary has been translated, instrumented using Jango and optimized using Speedy, it is run on the Dixie Virtual Machine. The virtual machine emulates the Dixie instruction set and maps all the operating system services required by the binary to the host operating system services. The internal operation of the DVM can be divided in three phases: initialization, program load and decode and emulation. In the initialization phase, all the relevant information is read from the binary. For each section in the binary that should be loaded in memory, the DVM computes the exact original address where the section would have been loaded if run on its original native machine. Then the DVM mmap's that section in memory and loads it with the initial values in the binary le. The idea is that all memory sections will be located at exactly the same address as in the original binary, thus eliminating the need of translating (relocating) each emulated memory access. Of course, this also implies that the DVM must be ready to relocate itself if it conicts with any of the 6

7 memory segments of the binary to be emulated. In the second phase, each instruction in the binary is read and predecoded into a format more suitable for interpretation. In essence, the instruction is translated into a structure that contains pointers to its operands, pointers to the destination register(s) and a pointer to a block of code that implements its semantics. This block of code, as already mentioned, can be either a C routine that interprets the instruction or a sequence of native machine instructions. For instrumentation purposes, each structure for each Dixie instruction will also contain the PC of the native instruction that it relates to. The third phase is the proper execution of the program. The DVM sets up two global variables, IR and PC, that contain, respectively, a pointer to the structure describing the current instruction and the PC of the current instruction. The execution loop accesses the structure pointed by IR, and jumps to its semantic code (either a C function or a block of native instructions). When the code returns, the DVM increments the PC (which, by the way, might have been modi- ed by the instruction just emulated), computes the corresponding IR and starts over. 9 Design Wins The initial thrust behind the Dixie project was the need of our research group to have execution traces of a vector processor. Since then, the project has been expanded to solve other issues that have become more important. In particular, Dixie addresses several needs of our research group. First, following our initial design goal, Dixie allows research on vector machines. There are very few, if any, available tools for tracing programs on a vector machine. The Cray systems provide an instruction tracing tool that, unfortunately, is not able to provide the exact values of the vector length and vector stride used in each vector instruction. Our group did actually develop a pixie-like tool to do tracing on a vector machine. This primitive tool was targeted to a Convex vector machine [3] and later became the ancestor of the Dixie tool described in this paper (see appendix A). Second, Dixie is able to do cross platform instrumentation and analysis. Closely related with the previous item, there is the question of machine availability to run experiments. Not every research group and organization has access to the same hardware, especially when this hardware is a multi-million dollar vector machine. Therefore, Dixie has been designed to allow researchers analyze any type of binary on their platform of choice. That is, a researcher needs only some limited access to a certain machine, say, a Cray T90, to generate the binaries of the benchmarks he is interested in. Then, he transfers the binaries to his workstation of choice and there, and not in the original T90 machine, he emulates the binaries, generates on-the-y traces and feeds his favorite simulator. This cross-platform ability is the direct cause of choosing instruction translation and emulation (through the DVM) as the main technology on which Dixie is based. Third, Dixie provides both register contents information as well as wrong path information. The latest generation of simulators/emulators for computer architecture research provide these capabilities (see Shade [2] and SimpleScalar [1]). Our previous tracing tool for the Convex machine did provided some register information, but only for the vector length and vector stride registers. Dixie is able to provide the contents of any register at any point during the execution of a program (even if it's a vector register). Also, Dixie is able to analyze and provide information when the processor mispredicts a branch and enters in mis-speculative mode. Fourth, Dixie is highly accurate, that is, faithfully reproduce the exact sequence of instruction addresses, data addresses and register values generated by the original binary when run on its original native environment. Fifth, Dixie allows research on new instruction set architectures. For a long period of time, computer architecture research has stayed away from studies on instruction set design. The reason was that after the consolidation of the RISC architectures and the absolute market dominance of the x86 line, little value was perceived on trying to push new instruction sets. However, at the present time we face two new domains that can benet from new instruction set designs. First, virtual machines are gaining wide acceptance due to the Java impulse [5]. However, there has been little research on high performance software instruction sets for virtual machines. Dixie represents an excellent opportunity to explore this new domain. Second, the computer architecture eld is slowly recognizing the importance of the embedded microprocessor market. Studies on high performance designs that have low power consumption are gaining interest in the community. The exibility of Dixie will allow studying embedded instruction sets (which have little support and tracing tools) as well as small extensions to existing instruction sets. 7

8 10 Current Status We nally proceed to describe the current status of the Dixie project. We have distinguished three dierent pieces in the toolset: First, the status of the basic software tools. Second the status of the optimizer and, nally, the status of the machine descriptions describing dierent input ISAs. Base Software Components: Dixie, DVM and Jango The Dixie compiler and DVM are completed and run on both Alpha and Mips 64-bit environments. We have completed tests running both 32- and 64-bit binaries and both big- and little-endian binaries on top of the DVM. Figure 3 shows the speed at which Alpha and Convex binaries run on top of the DVM on an AlphaStation 600 5/266 with Digital Unix OSF/1 4.0, revision 5.64 (This workstation is equipped with an Alpha microprocessor running at 266Mhz). The rst thing to note is that these results are without any optimization at all. That is, Speedy has not processed these binaries in any way. The results are clearly too slow. However, we expect that, once Speedy is operational, compiling basic blocks to native language will reduce the slowdown to a maximum factor of 5 to 10. We also note that the dierence between the Convex and Alpha results are mainly due to the endianism change. While Alpha is little endian, Convex is big endian. Therefore, when running a Convex binary on the DVM running on an Alpha chip, each load and store must be converted back and forth between endianisms. Finally, table 1 summarizes the legal combinations of input ISAs and possible hosts on which the DVM can be run. As it can bee seen, the major current restriction is that the DVM host must have the same or more virtual address bits as the input ISA. Currently, the DVM is limited to 64-bit hosts, but can emulate any ISA (32 or 64 bit, big or little endian). The Jango component has not been implemented yet. Only its initial design has been completed, and some features required by Jango have been incorporated into the Dixie Compiler. Optimizer: Speedy Implementation of the Speedy optimizer has started but is still far from completion. Currently, Speedy performs register allocation, eliminates NOP instructions and some PC-relative computations and generates code for a subset of all possible instructions. In particular, Speedy can translate basic blocks that contain load/store instructions and add/sub instructions. If a basic block contains any instruction other than the above mentioned, Speedy ignores the basic block and does not translate it. ISA Descriptions Three dierent ISA descriptions are currently under way: Alpha, Convex and x86. Alpha and Convex are completed and are in the testing stage. As shown in gure 3, a couple of Spec95 programs already run correctly to completion. The x86 description is in early stage of development. It can already run programs up to the main() routine, and also runs the classic Hello World! code. References [1] D. Burger, T. Austin, and S. Bennett. Evaluating Future Microprocessors: the SimpleScalar Tool Set. Technical Report CS-TR , Computer Science Department. University of Wisconsin- Madison., [2] R. F. Cmelik and D. Keppel. Shade: A Fast Instruction-set Simulator for Execution Proling. In Proceedings of the '94 ACM SIGMETRICS Conference, pages 128{137, May [3] R. Espasa and X. Martorell. Dixie: a trace generation system for the C3480. Technical Report CEPBA-RR-94-08, Universitat Politecnica de Catalunya, [4] M. Fernandez and R. Espasa. Dixie Architecture Reference Manual: Version 1.0. UPC-CEPBA, rst edition, September [5] T. Lindholm and F. Yellin. The Java Virtual Machine Specication. Addison-Wesley, Masachusetts, September The Java Series. [6] P. S. Magnusson and B. Werner. Ecient Memory Simulation in SimICS. In 28th Annual Simulation Symposium, April [7] M. Rosenblum, S. Herrod, E. Witchell, and A. Gupta. Complete Computer System Simulation: the SimOS Approach. IEEE Parallel and Distributed Technology,

9 Slowdown Alpha Convex 50 0 compress95 go95 hanoi life qsort Figure 3: Slowdown of Alpha and Convex binaries when run on top of the DVM on an AlphaStation/266Mhz. Slowdown is computed with respect to the execution time of the native Alpha binary when run on the Alpha workstation. For example, the compress program compiled for a Convex machine runs 80 times slower on the DVM than the compress program run native on the alpha workstation. [8] A. Srivastava and A. Eustace. ATOM a system for building customized program analysis tools. In Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation, pages 196{205, Orlando, Florida, June 20{ 24, SIGPLAN Notices, 29(6), June A History and Credits The rst major version of Dixie, referred generically as Dixie-II, was a pixie-like tool that allowed collecting traces on a Convex C34 machine and later processing them on a Unix workstation. The traces obviously included memory and instruction addresses, but also included every value stored in the vector length register and the vector stride register. The ability to know the exact contents of the vector length and vector stride registers at any point in time greatly increased the accuracy of our original tool over tools available in other vector machines (such as Cray, for example). Dixie-II was written by Roger Espasa starting o from a piece of code that parsed Convex instructions written by Xavier Martorell. Dixie-II was later ported to the newest Convex machine, the C4 series, yielding what is known as Dixie- III. This port was far from being trivial, since the C4 series had almost included a completely new instruction set, with more scalar registers, more vector registers, many new opcodes, and a completely new instruction format. Indeed, the C4 machines can be viewed as having two instruction sets: the format7 set, which includes all instructions up to the C3, and the format8 instructions, which includes one instruction per format7 instruction plus many new opcodes. The Dixie-III port was done by Francisca Quintana. In parallel with the development of Dixie-III, a new project was started to look at portable binary instrumentation tools. The rst attempt was to write a mini-compiler that would ease the task of writing binary instrumentation tools. This tool was developed as a proof of concept that the dierences between ISAs could be relatively easily described using a specialpurpose language. The tool was targeted to the Cray ISA and was implemented by Francisco-Javier Martin as his nal year undergraduate project under the direction of Roger Espasa. This tool, although never fully operational on the Cray, was known as Dixie-IV. With the lessons learned in the development of Dixie-IV, the Dixie-V project was started. Dixie-V is the tool described in this paper. Although externally we simply use the name \Dixie", internally we use the \Dixie-V" nomenclature. The major departure from the previous tools was that Dixie-V had to use binary emulation. This was motivated because, over time, all major vector machines that we had access to had been either replaced or simply unplugged. Thus we were running out of machines on which to compile and run our benchmarks. A binary emulator would solve this problem by allowing running the 9

10 Input DVM runs on ISA 32-bit 64-bit This combination is not currently supported. Yet, it 32-bit, Big endian, poses no signicant challenges and could be ready as soon 32-bit, Little endian as a port of the DVM to a 32 bit machine is done. running This combination implies a major rewrite of the DVM. Many elds in the internal structures of the DVM are declared as 64-bit variables and are expected to be 64 bits. Moreover, an application running on a 64 bit system 64-bit, Big endian, might have OS dependencies that are simply impossible 64-bit, Little endian to hide. For example, if the application is dealing with running les having osets larger than 64 bits, the underlying 32- bit OS might not be able to process certain system calls. Table 1: Combination of ISAs currently supported by Dixie. original binaries on any Unix workstation. Once binary emulation was decided, it was clear that binary translation would be a great plus. Instead of writing a virtual machine per vector ISA we were interested in, we would write a few binary translators (which Dixie-IV proved we knew how to do) that would translate any ISA into our Dixie ISA. Then, with a single virtual machine, we could emulate multiple instruction sets. We started focusing on two ISAs: Convex and Alpha. Convex was our real motivation for programming the tool, yet Alpha, with the availability of ATOM, was the ideal candidate to debug and train our set of tools. Alex Ramirez wrote the ISA-compiler (the Dixie Compiler as described in this paper) and wrote a signicant portion of the Alpha machine description. The rst DVM implementation was also written by Alex. Six months later, Manel Fernandez took the job and added all the OS-to-OS mapping capabilities to Dixie. Manel also greatly improved the Dixie compiler and completed the Alpha and Convex machine descriptions. Recently a port to the x86 architecture has been started by Silvia Cernuda, as her nal year undergraduate project. 10

An Assembler for the MSSP Distiller Eric Zimmerman University of Illinois, Urbana Champaign

An Assembler for the MSSP Distiller Eric Zimmerman University of Illinois, Urbana Champaign An Assembler for the MSSP Distiller Eric Zimmerman University of Illinois, Urbana Champaign Abstract It is important to have a means of manually testing a potential optimization before laboring to fully

More information

The members of the Committee approve the thesis of Baosheng Cai defended on March David B. Whalley Professor Directing Thesis Xin Yuan Commit

The members of the Committee approve the thesis of Baosheng Cai defended on March David B. Whalley Professor Directing Thesis Xin Yuan Commit THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES COMPILER MODIFICATIONS TO SUPPORT INTERACTIVE COMPILATION By BAOSHENG CAI A Thesis submitted to the Department of Computer Science in partial fulllment

More information

RISC Principles. Introduction

RISC Principles. Introduction 3 RISC Principles In the last chapter, we presented many details on the processor design space as well as the CISC and RISC architectures. It is time we consolidated our discussion to give details of RISC

More information

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the Evolution of ISAs Instruction set architectures have changed over computer generations with changes in the cost of the hardware density of the hardware design philosophy potential performance gains One

More information

Lecture 4: Instruction Set Architecture

Lecture 4: Instruction Set Architecture Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Design of CPU Simulation Software for ARMv7 Instruction Set Architecture

Design of CPU Simulation Software for ARMv7 Instruction Set Architecture Design of CPU Simulation Software for ARMv7 Instruction Set Architecture Author: Dillon Tellier Advisor: Dr. Christopher Lupo Date: June 2014 1 INTRODUCTION Simulations have long been a part of the engineering

More information

Pipelining, Branch Prediction, Trends

Pipelining, Branch Prediction, Trends Pipelining, Branch Prediction, Trends 10.1-10.4 Topics 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath Branch Prediction, Delay Slots 10.4 Overlapping

More information

JOHN GUSTAF HOLM. B.S.E., University of Michigan, 1989 THESIS. Submitted in partial fulllment of the requirements.

JOHN GUSTAF HOLM. B.S.E., University of Michigan, 1989 THESIS. Submitted in partial fulllment of the requirements. EVALUATION OF SOME SUPERSCALAR AND VLIW PROCESSOR DESIGNS BY JOHN GUSTAF HOLM B.S.E., University of Michigan, 989 THESIS Submitted in partial fulllment of the requirements for the degree of Master of Science

More information

For our next chapter, we will discuss the emulation process which is an integral part of virtual machines.

For our next chapter, we will discuss the emulation process which is an integral part of virtual machines. For our next chapter, we will discuss the emulation process which is an integral part of virtual machines. 1 2 For today s lecture, we ll start by defining what we mean by emulation. Specifically, in this

More information

Technische Universitat Munchen. Institut fur Informatik. D Munchen.

Technische Universitat Munchen. Institut fur Informatik. D Munchen. Developing Applications for Multicomputer Systems on Workstation Clusters Georg Stellner, Arndt Bode, Stefan Lamberts and Thomas Ludwig? Technische Universitat Munchen Institut fur Informatik Lehrstuhl

More information

Page 1. Structure of von Nuemann machine. Instruction Set - the type of Instructions

Page 1. Structure of von Nuemann machine. Instruction Set - the type of Instructions Structure of von Nuemann machine Arithmetic and Logic Unit Input Output Equipment Main Memory Program Control Unit 1 1 Instruction Set - the type of Instructions Arithmetic + Logical (ADD, SUB, MULT, DIV,

More information

Chapter 5. A Closer Look at Instruction Set Architectures

Chapter 5. A Closer Look at Instruction Set Architectures Chapter 5 A Closer Look at Instruction Set Architectures Chapter 5 Objectives Understand the factors involved in instruction set architecture design. Gain familiarity with memory addressing modes. Understand

More information

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals Performance Evaluations of a Multithreaded Java Microcontroller J. Kreuzinger, M. Pfeer A. Schulz, Th. Ungerer Institute for Computer Design and Fault Tolerance University of Karlsruhe, Germany U. Brinkschulte,

More information

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 199 5.2 Instruction Formats 199 5.2.1 Design Decisions for Instruction Sets 200 5.2.2 Little versus Big Endian 201 5.2.3 Internal

More information

Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX

Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX Keerthi Bhushan Rajesh K Chaurasia Hewlett-Packard India Software Operations 29, Cunningham Road Bangalore 560 052 India +91-80-2251554

More information

Comparison Checking. Dept. of Computer Science. Chatham College. Pittsburgh, USA. Rajiv Gupta. University of Arizona. Tucson, USA

Comparison Checking. Dept. of Computer Science. Chatham College. Pittsburgh, USA. Rajiv Gupta. University of Arizona. Tucson, USA Verifying Optimizers through Comparison Checking Clara Jaramillo Dept. of Computer Science Chatham College Pittsburgh, USA Rajiv Gupta Dept. of Computer Science University of Arizona Tucson, USA Mary Lou

More information

Understand the factors involved in instruction set

Understand the factors involved in instruction set A Closer Look at Instruction Set Architectures Objectives Understand the factors involved in instruction set architecture design. Look at different instruction formats, operand types, and memory access

More information

VM instruction formats. Bytecode translator

VM instruction formats. Bytecode translator Implementing an Ecient Java Interpreter David Gregg 1, M. Anton Ertl 2 and Andreas Krall 2 1 Department of Computer Science, Trinity College, Dublin 2, Ireland. David.Gregg@cs.tcd.ie 2 Institut fur Computersprachen,

More information

A Mechanism for Verifying Data Speculation

A Mechanism for Verifying Data Speculation A Mechanism for Verifying Data Speculation Enric Morancho, José María Llabería, and Àngel Olivé Computer Architecture Department, Universitat Politècnica de Catalunya (Spain), {enricm, llaberia, angel}@ac.upc.es

More information

Chapter 2. Instruction Set Principles and Examples. In-Cheol Park Dept. of EE, KAIST

Chapter 2. Instruction Set Principles and Examples. In-Cheol Park Dept. of EE, KAIST Chapter 2. Instruction Set Principles and Examples In-Cheol Park Dept. of EE, KAIST Stack architecture( 0-address ) operands are on the top of the stack Accumulator architecture( 1-address ) one operand

More information

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate

More information

Interaction of JVM with x86, Sparc and MIPS

Interaction of JVM with x86, Sparc and MIPS Interaction of JVM with x86, Sparc and MIPS Sasikanth Avancha, Dipanjan Chakraborty, Dhiral Gada, Tapan Kamdar {savanc1, dchakr1, dgada1, kamdar}@cs.umbc.edu Department of Computer Science and Electrical

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

ECE 486/586. Computer Architecture. Lecture # 7

ECE 486/586. Computer Architecture. Lecture # 7 ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix

More information

Parallel Algorithms for the Third Extension of the Sieve of Eratosthenes. Todd A. Whittaker Ohio State University

Parallel Algorithms for the Third Extension of the Sieve of Eratosthenes. Todd A. Whittaker Ohio State University Parallel Algorithms for the Third Extension of the Sieve of Eratosthenes Todd A. Whittaker Ohio State University whittake@cis.ohio-state.edu Kathy J. Liszka The University of Akron liszka@computer.org

More information

ARM Simulation using C++ and Multithreading

ARM Simulation using C++ and Multithreading International Journal of Innovative Technology and Exploring Engineering (IJITEE) ARM Simulation using C++ and Multithreading Suresh Babu S, Channabasappa Baligar Abstract: - This project is to be produced

More information

residual residual program final result

residual residual program final result C-Mix: Making Easily Maintainable C-Programs run FAST The C-Mix Group, DIKU, University of Copenhagen Abstract C-Mix is a tool based on state-of-the-art technology that solves the dilemma of whether to

More information

Chapter 13. The ISA of a simplified DLX Why use abstractions?

Chapter 13. The ISA of a simplified DLX Why use abstractions? Chapter 13 The ISA of a simplified DLX In this chapter we describe a specification of a simple microprocessor called the simplified DLX. The specification is called an instruction set architecture (ISA).

More information

Stack Memory. item (16-bit) to be pushed. item (16-bit) most recent

Stack Memory. item (16-bit) to be pushed. item (16-bit) most recent CSE 378 Winter 1998 Machine Organization and Assembly Language Programming Midterm Friday February 13th NAME : Do all your work on these pages. Do not add any pages. Use back pages if necessary. Show your

More information

Global Scheduler. Global Issue. Global Retire

Global Scheduler. Global Issue. Global Retire The Delft-Java Engine: An Introduction C. John Glossner 1;2 and Stamatis Vassiliadis 2 1 Lucent / Bell Labs, Allentown, Pa. 2 Delft University oftechnology, Department of Electrical Engineering Delft,

More information

More advanced CPUs. August 4, Howard Huang 1

More advanced CPUs. August 4, Howard Huang 1 More advanced CPUs In the last two weeks we presented the design of a basic processor. The datapath performs operations on register and memory data. A control unit translates program instructions into

More information

Outline. What Makes a Good ISA? Programmability. Implementability. Programmability Easy to express programs efficiently?

Outline. What Makes a Good ISA? Programmability. Implementability. Programmability Easy to express programs efficiently? Outline Instruction Sets in General MIPS Assembly Programming Other Instruction Sets Goals of ISA Design RISC vs. CISC Intel x86 (IA-32) What Makes a Good ISA? Programmability Easy to express programs

More information

(Refer Slide Time: 1:40)

(Refer Slide Time: 1:40) Computer Architecture Prof. Anshul Kumar Department of Computer Science and Engineering, Indian Institute of Technology, Delhi Lecture - 3 Instruction Set Architecture - 1 Today I will start discussion

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

Draft. Debugging of Optimized Code through. Comparison Checking. Clara Jaramillo, Rajiv Gupta and Mary Lou Soa. Abstract

Draft. Debugging of Optimized Code through. Comparison Checking. Clara Jaramillo, Rajiv Gupta and Mary Lou Soa. Abstract Draft Debugging of Optimized Code through Comparison Checking Clara Jaramillo, Rajiv Gupta and Mary Lou Soa Abstract We present a new approach to the debugging of optimized code through comparison checking.

More information

The Impact of Instruction Compression on I-cache Performance

The Impact of Instruction Compression on I-cache Performance Technical Report CSE-TR--97, University of Michigan The Impact of Instruction Compression on I-cache Performance I-Cheng K. Chen Peter L. Bird Trevor Mudge EECS Department University of Michigan {icheng,pbird,tnm}@eecs.umich.edu

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

Reimplementation of the Random Forest Algorithm

Reimplementation of the Random Forest Algorithm Parallel Numerics '05, 119-125 M. Vajter²ic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 5: Optimization and Classication ISBN 961-6303-67-8 Reimplementation of the Random Forest Algorithm Goran Topi,

More information

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Virtual Machines and Dynamic Translation: Implementing ISAs in Software Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application

More information

System-Level Specication of Instruction Sets. Department of Electrical and Computer Engineering. Department of Computer Science

System-Level Specication of Instruction Sets. Department of Electrical and Computer Engineering. Department of Computer Science System-Level Specication of Instruction Sets Todd A. Cook Paul D. Franzon Ed A. Harcourt y Thomas K. Miller III Department of Electrical and Computer Engineering y Department of Computer Science North

More information

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation Computer Organization CS 231-01 Data Representation Dr. William H. Robinson November 12, 2004 Topics Power tends to corrupt; absolute power corrupts absolutely. Lord Acton British historian, late 19 th

More information

Instruction Set Principles and Examples. Appendix B

Instruction Set Principles and Examples. Appendix B Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of

More information

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University. Instructions: ti Language of the Computer Rui Wang, Assistant professor Dept. of Information and Communication Tongji University it Email: ruiwang@tongji.edu.cn Computer Hierarchy Levels Language understood

More information

Outline. What Makes a Good ISA? Programmability. Implementability

Outline. What Makes a Good ISA? Programmability. Implementability Outline Instruction Sets in General MIPS Assembly Programming Other Instruction Sets Goals of ISA Design RISC vs. CISC Intel x86 (IA-32) What Makes a Good ISA? Programmability Easy to express programs

More information

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 293 5.2 Instruction Formats 293 5.2.1 Design Decisions for Instruction Sets 294 5.2.2 Little versus Big Endian 295 5.2.3 Internal

More information

A Parametric View of Retargetable. Register Allocation. January 24, Abstract

A Parametric View of Retargetable. Register Allocation. January 24, Abstract A Parametric View of Retargetable Register Allocation Kelvin S. Bryant Jon Mauney ksb@cs.umd.edu mauney@csc.ncsu.edu Dept. of Computer Science Dept. of Computer Science Univ. of Maryland, College Park,

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Arithmetic Unit 10032011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Chapter 3 Number Systems Fixed Point

More information

Lecture 4: Instruction Set Design/Pipelining

Lecture 4: Instruction Set Design/Pipelining Lecture 4: Instruction Set Design/Pipelining Instruction set design (Sections 2.9-2.12) control instructions instruction encoding Basic pipelining implementation (Section A.1) 1 Control Transfer Instructions

More information

Techniques described here for one can sometimes be used for the other.

Techniques described here for one can sometimes be used for the other. 01-1 Simulation and Instrumentation 01-1 Purpose and Overview Instrumentation: A facility used to determine what an actual system is doing. Simulation: A facility used to determine what a specified system

More information

Real instruction set architectures. Part 2: a representative sample

Real instruction set architectures. Part 2: a representative sample Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length

More information

REDUCED INSTRUCTION SET COMPUTERS (RISC)

REDUCED INSTRUCTION SET COMPUTERS (RISC) Datorarkitektur Fö 5/6-1 Datorarkitektur Fö 5/6-2 What are RISCs and why do we need them? REDUCED INSTRUCTION SET COMPUTERS (RISC) RISC architectures represent an important innovation in the area of computer

More information

Cpu Architectures Using Fixed Length Instruction Formats

Cpu Architectures Using Fixed Length Instruction Formats Cpu Architectures Using Fixed Length Instruction Formats Fixed-length instructions (RISC's). + allow easy fetch Load-store architectures. can do: add r1=r2+r3 What would be a good thing about having many

More information

Review Questions. 1 The DRAM problem [5 points] Suggest a solution. 2 Big versus Little Endian Addressing [5 points]

Review Questions. 1 The DRAM problem [5 points] Suggest a solution. 2 Big versus Little Endian Addressing [5 points] Review Questions 1 The DRAM problem [5 points] Suggest a solution 2 Big versus Little Endian Addressing [5 points] Consider the 32-bit hexadecimal number 0x21d3ea7d. 1. What is the binary representation

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No # 09 Lecture No # 40 This is lecture forty of the course on

More information

Instruction Set Architecture

Instruction Set Architecture Instruction Set Architecture Instructor: Preetam Ghosh Preetam.ghosh@usm.edu CSC 626/726 Preetam Ghosh Language HLL : High Level Language Program written by Programming language like C, C++, Java. Sentence

More information

T Jarkko Turkulainen, F-Secure Corporation

T Jarkko Turkulainen, F-Secure Corporation T-110.6220 2010 Emulators and disassemblers Jarkko Turkulainen, F-Secure Corporation Agenda Disassemblers What is disassembly? What makes up an instruction? How disassemblers work Use of disassembly In

More information

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine Machine Language Instructions Introduction Instructions Words of a language understood by machine Instruction set Vocabulary of the machine Current goal: to relate a high level language to instruction

More information

Generating Continuation Passing Style Code for the Co-op Language

Generating Continuation Passing Style Code for the Co-op Language Generating Continuation Passing Style Code for the Co-op Language Mark Laarakkers University of Twente Faculty: Computer Science Chair: Software engineering Graduation committee: dr.ing. C.M. Bockisch

More information

CS146 Computer Architecture. Fall Midterm Exam

CS146 Computer Architecture. Fall Midterm Exam CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state

More information

Algorithmic "imperative" language

Algorithmic imperative language Algorithmic "imperative" language Undergraduate years Epita November 2014 The aim of this document is to introduce breiy the "imperative algorithmic" language used in the courses and tutorials during the

More information

However, no results are published that indicate the applicability for cycle-accurate simulation purposes. The language RADL [12] is derived from earli

However, no results are published that indicate the applicability for cycle-accurate simulation purposes. The language RADL [12] is derived from earli Retargeting of Compiled Simulators for Digital Signal Processors Using a Machine Description Language Stefan Pees, Andreas Homann, Heinrich Meyr Integrated Signal Processing Systems, RWTH Aachen pees[homann,meyr]@ert.rwth-aachen.de

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Chapter 8 Pipelining and Vector Processing 8 1 If the pipeline stages are heterogeneous, the slowest stage determines the flow rate of the entire pipeline. This leads to other stages idling. 8 2 Pipeline

More information

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats Chapter 5 Objectives Understand the factors involved in instruction set architecture design. Chapter 5 A Closer Look at Instruction Set Architectures Gain familiarity with memory addressing modes. Understand

More information

THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION

THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION Radu Balaban Computer Science student, Technical University of Cluj Napoca, Romania horizon3d@yahoo.com Horea Hopârtean Computer Science student,

More information

TR-CS The rsync algorithm. Andrew Tridgell and Paul Mackerras. June 1996

TR-CS The rsync algorithm. Andrew Tridgell and Paul Mackerras. June 1996 TR-CS-96-05 The rsync algorithm Andrew Tridgell and Paul Mackerras June 1996 Joint Computer Science Technical Report Series Department of Computer Science Faculty of Engineering and Information Technology

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

Memory hierarchy. 1. Module structure. 2. Basic cache memory. J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas

Memory hierarchy. 1. Module structure. 2. Basic cache memory. J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Memory hierarchy J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Computer Architecture ARCOS Group Computer Science and Engineering Department University Carlos III of Madrid

More information

Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors

Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors Dan Nicolaescu Alex Veidenbaum Alex Nicolau Dept. of Information and Computer Science University of California at Irvine

More information

Chapter 5. A Closer Look at Instruction Set Architectures

Chapter 5. A Closer Look at Instruction Set Architectures Chapter 5 A Closer Look at Instruction Set Architectures Chapter 5 Objectives Understand the factors involved in instruction set architecture design. Gain familiarity with memory addressing modes. Understand

More information

ABCDE. HP Part No Printed in U.S.A U0989

ABCDE. HP Part No Printed in U.S.A U0989 Switch Programing Guide HP 3000 Computer Systems ABCDE HP Part No. 32650-90014 Printed in U.S.A. 19890901 U0989 The information contained in this document is subject to change without notice. HEWLETT-PACKARD

More information

[07] SEGMENTATION 1. 1

[07] SEGMENTATION 1. 1 [07] SEGMENTATION 1. 1 OUTLINE Segmentation An Alternative to Paging Implementing Segments Segment Table Lookup Algorithm Protection and Sharing Sharing Subtleties External Fragmentation Segmentation vs

More information

High-Level Language VMs

High-Level Language VMs High-Level Language VMs Outline Motivation What is the need for HLL VMs? How are these different from System or Process VMs? Approach to HLL VMs Evolutionary history Pascal P-code Object oriented HLL VMs

More information

LESSON 13: LANGUAGE TRANSLATION

LESSON 13: LANGUAGE TRANSLATION LESSON 13: LANGUAGE TRANSLATION Objective Interpreters and Compilers. Language Translation Phases. Interpreters and Compilers A COMPILER is a program that translates a complete source program into machine

More information

Conditional Branching is not Necessary for Universal Computation in von Neumann Computers Raul Rojas (University of Halle Department of Mathematics an

Conditional Branching is not Necessary for Universal Computation in von Neumann Computers Raul Rojas (University of Halle Department of Mathematics an Conditional Branching is not Necessary for Universal Computation in von Neumann Computers Raul Rojas (University of Halle Department of Mathematics and Computer Science rojas@informatik.uni-halle.de) Abstract:

More information

Intel X86 Assembler Instruction Set Opcode Table

Intel X86 Assembler Instruction Set Opcode Table Intel X86 Assembler Instruction Set Opcode Table x86 Instruction Set Reference. Derived from the September 2014 version of the Intel 64 and IA-32 LGDT, Load Global/Interrupt Descriptor Table Register.

More information

[1] C. Moura, \SuperDLX A Generic SuperScalar Simulator," ACAPS Technical Memo 64, School

[1] C. Moura, \SuperDLX A Generic SuperScalar Simulator, ACAPS Technical Memo 64, School References [1] C. Moura, \SuperDLX A Generic SuperScalar Simulator," ACAPS Technical Memo 64, School of Computer Science, McGill University, May 1993. [2] C. Young, N. Gloy, and M. D. Smith, \A Comparative

More information

Overview. EE 4504 Computer Organization. Much of the computer s architecture / organization is hidden from a HLL programmer

Overview. EE 4504 Computer Organization. Much of the computer s architecture / organization is hidden from a HLL programmer Overview EE 4504 Computer Organization Section 7 The Instruction Set Much of the computer s architecture / organization is hidden from a HLL programmer In the abstract sense, the programmer should not

More information

CSEE 3827: Fundamentals of Computer Systems

CSEE 3827: Fundamentals of Computer Systems CSEE 3827: Fundamentals of Computer Systems Lecture 15 April 1, 2009 martha@cs.columbia.edu and the rest of the semester Source code (e.g., *.java, *.c) (software) Compiler MIPS instruction set architecture

More information

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming CS311 Lecture: CPU Control: Hardwired control and Microprogrammed Control Last revised October 18, 2007 Objectives: 1. To explain the concept of a control word 2. To show how control words can be generated

More information

There are different characteristics for exceptions. They are as follows:

There are different characteristics for exceptions. They are as follows: e-pg PATHSHALA- Computer Science Computer Architecture Module 15 Exception handling and floating point pipelines The objectives of this module are to discuss about exceptions and look at how the MIPS architecture

More information

For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to

For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to access them. Contents at a Glance About the Author...xi

More information

Architectural Design and Analysis of a VLIW Processor. Arthur Abnous and Nader Bagherzadeh. Department of Electrical and Computer Engineering

Architectural Design and Analysis of a VLIW Processor. Arthur Abnous and Nader Bagherzadeh. Department of Electrical and Computer Engineering Architectural Design and Analysis of a VLIW Processor Arthur Abnous and Nader Bagherzadeh Department of Electrical and Computer Engineering University of California, Irvine Irvine, CA 92717 Phone: (714)

More information

Chapter 1. Preliminaries

Chapter 1. Preliminaries Chapter 1 Preliminaries Chapter 1 Topics Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation Criteria Influences on Language Design Language Categories Language

More information

COMP2121: Microprocessors and Interfacing. Instruction Set Architecture (ISA)

COMP2121: Microprocessors and Interfacing. Instruction Set Architecture (ISA) COMP2121: Microprocessors and Interfacing Instruction Set Architecture (ISA) http://www.cse.unsw.edu.au/~cs2121 Lecturer: Hui Wu Session 2, 2017 1 Contents Memory models Registers Data types Instructions

More information

Chapter 1 INTRODUCTION SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

Chapter 1 INTRODUCTION SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC. hapter 1 INTRODUTION SYS-ED/ OMPUTER EDUATION TEHNIQUES, IN. Objectives You will learn: Java features. Java and its associated components. Features of a Java application and applet. Java data types. Java

More information

CPU Pipelining Issues

CPU Pipelining Issues CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput

More information

Hardware Speculation Support

Hardware Speculation Support Hardware Speculation Support Conditional instructions Most common form is conditional move BNEZ R1, L ;if MOV R2, R3 ;then CMOVZ R2,R3, R1 L: ;else Other variants conditional loads and stores nullification

More information

Instructions: Language of the Computer

Instructions: Language of the Computer CS359: Computer Architecture Instructions: Language of the Computer Yanyan Shen Department of Computer Science and Engineering 1 The Language a Computer Understands Word a computer understands: instruction

More information

Instruction Set Principles. (Appendix B)

Instruction Set Principles. (Appendix B) Instruction Set Principles (Appendix B) Outline Introduction Classification of Instruction Set Architectures Addressing Modes Instruction Set Operations Type & Size of Operands Instruction Set Encoding

More information

Parallelism of Java Bytecode Programs and a Java ILP Processor Architecture

Parallelism of Java Bytecode Programs and a Java ILP Processor Architecture Australian Computer Science Communications, Vol.21, No.4, 1999, Springer-Verlag Singapore Parallelism of Java Bytecode Programs and a Java ILP Processor Architecture Kenji Watanabe and Yamin Li Graduate

More information

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo Real-Time Scalability of Nested Spin Locks Hiroaki Takada and Ken Sakamura Department of Information Science, Faculty of Science, University of Tokyo 7-3-1, Hongo, Bunkyo-ku, Tokyo 113, Japan Abstract

More information

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1 CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Winter 2008 3/11/2008 2002-08 Hal Perkins & UW CSE V-1 Agenda Java virtual machine architecture.class files Class loading Execution engines

More information

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Objective PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Explain what is meant by compiler. Explain how the compiler works. Describe various analysis of the source program. Describe the

More information

Multi-Version Caches for Multiscalar Processors. Manoj Franklin. Clemson University. 221-C Riggs Hall, Clemson, SC , USA

Multi-Version Caches for Multiscalar Processors. Manoj Franklin. Clemson University. 221-C Riggs Hall, Clemson, SC , USA Multi-Version Caches for Multiscalar Processors Manoj Franklin Department of Electrical and Computer Engineering Clemson University 22-C Riggs Hall, Clemson, SC 29634-095, USA Email: mfrankl@blessing.eng.clemson.edu

More information

Chapter 1. Preliminaries

Chapter 1. Preliminaries Chapter 1 Preliminaries Chapter 1 Topics Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation Criteria Influences on Language Design Language Categories Language

More information

COSC 6385 Computer Architecture. Instruction Set Architectures

COSC 6385 Computer Architecture. Instruction Set Architectures COSC 6385 Computer Architecture Instruction Set Architectures Spring 2012 Instruction Set Architecture (ISA) Definition on Wikipedia: Part of the Computer Architecture related to programming Defines set

More information

Outline. Computer Science 331. Information Hiding. What This Lecture is About. Data Structures, Abstract Data Types, and Their Implementations

Outline. Computer Science 331. Information Hiding. What This Lecture is About. Data Structures, Abstract Data Types, and Their Implementations Outline Computer Science 331 Data Structures, Abstract Data Types, and Their Implementations Mike Jacobson 1 Overview 2 ADTs as Interfaces Department of Computer Science University of Calgary Lecture #8

More information