JANGO DVM. User Specification. Alpha. Convex D I X I E. x Target. ISAs. Source. ISAs. User Simulator S P E E D Y. Mips Alpha.

Size: px

Start display at page:

Download "JANGO DVM. User Specification. Alpha. Convex D I X I E. x Target. ISAs. Source. ISAs. User Simulator S P E E D Y. Mips Alpha."

Leon Richard
6 years ago
Views:

1 Dixie: A Retargetable Binary Instrumentation Tool Manel Fernandez, Alex Ramrez, Silvia Cernuda and Roger Espasa Computer Architecture Dept. Universitat Politecnica de Catalunya-Barcelona fmfernand,aramirez,scernuda,rogerg@ac.upc.es December 16, 1998 Abstract Dixie is a toolset that enables computer architecture researchers to instrument and monitor all aspects of certain binary when run on its native environment. Dixie is even able to provide instructions on the \wrong path" of a program. Moreover, Dixie has cross-platform capabilities, allowing a researcher to trace a binary specied in ISA `A' on a workstation with a dierent ISA `B'. Not only that, but through its emulation capabilities, Dixie enables researchers to explore extensions to current ISAs and even to design, emulate and evaluate a completely new ISA. 1 Project Goal The Dixie projects seeks a a tool that allows exible instrumentation of program binaries to perform computer architecture research. The major features of the Dixie tool are listed below: Full architecture coverage The tool is able to provide at any point in time during the execution of the traced binary the value of all the architected registers of the machine, the value of all memory locations and the sequence of user instructions executed. High accuracy The tool does not distort in any way the virtual addresses generated by the program or the location of the program itself. That is, the tool accurately reproduces all instruction and data addresses generated during program execution as if the program was being executed in its native environment. Wrong Path Execution The tool is able to trace the sequence of instructions that follow a control ow mis-speculation. That is, on a wrong branch prediction, the tool provides the instructions being fetched, its associated register values and the potentially speculative memory contents. Multi-ISA A very important characteristic of Dixie is that it is easily retargetable. Currently it supports a RISC-style ISA (Alpha), a vector ISA (Convex) and is being ported to a CISC-style ISA (x86). The tool should also be able to work on user-dened ISAs, that is, ISAs that do not have real processors to run on. On-the-y trace processing The tool allows execution-driven simulators, to avoid storing huge trace les to disk. Cross-Platform execution Since the tool is based on emulation techniques, it can be used to execute and trace binaries of a certain architecture on top of a completely dierent architecture. Currently, the Convex binaries can be executed on an Alpha workstation. Atom-like interface Due to its widespread use, the user interface provided by the ATOM tool [8] has been chosen as the user interface for Dixie. 2 Project Non-Goals There are a few topics that Dixie does not try to address. These are: Operating System Traces At the current stage, there are no provisions to incorporate the ability to trace the operating system code to Dixie. If one desires such features, using tools like SimOS [7] or SimICS [6] is probably a better choice. 1

2 Self-modifying Code At the current stage, binaries that use this technique do not work under the Dixie emulation environment. 3 Dixie Overview Due to the variety of requirements stated in previous sections, Dixie is based on a combination of two main techniques: Binary translation, a technique to translate the object code of an application (binary le) from its native architecture into an equivalent object code for a new architecture. Instruction-set emulation, typically executing one instruction set (native ISA) in terms of another instruction set (host ISA). The need for binary emulation is obvious due to the requirement of providing wrong-path information. None of the current tools based on instrumenting a binary (as opposed to emulating it) can provide such information (for example, ATOM on Alpha, Pixie on Mips, etc.). Also, emulation greatly simplies the \accuracy" requirement also stated in section 1. When using emulation, it is easier to guarantee that the instruction and data addresses generated by the binary in its native environment are faithfully reproduced. The need for binary translation arises from the wide variety of instruction sets that Dixie tries to be able to instrument. Currently, Dixie can handle ISAs as dierent as the Alpha ISA (very RISC and simple), the Convex ISA (a vector instruction set with many CISCy features) and the x86 ISA (much more complex than any of the previous two). Instead of writing an emulator for each of these ISAs, the basic idea behind the Dixie project is to have a single instruction set, the Dixie instruction set, to which all binaries are translated. Then, this Dixie ISA is emulated, providing all the benets of emulation as stated in the previous paragraph. The complete Dixie toolset can be seen in gure 1. Next we describe each component in some depth: Dixie compiler The Dixie compiler can accept as input several binary formats and does an instruction by instruction translation from the source ISA into the Dixie ISA. This translation is typically a 1-to-N mapping, since the Dixie ISA is a simple, register-to-register ISA, while the source ISA's can be very complex. Therefore, it is expected that a single instruction from a source ISA can turn into several instructions in the Dixie ISA. For example, a string instruction in the x86 ISA translates into a loop of Dixie instructions. The Dixie instruction set has been kept as small as possible, placing the burden of the translation on the writer of the machine descriptions. In order to preserve the semantics of the original program, and to allow user transformations on the Dixie intermediate format, each instruction translated also carries with it a lot of extra semantic information. This extra information describes the original registers involved in the instruction, how eective address computation is carried out, etc. Jango Once the input binary has been translated into a Dixie binary, the user can proceed to instrument the intermediate binary. Using an interface very similar to that of ATOM, the user will supply a specication le and some instrumentation code. JANGO will analyze the dixie binary following the requests in the specication le and will insert special \breakpoints" in the Dixie code to collect the information required by the user. These breakpoints are in fact instructions that read a certain register and dump its contents through the output channel connected to the user simulator. The value generated by the DVM will be collected by stub routines (automatically generated by JANGO) that will call the instrumentation code provided by the user. A very important point in the design of JANGO is that the user never interacts with the Dixie intermediate ISA. That is, if a user is instrumenting an Alpha binary, the instrumentation le passed to JANGO describes the instrumentation of Alpha instructions, not Dixie instructions. In order to perform the instrumentation, JANGO understands both the semantics of the Alpha instructions and the semantics of the translation of each Alpha instruction into Dixie instructions. Speedy Once the Dixie binary has been generated and, possibly, instrumented, the user can optionally select to optimize it. The Speedy tool analyzes the Dixie binary at the basic block level and performs two basic types of optimizations: First, machine-independent optimizations, such as common subexpression elimination, constant folding, etc. Second, Speedy can select some of the basic blocks and translate them to native machine language, that is, to the ISA on which the 2

3 Alpha JANGO User Specification Convex x86... D I X I E Dixie ISA S P E E D Y Mips Alpha Target ISAs Source ISAs DVM (Dixie Virtual Machine) User Simulator 64-bit host Figure 1: The Dixie toolset. DVM itself will run. These translated basic blocks will be specially tagged, so that when the DVM loads the binary in memory realizes that a machine translation exists for them. During execution, if the DVM encounters one of these specially tagged basic blocks, it will jump to the native machine translation rather than interpreting the basic block instruction by instruction. Dixie virtual machine (DVM) Finally, when the input binary has been translated into a Dixie binary, instrumented if necessary and optimized if desired, the user runs it on the Dixie Virtual Machine (DVM). The virtual machine emulates the behavior of each Dixie instruction and performs all the necessary jacketing and translation for mapping native system calls into host system calls. As already mentioned, the DVM can work in two modes: either interpreting each Dixie instruction in turn or, when encountering a specially marked basic block, jumping into a sequence of host machine instructions that perform the operations specied in the basic block. Of course, the second method of operation is much more ecient than the instruction-by-instruction interpretation. The input user program will be run to completion, producing whatever results it should produce, and, meanwhile, the DVM will dump through its output channel all the information requested by the user. 4 Dixie ISA Highlights In this section we describe the main features of the Dixie ISA. For a full description, refer to [4]. As a reminder, we note that the general user of the Dixie toolset should never be exposed to using and/or knowing anything about the Dixie ISA. Only a developer 3

4 porting the toolset to a new ISA needs to be concerned with the inner workings of the Dixie ISA. The main goals of this intermediate ISA are simplicity, ease of interpretation and sucient expressive power to accommodate a wide variety of source ISAs. In particular, a variety of instructions has been provided to accommodate the following features: Big- and Little-endian memory architectures, 32-bit and 64-bit address spaces, Vector registers, with the associated vector length, vector stride and vector mask special purpose registers, Condition codes computation, to accommodate ISAs such as the x86, The only major limiting assumption made by the Dixie ISA is that the DVM will run on a host that provides oating point instructions compliant with the IEEE 754 standard. Therefore, emulating ISAs with dierent oating point formats would be quite dicult in the Dixie framework. Despite being possible, a developer would have to express the basic oating point operations of his native format (VAX, for example) in terms of Dixie integer instructions. Aside from being a very tedious and error-prone task, the performance of resulting binaries would be very slow. The main features of the Dixie ISA are summarized in the following list: Load-store architecture Dixie follows a strict loadstore architecture were all arithmetic operations are performed on registers and reading and writing from/to memory is always done through load/store instructions. Address space Dixie oers a 64-bit address space, with byte granularity that can be read either in little endian mode or in big endian mode. There are dierent load instructions for reading with the two dierent endianisms. Also, despite being a 64-bit architecture, there is the possibility to generate memory addresses that are only 32 bits in length. Again, using appropriate versions of the load/store instructions the eective address can be truncated to 32 bit. This greatly simplies emulation of 32-bit machines on 64-bit hosts. Single register le The Dixie architecture oers a single register le with registers, each register being 64-bits wide. There is no distinction between oating point and integer registers as is usually the case with most other instruction sets. The large number of registers is required to accommodate vector instructions. The register le can be viewed as a collection of vector registers with variable length. For example, when emulating a vector machine with eight 128-elements vector registers, the user can set aside 1024 of the general purpose register le to emulate those native vector registers. Thus, without strictly providing vector registers, the Dixie ISA can easily accommodate vector instructions. 128-bit instructions All Dixie instructions are 128- bit, with a single format, reserving 32 bits for opcode and the remaining bits to specify the operands. There are two source operands (15 and 64 bits respectively) and a destination operand (15 bits). The 64-bit source operand can be either a register (in which case only 15 bits are used) or a 64-bit constant (in two's complement). Vector support Coupled with the general purpose registers, the Dixie ISA oers a special addressing mode, known as increment mode, to support vector execution and to facilitate emulation of vector instructions. Imagine a simple instruction such as add r1,r300->r400. If the rst and last operands are marked as \increment mode" (yielding add i1,r300->i400) we have a vector instruction. This instruction indicates that registers 1 and 300 must be added and the result placed in register 400. Then, the operation must be repeated as many times as indicated in the VL register (the Vector Length register) incrementing each time the registers marked as \increment". That is, the instruction will be executed again, but this time registers 2 and 300 will be added and the result placed in register 401. To support vector memory operations, a vector stride register is also provided. Finally, masked operation is also provided using a special vector mask register designator. Data types Dixie oers the usual variety of data types. For xed point computation, signed and unsigned representations are provided coming in four data sizes: 8, 16, 32 and 64 bits. For oating point computations, IEEE 754 single and double precision are available. 4

5 5 Dixie Compiler The Dixie compiler accepts as input a particular binary format and performs an instruction by instruction translation from the source ISA into the Dixie ISA. Of course, for each source ISA, a dierent Dixie compiler is required. The tool set provides a convenient framework to easily build a particular Dixie compiler for a particular ISA. This framework includes a couple of dierent tools that allow the user to easily specify: (a) The parsing and semantics of each instruction in the ISA being analyzed and (b) the mapping of system calls between the input binary and the host machine on which the DVM will run. The idea is to provide concise and easy ways to specify the translation of the input ISA into the Dixie ISA. When porting the Dixie Compiler to a new ISA, the user is required to write three main pieces of code that precisely describe the new ISA environment: A binary le loader, that is, appropriate functions that allow reading and writing dierent sections of the input binary. An architecture description, written in a pseudo- C language that describes how instructions must be parsed (at the bit level), decoded and translated into Dixie instructions. The language in which this description is written is designed to help the user minimize porting time and also reduce the number of bugs introduced in the translation. An operating system description, which describes the correspondence between the input binary operating system and DixOS, the Dixie Operating System. This mapping between dierent operating systems is also written in a pseudo-c language, to facilitate translation. For the last two items, we provide two compilers that take the input descriptions and generate the appropriate C code. From these automatically generated code, plus the binary le loader plus some routines in the Dixie framework, the desired compiler can be readily constructed. This whole process is shown in Figure 2. 6 Jango Jango provides the interface to instrument the program binary. The user writes a piece of C code that describes, using a xed set of predened functions, how the binary has to be instrumented. For example, the user can specify that, just before each memory instruction, a certain user-dened function must be called and passed as a parameter the PC of the instruction and the eective memory address. Jango processes this request, locates all memory instruction in the binary, inserts \pseudo-breakpoints", and generates stub routines that will collect the PC and eective address generated by the pseudo-breakpoints and forward them to the user routine. When porting the Dixie toolset to a new ISA, the Jango toolset must be made aware of the particular characteristics of the new ISA. However, a big portion of the interface routines of Jango remain stable across dierent ISAs. Moreover, the general \architecture concepts" coded inside Jango (the concept of a register, of an eective address, of an instruction PC, etc.) are also very stable. Therefore, when porting Jango to a new ISA, minimal changes have to be made to its internal structure to accommodate these basic concepts to the new ISA. 7 Speedy The Speedy tool accepts as input a Dixie binary, analyzes the Dixie instructions, optimizes them and, optionally produces a translation of the Dixie instructions to a native machine language (currently, only Alpha is supported). The resulting native translation is stored inside the Dixie binary itself (the \fat binary" concept) and, in fact, a Dixie binary could carry inside multiple translations for dierent native ISAs. Speedy is composed of two main phases: a machineindependent optimizer and a code generator. The optimizer works at the Dixie machine language level and is independent of the target host on which the DVM will be running. With minimal information from the code generator, the optimizer performs register allocation of the Dixie virtual registers into real hardware registers. The optimizer also removes some redundant instructions inevitably introduced during the translation process and performs some simple common subexpression elimination. In the second phase, the code generator selects appropriate machine instructions to translate the Dixie instructions into some native machine instruction set. Code generation always works at the corresponding basic block level. Speedy takes a basic block, translates it, and stores in the Dixie binary the translated basic block and a \translation pointer". When the DVM loads the binary into memory, checks all the 5

6 Machine description (.md3) md2c compiler Decode (decode.c) Operating System description (.os) os2c compiler System calls (os-tab.c) Native file loader (.aout.c) Dixie library cc compiler Native binary Dixie compiler (dixiec) Dixie Binary Figure 2: Dixie Compiler Components. translation pointers available. For each translation pointer, the DVM associates the native machine language translation to the rst instruction of the basic block. Not all basic blocks can or must be translated. Therefore, at program load time, the DVM will see that some instructions have a pointer to machine language code and some others do not. As the program is emulated, the DVM will have two choices for executing an instruction: if the instruction does not have an associated machine translation, it will be interpreted. It the instruction does have a translation, the DVM will jump to it and resume after the machine code block is done. As expected, the native translations are much faster than interpretation. The interesting property of this hybrid translated/interpreted approach is that Speedy can be developed in a very gradual and smooth manner. Not all basic blocks need be translated and Speedy can potentially focus only on those that are really important in terms of execution time. This mixed strategy will also enable that, in the future, Speedy acts like a Just-in-Time compiler. 8 Dixie Virtual Machine After the original binary has been translated, instrumented using Jango and optimized using Speedy, it is run on the Dixie Virtual Machine. The virtual machine emulates the Dixie instruction set and maps all the operating system services required by the binary to the host operating system services. The internal operation of the DVM can be divided in three phases: initialization, program load and decode and emulation. In the initialization phase, all the relevant information is read from the binary. For each section in the binary that should be loaded in memory, the DVM computes the exact original address where the section would have been loaded if run on its original native machine. Then the DVM mmap's that section in memory and loads it with the initial values in the binary le. The idea is that all memory sections will be located at exactly the same address as in the original binary, thus eliminating the need of translating (relocating) each emulated memory access. Of course, this also implies that the DVM must be ready to relocate itself if it conicts with any of the 6

7 memory segments of the binary to be emulated. In the second phase, each instruction in the binary is read and predecoded into a format more suitable for interpretation. In essence, the instruction is translated into a structure that contains pointers to its operands, pointers to the destination register(s) and a pointer to a block of code that implements its semantics. This block of code, as already mentioned, can be either a C routine that interprets the instruction or a sequence of native machine instructions. For instrumentation purposes, each structure for each Dixie instruction will also contain the PC of the native instruction that it relates to. The third phase is the proper execution of the program. The DVM sets up two global variables, IR and PC, that contain, respectively, a pointer to the structure describing the current instruction and the PC of the current instruction. The execution loop accesses the structure pointed by IR, and jumps to its semantic code (either a C function or a block of native instructions). When the code returns, the DVM increments the PC (which, by the way, might have been modi- ed by the instruction just emulated), computes the corresponding IR and starts over. 9 Design Wins The initial thrust behind the Dixie project was the need of our research group to have execution traces of a vector processor. Since then, the project has been expanded to solve other issues that have become more important. In particular, Dixie addresses several needs of our research group. First, following our initial design goal, Dixie allows research on vector machines. There are very few, if any, available tools for tracing programs on a vector machine. The Cray systems provide an instruction tracing tool that, unfortunately, is not able to provide the exact values of the vector length and vector stride used in each vector instruction. Our group did actually develop a pixie-like tool to do tracing on a vector machine. This primitive tool was targeted to a Convex vector machine [3] and later became the ancestor of the Dixie tool described in this paper (see appendix A). Second, Dixie is able to do cross platform instrumentation and analysis. Closely related with the previous item, there is the question of machine availability to run experiments. Not every research group and organization has access to the same hardware, especially when this hardware is a multi-million dollar vector machine. Therefore, Dixie has been designed to allow researchers analyze any type of binary on their platform of choice. That is, a researcher needs only some limited access to a certain machine, say, a Cray T90, to generate the binaries of the benchmarks he is interested in. Then, he transfers the binaries to his workstation of choice and there, and not in the original T90 machine, he emulates the binaries, generates on-the-y traces and feeds his favorite simulator. This cross-platform ability is the direct cause of choosing instruction translation and emulation (through the DVM) as the main technology on which Dixie is based. Third, Dixie provides both register contents information as well as wrong path information. The latest generation of simulators/emulators for computer architecture research provide these capabilities (see Shade [2] and SimpleScalar [1]). Our previous tracing tool for the Convex machine did provided some register information, but only for the vector length and vector stride registers. Dixie is able to provide the contents of any register at any point during the execution of a program (even if it's a vector register). Also, Dixie is able to analyze and provide information when the processor mispredicts a branch and enters in mis-speculative mode. Fourth, Dixie is highly accurate, that is, faithfully reproduce the exact sequence of instruction addresses, data addresses and register values generated by the original binary when run on its original native environment. Fifth, Dixie allows research on new instruction set architectures. For a long period of time, computer architecture research has stayed away from studies on instruction set design. The reason was that after the consolidation of the RISC architectures and the absolute market dominance of the x86 line, little value was perceived on trying to push new instruction sets. However, at the present time we face two new domains that can benet from new instruction set designs. First, virtual machines are gaining wide acceptance due to the Java impulse [5]. However, there has been little research on high performance software instruction sets for virtual machines. Dixie represents an excellent opportunity to explore this new domain. Second, the computer architecture eld is slowly recognizing the importance of the embedded microprocessor market. Studies on high performance designs that have low power consumption are gaining interest in the community. The exibility of Dixie will allow studying embedded instruction sets (which have little support and tracing tools) as well as small extensions to existing instruction sets. 7

8 10 Current Status We nally proceed to describe the current status of the Dixie project. We have distinguished three dierent pieces in the toolset: First, the status of the basic software tools. Second the status of the optimizer and, nally, the status of the machine descriptions describing dierent input ISAs. Base Software Components: Dixie, DVM and Jango The Dixie compiler and DVM are completed and run on both Alpha and Mips 64-bit environments. We have completed tests running both 32- and 64-bit binaries and both big- and little-endian binaries on top of the DVM. Figure 3 shows the speed at which Alpha and Convex binaries run on top of the DVM on an AlphaStation 600 5/266 with Digital Unix OSF/1 4.0, revision 5.64 (This workstation is equipped with an Alpha microprocessor running at 266Mhz). The rst thing to note is that these results are without any optimization at all. That is, Speedy has not processed these binaries in any way. The results are clearly too slow. However, we expect that, once Speedy is operational, compiling basic blocks to native language will reduce the slowdown to a maximum factor of 5 to 10. We also note that the dierence between the Convex and Alpha results are mainly due to the endianism change. While Alpha is little endian, Convex is big endian. Therefore, when running a Convex binary on the DVM running on an Alpha chip, each load and store must be converted back and forth between endianisms. Finally, table 1 summarizes the legal combinations of input ISAs and possible hosts on which the DVM can be run. As it can bee seen, the major current restriction is that the DVM host must have the same or more virtual address bits as the input ISA. Currently, the DVM is limited to 64-bit hosts, but can emulate any ISA (32 or 64 bit, big or little endian). The Jango component has not been implemented yet. Only its initial design has been completed, and some features required by Jango have been incorporated into the Dixie Compiler. Optimizer: Speedy Implementation of the Speedy optimizer has started but is still far from completion. Currently, Speedy performs register allocation, eliminates NOP instructions and some PC-relative computations and generates code for a subset of all possible instructions. In particular, Speedy can translate basic blocks that contain load/store instructions and add/sub instructions. If a basic block contains any instruction other than the above mentioned, Speedy ignores the basic block and does not translate it. ISA Descriptions Three dierent ISA descriptions are currently under way: Alpha, Convex and x86. Alpha and Convex are completed and are in the testing stage. As shown in gure 3, a couple of Spec95 programs already run correctly to completion. The x86 description is in early stage of development. It can already run programs up to the main() routine, and also runs the classic Hello World! code. References [1] D. Burger, T. Austin, and S. Bennett. Evaluating Future Microprocessors: the SimpleScalar Tool Set. Technical Report CS-TR , Computer Science Department. University of Wisconsin- Madison., [2] R. F. Cmelik and D. Keppel. Shade: A Fast Instruction-set Simulator for Execution Proling. In Proceedings of the '94 ACM SIGMETRICS Conference, pages 128{137, May [3] R. Espasa and X. Martorell. Dixie: a trace generation system for the C3480. Technical Report CEPBA-RR-94-08, Universitat Politecnica de Catalunya, [4] M. Fernandez and R. Espasa. Dixie Architecture Reference Manual: Version 1.0. UPC-CEPBA, rst edition, September [5] T. Lindholm and F. Yellin. The Java Virtual Machine Specication. Addison-Wesley, Masachusetts, September The Java Series. [6] P. S. Magnusson and B. Werner. Ecient Memory Simulation in SimICS. In 28th Annual Simulation Symposium, April [7] M. Rosenblum, S. Herrod, E. Witchell, and A. Gupta. Complete Computer System Simulation: the SimOS Approach. IEEE Parallel and Distributed Technology,

9 Slowdown Alpha Convex 50 0 compress95 go95 hanoi life qsort Figure 3: Slowdown of Alpha and Convex binaries when run on top of the DVM on an AlphaStation/266Mhz. Slowdown is computed with respect to the execution time of the native Alpha binary when run on the Alpha workstation. For example, the compress program compiled for a Convex machine runs 80 times slower on the DVM than the compress program run native on the alpha workstation. [8] A. Srivastava and A. Eustace. ATOM a system for building customized program analysis tools. In Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation, pages 196{205, Orlando, Florida, June 20{ 24, SIGPLAN Notices, 29(6), June A History and Credits The rst major version of Dixie, referred generically as Dixie-II, was a pixie-like tool that allowed collecting traces on a Convex C34 machine and later processing them on a Unix workstation. The traces obviously included memory and instruction addresses, but also included every value stored in the vector length register and the vector stride register. The ability to know the exact contents of the vector length and vector stride registers at any point in time greatly increased the accuracy of our original tool over tools available in other vector machines (such as Cray, for example). Dixie-II was written by Roger Espasa starting o from a piece of code that parsed Convex instructions written by Xavier Martorell. Dixie-II was later ported to the newest Convex machine, the C4 series, yielding what is known as Dixie- III. This port was far from being trivial, since the C4 series had almost included a completely new instruction set, with more scalar registers, more vector registers, many new opcodes, and a completely new instruction format. Indeed, the C4 machines can be viewed as having two instruction sets: the format7 set, which includes all instructions up to the C3, and the format8 instructions, which includes one instruction per format7 instruction plus many new opcodes. The Dixie-III port was done by Francisca Quintana. In parallel with the development of Dixie-III, a new project was started to look at portable binary instrumentation tools. The rst attempt was to write a mini-compiler that would ease the task of writing binary instrumentation tools. This tool was developed as a proof of concept that the dierences between ISAs could be relatively easily described using a specialpurpose language. The tool was targeted to the Cray ISA and was implemented by Francisco-Javier Martin as his nal year undergraduate project under the direction of Roger Espasa. This tool, although never fully operational on the Cray, was known as Dixie-IV. With the lessons learned in the development of Dixie-IV, the Dixie-V project was started. Dixie-V is the tool described in this paper. Although externally we simply use the name \Dixie", internally we use the \Dixie-V" nomenclature. The major departure from the previous tools was that Dixie-V had to use binary emulation. This was motivated because, over time, all major vector machines that we had access to had been either replaced or simply unplugged. Thus we were running out of machines on which to compile and run our benchmarks. A binary emulator would solve this problem by allowing running the 9

10 Input DVM runs on ISA 32-bit 64-bit This combination is not currently supported. Yet, it 32-bit, Big endian, poses no signicant challenges and could be ready as soon 32-bit, Little endian as a port of the DVM to a 32 bit machine is done. running This combination implies a major rewrite of the DVM. Many elds in the internal structures of the DVM are declared as 64-bit variables and are expected to be 64 bits. Moreover, an application running on a 64 bit system 64-bit, Big endian, might have OS dependencies that are simply impossible 64-bit, Little endian to hide. For example, if the application is dealing with running les having osets larger than 64 bits, the underlying 32- bit OS might not be able to process certain system calls. Table 1: Combination of ISAs currently supported by Dixie. original binaries on any Unix workstation. Once binary emulation was decided, it was clear that binary translation would be a great plus. Instead of writing a virtual machine per vector ISA we were interested in, we would write a few binary translators (which Dixie-IV proved we knew how to do) that would translate any ISA into our Dixie ISA. Then, with a single virtual machine, we could emulate multiple instruction sets. We started focusing on two ISAs: Convex and Alpha. Convex was our real motivation for programming the tool, yet Alpha, with the availability of ATOM, was the ideal candidate to debug and train our set of tools. Alex Ramirez wrote the ISA-compiler (the Dixie Compiler as described in this paper) and wrote a signicant portion of the Alpha machine description. The rst DVM implementation was also written by Alex. Six months later, Manel Fernandez took the job and added all the OS-to-OS mapping capabilities to Dixie. Manel also greatly improved the Dixie compiler and completed the Alpha and Convex machine descriptions. Recently a port to the x86 architecture has been started by Silvia Cernuda, as her nal year undergraduate project. 10

An Assembler for the MSSP Distiller Eric Zimmerman University of Illinois, Urbana Champaign

An Assembler for the MSSP Distiller Eric Zimmerman University of Illinois, Urbana Champaign Abstract It is important to have a means of manually testing a potential optimization before laboring to fully