CO200-Computer Organization and Architecture - Assignment One Note: A team may contain not more than 2 members. Format the assignment solutions in a L A TEX document. E-mail the assignment solutions PDF to basavaraj@nitk.edu.in. Submit the printed copy by 4pm, Friday, August 22. One submission per team. Late submission will carry negative credit. Use flowcharts, algorithms, diagrams, graphs, tables, screenshots, and illustrations wherever necessary. Com- Textbook codes: PH4e: Patterson and Hennessy. puter Organization and Architecture, 4ed. Common Questions to all Teams CQ 1 Indicate the percentage of task completed by each team member per question. Present your data in the form of a table. CQ 2 Represent the alphabetical names of the members of this team in ASCII format. Use a byte per character. Present the binary translation in hexadecimal form. CQ 3 Represent the alphabetical names of the members of this team in Unicode format. Use 16b or 32b per character as required. You may choose your mother tongue to write your name and then encode its individual characters in Unicode. In the case the characters of your mother tongue are not included in Unicode, you may use the Devanagari script. Present the binary translation in hexadecimal form. CQ 4 Use the answer from CQ 2 and CQ 3 for this question. Consider the first four bytes from each name (names of member one and member two), from each encoding (ASCII and Unicode) - You now have 4 sets of 4 bytes. Write a diagram to show each of these words (a word consists of 4 bytes) stored in memory locations 0x100, 0x200, 0x300, and 0x400. The memory is byte addressable. Answer the following: (a) What is the decimal number represented by the word in address 0x100 if the machine is Little Endian? Consider two s complement encoding. (b) What is the decimal number represented by the word in address 0x200 if the machine is Big Endian? Consider one s complement encoding. (c) What is the value of the real number stored in the address 0x300 if the machine is Little Endian? Assume IEEE 754 single precision format. (d) What is the value of the real number stored in the address 0x400 if the machine is Big Endian? Assume IEEE 754 single precision format. CQ 5 Use the last two digits of your roll numbers for this question. If your roll numbers are 12CO001 and 12CO022, let x = 1 and y = 22. Represent the following in IEEE 754 single precision format: x.y and -y.x. In this example, the team will have to convert 1.22 and -22.1 into IEEE 754 SP FP numbers. Team 1 and 1952 in Princeton Institute of Advanced Studies by von Neumann and his team. The IAS computer is the prototype of all subsequent general purpose computers. (a) The IAS is one of the first computers to demonstrate which fundamental computer architecture concept? (b) Use the IAS instruction set, write the sequence of instructions to demonstrate the following: Let A=A(1), A(2),..., A(1000) and B=B(1), B(2),..., B(1000) be two vectors comprising 1000 numbers each that are to be added to form an array C such that C(I) = A(I) + B(I) for I=1,2,...,1000. forth the salient features of the first generation of computers? 3. What are the important factors that influence buying a laptop computer in today s technology? Scan the advertisements from vendors and vendor catalogs to list the characteristics and their importance. 4. PH4e: Exercise 1.2, 2.15. Team 2 1. Write IAS assembly language code for stack push and pop operations assuming the following: (a) all stack elements to high memory addresses, (c) register R29 is used as the stack pointer and always contains the memory address just after the element which is on top of the stack, (d) for a push operation, register R1 holds the data to be pushed, (e) for a pop operation, the element from top of stack is to be popped into register R1. forth the salient features of the second generation of computers? 3. Explain the binary coded decimal standard with integer and floating point examples. 4. PH4e: Exercise 1.3, 2.16. Team 3 1. Show how a rotate left operation, R1 R2 rotate left by 7 bits, can be implemented using the IAS instruction set. in the destination register R1. Note: R1 and R2 are identifiers of any two available registers in the system. forth the salient features of the third generation of computers? 3. Explain the purpose, philosophy and architecture of the Difference Engine designed by Charles Babbage. 4. PH4e: Exercise 1.4, 2.17.
Team 4 and 1952 in Princeton Institute of Advanced Studies by von Neumann and his team. The IAS computer is the prototype of all subsequent general purpose computers. (a) The IAS is one of the first computers to demonstrate which fundamental computer architecture concept? (b) Use the IAS instruction set, write the sequence of instructions to demonstrate the following: Let A=A(1), A(2),..., A(1000) be a vector of non negative integers. Sort the elements of the vector in the non-decreasing order in place. forth the salient features of the fourth generation of computers? 3. Write the sequence of instructions involved in the evaluation of the following equation on a Register-Register machine model based system: b = a 2 + 3 a + 7. 4. PH4e: Exercise 1.5, 2.18. Team 5 and 1952 in Princeton Institute of Advanced Studies by von Neumann and his team. The IAS computer is the prototype of all subsequent general purpose computers. (a) The IAS is one of the first computers to demonstrate which fundamental computer architecture concept? (b) Use the IAS instruction set, write the sequence of instructions to calculate the factorial of a non negative number stored in the location A. forth the salient features of the current (fifth) generation of computers? 3. Explain the CRAY floating point data type with examples. 4. PH4e: Exercise 1.6, 2.19. Team 6 and 1952 in Princeton Institute of Advanced Studies by von Neumann and his team. The IAS computer is the prototype of all subsequent general purpose computers. (a) The IAS is one of the first computers to demonstrate which fundamental computer architecture concept? (b) Use the IAS instruction set, write the sequence of instructions to demonstrate the following: Let A=A(1), A(2),..., A(1000) be a vector of non negative integers. Find the largest integer in the vector and store it a variable named L. 2. Explain the Intel extended precision 80b floating point data type with examples. 3. Use the MIPS I Instruction set for this question. Write a code sequence to find the sum of all the elements of an array. You may assume that the array contains 1024 elements. Each of the elements is 32 bit long. 4. PH4e: Exercise 1.7, 2.20. Team 7 1. Use the IAS Instruction set for this question. Write a code sequence to find the sum of all the elements of an array. You may assume that the array contains 1024 elements. Each of the elements is 32 bit long. 2. What are the different sizes of storing integers and floating point numbers that have been used over the generations of computers. 3. Compilers translate the program into machine language so that it can be executed by hardware. Some systems use and interpreter rather than a compiler. How is this environment different? 4. PH4e: Exercise 1.8, 2.21. Team 8 1. Use the IA-32 instruction set to write the sequence of instructions..., A(1000) and B=B(1), B(2),..., B(1000) be two vectors 2. The Hello World program is a program that prints the string Hello World to the output. Write such a program in C and compile it to generate an executable file, a.out on a Linux system. What is the size of this a.out file? What do the first 16 Bytes in this a.out file contain? 3. What are weighted codes? Represent the integers 0 to 9 in (6 4 2-3) weighted code. 4. PH4e: Exercise 1.9, 2.22. Team 9 1. Write IA-32 assembly language code for stack push and pop operations assuming the following: (a) all stack elements to high memory addresses, (c) register R29 is used as the stack pointer and always contains the memory address just after the element which is on top of the stack, (d) for a push operation, register R1 holds the data to be pushed, (e) for a pop operation, the element from top of stack is to be popped into register R1. 2. Write a C program which checks whether the machine it is running on supports the IEEE Floating Point representation. Your program should also verify whether the machine handles special forms (such as infinity) correctly. 3. Use (a) 2s complement and (b) 1s complement arithmetic to perform the following operations: (a) 1011010 2 10101 2 (b) 10101 2 1011010 2. 4. PH4e: Exercise 1.10, 2.23. Team 10 1. Show how a rotate left operation, R1 R2 rotate left by 7 bits, can be implemented using the IA-32 instruction set.
in the destination register R1. Note: R1 and R2 are identifiers of any two available registers in the system. 2. Write the sequence of instructions involved in the evaluation of the following equation on a Register-Memory machine model based system: b = a 2 + 3 a + 7. 3. Which are the addressing modes used by the IA-32 ISA? 4. PH4e: Exercise 1.11, 2.24. Team 11 1. Use the IA-32 instruction set and write the sequence of instructions..., A(1000) be a vector of non negative integers. Sort the elements of the vector in the non-decreasing order in place. 2. Write a C program that can be used to identify whether the machine it is running on is using the LittleEndian or BigEndian byte ordering convention. 3. A 16-bit register contains the following: 0100100101010111. Interpret the contents as: (a) a BCD number (b) a binary number (c) an excess-3 number (d) two ASCII characters. 4. PH4e: Exercise 1.12, 2.25. Team 12 1. Use the IA-32 instruction set to write the sequence of instructions to calculate the factorial of a non negative number stored in the location A. 2. Represent the following in a 16-bit register: (a) 356 10, (b) 356 BCD, (c) A1 ASCII. 3. Use the ARM instruction set to write the sequence of instructions..., A(1000) and B=B(1), B(2),..., B(1000) be two vectors 4. PH4e: Exercise 1.13, 2.26. Team 13 1. Use the IA-32 instruction set to write the sequence of instructions..., A(1000) be a vector of non negative integers. Find the largest integer in the vector and store it a variable named L. 2. What are the different steps involved in the compilation of a C code? Use the GNU C Compiler as an example to illustrate your answer. Which programs does the GCC use to accomplish its intermediate tasks. Explain with an example. 3. What are weighted codes? Represent the integers 0 to 9 in (2 4 2 1) weighted code. 4. PH4e: Exercise 1.14, 2.27. Team 14 1. Use the IA-32 instruction set for this question. Write a code sequence to find the sum of all the elements of an array. You may assume that the array contains 1024 elements. Each of the elements is 32 bit long. 2. What are weighted codes? Represent the integers 0 to 9 in (6 3-1 1) weighted code. Do you observe any unique property about this code? 3. How many total bits are required for a direct mapped cache with 16KB of data and 4-word blocks, assuming a 32-bit address? 4. PH4e: Exercise 1.15, 2.28. Team 15 1. The IBM System/360 was one of most important members instruction set to write the sequence of instructions..., A(1000) and B=B(1), B(2),..., B(1000) be two vectors 2. Which are the addressing modes used by the MIPS64 ISA? 3. Write ARM assembly language code for stack push and pop operations assuming the following: (a) all stack elements to high memory addresses, (c) register R29 is used as the stack pointer and always contains the memory address just after the element which is on top of the stack, (d) for a push operation, register R1 holds the data to be pushed, (e) for a pop operation, the element from top of stack is to be popped into register R1. 4. PH4e: Exercise 1.16, 2.29. Team 16 1. Write IBM System/360 assembly language code for stack push and pop operations assuming the following: (a) all stack elements are of size 1 halfword, (b) stack grows from low memory to high memory addresses, (c) one register is used as the stack pointer and always contains the memory address just after the element which is on top of the stack, (d) for a push operation, a certain register holds the data to be pushed, (e) for a pop operation, the element from top of stack is to be popped into register a chosen register. 2. Which are the addressing modes used by the ARM ISA? 3. Show how a rotate left operation, R1 R2 rotate left by 7 bits, can be implemented using the ARM instruction set. in the destination register R1. Note: R1 and R2 are identifiers of any two available registers in the system. 4. PH4e: Exercise 2.1, 2.30. Team 17 1. Show how a rotate left operation, R1 R2 rotate left by 7 bits, can be implemented using the IBM System/360 instruction set. Note that we require the value present in general purpose register R2 to be rotated left by 7 bits,
with source register R2 not being modified, but the rotated value to be available in the destination register R1. Note: R1 and R2 are identifiers of any two available registers in the system. 2. Find a large program written in C (for example, gcc, from http://gcc.gnu.org) and compile the program twice, once with optimizations (use -O3) and once without. Compare the compilation time and run time of the program. 3. Introduce the Sign-Magnitude representation of integers. 4. PH4e: Exercise 2.2, 2.31. Team 18 1. The IBM System/360 was one of most important members instruction set and write the sequence of instructions..., A(1000) be a vector of non negative integers. Sort the elements of the vector in the non-decreasing order in place. 2. Assume you are in a company that markets a certain IC chip. The fixed costs, include R & D, fabrication and equipments, and so on, add up to $500000. The cost per wafer is $6000, and each wafer can be diced into 1500 dies. The die yield is 50%. Finally, the dies are packaged and tested, with a cost of $10 per chip. The test yield is 90%; only those that pass the test will be sold to customers. If the retail price is 40% more than the cost, at least how many chips have to be sold to break even? 3. PH4e: Exercise 2.3, 2.32. Team 19 1. The IBM System/360 was one of most important members instruction set to write the sequence of instructions to calculate the factorial of a non negative number stored in the location A. 2. We wish to compare the performance of two different machines: M1 and M2. The measurements made on the machines are shown in the table below. Which machine is faster for each program and by how much? Program Time on M1 Time on M2 1 10 seconds 5 seconds 2 3 seconds 4 seconds 3. PH4e: Exercise 2.4, 2.33. Team 20 1. (a) What are CISC and RISC CPU architectures? (b) Typically one CISC instruction takes more time to complete than a RISC instruction. Assume that a certain task needs P CISC instructions and 2P RISC instructions, and that one CISC instruction takes 8T ns to complete, and one RISC instruction takes 2T ns. Under this assumption, which one has the better performance? 2. The IBM System/360 was one of most important members instruction set to demonstrate the following: Let A=A(1), A(2),..., A(1000) be a vector of non negative integers. Find the largest integer in the vector and store it a variable named L. 3. Consider the data obtained after executing a program from two machines (M1 and M2) shown in the table below. Program execution time and total instructions executed by the program are shown. What is the instruction execution rate (instructions per second) for each of the machines. On M1 On M2 Program Time Instructions Time Instructions Program Time executed Time executed 1 10s 200 10 6 5s 160 10 6 4. PH4e: Exercise 2.5, 2.34. Team 21 1. Use the IBM System/360 Instruction set for this question. Write a code sequence to find the sum of all the elements of an array. You may assume that the array contains 1024 elements. Each of the elements is 32 bit long. 2. What is a von Neumann machine? Explain with an illustration. 3. PH4e: Exercise 2.6, 2.35. 4. Consider two implementations, M1 and M2 of the same instruction set. There are 4 classes of instructions (A, B, C, and D) in the instruction set. M1 and M2 operate at a clock rate of 500MHz and 750MHz, respectively. The average number of clock cycles for each instruction class on M1 and M2 are given in the table below. Machine M1 Class CPI for this class A 1 B 2 C 3 D 4 Machine M2 Class CPI for this class A 2 B 2 C 4 D 4 Assume that peak performance is defined as the fastest rate that a machine can execute an instruction sequence chosen to maximize that rate. What are the peak performances of M1 and M2 expressed as instructions per second? Team 22..., A(1000) and B=B(1), B(2),..., B(1000) be two vectors 2. Write the sequence of instructions involved in the evaluation of the following equation in a accumulator based machine: b = a 2 + 3 a + 7.
3. For this question, use the data from questions 19.5 and 20.5, wherever required. Assume the clock rates of the machines M1 and M2 are 200 MHz and 300 MHz respectively. Find the CPI for program 1 on both machines. 4. PH4e: Exercise 2.8, 2.36. Team 23 1. Write MIPS I assembly language code for stack push and pop operations assuming the following: (a) all stack elements to high memory addresses, (c) one register is used as the stack pointer and always contains the memory address just after the element which is on top of the stack, (d) for a push operation, a certain register holds the data to be pushed, (e) for a pop operation, the element from top of stack is to be popped into register a chosen register. 2. PH4e: Exercise 2.9, 2.37. 3. Consider two different implementations, M1 and M2, of the same instruction set. There are 3 classes of instructions (A, B, and C) in the instruction set. M1 has a clock rate of 400 MHz, and M2 has a clock rate of 200 MHz. The average number of cycles for each instruction class on M1 and M2 is given in the following table: Class CPI on CPI on C1 C2 3 rd -party M1 M2 usage usage usage A 4 2 30% 30% 50% B 6 4 50% 20% 30% C 8 3 20% 50% 20% The table also contains a summary of how three different compilers use the instruction set. C1 is a compiler produced by the makers of M1, C2 is a compiler produced by the makers of M2,, and the other compiler is a third-party product. Assume that each compiler uses the same number of instructions for a given program but that the instruction mix is as described in the table. Using C1 on both M1 and M2, how much faster can the makers of M1 claim that M1 is compared with M2? Using C2 on both M2 and M1 how much faster can the makers of M2 claim that M2 is compared with M1? If you purchase M1, which compiler would you use? Which machine would you purchase if we assume that all the other criteria are identical, including costs? Team 24 1. Show how a rotate left operation, R1 R2 rotate left by 7 bits, can be implemented using the MIPS I instruction set. in the destination register R1. 2. PH4e: Exercise 2.10, 2.38. 3. We are interested in two implementations of a machine, one with and one without special floating-point hardware. Consider a program, P, with the following mix of operations: Floating-point Multiply 10% Floating-point Add 15% Floating-point Divide 5% Integer Instructions 70% Machine MFP (Machine with Floating Point) has floatingpoint hardware and can therefore implement the floatingpoint operations directly. It requires the following number of clock cycles for each Instruction class: Floating-point Multiply 6 Floating-point Add 4 Floating-point Divide 20 Integer Instructions 2 Machine MNFP (Machine with No Floating Point) has no floating-point hardware and so must emulate the floatingpoint operations using integer instructions. The integer instructions all take 2 clock cycles. The number of integer instructions needed to implement each of the floating point operations is as follows: Floating-point Multiply 30 Floating-point Add 20 Floating-point Divide 50 Both machines have a clock rate of 1000 MHz. Find the native MIPS (Millions of Instructions Per Second) ratings for both machines. Team 25 1. Use the MIPS I instruction set and write the sequence of instructions..., A(1000) be a vector of non negative integers. Sort the elements of the vector in the non-decreasing order in place. 2. If the machine MFP in Question 24.3 needs 300 million instructions for this program, how many integer instructions does the machine MNFP require for the same program? 3. PH4e: Exercise 2.11, 2.39. Team 26 to calculate the factorial of a non negative number stored in the location A. 2. (a) The internal representation of floating point numbers in IA-32 is 80 bits wide. This contains a 16 bit exponent. However it also advertises a 64 bit significand. How is this possible? (b) While the IA-32 allows 80 bit floating point numbers internally, only 64 bit floating point numbers, can be loaded or stored. Starting with only 64 bit numbers, how many operations are required before the full range of the 80 bit exponents are used? Give an example. 3. PH4e: Exercise 2.12, 2.40. 4. You are the lead designer of a new processor. The processor design and compiler are complete, and now you must decide whether to produce the current design as it stands or spend additional time to improve it. You discuss this problem with your hardware engineering team and arrive at the following options. (a) Leave the design as it stands. Call this base machine Mbase. It has clock rate of 500 MHz, and the following measurements have been made using a simulator.
Instruction Class CPI Frequency A 2 40% B 3 25% C 3 25% D 5 10% (b) Optimize the hardware. The hardware team claims that it can improve the processor design to give it a clock rate of 600 MHz. Call this machine Mopt. The following measurements were made using a simulator for Mopt: Instruction Class CPI Frequency A 2 40% B 2 25% C 3 25% D 4 10% What is the CPI for each machine? Team 27..., A(1000) be a vector of non negative integers. Find the largest integer in the vector and store it a variable named L. 2. Show a sequence of MIPS 1 instructions that can be used to implement the multiplication operation R1 R2 R3, where we require the destination to contain the value of the product if it fits in 32bits, or 0 if the product does not fit in 32 bits. 3. Use the data from Question 26.5 for this question. What are the native MIPS ratings for Mbase and Mopt? How much faster is Mopt than Mbase? 4. PH4e: Exercise 2.13. Team 28..., A(1000) be a vector of non negative integers. Find the largest integer in the vector and store it a variable named L. 2. Write the sequence of instructions involved in the evaluation of the following equation in a Stack based machine: b = a 2 + 3 a + 7. 3. Define the following terms with respect to computer organization and architecture: accumulator, instruction set architecture, integrated circuit, sequential systems, combinational systems. 4. PH4e: Exercise 2.14.