Hakam Zaidan Stephen Moore

Size: px

Start display at page:

Download "Hakam Zaidan Stephen Moore"

Sara Wood
5 years ago
Views:

1 Hakam Zaidan Stephen Moore

2 Outline Vector Architectures Properties Applications History Westinghouse Solomon ILLIAC IV CDC STAR 100 Cray 1 Other Cray Vector Machines Vector Machines Today

3 Introduction A Vector processor is a CPU that can run one instructiononanentire vector of data. The fetched number of instructions are small. They also achieve data parallelism in large scientific and multimedia applications.

4 Styles of Vector Architectures Based on how the operands are fetched, vector processors can be divided into two categories: Memory Memory Architecture. Vector Register Architecture.

Vector Processor Elements Vector Register: Fixed length, single vector, ports for reading and writing. Usually 8 to 32 registers of length 64 or 128 bits.

5 Vector Processor Elements Vector Register: Fixed length, single vector, ports for reading and writing. Usually 8 to 32 registers of length 64 or 128 bits. Vector Functional Units (FUs): Usually 4 8 functional units: FP mult, FP add, and FP divide, in addition to the integer add and logical shift Vector Load Store Unit (LSUs). Scalar registers. Cross bar.

6 Vector Processor Properties Results are independent. Known pattern for memory access by the vector instructions. In pipelines, branches and branch problems are reduced. Single vector instruction indicate huge amount of calculations (e.g. loops).

7 Disadvantages With scalar instructions: Relatively slow. Some difficulties in the implementation of the precise exceptions. High cost for on chip vector memory systems. Code complexity.

8 Applications Lossy compression. Lossless compression. Multimedia Processing. Standard benchmarking kernels. Handwriting recognition. Speech recognition. Cryptography. Operating system and networking. Databases. Support of language run time.

9 History In 1962, Illinois Automatic Computer series of super computers ILLIAC I, ILLIAC II, ILLIAC III, ILLIAC IV (with 64 ALUs Mflops). In 1973 TI s Advance Scientific Computer (ASC) Mflops. In 1975 the Cray 1 ( Mflops) was the first super computer to have vector registers instead of keeping data in memory. CRAY XMP, CRAY YMP, NEC SX/2, CRAY C 90, NEC SX/4, CRAY J 90, CRAY T 90, NEC SX/5. (from 1976 to 1999).

10 Westinghouse Solomon Project Used an array of processing elements (PE) Applied same instruction to all processors, different data per processor Research contract with US Air Force Prototype built in 1964 Development ended after contract expired

12 ILLIAC IV Parallel Machine One Control Unit (CU) controlled PEs One of predicted four CUs built 64 PEs available per CU Each PE had private memory unit Expected 1000 MFLOPS Achieved MFLOPS Fastest machine until 1981

14 CDC STAR 100 Designed to operate at 100 MFLOPS Long pipelines Long vector setup time Needed to have 50 elements to be faster than competitors Scalar performance was slow

15 Cray 1 Supercomputer built in MFLOPS, or 250 MFLOPS for bursts Fast Vector and Scalar computation Smaller than other computers

17 Architecture Uses registers to increase speed 8 24 bit address registers bit address save registers 8 64 bit scalar registers bit scalar save registers 8 64 word vector registers Chains together functional units Address, Scalar, Vector, and Floating Point

19 Other Cray Computers Cray X MP (1983) Used shared memory, faster clock, more memory bandwidth, 2 CPUs, MFLOPS Cray 2 (1985) New architecture, fast memory, 1.9 GFLOPS Cray Y MP (1988) 2, 4, or 8 vector processors, 2.67 GFLOPS Cray X1 (2003) Unification of multiple architectures, 12.8 GFLOPS Not financially successful

20 Vector Supercomputers

21 Vector Machines Today Very expensive to build Smaller speedup compared to using multiple processors Processors with many sequential cores are preferred Vector Machine concepts are still used IBM ViVA Virtual Vector machine Uses multiple functional units Acts as a vector processor

22 Vector Intelligent RAM (VIRAM) Architecture developed at UC Berkeley. Full vector microprocessor and DRAM on a single chip. Lower memory latency up to 5 10X lower, and bandwidth up to X higher. High bandwidth for I/O up to GB/sec. Improve energy efficiency 2X 4X, as there are no off chip bus. Adjustable memory size. Lower cost and power than traditional vector supercomputers.

23 Clustered Organization for Decoupled Execution (CODE) Developed at UC Berkeley. CODE is a proposed vector architecture to overcome the conventional vector processors disadvantages or limitations. CODE organizes the vector registers in clusters 4 8 registers in each cluster. CODE allows partial completion of an instruction in case of an exception. CODE supports precise exception using a history buffer. CODE can hide communication latency.

24 Conclusion Vector supercomputers are not practical due its high cost. To improve the cost performance, vector supercomputers are adapting commodity technology like SMT. Designs of superscalar microprocessors designs began to absorb some of the techniques made popular in earlier vector computer systems. (e.g. Intel MMX extension). Vector processors are useful for embedded and multimedia applications which require low power, small code size and high performance.

25 References C. Kozyrakis, D. Patterson, Overcoming the Limitations of Conventional Vector Processors, in ISCA, C. Kozyrakis, D. Patterson, Vector vs. Superscalar and VLIW Architectures for Embedded Multimedia Benchmarks, in MICRO, W. J. Bouknight, et al., The Illiac IV System, Proceedings of the IEEE Vol 60, No. 4, April R. M. Russell, The Cray 1 Computer System, Communications of the ACM Vol 21, No 1, Jan J Gebis, et al., Improving Memory Subsystem Performance using ViVA: Virtual Vector Architecture in ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems, 2009, pp D. L. Slotnick, et al., The Solomon Computer, Westinghouse Electric Corporation, Baltimore, MD, 1962.

26 Questions?

CMPE 655 Multiple Processor Systems. SIMD/Vector Machines. Daniel Terrance Stephen Charles Rajkumar Ramadoss

CMPE 655 Multiple Processor Systems SIMD/Vector Machines Daniel Terrance Stephen Charles Rajkumar Ramadoss SIMD Machines - Introduction Computers with an array of multiple processing elements (PE). Similar