Computer Architecture Development: From the BESM-6 Computer to Supercomputers

Similar documents
Instruction Register. Instruction Decoder. Control Unit (Combinational Circuit) Control Signals (These signals go to register) The bus and the ALU

Unit 9 : Fundamentals of Parallel Processing

Lecture 1: Course Introduction and Overview Prof. Randy H. Katz Computer Science 252 Spring 1996

Lecture 1: Introduction

Lecture 9: MIMD Architecture

COMPUTER - GENERATIONS

Lecture 9: MIMD Architectures

Vector an ordered series of scalar quantities a one-dimensional array. Vector Quantity Data Data Data Data Data Data Data Data

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

EECS4201 Computer Architecture

Fundamentals of Quantitative Design and Analysis

Copyright 2012, Elsevier Inc. All rights reserved.

3.1 Description of Microprocessor. 3.2 History of Microprocessor

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE) UNIT-I

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Parallel Processors. Session 1 Introduction

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

MaanavaN.Com CS1202 COMPUTER ARCHITECHTURE

PIPELINE AND VECTOR PROCESSING

Portland State University ECE 588/688. Cray-1 and Cray T3E

Fundamental of digital computer

Trends in HPC (hardware complexity and software challenges)

Pipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD

Supercomputers. Alex Reid & James O'Donoghue

Chapter 2 Logic Gates and Introduction to Computer Architecture

What are Clusters? Why Clusters? - a Short History

Computer Architecture. Fall Dongkun Shin, SKKU

Lecture 7: Parallel Processing

1 Digital tools. 1.1 Introduction

Computer System architectures

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization

Advanced Topics in Computer Architecture

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J.

Honorary Professor Supercomputer Education and Research Centre Indian Institute of Science, Bangalore

Operating Systems. studykorner.org

SIMD Parallel Computers

Parallel Computing: Parallel Architectures Jin, Hai

Chapter Seven Morgan Kaufmann Publishers

Author: Dr. Atul M. Gonsai Bhargavi Goswami, Kar Uditnarayan

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Computer Evolution. Budditha Hettige. Department of Computer Science

Hakam Zaidan Stephen Moore

CS370 Operating Systems

Computer Evolution. Computer Generation. The Zero Generation (3) Charles Babbage. First Generation- Time Line

SRM ARTS AND SCIENCE COLLEGE SRM NAGAR, KATTANKULATHUR

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Lecture 8: RISC & Parallel Computers. Parallel computers

VHDL-MODELING OF A GAS LASER S GAS DISCHARGE CIRCUIT Nataliya Golian, Vera Golian, Olga Kalynychenko

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

Lecture 9: MIMD Architectures

Parallelism and Concurrency. COS 326 David Walker Princeton University

CHETTINAD COLLEGE OF ENGINEERING AND TECHNOLOGY COMPUTER ARCHITECURE- III YEAR EEE-6 TH SEMESTER 16 MARKS QUESTION BANK UNIT-1

SSRVM Content Creation Template

TU Wien. Fault Isolation and Error Containment in the TT-SoC. H. Kopetz. TU Wien. July 2007

The Memory Hierarchy Part I

INTELLIGENCE PLUS CHARACTER - THAT IS THE GOAL OF TRUE EDUCATION UNIT-I

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Computer Organization and Assembly Language

Parallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother?

QUESTION BANK UNIT-I. 4. With a neat diagram explain Von Neumann computer architecture

omputer Design Concept adao Nakamura

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

UNIT I BASIC STRUCTURE OF COMPUTERS Part A( 2Marks) 1. What is meant by the stored program concept? 2. What are the basic functional units of a

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Instructor Information

Module 5 Introduction to Parallel Processing Systems

High performance computing. Memory

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Module 1: Introduction

Fujitsu s Approach to Application Centric Petascale Computing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Cell Broadband Engine. Spencer Dennis Nicholas Barlow

Introduction to OS (cs1550)

CSC 453 Operating Systems

Figure 1-1. A multilevel machine.

Operating Systems: Internals and Design Principles. Chapter 2 Operating System Overview Seventh Edition By William Stallings

System Verification of Hardware Optimization Based on Edge Detection

1. Fundamental Concepts

CS370 Operating Systems

Computer Organization and Design, 5th Edition: The Hardware/Software Interface

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Reduced Instruction Set Computers

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

Introduction. Operating Systems. Introduction. Introduction. Introduction


A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

The Architecture of a Homogeneous Vector Supercomputer

Design and Implementation of an AHB SRAM Memory Controller

Module 1: Introduction. What is an Operating System?

A Data-Parallel Genealogy: The GPU Family Tree. John Owens University of California, Davis

CS550. TA: TBA Office: xxx Office hours: TBA. Blackboard:

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Syllabus for Computer Science General Part I

A taxonomy of computer architectures

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley

Today's Outline. In the Beginning - ENIAC

Transcription:

Computer Architecture Development: From the BESM-6 Computer to Supercomputers Yuri I. Mitropolski Institute of System Analysis Russian Academy of Sciences mitr@imvs.ru Abstract. This work outlines the architecture and main features of the BESM-6 computer and data processing system AS-6. It also discusses the development of the Electronika SS BIS-1 supercomputer system as well as the principle results of the research on multi-architecture (heterogeneous) supercomputer systems. Keywords: Supercomputer architecture, heterogeneous multicomputer systems, pipelined vector processors, multi-architecture supercomputer systems, scalable heterogeneous chip multiprocessors 1. Introduction The BESM-6 computer became the milestone of domestic computer development in the USSR. The chief designer of BESM-6 was academician S.A. Lebedev; the vicechief designer was academician V.A. Melnikov. This computer was oriented towards the solution of large-scale scientific and engineering problems. This fact was the major influence on the architecture, the logical design, and the construction of the machine. The logical circuits were based on diode gates and transistor current switch amplifiers with a floating power supply. Due to the original construction with short wires between modules, it achieved a very high clock frequency of 10 MHertz. It used a pipeline on clock frequency in control and arithmetic units. The interleaving of banks of the main memory and non-addressed buffer memory with associative search maintained the proper speed of access to the memory. In fact, this design was similar to the cache memory of the IBM System 360 Model 85, which was introduced two years later. The BESM-6 computer had a simple and effective one-address instruction set, a floating-point arithmetic unit and registers for address modification. For multiprogramming, it used paged virtual addressing, a processor interruption system, a privileged instruction set, and a multi-user operational system. There were seven selector channels for magnetic drum, disk and tape memory, and multiplexer channel for input-output devices based on CPU multiprogramming. The original design methodology had two features: a formula description of logics and a table system with a table for each module describing all its wire connections [4].

2. Developments The beginning of the production was 1967. The BESM-6 computer set two records: the duration of production of fifteen years and the duration of use for more than twenty years. The impact of this computer on many generations of engineers and programmers was based mainly on the fruitful ideas of its original design. The next major development produced under the leadership of chief designer V.A. Melnikov was the AS-6 data processing system. The installation and maintenance of the BESM-6 at the computer centers that processed large volumes of data from many abonents, in particular from the Space Flight Control Center, served as a stimulus for the AS-6 system development. A weak point in the BESM-6 design was the small number of external devices available for use and the low throughput of the input-output channels. At the first stage the main task was the design of the interface unit (in Russian Аппаратура Сопряжения - АС-6) between the BESM-6 machine and AS-6 machine with the capability of attaching a large number of telephone and telegraph channels, channels for telemetric data, and also increasing the number of peripheral devices. However, after experimental work with at the first stage it became clear that the system must contain more powerful means of processing data; most of all, it must be a scalable system with the ability of attaching additional computers and devices. In the end, the task was the design of multicomputer system with an advanced means for system reconfiguration. 3. The AS-6 System The fundamental ideas were the creation of application-specific subsystems and devices, and a unification of channels for the whole system. Beside the BESM-6 the system included an AS-6 central processing unit, a peripheral computer PM-6 consisting of a peripheral multichannel processor, a multichannel exchange unit (channeler), additional units of main memory, magnetic disks controllers, and a telemetric controller. They connected all these components by a unified first-level channel. Since 1973, the AS-6 system was in use for experimental purposes. At the same time its development was continued. In 1975, the system was in use during works on the joint Soviet-American Apollo Soyuz mission. The complete system was ready in 1979. In the AS-6 system design, they realized for the first time new ideas that became the basis for supercomputer development and research on advanced high-performance computer architecture. To begin, it is necessary to highlight the following features: o The AS-6 was a heterogeneous multicomputer system; o The AS-6 central processing unit was problem-orientated for the tasks of complex objects control and effective translation; o The PM-6 peripheral computer was function-specific for input-output control; o The system used several specially designed unifying channels.

4. The BESM-10 and the Electronika During the period of development and operation of the system, it became obvious that new architectural ideas and old electronic technology were not corresponding one to the other. In 1973, they worked out the BESM-10 system. They planned to develop the advanced computer system on high-speed ECL LSI chips. The Ministry of Radio Industry of the USSR did not support this project. Further development in this direction under the leadership of academician V.A. Melnikov was the creation of the Electronika SS BIS-1 supercomputer system. The aim of this development was to increase the performance of the new computer by two orders of magnitude. This led to the development of a new technology and appropriate architecture [5]. It was planned on the first stage of design that the system would include the following problem-oriented subsystems: a main computer with a pipelined vector processor, a matrix computer, and a logical data processing machine. In addition to this, they planned the development of the following function-specific subsystems: a peripheral computer, an external solid-state memory controller, a magnetic disk controller, control computers, and front-end computers. Due to the shortage of resources, they decided to postpone the development of the matrix, logical and peripheral machines. The two choices of architecture for the main computer came under review; namely, the AS-6 central processor as enhanced in BESM-10 project and the pipelined vector processor. The second one was more appropriate due to much higher performance and better utilization of synchronous pipelines. The feature of external solid-state memory was the development of intellectual controller intended for the realization of different access methods for the main computer and for joining up two vector machines in a single platform. The peak performance of two-processor system achieved 500 MFLOPS. The software included operational systems of main and front-end computers, programming systems for macro assembler, and the Fortran 77, Pascal, and C programming languages [6]. In 1991, they tested the Electronika SS BIS-1 system. Four sets of system hardware were delivered to customers. In the same year, they developed the Electronika SS BIS-2 multiprocessor system. The planned peak performance was 10 GFLOPS. Beside multiprocessor main machines, the system has included monitor machines for task preparation and system control as well as massively parallel subsystems. However, in 1993, they made the decision to discontinue all work on this effort. 5. Reflections The experience gained during the development of these systems became the basis for the research on heterogeneous supercomputer systems. They showed that the most effective system was one with subsystems oriented toward different forms of parallelism, that is, those corresponding to different forms of parallelism of parts belonging to the very large problems. The processing of large data sets is connected with data level parallelism. This form of parallelism is used most effectively in vector

processors. Another form of parallelism is parallelism of tasks, which is present in problems with great number of independent or weakly coupled subtasks. In this case, the best choice is a multiprocessing subsystem. At the first stage of research, they developed the concept of heterogeneous supercomputer systems. In particular, they proposed to integrate in a single platform a vector uniprocessor and a multiprocessor, both having the access to the common memory [7]. The whole system could have several such integrated platforms based on large system memory [1]. The next stage of research concerned the analysis of advanced ULSI technologies with more than one billion transistors on a chip. They developed the original architecture of a scalable multi-pipelined vector processor. This processor had many parallel pipelined branches each having a chain of simple vector processor with several functional units and a local memory. A program having complex vector operations achieved the best results. The performance of such scalable processor was up to 1024 floating-point operations per cycle [8]. In accordance with these concepts, the research project of a multi-architecture supercomputer system had been developed [9]. The system consisted of a computing subsystem, monitor-simulating subsystem, system memory and external subsystem. The computing subsystem includes a vector multiprocessor, a scalar multiprocessor, and a monitor computer. The productivity of this system can be very high, up to 90% of its peak performance. The performance depends on the level of technology and it could reach up to 10 PFLOPS. 6. Conclusion The comparison of this project with foreign researches and developments shows that in theoretical and conceptual sense it is ahead of published results [10]. On March 20 of 2006, Cray Inc. announced plans to develop supercomputers that will take the concept of heterogeneous computing to an entirely new level by integrating a range of processing technologies into a single platform. The second phase will result in a fully integrated multi-architecture system [3]. The same concepts were published in 1995 [7]. The term multi-architecture was proposed in 2003 [9]. The design of the cell chip by Sony, IBM, and Toshiba began in 2001. Cell is a heterogeneous chip multiprocessor that consists of an IBM 64-bit Power Architecture core, augmented with eight specialized co-processors based on a novel single-instruction multiple data [2]. In a paper published in 1998 [8], one reads the following: The chains and parallel branches of the modules provide high performance due to their topology with the shortest links between functional units. In a maximum configuration, the uniprocessor might have up to 1024 modules and attain up to 4 thousands floating-point operations per cycle or 4 TFLOPS.

References 1. Anohin, A.V., L.M. Lengnik, Yu. I. Mitropolski, I.I. Puchkov, The architecture of Heterogeneous Supercomputer System, Proc. Fifth International Seminar The Distributed Information Processing, Novosibirsk, 1995, pp. 22-27 (In Russian). 2. Cell Architecture, The; http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html. 3. Cray Will Leverage an "Adaptive Supercomputing" Strategy to Deliver the Next Major Productivity Breakthrough, http://www.cray.com. 4. Kuzmichev, D.A., V.A. Melnikov, Yu. I. Mitropolski, V.I. Smirnov, The Principles of Design of Large-scale Computers, Proc. Jubilee Scientific Conference Devoted to 25 Years of Institute for Precise Mechanics and Computer Engineering USSR Academy of Sciences, Moscow, 1975, pp. 3-16 (In Russian). 5. Melnikov, V.A., Yu. I. Mitropolski, A.I. Malinin, V.M. Romankov, Requirements for Hardware Design of High Performance Computers and Problems of Their Development, Proc. Cybernetics Problems, Complex Design of Electronics and Mechanical Construction of Supercomputers, ed. by V.A. Melnikov and Yu. I. Mitropolski, Scientific Counsel on Cybernetics of the USSR Academy of Sciences, Moscow, 1988, pp. 3-10 (In Russian). 6. Melnikov, V.A., Yu. I. Mitropolski, G.V. Reznikov, Designing the Electronica SS BIS Supercomputer, IEEE Trasactions on Components, Packaging, and Manufacturing Technology, Part A, June 1996, V.19, # 2, pp. 151-156. 7. Mitropolski, Yu.I., The Concept of Heterogeneous Supercomputer Systems Development, Proc. Fifth International Seminar The Distributed Information Processing, Novosibirsk, 1995, pp. 42-46 (In Russian). 8. Mitropolski, Yu.I., Architecture of Multipipe-lined Module Scalable Uniprocessor, Proc. Sixth International Seminar The Distributed Information Processing, Novosibirsk, 1998, pp. 30-34 (In Russian). 9. Mitropolski, Yu.I., Multiarchitecture Supercomputer System, Proc. First All-Russian Scientific Conf. Methods and Means of Information Processing, Moscow State University, Moscow, 2003, pp.131-136 (In Russian). 10. Mitropolski, Yu.I., Mutiarchitecture New Paradigm for Supercomputers, Electronics: Science, Technology, Business, 2005, # 3, pp. 42-47 (In Russian).