AN INTEGRATED DEVELOPMENT / RUN-TIME ENVIRONMENT

Size: px
Start display at page:

Download "AN INTEGRATED DEVELOPMENT / RUN-TIME ENVIRONMENT"

Transcription

1 AN INTEGRATED DEVELOPMENT / RUN-TIME ENVIRONMENT William Cave & Robert Wassmer - May 12, 2012 INTRODUCTION The need for an integrated software development / run-time environment is motivated by requirements to significantly improve productivity and run-time speed on parallel processors. Trying to meet such objectives using current programming languages is a challenge that software developers no longer need to confront. This paper describes an approach to building software that follows from engineering principles. Having used this system (known as VisiSoft), it becomes clear that design of the development environment must be integrated with that of the run-time environment. The approach described here achieves the following objectives. Substantially simplify software development for parallel processors. Create software that runs much faster on single as well as parallel processors. Control the growing complexity of a software system as it is expanded. Create modules that can be changed with minimal effects on the rest of the system. SOFTWARE-HARDWARE ENVIRONMENT - FUNCTIONAL REQUIREMENTS Figure 1 illustrates a decomposition of the software-hardware environment from application requirements to results. To address the objectives in the introduction, we must consider the individual requirements in the chain of elements in the software-hardware environment. These are described below, refer to [1]. APPLICATION REQUIREMENTS DESIGN ARCHITECTURE & PRODUCE CODE SOFTWARE-HARDWARE ENVIRONMENT RUN-TIME ENVIRONMENT DEVELOPMENT ENVIRONMENT APPLICATION SOFTWARE RUN-TIME SYSTEM (RTS) VIRTUAL PARALLEL OPERATING SYSTEM (VPOS) HARDWARE RESULTS PlatformHierarchies 05/21/12 Figure 1. Overall software - hardware environment. The authors are with Visual Software International ( Software Architecture Page 1

2 Application Requirements This approach addresses large complex parallel processor applications requiring a team effort. For the purposes of this paper, applications are divided into three types: 1. Embarrassingly Parallel - Applications that may be split into independent tasks that run concurrently with effectively no exchange of information. 2. Partially Independent - A single task that may be split into independent modules that must exchange information during the task, but where the processing time for information exchanges are small compared to what is going on inside the modules. 3. Effectively Sequential - A single task where most instructions follow from the prior ones, providing little chance for concurrent processing. This paper addresses partially independent applications, i.e., those with a reasonable amount of inherent parallelism, that may be run effectively on a parallel processor to meet stringent run-time speed requirements. Examples are real-time planning and control systems used in large manufacturing plants, or simulations of many platforms, e.g., aircraft, exchanging information - by radio - that affects their future behavior. The applications addressed also require high reliability, rapid enhancement to support new features, and potential for growth of complexity. Architectural Design In the case of applications to be run on a parallel processor, the architect must decompose the application into sets of relatively independent modules that represent the inherent parallelism of the system. By this we imply that processing within modules far exceeds communications between modules, stemming from the inherent parallelism in the system (partially independent). These modules may then be placed on separate processors to run efficiently. For complex systems requiring special skills to produce a design (e.g., systems requiring detailed engineering knowledge, special experience, or historic statistical knowledge), subject area experts must be able to understand both software architectures and code with minimal help from programmers. They must also be able to help design architectures that take full advantage of the inherent parallelism in the application system since only they may have that knowledge. Development Environment The development environment must support high productivity to minimize the time and cost of development, validation, and testing. This implies rapid translation of application requirements into software architectures that reflect the inherent parallelism in the system. This is particularly true during post development upgrades and support. This implies that architectures can be easily inspected, visually - using engineering drawings of connectivity (they are not flow charts), to maintain full control over the design. It also implies that the language effectively supports this architectural breakout. In addition, the language must be easily read directly by subject area experts, so they can understand and validate complex algorithms representing the system as well as the architectural breakout. Software Architecture Page 2

3 The development environment must also produce the information needed by the run-time environment to ensure that full advantage is taken of the architectural characteristics of the application software. This is especially true when trying to achieve high run-time speeds on a parallel processor while minimizing the machine resources required to achieve that speed. This information includes designation of the independent modules that may be assigned to separate processors. It must also produce the connectivity properties between modules so those that communicate may be located on physically adjacent processors to minimize communication delays. In the case that the use of these connectivity properties is nonstationary (modules may vary their use of the connectivity properties by communicating with different modules as they operate), modules may be migrated to reduce communication delays during run time. To support the above, the development environment must produce the application software object code in segments, corresponding to the independent modules produced by the architecture. Similarly, it must produce the database describing the independent module architecture along with management software to interface with the OS. Given this information, the OS can take maximum advantage of a (potentially simplified) hardware architecture. Application Software It is essential that the resulting application software be able to run fast on a single or parallel processor while using minimum machine resources. This implies that the machine code is organized such that hardware resource management, and in particular memory management, is simplified. This implies that the chunks of code to be managed are well defined and organized into a minimum number of chunks. This is another architectural design problem that depends heavily on the language used in the development environment to describe the databases. Run-Time System (RTS) The run-time system must provide the translation of architectural information from the development environment into calls to the OS during run time. It is the architectural design that minimizes the movement of instruction memory as well as data memory at run time. As indicated above, architectural information can be used to optimize processor allocation so as to minimize memory boundary crossing delays. Use of this information by the run-time system is critical to effective use of parallel processors. Virtual Parallel Operating System (VPOS) VPOS must be designed to take full advantage of the information provided by the runtime system. Specifically, it must be designed to allocate and assign hardware machine resources to make maximum effective use of this information. This includes minimizing overhead and memory sharing delays to achieve maximum run-time speed. This can only be achieved by allocating processors and memory to independent modules based upon the architectural information, including the possible migration of partially independent modules when the time-constants of nonstationary inter-module communications permit. Software Architecture Page 3

4 Hardware In applications where run-time speeds are critical and parallel processors are required to support a single task, the hardware design must support the run-time system and corresponding OS requirements. In general, one typically trades memory for speed, duplicating instruction sets and stationary databases on separate processors to avoid swapping and paging. With the approach to architecture described here, hardware designers can focus on the essentials of minimizing overhead and memory sharing delays to achieve maximum speeds on a parallel processor, with little concern for the inherent architecture of an application software system. This is because full knowledge of the inherent parallelism of the system is embedded in the architectural design and automatically transferred to the run-time system. With the integrated approach described here, the software development environment directly impacts the design of the run-time environment, including the OS. This, in turn, can be used to simplify design of multi-core chips. Specifically, the combination of language facilities and architecture eliminates the need for the hardware facilities in the bullets below, opening up chip real-estate for better use, e.g., more memory. Cache coherency Thread synchronization Stack facilities Special instruction swapping facilities In the case where parallel processors may be dedicated to algorithm-intensive or memory-intensive applications that consume substantial processor time, they may be connected to server chips via shared memory as illustrated in Figure 2. When properly housed with a shared memory server environment, parallel processor chips need not interface directly with disks, communication channels, graphics, work stations, etc. One-way memory transfers to and from the server replace the need for special DMA channels or device interfaces. ACHIEVING SPEED INCREASES Design of the language for VisiSoft was driven by speed and accuracy for discrete event simulations of physical systems, typically with a high degree of inherent parallelism. The principle requirement was to develop a language that made it easy to build complex software for parallel processors as well as ease of understanding by subject area experts. The first step in the design was to separate data from instructions at the coding level. Known as the Separation Principle, this simplifies the ability to track which sets of instructions share what data sets. To minimize the number of data elements to be tracked requires the ability to support large hierarchical data structures. Similarly, one wants large hierarchical rule sets within a single process (a group of assembler instructions). Given that blocks of data are separated from blocks of instructions at the language level, one can easily build independent modules that map into the inherent parallelism of an application. As a by-product, this provides the ability to visualize the design using engineering drawings showing the connectivity of blocks of instructions with blocks of data (they are not flow charts). These are represented by icons that are grouped into hierarchical modules that form an independent module at the top layer, see Figure 3. Software Architecture Page 4

5 GENERAL PARALLEL PROCESSOR FACILITY MASTER CONTROLLER & BACKUPS SERVERs MASTER OS SERVER OS-1 SERVER OS-2 SERVER OS-3 SERVER OS-4 SERVER OS-5 SERVER OS-6 RUN-TIME MASTER_1 RUN-TIME MASTER_2 RUN-TIME MASTER_3 PARALLEL_PROCESSORS Parallel_processor_hardware 01/16/12 Figure 2. Server environment with parallel processors. Software Architecture Page 5

6 Software Architecture Page 6 UD UD Figure 3. Illustration of editing processes and resources on the drawing. PROPAGATION_PREDICTION PROPAGATION_PREDICTION FPPS 08/26/07

7 Software Decomposition - Creating Independent Modules The decomposition of a software system into independent modules implies drawing boundaries around the elements in a system that comprise a specified module. Any system can be decomposed into a set of modules. Furthermore, as modules get large, they can be decomposed hierarchically into submodules, etc. Creating modules that can run concurrently on a parallel processor presents explicit requirements on module design. Two modules can run concurrently only if they are independent. This implies that they share no data, else they incur the potential for incoherent use of that data. The independence property is also an important contribution to the other requirements stated above. To determine the independence of modules, one needs a map of the data shared between the processes (groups of instructions). This leads to the concept of software architecture as shown in Figure 3, a totally new approach to software design. Multipliers On The Speed Multipliers Being able to easily define and reference large data structures as illustrated in Figure 3, they may be moved using a single instruction fetch into another shared structure that defines the details of all of the elements. This provides for significant increases in speed when working with algorithms requiring large state vectors or databases. This has been born out by a substantial number of case histories and experiments. Given the speed multipliers that VisiSoft has generated on single processors, one may expect to use fewer processors (as many as a factor of 10 less) simply by using the VisiSoft environment to build the software. Using the architectural features of VisiSoft, one may create larger independent modules that will run faster (using less overhead) provided that each processor has sufficient adjacent memory. This new architectural approach affords speed increases that require fewer processors to achieve the same speed multiplier. Using fewer processors reduces the distance between processors, further increasing the speed multiplier. This is clearly a nonlinear function, where speed increases with fewer processors. Conversely, speed will decrease nonlinearly with more processors if they increase the overhead. This has been shown to be true in many parallel processor experiments. Ensuring Data Coherency When one independent module wants to communicate with another, it simply copies the shared data structure into a similar system data structure that ensures coherency of the data. The system data structure is part of the run-time system that places interlocks on processes that share the data structure. This includes the timing on scheduling of threads within each independent module. Software Architecture Page 7

8 Scheduling Of Threads Using the integrated approach, a thread must be contained within a single independent module. Threads within an independent module cannot run concurrently since they are on a single processor. Threads in one independent module may schedule threads in another (or the same) independent module. All threads are controlled by a run-time system scheduler that ensures they do not get out of synchronization, including those on separate processors. Thus, the developer has no concern for synchronization of threads or corresponding race conditions. Special facilities exist that allow timing to be out-of-sync up to a ΔT when using a discrete event simulation clock, where ΔT is determined based upon comparing error distributions of simulated results with live test data or single processor simulations. SUMMARY This paper describes the application of concepts and principles derived from engineering to support the design of large complex software systems for parallel processors. These principles include the properties of independence derived from separating data from instructions (the Separation Principle). These properties lead to increasing speed while reducing the effort required to develop software and to support enhancements that increase complexity, particularly when using parallel processors. This approach automates thread synchronization and eliminates the need for hardware (cache) coherency checks. The independence properties of modular architectures and understandability of complex algorithms have been confirmed on many large software projects. The language is easily read directly by subject area experts who must understand and validate complex algorithms representing the system as well as the architectural breakout. This approach simplifies the development of large software systems, particularly those whose complexity is high and constantly increasing, as well as those requiring the speed of a parallel processor. REFERENCES [1] Cave, W.C. et al, Time is of the Essence: Software Engineering for Parallel Processors, Visual Software International, Spring Lake, NJ, Dec Software Architecture Page 8

Operating Systems: Internals and Design Principles. Chapter 2 Operating System Overview Seventh Edition By William Stallings

Operating Systems: Internals and Design Principles. Chapter 2 Operating System Overview Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Chapter 2 Operating System Overview Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Operating systems are those

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2016 Lecture 2 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 2 System I/O System I/O (Chap 13) Central

More information

A Disruptive Solution to Parallel Processing

A Disruptive Solution to Parallel Processing A Disruptive Solution to Parallel Processing William C. Cave, Robert E. Wassmer, Kenneth T. Irvine, Henry F. Ledgard Prediction Systems, Inc.; University of Toledo December 6, 2017 - www.visisoft.com GENERIC

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing This document consists of two parts. The first part introduces basic concepts and issues that apply generally in discussions of parallel computing. The second part consists

More information

Introduction to System Design

Introduction to System Design Introduction to System Design Software Requirements and Design CITS 4401 Lecture 8 System Design is a creative process no cook book solutions goal driven we create a design for solving some problem constraint

More information

GSS - A complete simulation environment

GSS - A complete simulation environment GSS - A complete simulation environment LANGUAGE ENVIRONMENT BUILD MODELS VDE ARCHITECTURE ENVIRONMENT SUPPORT ENVIRONMENT MODEL LIBRARY SIMULATION CONTROL LIBRARY INTERACT WITH SIMULATIONS RTG RUN-TIME

More information

Parallel Computing Concepts. CSInParallel Project

Parallel Computing Concepts. CSInParallel Project Parallel Computing Concepts CSInParallel Project July 26, 2012 CONTENTS 1 Introduction 1 1.1 Motivation................................................ 1 1.2 Some pairs of terms...........................................

More information

A Comparison of Two Distributed Systems: Amoeba & Sprite. By: Fred Douglis, John K. Ousterhout, M. Frans Kaashock, Andrew Tanenbaum Dec.

A Comparison of Two Distributed Systems: Amoeba & Sprite. By: Fred Douglis, John K. Ousterhout, M. Frans Kaashock, Andrew Tanenbaum Dec. A Comparison of Two Distributed Systems: Amoeba & Sprite By: Fred Douglis, John K. Ousterhout, M. Frans Kaashock, Andrew Tanenbaum Dec. 1991 Introduction shift from time-sharing to multiple processors

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture 2 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 2 What is an Operating System? What is

More information

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18 PROCESS VIRTUAL MEMORY CS124 Operating Systems Winter 2015-2016, Lecture 18 2 Programs and Memory Programs perform many interactions with memory Accessing variables stored at specific memory locations

More information

Modeling Page Replacement: Stack Algorithms. Design Issues for Paging Systems

Modeling Page Replacement: Stack Algorithms. Design Issues for Paging Systems Modeling Page Replacement: Stack Algorithms 7 4 6 5 State of memory array, M, after each item in reference string is processed CS450/550 Memory.45 Design Issues for Paging Systems Local page replacement

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (10 th Week) (Advanced) Operating Systems 10. Multiprocessor, Multicore and Real-Time Scheduling 10. Outline Multiprocessor

More information

EMC CLARiiON Backup Storage Solutions

EMC CLARiiON Backup Storage Solutions Engineering White Paper Backup-to-Disk Guide with Computer Associates BrightStor ARCserve Backup Abstract This white paper describes how to configure EMC CLARiiON CX series storage systems with Computer

More information

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 13 Virtual memory and memory management unit In the last class, we had discussed

More information

CS 856 Latency in Communication Systems

CS 856 Latency in Communication Systems CS 856 Latency in Communication Systems Winter 2010 Latency Challenges CS 856, Winter 2010, Latency Challenges 1 Overview Sources of Latency low-level mechanisms services Application Requirements Latency

More information

Chapter 1: Introduction. Operating System Concepts 9 th Edit9on

Chapter 1: Introduction. Operating System Concepts 9 th Edit9on Chapter 1: Introduction Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 Chapter 1: Introduction 1. What Operating Systems Do 2. Computer-System Organization 3. Computer-System

More information

The modularity requirement

The modularity requirement The modularity requirement The obvious complexity of an OS and the inherent difficulty of its design lead to quite a few problems: an OS is often not completed on time; It often comes with quite a few

More information

A unified multicore programming model

A unified multicore programming model A unified multicore programming model Simplifying multicore migration By Sven Brehmer Abstract There are a number of different multicore architectures and programming models available, making it challenging

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

System/370 integrated emulation under OS and DOS

System/370 integrated emulation under OS and DOS System/370 integrated emulation under OS and DOS by GARY R. ALLRED International Business Machines Corporation Kingston, N ew York INTRODUCTION The purpose of this paper is to discuss the design and development

More information

Reconfigurable Multicore Server Processors for Low Power Operation

Reconfigurable Multicore Server Processors for Low Power Operation Reconfigurable Multicore Server Processors for Low Power Operation Ronald G. Dreslinski, David Fick, David Blaauw, Dennis Sylvester, Trevor Mudge University of Michigan, Advanced Computer Architecture

More information

Chapter 17: Parallel Databases

Chapter 17: Parallel Databases Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems

More information

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications September 2013 Navigating between ever-higher performance targets and strict limits

More information

Qlik Sense Enterprise architecture and scalability

Qlik Sense Enterprise architecture and scalability White Paper Qlik Sense Enterprise architecture and scalability June, 2017 qlik.com Platform Qlik Sense is an analytics platform powered by an associative, in-memory analytics engine. Based on users selections,

More information

Lecture 2: September 9

Lecture 2: September 9 CMPSCI 377 Operating Systems Fall 2010 Lecture 2: September 9 Lecturer: Prashant Shenoy TA: Antony Partensky & Tim Wood 2.1 OS & Computer Architecture The operating system is the interface between a user

More information

SYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID

SYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID System Upgrade Teaches RAID In the growing computer industry we often find it difficult to keep track of the everyday changes in technology. At System Upgrade, Inc it is our goal and mission to provide

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

The Concurrency Viewpoint

The Concurrency Viewpoint The Concurrency Viewpoint View Relationships The Concurrency Viewpoint 2 The Concurrency Viewpoint Definition: The Concurrency Viewpoint: describes the concurrency structure of the system and maps functional

More information

SOFTWARE ARCHITECTURE FOR PARALLEL PROCESSORS W. C. Cave & R.E. Wassmer - August 17, 2016

SOFTWARE ARCHITECTURE FOR PARALLEL PROCESSORS W. C. Cave & R.E. Wassmer - August 17, 2016 SOFTWARE ARCHITECTURE FOR PARALLEL PROCESSORS W. C. Cave & R.E. Wassmer - August 17, 2016 BACKGROUND In the early Renaissance, artists sketched buildings that represented their imagined plans. Their renderings

More information

Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory - CAPSL. Introduction

Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory - CAPSL. Introduction Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory - CAPSL Introduction CPEG 852 - Spring 2014 Advanced Topics in Computing Systems Guang R. Gao ACM

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Parallelization Strategy

Parallelization Strategy COSC 6374 Parallel Computation Algorithm structure Spring 2008 Parallelization Strategy Finding Concurrency Structure the problem to expose exploitable concurrency Algorithm Structure Supporting Structure

More information

Ch 1: The Architecture Business Cycle

Ch 1: The Architecture Business Cycle Ch 1: The Architecture Business Cycle For decades, software designers have been taught to build systems based exclusively on the technical requirements. Software architecture encompasses the structures

More information

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor

More information

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE System Architecture Date: Tuesday 3rd June 2014 Time: 09:45-11:45 Please answer any THREE Questions from the FOUR questions provided Use a

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Achieving Rapid Data Recovery for IBM AIX Environments An Executive Overview of EchoStream for AIX

Achieving Rapid Data Recovery for IBM AIX Environments An Executive Overview of EchoStream for AIX Achieving Rapid Data Recovery for IBM AIX Environments An Executive Overview of EchoStream for AIX Introduction Planning for recovery is a requirement in businesses of all sizes. In implementing an operational

More information

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads) Parallel Programming Models Parallel Programming Models Shared Memory (without threads) Threads Distributed Memory / Message Passing Data Parallel Hybrid Single Program Multiple Data (SPMD) Multiple Program

More information

Technical Brief: Microsoft Configuration Manager 2012 and Nomad

Technical Brief: Microsoft Configuration Manager 2012 and Nomad Configuration Manager 2012 and Nomad Better together for large organizations ConfigMgr 2012 (including SP1 and R2) has substantial improvements in content distribution as compared with ConfigMgr 2007.

More information

Parallel Programming with OpenMP. CS240A, T. Yang

Parallel Programming with OpenMP. CS240A, T. Yang Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 28 August 2018 Last Thursday Introduction

More information

HANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY

HANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY HANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY Presenters: Harshitha Menon, Seonmyeong Bak PPL Group Phil Miller, Sam White, Nitin Bhat, Tom Quinn, Jim Phillips, Laxmikant Kale MOTIVATION INTEGRATED

More information

EMERALDS: a small-memory real-time microkernel

EMERALDS: a small-memory real-time microkernel EMERALDS: a small-memory real-time microkernel By Khawar M. Zuberi, Padmanabhan Pillai, and Kang G. Shin 4/28/2005 CMSC 691S Real-Time Systems 1 Outline Introduction Requirements EMERALDS Overview CSD

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 17 January 2017 Last Thursday

More information

Contemporary Design. Traditional Hardware Design. Traditional Hardware Design. HDL Based Hardware Design User Inputs. Requirements.

Contemporary Design. Traditional Hardware Design. Traditional Hardware Design. HDL Based Hardware Design User Inputs. Requirements. Contemporary Design We have been talking about design process Let s now take next steps into examining in some detail Increasing complexities of contemporary systems Demand the use of increasingly powerful

More information

IBM InfoSphere Streams v4.0 Performance Best Practices

IBM InfoSphere Streams v4.0 Performance Best Practices Henry May IBM InfoSphere Streams v4.0 Performance Best Practices Abstract Streams v4.0 introduces powerful high availability features. Leveraging these requires careful consideration of performance related

More information

Misc. Third Generation Batch Multiprogramming. Fourth Generation Time Sharing. Last Time Evolution of OSs

Misc. Third Generation Batch Multiprogramming. Fourth Generation Time Sharing. Last Time Evolution of OSs Third Generation Batch Multiprogramming Misc. Problem: but I/O still expensive; can happen in middle of job Idea: have a pool of ready jobs in memory, switch to one when another needs I/O When one job

More information

Chapter 1: Introduction. Operating System Concepts 8 th Edition,

Chapter 1: Introduction. Operating System Concepts 8 th Edition, Chapter 1: Introduction Operating System Concepts 8 th Edition, Silberschatz, Galvin and Gagne 2009 Operating-System Operations Interrupt driven by hardware Software error or system request creates exception

More information

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic) I/O Systems Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) I/O Systems 1393/9/15 1 / 57 Motivation Amir H. Payberah (Tehran

More information

6.1 Multiprocessor Computing Environment

6.1 Multiprocessor Computing Environment 6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,

More information

Chapter 11: File System Implementation. Objectives

Chapter 11: File System Implementation. Objectives Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block

More information

Expanding Automated Test and Re-Engineering Old Processes

Expanding Automated Test and Re-Engineering Old Processes Expanding Automated Test and Re-Engineering Old Processes Eric Greene and Jim Knuff Raytheon Missile Systems September 19 th, 2012 Copyright 2011 Raytheon Company. All rights reserved. Customer Success

More information

Guide to Mitigating Risk in Industrial Automation with Database

Guide to Mitigating Risk in Industrial Automation with Database Guide to Mitigating Risk in Industrial Automation with Database Table of Contents 1.Industrial Automation and Data Management...2 2.Mitigating the Risks of Industrial Automation...3 2.1.Power failure and

More information

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!

More information

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Chapter 20: Parallel Databases. Introduction

Chapter 20: Parallel Databases. Introduction Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Transactum Business Process Manager with High-Performance Elastic Scaling. November 2011 Ivan Klianev

Transactum Business Process Manager with High-Performance Elastic Scaling. November 2011 Ivan Klianev Transactum Business Process Manager with High-Performance Elastic Scaling November 2011 Ivan Klianev Transactum BPM serves three primary objectives: To make it possible for developers unfamiliar with distributed

More information

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides Parallel Programming with OpenMP CS240A, T. Yang, 203 Modified from Demmel/Yelick s and Mary Hall s Slides Introduction to OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for

More information

Parallel Programming Multicore systems

Parallel Programming Multicore systems FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have

More information

Lecture 1 Overview - Data Communications, Data Networks, and the Internet

Lecture 1 Overview - Data Communications, Data Networks, and the Internet DATA AND COMPUTER COMMUNICATIONS Lecture 1 Overview - Data Communications, Data Networks, and the Internet Mei Yang Based on Lecture slides by William Stallings 1 OUTLINE Data Communications and Networking

More information

In multiprogramming systems, processes share a common store. Processes need space for:

In multiprogramming systems, processes share a common store. Processes need space for: Memory Management In multiprogramming systems, processes share a common store. Processes need space for: code (instructions) static data (compiler initialized variables, strings, etc.) global data (global

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

Cloud Programming James Larus Microsoft Research. July 13, 2010

Cloud Programming James Larus Microsoft Research. July 13, 2010 Cloud Programming James Larus Microsoft Research July 13, 2010 New Programming Model, New Problems (and some old, unsolved ones) Concurrency Parallelism Message passing Distribution High availability Performance

More information

Chapter 11. I/O Management and Disk Scheduling

Chapter 11. I/O Management and Disk Scheduling Operating System Chapter 11. I/O Management and Disk Scheduling Lynn Choi School of Electrical Engineering Categories of I/O Devices I/O devices can be grouped into 3 categories Human readable devices

More information

ELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges

ELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges ELE 455/555 Computer System Engineering Section 4 Class 1 Challenges Introduction Motivation Desire to provide more performance (processing) Scaling a single processor is limited Clock speeds Power concerns

More information

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel Chapter-6 SUBJECT:- Operating System TOPICS:- I/O Management Created by : - Sanjay Patel Disk Scheduling Algorithm 1) First-In-First-Out (FIFO) 2) Shortest Service Time First (SSTF) 3) SCAN 4) Circular-SCAN

More information

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches Background 20: Distributed File Systems Last Modified: 12/4/2002 9:26:20 PM Distributed file system (DFS) a distributed implementation of the classical time-sharing model of a file system, where multiple

More information

CSC 2405: Computer Systems II

CSC 2405: Computer Systems II CSC 2405: Computer Systems II Dr. Mirela Damian http://www.csc.villanova.edu/~mdamian/csc2405/ Spring 2016 Course Goals: Look under the hood Help you learn what happens under the hood of computer systems

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery

More information

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of

More information

Chapter 3: Process Concept

Chapter 3: Process Concept Chapter 3: Process Concept Chapter 3: Process Concept Process Concept Process Scheduling Operations on Processes Inter-Process Communication (IPC) Communication in Client-Server Systems Objectives 3.2

More information

Chapter 3: Process Concept

Chapter 3: Process Concept Chapter 3: Process Concept Chapter 3: Process Concept Process Concept Process Scheduling Operations on Processes Inter-Process Communication (IPC) Communication in Client-Server Systems Objectives 3.2

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

Lecture 27 Programming parallel hardware" Suggested reading:" (see next slide)"

Lecture 27 Programming parallel hardware Suggested reading: (see next slide) Lecture 27 Programming parallel hardware" Suggested reading:" (see next slide)" 1" Suggested Readings" Readings" H&P: Chapter 7 especially 7.1-7.8" Introduction to Parallel Computing" https://computing.llnl.gov/tutorials/parallel_comp/"

More information

FPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges

FPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges Bob Blainey IBM Software Group 27 Feb 2011 FPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges Workshop on The Role of FPGAs in a Converged Future with Heterogeneous Programmable

More information

What are Embedded Systems? Lecture 1 Introduction to Embedded Systems & Software

What are Embedded Systems? Lecture 1 Introduction to Embedded Systems & Software What are Embedded Systems? 1 Lecture 1 Introduction to Embedded Systems & Software Roopa Rangaswami October 9, 2002 Embedded systems are computer systems that monitor, respond to, or control an external

More information

CSc33200: Operating Systems, CS-CCNY, Fall 2003 Jinzhong Niu December 10, Review

CSc33200: Operating Systems, CS-CCNY, Fall 2003 Jinzhong Niu December 10, Review CSc33200: Operating Systems, CS-CCNY, Fall 2003 Jinzhong Niu December 10, 2003 Review 1 Overview 1.1 The definition, objectives and evolution of operating system An operating system exploits and manages

More information

Chapter 3: Important Concepts (3/29/2015)

Chapter 3: Important Concepts (3/29/2015) CISC 3595 Operating System Spring, 2015 Chapter 3: Important Concepts (3/29/2015) 1 Memory from programmer s perspective: you already know these: Code (functions) and data are loaded into memory when the

More information

Computer Systems A Programmer s Perspective 1 (Beta Draft)

Computer Systems A Programmer s Perspective 1 (Beta Draft) Computer Systems A Programmer s Perspective 1 (Beta Draft) Randal E. Bryant David R. O Hallaron August 1, 2001 1 Copyright c 2001, R. E. Bryant, D. R. O Hallaron. All rights reserved. 2 Contents Preface

More information

Architectural Design

Architectural Design Architectural Design Topics i. Architectural design decisions ii. Architectural views iii. Architectural patterns iv. Application architectures Chapter 6 Architectural design 2 PART 1 ARCHITECTURAL DESIGN

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Ninth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

INTELLIGENCE PLUS CHARACTER - THAT IS THE GOAL OF TRUE EDUCATION UNIT-I

INTELLIGENCE PLUS CHARACTER - THAT IS THE GOAL OF TRUE EDUCATION UNIT-I UNIT-I 1. List and explain the functional units of a computer with a neat diagram 2. Explain the computer levels of programming languages 3. a) Explain about instruction formats b) Evaluate the arithmetic

More information

Software within building physics and ground heat storage. HEAT3 version 7. A PC-program for heat transfer in three dimensions Update manual

Software within building physics and ground heat storage. HEAT3 version 7. A PC-program for heat transfer in three dimensions Update manual Software within building physics and ground heat storage HEAT3 version 7 A PC-program for heat transfer in three dimensions Update manual June 15, 2015 BLOCON www.buildingphysics.com Contents 1. WHAT S

More information

Frequently asked questions from the previous class survey

Frequently asked questions from the previous class survey CS 370: OPERATING SYSTEMS [THREADS] Shrideep Pallickara Computer Science Colorado State University L7.1 Frequently asked questions from the previous class survey When a process is waiting, does it get

More information

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured System Performance Analysis Introduction Performance Means many things to many people Important in any design Critical in real time systems 1 ns can mean the difference between system Doing job expected

More information

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

IX: A Protected Dataplane Operating System for High Throughput and Low Latency IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this

More information

Chapter 3: Process Concept

Chapter 3: Process Concept Chapter 3: Process Concept Silberschatz, Galvin and Gagne 2013! Chapter 3: Process Concept Process Concept" Process Scheduling" Operations on Processes" Inter-Process Communication (IPC)" Communication

More information

Produced by. Design Patterns. MSc in Computer Science. Eamonn de Leastar

Produced by. Design Patterns. MSc in Computer Science. Eamonn de Leastar Design Patterns MSc in Computer Science Produced by Eamonn de Leastar (edeleastar@wit.ie)! Department of Computing, Maths & Physics Waterford Institute of Technology http://www.wit.ie http://elearning.wit.ie

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working

More information

Chapter 3 - Memory Management

Chapter 3 - Memory Management Chapter 3 - Memory Management Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 3 - Memory Management 1 / 222 1 A Memory Abstraction: Address Spaces The Notion of an Address Space Swapping

More information

Chapter 3: Processes

Chapter 3: Processes Chapter 3: Processes Silberschatz, Galvin and Gagne 2013 Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Silberschatz, Galvin and Gagne 2013

More information

Design of Parallel Algorithms. Models of Parallel Computation

Design of Parallel Algorithms. Models of Parallel Computation + Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms and Concurrency n Introduction to Parallel Algorithms n Tasks and Decomposition n Processes and Mapping n Processes

More information

Input and Output = Communication. What is computation? Hardware Thread (CPU core) Transforming state

Input and Output = Communication. What is computation? Hardware Thread (CPU core) Transforming state What is computation? Input and Output = Communication Input State Output i s F(s,i) (s,o) o s There are many different types of IO (Input/Output) What constitutes IO is context dependent Obvious forms

More information

CPU Architecture. HPCE / dt10 / 2013 / 10.1

CPU Architecture. HPCE / dt10 / 2013 / 10.1 Architecture HPCE / dt10 / 2013 / 10.1 What is computation? Input i o State s F(s,i) (s,o) s Output HPCE / dt10 / 2013 / 10.2 Input and Output = Communication There are many different types of IO (Input/Output)

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Operating Systems: Internals and Design Principles You re gonna need a bigger boat. Steven

More information

CS:2820 (22C:22) Object-Oriented Software Development

CS:2820 (22C:22) Object-Oriented Software Development The University of Iowa CS:2820 (22C:22) Object-Oriented Software Development! Spring 2015 Software Complexity by Cesare Tinelli Complexity Software systems are complex artifacts Failure to master this

More information

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I EE382 (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality, Spring 2015

More information

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution Datorarkitektur Fö 2-1 Datorarkitektur Fö 2-2 Components of the Memory System The Memory System 1. Components of the Memory System Main : fast, random access, expensive, located close (but not inside)

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information