MODELING SHARED- MEMORY MULTIPROCESSOR SYSTEMS WITH AADL

Size: px
Start display at page:

Download "MODELING SHARED- MEMORY MULTIPROCESSOR SYSTEMS WITH AADL"

Transcription

1 MODELING SHARED- MEMORY MULTIPROCESSOR SYSTEMS WITH AADL Stéphane Rubini, Frank Singhoff Lab-STICC, University of Western Brittany (UBO), Brest, France Contact: Pierre Dissaux Ellidiss Technologies, Brest, France Smart and Cheddar Projects Workshop, Valencia, Spain, September 29 th, 2014

2 Monday, September 29, Scheduling Analysis on Multiprocessor Systems The multiprocessor execution platform model CPU1 CPU2 CPU3 in the real-time scheduling theory and a real one on a single chip (TI OMAP 4430).

3 3 Hidden Sharing of some Resources The scheduling goal is to manage the temporal sharing of resources (typically processing unit allocations). But, hardware systems is build to hide some of them. Example of the SPEC program slowdowns when running in parallel with another program on a parallel core sharing of the L2 cache. From Christos Kozyrakis lectures, citation of DasCMP 06

4 4 Abstraction level of the execution platform What is the right level of abstraction? The usage of a complete hardware description is not tractable (cycle accurate simulation), or not available. ADL components more abstract descriptions The execution platform models must be able to express: the hardware capabilities (processing units, IO devices, ); the resource sharing.

5 5 Outline Modeling of Programmable Heterogonous Multiprocesors (PHM) to perform scheduling analysis. Processing units Identification for the allocation of tasks Shared resources Identification for the performance analysis Example IC-INT-VPX3a board modeling Conclusion

6 Resource identification Multi-processing implementations and task scheduling Processors, cores or physical threads may be seen as standard processing resources by the scheduler. 6 Task Task Task Multi-Processor Multi-Core Processor Core Core Multi-Threaded Core But the shared resources are the difference Private Caches Interconnect Network Optional Shared Caches Memory Private caches Interconnect Network Shared Caches Shared / Private Resources Shared Private Caches caches Physical Thread

7 7 Resource identification Processing units Processing unit Processing system (10 processing units)

8 Resource identification Task allocations 8 Global scheduling Partitioned scheduling app is a process, thrs an array of threads, and dedicatedthr a thread Scheduling domain

9 9 Outline Modeling of Programmable Heterogonous Systems (PHM) to perform scheduling analysis. Processing units Identification for the allocation of tasks Shared resources Identification for the performance analysis Example IC-INT-VPX3a board modeling Conclusion

10 Shared resources Hierarchical models 10 The node s children can represent the entities that interact directly with this node (shared resource). Processing units Shared resources Pj Pk Pi My SSr Mx

11 11 Shared resources Node internals and private resources Composite M y implementation Pj Pk Pi My Private processor resource Mx

12 12 Outline Modeling of Programmable Heterogonous Systems (PHM) to perform scheduling analysis. Processing units Identification for the allocation of tasks Shared resources Identification for the performance analysis Example IC-INT-VPX3a board modeling Conclusion

13 Example Board InterfaceConcept IC-INT-VPX3a Board InterfaceConcept IC-INT-VPX3a Processor Intel Core i7 2566LE: Dual cores, 2.2 GHz, 3 levels of caches Technology «Hyper Threading» two physical threads /core 13 Thread state 1 Thread state 1 Thread state 2 Thread state 2 L1 Inst Cache L1 Data Cache L1 Inst Cache L1 Data Cache L2 Cache L2 Cache L3 Cache

14 Example IC-INT-VPX3a execution platform modeling core2 14 pth1 pth2 core1 mem mem

15 15 Example New processing unit arrivals! In mode Hyper-threading pth1 pth2 pth1 pth2 mem mem mem

16 16 Outline Modeling of Programmable Heterogonous Systems (PHM) to perform scheduling analysis. Processing units Identification for the allocation of tasks Shared resources Identification for the performance analysis Example IC-INT-VPX3a board modeling Conclusion

17 17 Conclusion Modeling for scheduling analysis: Processing unit enumeration and partitioning: processor bindings The presented AADL is interpreted by AADLinspector to define the allocation of tasks on processing units. Memory hierarchy effect The cheddar tool deals with cache related preemption delay for single shared instruction cache and uniprocessor platform. Extension to multiprocessor remains to be done. Cache analysis, WCET Modeling for code generation What are the specific requirements?

18 18

19 Shared resources Hyper-threading configuration 19 Processing units activation following the user s configuration.

20 20 Other topics Memory hierarchies are complexes, but hardware entities interconnections too. Examples: SoCs contains a lot of buses with different performances Serial communications allow to multiple the numbers of connections, and their configuration IC VPX board (on 12 lanes PCIE) Need of properties to characterize processing units Capabilities (speed, operations) in Programmable Heterogeneous Multiprocessors (PHM) Implementation: operating systems don t manage similarly physical threads or cores/processors.

21 Allocation strategies System and software architecture Complete system 21 Software architecture Implicit assumption: A process represents an address space the execution platform implements the shared memory paradigm (native or rebuild through Distributed Shared Memory mechanisms)

MODELING OF MULTIPROCESSOR HARDWARE PLATFORMS FOR SCHEDULING ANALYSIS

MODELING OF MULTIPROCESSOR HARDWARE PLATFORMS FOR SCHEDULING ANALYSIS 1 MODELING OF MULTIPROCESSOR HARDWARE PLATFORMS FOR SCHEDULING ANALYSIS Stéphane Rubini, Christian Fotsing, Frank Singhoff, Hai Nam Tran Lab-STICC, University of Western Britany (UBO) Contact: Stephane.Rubini@univ-brest.fr

More information

Update on AADLInspector and Cheddar : new interface and multiprocessors analysis

Update on AADLInspector and Cheddar : new interface and multiprocessors analysis Update on AADLInspector and Cheddar : new interface and multiprocessors analysis P. Dissaux*, J. Legrand*, A. Schach*, S. Rubini+, J. Boukhobza+, L. Lemarchand+, J.P. Diguet+, N. Tran+, M. Dridi+, R. Bouaziz$,

More information

Automatically adapt Cheddar to users need

Automatically adapt Cheddar to users need Automatically adapt Cheddar to users need AADL Standards Meeting, Toulouse A. Plantec +, V. Gaudel +, S. Rubini +, F. Singhoff + P. Dissaux*, J. Legrand* + University of Brest/UBO, LISyC, France *Ellidiss

More information

AADL Subsets Annex Update

AADL Subsets Annex Update AADL Subsets Annex Update V. Gaudel, P. Dissaux, A. Plantec, F. Singhoff, J. Hugues*, J. Legrand University of Brest/UBO, Lab-Sticc, France Ellidiss Technologies, France *Institut Supérieur de l Aéronautique

More information

Automatic Selection of Feasibility Tests With the Use of AADL Design Patterns

Automatic Selection of Feasibility Tests With the Use of AADL Design Patterns Automatic Selection of Feasibility Tests With the Use of AADL Design Patterns V. Gaudel, F. Singhoff, A. Plantec, S. Rubini P. Dissaux*, J. Legrand* University of Brest/UBO, LISyC, France *Ellidiss Technologies,

More information

Executable AADL. Real Time Simulation of AADL Models. Pierre Dissaux 1, Olivier Marc 2.

Executable AADL. Real Time Simulation of AADL Models. Pierre Dissaux 1, Olivier Marc 2. Executable AADL Real Time Simulation of AADL Models Pierre Dissaux 1, Olivier Marc 2 1 Ellidiss Technologies, Brest, France. 2 Virtualys, Brest, France. pierre.dissaux@ellidiss.com olivier.marc@virtualys.com

More information

Multiprocessor scheduling, part 1 -ChallengesPontus Ekberg

Multiprocessor scheduling, part 1 -ChallengesPontus Ekberg Multiprocessor scheduling, part 1 -ChallengesPontus Ekberg 2017-10-03 What is a multiprocessor? Simplest answer: A machine with >1 processors! In scheduling theory, we include multicores in this defnition

More information

AADL performance analysis with Cheddar : a review

AADL performance analysis with Cheddar : a review AADL performance analysis with Cheddar : a review P. Dissaux*, J. Legrand*, A. Plantec+, F. Singhoff+ *Ellidiss Technologies, France +University of Brest/UBO, LISyC, France Talk overview 1. Cheddar project

More information

A DSL for AADL Subsets Specification

A DSL for AADL Subsets Specification A DSL for AADL Subsets Specification V. Gaudel, P. Dissaux*, A. Plantec, F. Singhoff, J. Hugues**, J. Legrand* University of Brest/UBO, Lab-Sticc, France *Ellidiss Technologies, France ** Institut Supérieur

More information

Modeling and verification of memory architectures with AADL and REAL

Modeling and verification of memory architectures with AADL and REAL Modeling and verification of memory architectures with AADL and REAL Stéphane Rubini, Frank Singhoff LISyC - University of Brest - UEB 20, Avenue Le Gorgeu, CS 93837 29238 Brest Cedex 3, France {stephane.rubini,frank.singhoff}@univ-brest.fr

More information

CSC501 Operating Systems Principles. OS Structure

CSC501 Operating Systems Principles. OS Structure CSC501 Operating Systems Principles OS Structure 1 Announcements q TA s office hour has changed Q Thursday 1:30pm 3:00pm, MRC-409C Q Or email: awang@ncsu.edu q From department: No audit allowed 2 Last

More information

AADL Inspector Tutorial. ACVI Workshop, Valencia September 29th, Pierre Dissaux. Ellidiss. Technologies w w w. e l l i d i s s.

AADL Inspector Tutorial. ACVI Workshop, Valencia September 29th, Pierre Dissaux. Ellidiss. Technologies w w w. e l l i d i s s. AADL Inspector Tutorial ACVI Workshop, Valencia September 29th, 2014 Pierre Dissaux Ellidiss Technologies w w w. e l l i d i s s. c o m Independent Technology Provider: Ellidiss Software w w w. e l l i

More information

Multi-core Programming Evolution

Multi-core Programming Evolution Multi-core Programming Evolution Based on slides from Intel Software ollege and Multi-ore Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts, Evolution

More information

Multi-core Computing Lecture 2

Multi-core Computing Lecture 2 Multi-core Computing Lecture 2 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh August 21, 2012 Multi-core Computing Lectures: Progress-to-date

More information

Manycore Processors. Manycore Chip: A chip having many small CPUs, typically statically scheduled and 2-way superscalar or scalar.

Manycore Processors. Manycore Chip: A chip having many small CPUs, typically statically scheduled and 2-way superscalar or scalar. phi 1 Manycore Processors phi 1 Definition Manycore Chip: A chip having many small CPUs, typically statically scheduled and 2-way superscalar or scalar. Manycore Accelerator: [Definition only for this

More information

AADL committee, Valencia October 2 nd, Pierre Dissaux (Ellidiss) Maxime Perrotin (ESA)

AADL committee, Valencia October 2 nd, Pierre Dissaux (Ellidiss) Maxime Perrotin (ESA) AADL committee, Valencia October 2 nd, 2014 Pierre Dissaux (Ellidiss) Maxime Perrotin (ESA) what is TASTE? A tool-chain targeting heterogeneous, embedded systems, using a model-centric development approach

More information

Hardware/Software Codesign

Hardware/Software Codesign Hardware/Software Codesign SS 2016 Prof. Dr. Christian Plessl High-Performance IT Systems group University of Paderborn Version 2.2.0 2016-04-08 how to design a "digital TV set top box" Motivating Example

More information

NUMA-aware OpenMP Programming

NUMA-aware OpenMP Programming NUMA-aware OpenMP Programming Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group schmidl@itc.rwth-aachen.de Christian Terboven IT Center, RWTH Aachen University Deputy lead of the HPC

More information

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee

More information

Why do we care about parallel?

Why do we care about parallel? Threads 11/15/16 CS31 teaches you How a computer runs a program. How the hardware performs computations How the compiler translates your code How the operating system connects hardware and software The

More information

Lecture 27: Multiprocessors. Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs

Lecture 27: Multiprocessors. Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs Lecture 27: Multiprocessors Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs 1 Shared-Memory Vs. Message-Passing Shared-memory: Well-understood programming model

More information

This is an author-deposited version published in: Eprints ID: 9287

This is an author-deposited version published in:   Eprints ID: 9287 Open Archive Toulouse Archive Ouverte (OATAO) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor

Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Juan C. Pichel Centro de Investigación en Tecnoloxías da Información (CITIUS) Universidade de Santiago de Compostela, Spain

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 22: Direct Mapped Cache Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Intel 8-core i7-5960x 3 GHz, 8-core, 20 MB of cache, 140

More information

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,

More information

Cheddar Architecture Description Language

Cheddar Architecture Description Language Cheddar Architecture Description Language Christian Fotsing, Frank Singhoff, Alain Plantec, Vincent Gaudel, Stéphane Rubini, Shuai Li, Hai Nam Tran, Laurent Lemarchand Lab-STICC/UMR CNRS 6285, University

More information

Process Synchronisation (contd.) Deadlock. Operating Systems. Spring CS5212

Process Synchronisation (contd.) Deadlock. Operating Systems. Spring CS5212 Operating Systems Spring 2009-2010 Outline Process Synchronisation (contd.) 1 Process Synchronisation (contd.) 2 Announcements Presentations: will be held on last teaching week during lectures make a 20-minute

More information

The University of Texas at Austin

The University of Texas at Austin EE382 (20): Computer Architecture - Parallelism and Locality Lecture 4 Parallelism in Hardware Mattan Erez The University of Texas at Austin EE38(20) (c) Mattan Erez 1 Outline 2 Principles of parallel

More information

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

HW Trends and Architectures

HW Trends and Architectures Pavel Tvrdík, Jiří Kašpar (ČVUT FIT) HW Trends and Architectures MI-POA, 2011, Lecture 1 1/29 HW Trends and Architectures prof. Ing. Pavel Tvrdík CSc. Ing. Jiří Kašpar Department of Computer Systems Faculty

More information

An Autonomous Dynamically Adaptive Memory Hierarchy for Chip Multiprocessors

An Autonomous Dynamically Adaptive Memory Hierarchy for Chip Multiprocessors ACM IEEE 37 th International Symposium on Computer Architecture Elastic Cooperative Caching: An Autonomous Dynamically Adaptive Memory Hierarchy for Chip Multiprocessors Enric Herrero¹, José González²,

More information

A Study on Optimally Co-scheduling Jobs of Different Lengths on CMP

A Study on Optimally Co-scheduling Jobs of Different Lengths on CMP A Study on Optimally Co-scheduling Jobs of Different Lengths on CMP Kai Tian Kai Tian, Yunlian Jiang and Xipeng Shen Computer Science Department, College of William and Mary, Virginia, USA 5/18/2009 Cache

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,

More information

Lecture: Storage, GPUs. Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4)

Lecture: Storage, GPUs. Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4) Lecture: Storage, GPUs Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4) 1 Magnetic Disks A magnetic disk consists of 1-12 platters (metal or glass disk covered with magnetic recording material

More information

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe

More information

Presentation of the AADL: Architecture Analysis and Design Language

Presentation of the AADL: Architecture Analysis and Design Language Presentation of the AADL: Architecture Analysis and Design Language Outline 1. AADL a quick overview 2. AADL key modeling constructs 1. AADL components 2. Properties 3. Component connection 3. AADL: tool

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

Programmable NICs. Lecture 14, Computer Networks (198:552)

Programmable NICs. Lecture 14, Computer Networks (198:552) Programmable NICs Lecture 14, Computer Networks (198:552) Network Interface Cards (NICs) The physical interface between a machine and the wire Life of a transmitted packet Userspace application NIC Transport

More information

Update on AADL Requirements Annex

Update on AADL Requirements Annex Open-PEOPLE Open Power and Energy Optimization PLatform and Estimator Update on AADL Requirements Annex Dominique BLOUIN* *Lab-STICC, Université de Bretagne Sud, Lorient, FRANCE AADL Standards Meeting,

More information

Topics. Digital Systems Architecture EECE EECE Need More Cache?

Topics. Digital Systems Architecture EECE EECE Need More Cache? Digital Systems Architecture EECE 33-0 EECE 9-0 Need More Cache? Dr. William H. Robinson March, 00 http://eecs.vanderbilt.edu/courses/eece33/ Topics Cache: a safe place for hiding or storing things. Webster

More information

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1) Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling

More information

ayaz ali Micro & Macro Scheduling Techniques Ayaz Ali Department of Computer Science University of Houston Houston, TX

ayaz ali Micro & Macro Scheduling Techniques Ayaz Ali Department of Computer Science University of Houston Houston, TX ayaz ali Micro & Macro Scheduling Techniques Ayaz Ali Department of Computer Science University of Houston Houston, TX 77004 ayaz@cs.uh.edu 1. INTRODUCTION Scheduling techniques has historically been one

More information

COMP Parallel Computing. SMM (1) Memory Hierarchies and Shared Memory

COMP Parallel Computing. SMM (1) Memory Hierarchies and Shared Memory COMP 633 - Parallel Computing Lecture 6 September 6, 2018 SMM (1) Memory Hierarchies and Shared Memory 1 Topics Memory systems organization caches and the memory hierarchy influence of the memory hierarchy

More information

A Memory System Design Framework: Creating Smart Memories

A Memory System Design Framework: Creating Smart Memories A Memory System Design Framework: Creating Smart Memories Amin Firoozshahian, Alex Solomatnikov Hicamp Systems Inc. Ofer Shacham, Zain Asgar, http://www.c2s2.org Stephen Richardson, Christos Kozyrakis,

More information

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3

More information

Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters

Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters Junaid Nomani and Jakub Szefer Computer Architecture and Security Laboratory Yale University junaid.nomani@yale.edu

More information

Lab-STICC : Dominique BLOUIN Skander Turki Eric SENN Saâdia Dhouib 11/06/2009

Lab-STICC : Dominique BLOUIN Skander Turki Eric SENN Saâdia Dhouib 11/06/2009 Lab-STICC : Power Consumption Modelling with AADL Dominique BLOUIN Skander Turki Eric SENN Saâdia Dhouib 11/06/2009 Outline Context Review of power estimation methodologies and tools Functional Level Power

More information

Deterministic Memory Abstraction and Supporting Multicore System Architecture

Deterministic Memory Abstraction and Supporting Multicore System Architecture Deterministic Memory Abstraction and Supporting Multicore System Architecture Farzad Farshchi $, Prathap Kumar Valsan^, Renato Mancuso *, Heechul Yun $ $ University of Kansas, ^ Intel, * Boston University

More information

AADL resource requirements analysis with Cheddar F. Singhoff, J. Legrand, L. Nana University of Brest, France LYSIC/EA 3883

AADL resource requirements analysis with Cheddar F. Singhoff, J. Legrand, L. Nana University of Brest, France LYSIC/EA 3883 AADL resource requirements analysis with Cheddar F. Singhoff, J. Legrand, L. Nana University of Brest, France LYSIC/EA 3883 SAE AADL wg, oct.'05 1 Introduction and motivations Real time scheduling Analysis

More information

Handling cache in real-time scheduling simulation.

Handling cache in real-time scheduling simulation. Handling cache in real-time scheduling simulation. Master thesis in Information and Communication Technology TRAN HAI NAM Department of Information & Communication Technology University of Science and

More information

Windows NT Server Configuration and Tuning for Optimal SAS Server Performance

Windows NT Server Configuration and Tuning for Optimal SAS Server Performance Windows NT Server Configuration and Tuning for Optimal SAS Server Performance Susan E. Davis Compaq Computer Corp. Carl E. Ralston Compaq Computer Corp. Our Role Onsite at SAS Corporate Technology Center

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

CS4961 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/30/11. Administrative UPDATE. Mary Hall August 30, 2011

CS4961 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/30/11. Administrative UPDATE. Mary Hall August 30, 2011 CS4961 Parallel Programming Lecture 3: Introduction to Parallel Architectures Administrative UPDATE Nikhil office hours: - Monday, 2-3 PM, MEB 3115 Desk #12 - Lab hours on Tuesday afternoons during programming

More information

Context. Hardware Performance. Increasing complexity. Software Complexity. And the Result is. Embedded systems are becoming more complex every day:

Context. Hardware Performance. Increasing complexity. Software Complexity. And the Result is. Embedded systems are becoming more complex every day: Context Embedded systems are becoming more complex every day: Giorgio uttazzo g.buttazzo@sssup.it more functions higher performance higher efficiency Scuola Superiore Sant nna new hardware s Increasing

More information

Context. Giorgio Buttazzo. Scuola Superiore Sant Anna. Embedded systems are becoming more complex every day: more functions. higher performance

Context. Giorgio Buttazzo. Scuola Superiore Sant Anna. Embedded systems are becoming more complex every day: more functions. higher performance Giorgio uttazzo g.buttazzo@sssup.it Scuola Superiore Sant nna Context Embedded systems are becoming more complex every day: more functions higher performance higher efficiency new hardware platforms 2

More information

CS3350B Computer Architecture CPU Performance and Profiling

CS3350B Computer Architecture CPU Performance and Profiling CS3350B Computer Architecture CPU Performance and Profiling Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada

More information

Pilot: A Platform-based HW/SW Synthesis System

Pilot: A Platform-based HW/SW Synthesis System Pilot: A Platform-based HW/SW Synthesis System SOC Group, VLSI CAD Lab, UCLA Led by Jason Cong Zhong Chen, Yiping Fan, Xun Yang, Zhiru Zhang ICSOC Workshop, Beijing August 20, 2002 Outline Overview The

More information

EECS 750: Advanced Operating Systems. 01/29 /2014 Heechul Yun

EECS 750: Advanced Operating Systems. 01/29 /2014 Heechul Yun EECS 750: Advanced Operating Systems 01/29 /2014 Heechul Yun 1 Administrative Next summary assignment Resource Containers: A New Facility for Resource Management in Server Systems, OSDI 99 due by 11:59

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

CSE 392/CS 378: High-performance Computing - Principles and Practice

CSE 392/CS 378: High-performance Computing - Principles and Practice CSE 392/CS 378: High-performance Computing - Principles and Practice Parallel Computer Architectures A Conceptual Introduction for Software Developers Jim Browne browne@cs.utexas.edu Parallel Computer

More information

Multiprocessors 2007/2008

Multiprocessors 2007/2008 Multiprocessors 2007/2008 Abstractions of parallel machines Johan Lukkien 1 Overview Problem context Abstraction Operating system support Language / middleware support 2 Parallel processing Scope: several

More information

Presentation of the AADL: Architecture Analysis and Design Language

Presentation of the AADL: Architecture Analysis and Design Language Presentation of the AADL: Architecture Analysis and Design Language Outline 1. AADL a quick overview 2. AADL key modeling constructs 1. AADL components 2. Properties 3. Component connection 3. AADL: tool

More information

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence 1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains: The Lecture Contains: Four Organizations Hierarchical Design Cache Coherence Example What Went Wrong? Definitions Ordering Memory op Bus-based SMP s file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture10/10_1.htm[6/14/2012

More information

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System

More information

Lecture 1: Parallel Architecture Intro

Lecture 1: Parallel Architecture Intro Lecture 1: Parallel Architecture Intro Course organization: ~13 lectures based on textbook ~10 lectures on recent papers ~5 lectures on parallel algorithms and multi-thread programming New topics: interconnection

More information

AADL : about code generation

AADL : about code generation AADL : about code generation AADL objectives AADL requirements document (SAE ARD 5296) Analysis and Generation of systems Generation can encompasses many dimensions 1. Generation of skeletons from AADL

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

Design Principles for End-to-End Multicore Schedulers

Design Principles for End-to-End Multicore Schedulers c Systems Group Department of Computer Science ETH Zürich HotPar 10 Design Principles for End-to-End Multicore Schedulers Simon Peter Adrian Schüpbach Paul Barham Andrew Baumann Rebecca Isaacs Tim Harris

More information

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010 Parallelizing FPGA Technology Mapping using GPUs Doris Chen Deshanand Singh Aug 31 st, 2010 Motivation: Compile Time In last 12 years: 110x increase in FPGA Logic, 23x increase in CPU speed, 4.8x gap Question:

More information

Comparing Memory Systems for Chip Multiprocessors

Comparing Memory Systems for Chip Multiprocessors Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 12 Mahadevan Gomathisankaran March 4, 2010 03/04/2010 Lecture 12 CSCE 4610/5610 1 Discussion: Assignment 2 03/04/2010 Lecture 12 CSCE 4610/5610 2 Increasing Fetch

More information

Cache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.

Cache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6. Cache Memories From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6. Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging

More information

Fall 2015 COMP Operating Systems. Lab 06

Fall 2015 COMP Operating Systems. Lab 06 Fall 2015 COMP 3511 Operating Systems Lab 06 Outline Monitor Deadlocks Logical vs. Physical Address Space Segmentation Example of segmentation scheme Paging Example of paging scheme Paging-Segmentation

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 24: Cache Performance Analysis Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Last time: Associative caches How do we

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

SRM-Buffer: An OS Buffer Management Technique to Prevent Last Level Cache from Thrashing in Multicores

SRM-Buffer: An OS Buffer Management Technique to Prevent Last Level Cache from Thrashing in Multicores SRM-Buffer: An OS Buffer Management Technique to Prevent Last Level Cache from Thrashing in Multicores Xiaoning Ding et al. EuroSys 09 Presented by Kaige Yan 1 Introduction Background SRM buffer design

More information

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Advanced Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

Exploration of Cache Coherent CPU- FPGA Heterogeneous System

Exploration of Cache Coherent CPU- FPGA Heterogeneous System Exploration of Cache Coherent CPU- FPGA Heterogeneous System Wei Zhang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology 1 Outline ointroduction to FPGA-based

More information

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Professor Alvin R. Lebeck Computer Science 220 Fall 1999 Exam Average 76 90-100 4 80-89 3 70-79 3 60-69 5 < 60 1 Admin

More information

The TASTE MBE development toolchain - update & case-studies

The TASTE MBE development toolchain - update & case-studies The TASTE MBE development toolchain - update & case-studies Julien Delange 18/10/2010 Agenda 1. Overview of the TASTE environment 2. Latest improvements 3. Ongoing projects, conclusion TASTE update & case-studies

More information

CMSC 611: Advanced. Parallel Systems

CMSC 611: Advanced. Parallel Systems CMSC 611: Advanced Computer Architecture Parallel Systems Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems

More information

Compositional Schedulability Analysis of Hierarchical Real-Time Systems

Compositional Schedulability Analysis of Hierarchical Real-Time Systems Compositional Schedulability Analysis of Hierarchical Real-Time Systems Arvind Easwaran, Insup Lee, Insik Shin, and Oleg Sokolsky Department of Computer and Information Science University of Pennsylvania,

More information

Computers and Microprocessors. Lecture 34 PHYS3360/AEP3630

Computers and Microprocessors. Lecture 34 PHYS3360/AEP3630 Computers and Microprocessors Lecture 34 PHYS3360/AEP3630 1 Contents Computer architecture / experiment control Microprocessor organization Basic computer components Memory modes for x86 series of microprocessors

More information

Department of Computer Science Institute for System Architecture, Operating Systems Group REAL-TIME MICHAEL ROITZSCH OVERVIEW

Department of Computer Science Institute for System Architecture, Operating Systems Group REAL-TIME MICHAEL ROITZSCH OVERVIEW Department of Computer Science Institute for System Architecture, Operating Systems Group REAL-TIME MICHAEL ROITZSCH OVERVIEW 2 SO FAR talked about in-kernel building blocks: threads memory IPC drivers

More information

Memory. Lecture 2: different memory and variable types. Memory Hierarchy. CPU Memory Hierarchy. Main memory

Memory. Lecture 2: different memory and variable types. Memory Hierarchy. CPU Memory Hierarchy. Main memory Memory Lecture 2: different memory and variable types Prof. Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Key challenge in modern computer architecture

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working

More information

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics CS4230 Parallel Programming Lecture 3: Introduction to Parallel Architectures Mary Hall August 28, 2012 Homework 1: Parallel Programming Basics Due before class, Thursday, August 30 Turn in electronically

More information

ARTIST-Relevant Research from Linköping

ARTIST-Relevant Research from Linköping ARTIST-Relevant Research from Linköping Department of Computer and Information Science (IDA) Linköping University http://www.ida.liu.se/~eslab/ 1 Outline Communication-Intensive Real-Time Systems Timing

More information

The Ocarina Tool Suite. Thomas Vergnaud

The Ocarina Tool Suite. Thomas Vergnaud The Ocarina Tool Suite Motivation 2 ENST is developing a middleware architecture: PolyORB generic, configurable, interoperable enables middleware verification create a tool chain

More information

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform

More information

Lecture 2: different memory and variable types

Lecture 2: different memory and variable types Lecture 2: different memory and variable types Prof. Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Lecture 2 p. 1 Memory Key challenge in modern

More information

Embedded Systems Programming

Embedded Systems Programming Embedded Systems Programming Introduction (Module 1) Yann-Hang Lee Arizona State University yhlee@asu.edu (480) 727-7507 Summer 2014 Course Syllabus Course Goals: fundamental issues as well as practical

More information

Advanced Processor Architecture

Advanced Processor Architecture Advanced Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong

More information

Parallel Programming Multicore systems

Parallel Programming Multicore systems FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have

More information

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Systems Group Department of Computer Science ETH Zürich SMP architecture

More information