On-Chip Interconnect Implications of Shared Memory Multicores

Size: px
Start display at page:

Download "On-Chip Interconnect Implications of Shared Memory Multicores"

Transcription

1 On-Chi Interconnect Ilications of Shared Meory Multicores Srini Devadas Couter Science and Artificial Intelligence Laboratory (CSAIL) Massachusetts Institute of Technology 1

2 Prograing 1000 cores MPI has been used to exloit large-scale arallelis in soe alications (e.g., 3D rendering) Requires individual tasks to be large; becoes difficult to aly at a fine-grained level Paradigs such as MaReduce and shard-based databases have been successful in articular alication doains A shared eory abstraction is required for generalurose rograing and running an oerating syste 3

3 The Proble Will failiar shared eory rograing odels be feasible at 1000 cores? 4

4 The Proble Will failiar shared eory rograing odels be feasible at 1000 cores? Cache Coherence Challenges Interconnection Network Challenges 5

5 The Proble Will failiar shared eory rograing odels be feasible at 1000 cores? Cache Coherence Proble Existing full-a directory-based rotocols do not scale High area overhead [O(N 2 )] Energy overhead roortional to area Hotsots on networks due to frequent invalidations Interconnection Network Proble 6

6 The Proble Will failiar shared eory rograing odels be feasible at 1000 cores? Cache Coherence Proble Interconnection Network Proble Existing eshes, rings unlikely to scale Energy-inefficient due to ultile routers and links High latency fro one core to another 7

7 Directory Cache Coherence Background Directory-based rotocols Need to kee track of who is sharing a cache block 8

8 Directory Cache Coherence Directory-based rotocols Need to kee track of who is sharing a cache block Full-a directories Background Maintain a bit for every ossible sharer Baseline rotocol requires invalidation of all read coies (otentially at every core) and collection of acknowledgeents (otentially fro every core) 9

9 Full-a directories Full-Ma Directories Maintain a bit for every ossible sharer For 1000 cores, need a 1000-bit vector for each 512-bit cache block (in naïve ileentation) Full-a directories consue too uch area. 10

10 Liited Directories Liited directories Liited nuber of hardware ointers (k) in the sharer list Address State Sharer 1 Sharer 2 Sharer k 11

11 Liited Directories Liited directories Liited nuber of hardware ointers (k) in the sharer list Dir(k)B Allow unliited sharers, but if (# sharers > k), use broadcast invalidate on exclusive request Requires ACKs fro ALL cores If sharers > k, we are broadcasting to 1000 cores and waiting for acknowledgeents fro 1000 cores Interconnect network will need to handle this traffic efficiently, else erforance suffers. 13

12 1-to-M (Broadcast/Multicast) and M-to-1 (Acks) occurrence? 14 % 14 % 2% 51 % 71 % 47 % AMD HyerTransort Token Coherence 64-core full-syste siulations 16

13 Why are 1-to-M and M-to-1 bad? Increased bandwidth consution U to M ties Increased network contention M essages at src/dest links More ackets in network 1-to-M => ulticasts M-to-1 => ACKs Worse as M Increased ower consution Bursty, not sustained Can we handle using an efficient network design? 17

14 Proble: how to route broadcasts? Sarse Multicast Tree Broadcast/Dense Multicast Tree Contention! Idle links! Tree constructed dynaically based on destination set Sae destination set (all nodes) => sae tree structure 18

15 Network-Centric Aroaches ATAC and ACKwise architecture that leverages hotonics (Agarwal grou) Assues a fast, energy-efficient otical network that enables energy-efficient broadcasts and long distance essages 19

16 ATAC Fro 10,000 Feet Electrical Mesh Interconnect Tiled Multicore Processor with Otical Network Overlay 2-D array of sile cores connected by an electrical esh network Electrical network rovides efficient short-range counication Otical overlay network rovides fast broadcast and long-distance counication Otical WDM Interconnect 20

17 ACKwise Protocol Extension of Dir(k)B rotocol Designed to leverage the ATAC broadcast network Address State Global Sharers 1-3 addr shared false Core-A Core-B Core-C Structure of an ACKwise(3) Directory Entry 21

18 ACKwise Protocol Nuber of Sharers > Nuber of Hardware Pointers (k) Tracks the nuber of sharers If (# sharers > k), use broadcast invalidate on exclusive request Requires ACKs fro ONLY sharers Address State Global Sharers 1-3 addr shared true 4 Structure of an ACKwise(3) Directory Entry 22

19 ATAC Architecture Details StarNet ENet Hub StarNet ENet (a) 64 Otically-Connected Clusters ONet (b) Electrical Networks Connecting 16 cores Takeaway: Otical network necessary but not sufficient for efficient coherence and high erforance 23

20 Evaluation Requires New Toolflow Cache Models Benchark Network Models Inuts Cache Counters Electrical Technology Paraeters Grahite NM Electrical Router & Link Counters Otical Link Counters Otical Technology Paraeters McPAT Modified Orion 2.0 Otical Models Tools Coletion Tie Cache Energy & Area Electrical Router & Link Energy & Area Otical Link Energy & Area Oututs 24

21 Network-Centric Aroaches ATAC and ACKwise architecture that leverages hotonics (Agarwal grou) Assues a fast, energy-efficient otical network that enables energy-efficient broadcasts and long distance essages Directoryless coherence via execution igration Migrate threads as oosed to igrating data for faster data access Requires high-bandwidth interconnect network 25

22 Execution Migration Machine (EM²) No data relication: Reote Access (RA) eory organization Idea: send thread on 1 st eory access ove context (RF etc.) to core where data lives ossibly evict a context currently at destination and execute in its lace igration entirely at hardware level for seed Avoids directories and directory rotocols but oses challenges in interconnect design

23 EM² Pluses and Minuses + One-way data access through thread igration Deterine core iss and reote core destination in arallel with L1 looku + No relication across on-chi caches lowers off-chi eory access rates in coarison to DirCC + No directories, broadcast or ulticast required - Context size of 2-4Kb significantly greater than 1 word (RA), and greater than 512-bit cache block size (Directory rotocols) - Contention esecially for reads of shared data (even read-only data is not relicated)

24 AML versus network bandwidth Needs high bandwidth, lowcontention network to be coetitive

25 Suary Network design and ileentation is going to be hugely iortant in roviding shared eory abstractions for ulticores regardless of articular aroach used! 29

Recap Consistent cuts. CS514: Intermediate Course in Operating Systems. What time is it? But what does time mean? Drawing time-line pictures:

Recap Consistent cuts. CS514: Intermediate Course in Operating Systems. What time is it? But what does time mean? Drawing time-line pictures: CS514: Interediate Course in Oerating Systes Professor Ken iran Vivek Vishnuurthy: T Reca Consistent cuts On Monday we saw that sily gathering the state of a syste isn t enough Often the state includes

More information

Performance analysis of hybrid (M/M/1 and M/M/m) client server model using Queuing theory

Performance analysis of hybrid (M/M/1 and M/M/m) client server model using Queuing theory International Journal of Electronic and Couter cience Engineering vailable Online at wwwijeceorg IN- 77-9 erforance analyi of hybrid M/M/ and M/M/ client erver odel uing ueuing theory atarhi Guta, Dr Rajan

More information

A Fail-Aware Datagram Service

A Fail-Aware Datagram Service A Fail-Aware Datagra Service Christof Fetzer and Flaviu Cristian christof@research.att.co, htt://www.christof.org Abstract In distributed real-tie systes it is often useful for a rocess to know that another

More information

Collaborative Web Caching Based on Proxy Affinities

Collaborative Web Caching Based on Proxy Affinities Collaborative Web Caching Based on Proxy Affinities Jiong Yang T J Watson Research Center IBM jiyang@usibco Wei Wang T J Watson Research Center IBM ww1@usibco Richard Muntz Coputer Science Departent UCLA

More information

10. Multiprocessor Scheduling (Advanced)

10. Multiprocessor Scheduling (Advanced) 10. Multirocessor Scheduling (Advanced) Oerating System: Three Easy Pieces AOS@UC 1 Multirocessor Scheduling The rise of the multicore rocessor is the source of multirocessorscheduling roliferation. w

More information

A Fail-Aware Datagram Service

A Fail-Aware Datagram Service A Fail-Aware Datagra Service Christof Fetzer and Flaviu Cristian christof@research.att.co, htt://www.christof.org Abstract In distributed real-tie systes it is often useful for a rocess Ô to know that

More information

A Cache Coherence Protocol to Implement Sequential Consistency. Memory Consistency in SMPs

A Cache Coherence Protocol to Implement Sequential Consistency. Memory Consistency in SMPs 6.823, L20--1 A Cache Coherence rotocol to Ipleent Sequential Consistency Laboratory for Coputer Science M.I.T. http://www.csg.lcs.it.edu/6.823 Meory Consistency in SMs CU-1 CU-2 6.823, L20--2 A 100 cache-1

More information

EXTENDED SVD FLATNESS CONTROL. Per Erik Modén and Markus Holm ABB AB, Västerås, Sweden

EXTENDED SVD FLATNESS CONTROL. Per Erik Modén and Markus Holm ABB AB, Västerås, Sweden EXTENDED SVD FLATNESS CONTROL Per Erik Modén and Markus Hol ABB AB, Västerås, Sweden ABSTRACT Cold rolling ills soeties do not see able to control flatness as well as exected, taking into account the nuber

More information

Scalable Cache Coherence

Scalable Cache Coherence arallel Computing Scalable Cache Coherence Hwansoo Han Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels of caches on a processor Large scale multiprocessors with hierarchy

More information

Adaptive Parameter Estimation Based Congestion Avoidance Strategy for DTN

Adaptive Parameter Estimation Based Congestion Avoidance Strategy for DTN Proceedings of the nd International onference on oputer Science and Electronics Engineering (ISEE 3) Adaptive Paraeter Estiation Based ongestion Avoidance Strategy for DTN Qicai Yang, Futong Qin, Jianquan

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

Scalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Scalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Scalable Cache Coherence Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels

More information

Scalable Cache Coherence

Scalable Cache Coherence Scalable Cache Coherence [ 8.1] All of the cache-coherent systems we have talked about until now have had a bus. Not only does the bus guarantee serialization of transactions; it also serves as convenient

More information

An energy-efficient random verification protocol for the detection of node clone attacks in wireless sensor networks

An energy-efficient random verification protocol for the detection of node clone attacks in wireless sensor networks Zhou et al. EURASIP Journal on Wireless Counications and Networking 2014, 2014:163 htt://jwcn.eurasijournals.co/content/2014/1/163 RESEARCH Oen Access An energy-efficient rando verification rotocol for

More information

I n many cases, the SPRT will come to a decision with fewer samples than would have been required for a fixed size test.

I n many cases, the SPRT will come to a decision with fewer samples than would have been required for a fixed size test. STATGRAPHICS Rev. 9/6/3 Sequential Saling Suary... Data Inut... 3 Analysis Otions... 3 Analysis Suary... 5 Cuulative Plot... 6 Decision Nubers... 9 Test Perforance... O. C. Curve... ASN Function... Forulas...

More information

New method of angle error measurement in angular artifacts using minimum zone flatness plane

New method of angle error measurement in angular artifacts using minimum zone flatness plane Alied Mechanics and Materials Subitted: 04-05-4 ISSN: 66-748, Vols. 599-60, 997-004 Acceted: 04-06-05 doi:0.408/www.scientific.net/amm.599-60.997 Online: 04-08- 04 Trans Tech Publications, Switzerland

More information

Energy-Efficient Disk Replacement and File Placement Techniques for Mobile Systems with Hard Disks

Energy-Efficient Disk Replacement and File Placement Techniques for Mobile Systems with Hard Disks Energy-Efficient Disk Replaceent and File Placeent Techniques for Mobile Systes with Hard Disks Young-Jin Ki School of Coputer Science & Engineering Seoul National University Seoul 151-742, KOREA youngjk@davinci.snu.ac.kr

More information

Closing The Performance Gap between Causal Consistency and Eventual Consistency

Closing The Performance Gap between Causal Consistency and Eventual Consistency Closing The Perforance Gap between Causal Consistency and Eventual Consistency Jiaqing Du Călin Iorgulescu Aitabha Roy Willy Zwaenepoel EPFL ABSTRACT It is well known that causal consistency is ore expensive

More information

Lecture 5: Directory Protocols. Topics: directory-based cache coherence implementations

Lecture 5: Directory Protocols. Topics: directory-based cache coherence implementations Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations 1 Flat Memory-Based Directories Block size = 128 B Memory in each node = 1 GB Cache in each node = 1 MB For 64 nodes

More information

ATAC: Improving Performance and Programmability with On-Chip Optical Networks

ATAC: Improving Performance and Programmability with On-Chip Optical Networks ATAC: Improving Performance and Programmability with On-Chip Optical Networks James Psota, Jason Miller, George Kurian, Nathan Beckmann, Jonathan Eastep, Henry Hoffman, Jifeng Liu, Mark Beals, Jurgen Michel,

More information

Scheduling Parallel Real-Time Recurrent Tasks on Multicore Platforms

Scheduling Parallel Real-Time Recurrent Tasks on Multicore Platforms IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL., NO., NOV 27 Scheduling Parallel Real-Tie Recurrent Tasks on Multicore Platfors Risat Pathan, Petros Voudouris, and Per Stenströ Abstract We

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES University of Toronto Interaction of Coherence and Network 2 Cache coherence protocol drives network-on-chip traffic Scalable coherence protocols

More information

DYNAMIC ESTIMATION OF BDP IN MANETS FOR EFFECTIVE NEXT NODE SELECTION

DYNAMIC ESTIMATION OF BDP IN MANETS FOR EFFECTIVE NEXT NODE SELECTION www.arpnjournals.co DYNAMIC ESTIMATION OF BDP IN MANETS FOR EFFECTIVE NEXT NODE SELECTION N. Snehalatha 1 and Paul Rodrigues 2 1 School of Coputing, SRM University, Chennai, Tail Nadu, India 2 Departent

More information

An Efficient Approach for Content Delivery in Overlay Networks

An Efficient Approach for Content Delivery in Overlay Networks An Efficient Approach for Content Delivery in Overlay Networks Mohaad Malli, Chadi Barakat, Walid Dabbous Projet Planète, INRIA-Sophia Antipolis, France E-ail:{alli, cbarakat, dabbous}@sophia.inria.fr

More information

Scalable Multiprocessors

Scalable Multiprocessors Scalable Multiprocessors [ 11.1] scalable system is one in which resources can be added to the system without reaching a hard limit. Of course, there may still be economic limits. s the size of the system

More information

DSENT A Tool Connecting Emerging Photonics with Electronics for Opto- Electronic Networks-on-Chip Modeling Chen Sun

DSENT A Tool Connecting Emerging Photonics with Electronics for Opto- Electronic Networks-on-Chip Modeling Chen Sun A Tool Connecting Emerging Photonics with Electronics for Opto- Electronic Networks-on-Chip Modeling Chen Sun In collaboration with: Chia-Hsin Owen Chen George Kurian Lan Wei Jason Miller Jurgen Michel

More information

Cache Coherence in Scalable Machines

Cache Coherence in Scalable Machines ache oherence in Scalable Machines SE 661 arallel and Vector Architectures rof. Muhamed Mudawar omputer Engineering Department King Fahd University of etroleum and Minerals Generic Scalable Multiprocessor

More information

Modeling Parallel Applications Performance on Heterogeneous Systems

Modeling Parallel Applications Performance on Heterogeneous Systems Modeling Parallel Applications Perforance on Heterogeneous Systes Jaeela Al-Jaroodi, Nader Mohaed, Hong Jiang and David Swanson Departent of Coputer Science and Engineering University of Nebraska Lincoln

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604

More information

Shortest Path Determination in a Wireless Packet Switch Network System in University of Calabar Using a Modified Dijkstra s Algorithm

Shortest Path Determination in a Wireless Packet Switch Network System in University of Calabar Using a Modified Dijkstra s Algorithm International Journal of Engineering and Technical Research (IJETR) ISSN: 31-869 (O) 454-4698 (P), Volue-5, Issue-1, May 16 Shortest Path Deterination in a Wireless Packet Switch Network Syste in University

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

A Low-cost Memory Architecture with NAND XIP for Mobile Embedded Systems

A Low-cost Memory Architecture with NAND XIP for Mobile Embedded Systems A Low-cost Meory Architecture with XIP for Mobile Ebedded Systes Chanik Park, Jaeyu Seo, Sunghwan Bae, Hyojun Ki, Shinhan Ki and Busoo Ki Software Center, SAMSUNG Electronics, Co., Ltd. Seoul 135-893,

More information

A Scalable SAS Machine

A Scalable SAS Machine arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 Scalable ache oherence Design principles of scalable cache protocols Overview of design space (8.1) Basic operation

More information

Control plane and data plane. Computing systems now. Glacial process of innovation made worse by standards process. Computing systems once upon a time

Control plane and data plane. Computing systems now. Glacial process of innovation made worse by standards process. Computing systems once upon a time Classical work Architecture A A A Intro to SDN A A Oerating A Secialized Packet A A Oerating Secialized Packet A A A Oerating A Secialized Packet A A Oerating A Secialized Packet Oerating Secialized Packet

More information

Integrating fast mobility in the OLSR routing protocol

Integrating fast mobility in the OLSR routing protocol Integrating fast obility in the OLSR routing protocol Mounir BENZAID 1,2, Pascale MINET 1 and Khaldoun AL AGHA 1,2 1 INRIA, Doaine de Voluceau - B.P.105, 78153 Le Chesnay Cedex, FRANCE ounir.benzaid, pascale.inet@inria.fr

More information

Analysing Real-Time Communications: Controller Area Network (CAN) *

Analysing Real-Time Communications: Controller Area Network (CAN) * Analysing Real-Tie Counications: Controller Area Network (CAN) * Abstract The increasing use of counication networks in tie critical applications presents engineers with fundaental probles with the deterination

More information

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Sandro Bartolini* Department of Information Engineering, University of Siena, Italy bartolini@dii.unisi.it

More information

Investigation of The Time-Offset-Based QoS Support with Optical Burst Switching in WDM Networks

Investigation of The Time-Offset-Based QoS Support with Optical Burst Switching in WDM Networks Investigation of The Tie-Offset-Based QoS Support with Optical Burst Switching in WDM Networks Pingyi Fan, Chongxi Feng,Yichao Wang, Ning Ge State Key Laboratory on Microwave and Digital Counications,

More information

Grading Results Total 100

Grading Results Total 100 University of California, Berkeley College of Engineering Departent of Electrical Engineering and Coputer Sciences Fall 2003 Instructor: Dave Patterson 2003-11-19 v1.9 CS 152 Exa #2 Solutions Personal

More information

Cache Coherence in Scalable Machines

Cache Coherence in Scalable Machines Cache Coherence in Scalable Machines COE 502 arallel rocessing Architectures rof. Muhamed Mudawar Computer Engineering Department King Fahd University of etroleum and Minerals Generic Scalable Multiprocessor

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working

More information

Cache Coherence. Todd C. Mowry CS 740 November 10, Topics. The Cache Coherence Problem Snoopy Protocols Directory Protocols

Cache Coherence. Todd C. Mowry CS 740 November 10, Topics. The Cache Coherence Problem Snoopy Protocols Directory Protocols Cache Coherence Todd C. Mowry CS 740 November 10, 1998 Topics The Cache Coherence roblem Snoopy rotocols Directory rotocols The Cache Coherence roblem Caches are critical to modern high-speed processors

More information

CLOUD computing is quickly becoming an effective and

CLOUD computing is quickly becoming an effective and IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 6, JUNE 203 087 Otial Multiserver Configuration for Profit Maxiization in Cloud Couting Junwei Cao, Senior Meber, IEEE, Kai Hwang, Fellow,

More information

Data & Knowledge Engineering

Data & Knowledge Engineering Data & Knowledge Engineering 7 (211) 17 187 Contents lists available at ScienceDirect Data & Knowledge Engineering journal hoepage: www.elsevier.co/locate/datak An approxiate duplicate eliination in RFID

More information

Markov Analysis for Optimum Caching as an Alternative to Belady s Algorithm

Markov Analysis for Optimum Caching as an Alternative to Belady s Algorithm arov Analysis for Otiu Caching as an Alternative to Belady s Algorith, Deutsche Teleo, Darstadt, Gerany gerhard.hasslinger@teleo.de Analytic Results on LRU, LFU, Otiu Caching Belady s Princile for Otiu

More information

Utility-Based Resource Allocation for Mixed Traffic in Wireless Networks

Utility-Based Resource Allocation for Mixed Traffic in Wireless Networks IEEE IFOCO 2 International Workshop on Future edia etworks and IP-based TV Utility-Based Resource Allocation for ixed Traffic in Wireless etworks Li Chen, Bin Wang, Xiaohang Chen, Xin Zhang, and Dacheng

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

Lecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations

Lecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations 1 Design Issues, Optimizations When does memory get updated? demotion from modified to shared? move from modified in

More information

COMP 250. Lecture 4. Array lists. Sept. 15, 2017

COMP 250. Lecture 4. Array lists. Sept. 15, 2017 COMP 25 Lecture 4 Arra lists Set. 5, 27 Arras in Java int[ ] Ints = new int[5]; Ints[3] = -732; Arra whose eleents have a riitive te 2 Ints int[ ] Ints = new int[5]; Ints[3] = -732; 2 3 : 4-732 : Arras

More information

Enhancing Real-Time CAN Communications by the Prioritization of Urgent Messages at the Outgoing Queue

Enhancing Real-Time CAN Communications by the Prioritization of Urgent Messages at the Outgoing Queue Enhancing Real-Tie CAN Counications by the Prioritization of Urgent Messages at the Outgoing Queue ANTÓNIO J. PIRES (1), JOÃO P. SOUSA (), FRANCISCO VASQUES (3) 1,,3 Faculdade de Engenharia da Universidade

More information

Scalable Cache Coherent Systems Scalable distributed shared memory machines Assumptions:

Scalable Cache Coherent Systems Scalable distributed shared memory machines Assumptions: Scalable ache oherent Systems Scalable distributed shared memory machines ssumptions: rocessor-ache-memory nodes connected by scalable network. Distributed shared physical address space. ommunication assist

More information

Packet-Switched On-Chip FPGA Overlay Networks

Packet-Switched On-Chip FPGA Overlay Networks Packet-Switched On-Chip FPGA Overlay Networks Thesis by Nachiket Kapre In Partial Fulfillent of the Requireents for the Degree of Master of Science California Institute of Technology Pasadena, California

More information

The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing

The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing Mikael Taveniku 2,3, Anders Åhlander 1,3, Magnus Jonsson 1 and Bertil Svensson 1,2

More information

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,

More information

Cache Coherence: Part II Scalable Approaches

Cache Coherence: Part II Scalable Approaches ache oherence: art II Scalable pproaches Hierarchical ache oherence Todd. Mowry S 74 October 27, 2 (a) 1 2 1 2 (b) 1 Topics Hierarchies Directory rotocols Hierarchies arise in different ways: (a) processor

More information

Secure Wireless Multihop Transmissions by Intentional Collisions with Noise Wireless Signals

Secure Wireless Multihop Transmissions by Intentional Collisions with Noise Wireless Signals Int'l Conf. Wireless etworks ICW'16 51 Secure Wireless Multihop Transissions by Intentional Collisions with oise Wireless Signals Isau Shiada 1 and Hiroaki Higaki 1 1 Tokyo Denki University, Japan Abstract

More information

CMSC 411 Computer Systems Architecture Lecture 21 Multiprocessors 3

CMSC 411 Computer Systems Architecture Lecture 21 Multiprocessors 3 MS 411 omputer Systems rchitecture Lecture 21 Multiprocessors 3 Outline Review oherence Write onsistency dministrivia Snooping Building Blocks Snooping protocols and examples oherence traffic and performance

More information

Lecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)

Lecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based) Lecture 8: Snooping and Directory Protocols Topics: split-transaction implementation details, directory implementations (memory- and cache-based) 1 Split Transaction Bus So far, we have assumed that a

More information

Locality-Aware Data Replication in the Last-Level Cache

Locality-Aware Data Replication in the Last-Level Cache Locality-Aware Data Replication in the Last-Level Cache George Kurian, Srinivas Devadas Massachusetts Institute of Technology Cambridge, MA USA {gkurian, devadas}@csail.mit.edu Omer Khan University of

More information

Multipath Selection and Channel Assignment in Wireless Mesh Networks

Multipath Selection and Channel Assignment in Wireless Mesh Networks Multipath Selection and Channel Assignent in Wireless Mesh Networs Soo-young Jang and Chae Y. Lee Dept. of Industrial and Systes Engineering, KAIST, 373-1 Kusung-dong, Taejon, Korea Tel: +82-42-350-5916,

More information

Scalable Cache Coherent Systems

Scalable Cache Coherent Systems NUM SS Scalable ache oherent Systems Scalable distributed shared memory machines ssumptions: rocessor-ache-memory nodes connected by scalable network. Distributed shared physical address space. ommunication

More information

Design Optimization of Mixed Time/Event-Triggered Distributed Embedded Systems

Design Optimization of Mixed Time/Event-Triggered Distributed Embedded Systems Design Optiization of Mixed Tie/Event-Triggered Distributed Ebedded Systes Traian Pop, Petru Eles, Zebo Peng Dept. of Coputer and Inforation Science, Linköping University {trapo, petel, zebpe}@ida.liu.se

More information

Lecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations

Lecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations 1 Split Transaction Bus So far, we have assumed that a coherence operation (request, snoops, responses,

More information

6.1 Multiprocessor Computing Environment

6.1 Multiprocessor Computing Environment 6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,

More information

MIDA: AN IDA SEARCH WITH DYNAMIC CONTROL

MIDA: AN IDA SEARCH WITH DYNAMIC CONTROL April 1991 UILU-ENG-91-2216 CRHC-91-9 Center fo r Reliable and High-Perforance Coputing MIDA: AN IDA SEARCH WITH DYNAMIC CONTROL Benjain W. Wah Coordinated Science Laboratory College of Engineering UNIVERSITY

More information

EE 364B Convex Optimization An ADMM Solution to the Sparse Coding Problem. Sonia Bhaskar, Will Zou Final Project Spring 2011

EE 364B Convex Optimization An ADMM Solution to the Sparse Coding Problem. Sonia Bhaskar, Will Zou Final Project Spring 2011 EE 364B Convex Optiization An ADMM Solution to the Sparse Coding Proble Sonia Bhaskar, Will Zou Final Project Spring 20 I. INTRODUCTION For our project, we apply the ethod of the alternating direction

More information

Platforms Design Challenges with many cores

Platforms Design Challenges with many cores latforms Design hallenges with many cores Raj Yavatkar, Intel Fellow Director, Systems Technology Lab orporate Technology Group 1 Environmental Trends: ell 2 *Other names and brands may be claimed as the

More information

10 File System Mass Storage Structure Mass Storage Systems Mass Storage Structure Mass Storage Structure FILE SYSTEM 1

10 File System Mass Storage Structure Mass Storage Systems Mass Storage Structure Mass Storage Structure FILE SYSTEM 1 10 File System 1 We will examine this chater in three subtitles: Mass Storage Systems OERATING SYSTEMS FILE SYSTEM 1 File System Interface File System Imlementation 10.1.1 Mass Storage Structure 3 2 10.1

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

A Novel Architecture for Compiled-type Software CNC System

A Novel Architecture for Compiled-type Software CNC System Key Engineering Materials Online: 2007-05-15 ISSN: 1662-9795, ol. 339, 442-446 doi:10.4028/.scientific.net/kem.339.442 2007 rans ech Pulications, Sitzerland A Novel Architecture for Coiled-tye Softare

More information

Multi Packet Reception and Network Coding

Multi Packet Reception and Network Coding The 2010 Military Counications Conference - Unclassified Progra - etworking Protocols and Perforance Track Multi Packet Reception and etwork Coding Aran Rezaee Research Laboratory of Electronics Massachusetts

More information

Rethinking Last-Level Cache Management for Multicores Operating at Near-Threshold

Rethinking Last-Level Cache Management for Multicores Operating at Near-Threshold Rethinking Last-Level Cache Management for Multicores Operating at Near-Threshold Farrukh Hijaz, Omer Khan University of Connecticut Power Efficiency Performance/Watt Multicores enable efficiency Power-performance

More information

The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith

The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith Review Introduction Optimizing the OS based on hardware Processor changes Shared Memory vs

More information

Derivation of an Analytical Model for Evaluating the Performance of a Multi- Queue Nodes Network Router

Derivation of an Analytical Model for Evaluating the Performance of a Multi- Queue Nodes Network Router Derivation of an Analytical Model for Evaluating the Perforance of a Multi- Queue Nodes Network Router 1 Hussein Al-Bahadili, 1 Jafar Ababneh, and 2 Fadi Thabtah 1 Coputer Inforation Systes Faculty of

More information

Near Light Correction for Image Relighting and 3D Shape Recovery

Near Light Correction for Image Relighting and 3D Shape Recovery Near Light Correction for Iage Relighting and 3D Shae Recovery Anonyous for Review Abstract In this aer, we roose a near-light illuination odel for iage relighting and 3D shae recovery Classic ethods such

More information

Automated Installation Verification of COMSOL via LiveLink for MATLAB

Automated Installation Verification of COMSOL via LiveLink for MATLAB Autoated Installation Verification of COMSOL via LiveLink for MATLAB Michael W. Crowell Oak Ridge National Laboratory, PO Bo 2008 MS6423, Oak Ridge, TN 37831 crowellw@ornl.gov Abstract: Verifying that

More information

A Novel Fast Constructive Algorithm for Neural Classifier

A Novel Fast Constructive Algorithm for Neural Classifier A Novel Fast Constructive Algorith for Neural Classifier Xudong Jiang Centre for Signal Processing, School of Electrical and Electronic Engineering Nanyang Technological University Nanyang Avenue, Singapore

More information

Structural Balance in Networks. An Optimizational Approach. Andrej Mrvar. Faculty of Social Sciences. University of Ljubljana. Kardeljeva pl.

Structural Balance in Networks. An Optimizational Approach. Andrej Mrvar. Faculty of Social Sciences. University of Ljubljana. Kardeljeva pl. Structural Balance in Networks An Optiizational Approach Andrej Mrvar Faculty of Social Sciences University of Ljubljana Kardeljeva pl. 5 61109 Ljubljana March 23 1994 Contents 1 Balanced and clusterable

More information

A Low-Cost Multi-Failure Resilient Replication Scheme for High Data Availability in Cloud Storage

A Low-Cost Multi-Failure Resilient Replication Scheme for High Data Availability in Cloud Storage 216 IEEE 23rd International Conference on High Perforance Coputing A Low-Cost Multi-Failure Resilient Replication Schee for High Data Availability in Cloud Storage Jinwei Liu* and Haiying Shen *Departent

More information

A Case for Fine-Grain Adaptive Cache Coherence George Kurian, Omer Khan, and Srinivas Devadas

A Case for Fine-Grain Adaptive Cache Coherence George Kurian, Omer Khan, and Srinivas Devadas Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2012-012 May 22, 2012 A Case for Fine-Grain Adaptive Cache Coherence George Kurian, Omer Khan, and Srinivas Devadas

More information

A Generic Architecture for Programmable Trac. Shaper for High Speed Networks. Krishnan K. Kailas y Ashok K. Agrawala z. fkrish,

A Generic Architecture for Programmable Trac. Shaper for High Speed Networks. Krishnan K. Kailas y Ashok K. Agrawala z. fkrish, A Generic Architecture for Prograable Trac Shaper for High Speed Networks Krishnan K. Kailas y Ashok K. Agrawala z fkrish, agrawalag@cs.ud.edu y Departent of Electrical Engineering z Departent of Coputer

More information

Redundancy Level Impact of the Mean Time to Failure on Wireless Sensor Network

Redundancy Level Impact of the Mean Time to Failure on Wireless Sensor Network (IJACSA) International Journal of Advanced Coputer Science and Applications Vol. 8, No. 1, 217 Redundancy Level Ipact of the Mean Tie to Failure on Wireless Sensor Network Alaa E. S. Ahed 1 College of

More information

Joint Measurement- and Traffic Descriptor-based Admission Control at Real-Time Traffic Aggregation Points

Joint Measurement- and Traffic Descriptor-based Admission Control at Real-Time Traffic Aggregation Points Joint Measureent- and Traffic Descriptor-based Adission Control at Real-Tie Traffic Aggregation Points Stylianos Georgoulas, Panos Triintzios and George Pavlou Centre for Counication Systes Research, University

More information

Control Message Reduction Techniques in Backward Learning Ad Hoc Routing Protocols

Control Message Reduction Techniques in Backward Learning Ad Hoc Routing Protocols Control Message Reduction Techniques in Backward Learning Ad Hoc Routing Protocols Navodaya Garepalli Kartik Gopalan Ping Yang Coputer Science, Binghaton University (State University of New York) Contact:

More information

A Network-based Seamless Handover Scheme for Multi-homed Devices

A Network-based Seamless Handover Scheme for Multi-homed Devices A Network-based Sealess Handover Schee for Multi-hoed Devices Md. Shohrab Hossain and Mohaed Atiquzzaan School of Coputer Science, University of Oklahoa, Noran, OK 7319 Eail: {shohrab, atiq}@ou.edu Abstract

More information

1 P a g e. F x,x...,x,.,.' written as F D, is the same.

1 P a g e. F x,x...,x,.,.' written as F D, is the same. 11. The security syste at an IT office is coposed of 10 coputers of which exactly four are working. To check whether the syste is functional, the officials inspect four of the coputers picked at rando

More information

Data-driven Hybrid Caching in Hierarchical Edge Cache Networks

Data-driven Hybrid Caching in Hierarchical Edge Cache Networks Data-driven Hybrid Caching in Hierarchical Edge Cache Networks Abstract Hierarchical cache networks are increasingly deployed to facilitate high-throughput and low-latency content delivery to end users.

More information

Efficient file search in non-dht P2P networks

Efficient file search in non-dht P2P networks Available online at www.sciencedirect.co Coputer Counications 3 (28) 34 37 www.elsevier.co/locate/coco Efficient file search in non-dht P2P networks Shiping Chen a, Zhan Zhang b, *, Shigang Chen b, Baile

More information

MAC schemes - Fixed-assignment schemes

MAC schemes - Fixed-assignment schemes MAC schees - Fixed-assignent schees M. Veeraraghavan, April 6, 04 Mediu Access Control (MAC) schees are echaniss for sharing a single link. MAC schees are essentially ultiplexing schees. For exaple, on

More information

Analysis of a Biologically-Inspired System for Real-time Object Recognition

Analysis of a Biologically-Inspired System for Real-time Object Recognition Cognitive Science Online, Vol.3.,.-4, 5 htt://cogsci-online.ucsd.edu Analysis of a Biologically-Insired Syste for Real-tie Object Recognition Erik Murhy-Chutorian,*, Sarah Aboutalib & Jochen Triesch,3

More information

43. Log-structured File Systems

43. Log-structured File Systems 43. Log-structured File Systems Oerating System: Three Easy Pieces AOS@UC 1 LFS: Log-structured File System Proosed by Stanford back in 91 Motivated by: w DRAM Memory sizes where growing. w Large ga between

More information

Mapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues

Mapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues Mapping Data in Peer-to-Peer Systes: Seantics and Algorithic Issues Anastasios Keentsietsidis Marcelo Arenas Renée J. Miller Departent of Coputer Science University of Toronto {tasos,arenas,iller}@cs.toronto.edu

More information

Foundations of Computer Systems

Foundations of Computer Systems 18-600 Foundations of Computer Systems Lecture 21: Multicore Cache Coherence John P. Shen & Zhiyi Yu November 14, 2016 Prevalence of multicore processors: 2006: 75% for desktops, 85% for servers 2007:

More information

QUERY ROUTING OPTIMIZATION IN SENSOR COMMUNICATION NETWORKS

QUERY ROUTING OPTIMIZATION IN SENSOR COMMUNICATION NETWORKS QUERY ROUTING OPTIMIZATION IN SENSOR COMMUNICATION NETWORKS Guofei Jiang and George Cybenko Institute for Security Technology Studies and Thayer School of Engineering Dartouth College, Hanover NH 03755

More information

Flynn s Classification

Flynn s Classification Flynn s Classification SISD (Single Instruction Single Data) Uniprocessors MISD (Multiple Instruction Single Data) No machine is built yet for this type SIMD (Single Instruction Multiple Data) Examples:

More information

On the Accuracy of MANET Simulators

On the Accuracy of MANET Simulators On the ccuracy of MNT Siulators avid avin david.cavin@epfl.ch Yoav Sasson yoav.sasson@epfl.ch istributed Systes Laboratory cole Polytechnique Fédérale de Lausanne (PFL) H-115 Lausanne ndré Schiper andre.schiper@epfl.ch

More information

An Ensemble of Adaptive Neuro-Fuzzy Kohonen Networks for Online Data Stream Fuzzy Clustering

An Ensemble of Adaptive Neuro-Fuzzy Kohonen Networks for Online Data Stream Fuzzy Clustering An Enseble of Adative euro-fuzzy Kohonen etworks for Online Data Strea Fuzzy Clustering Zhengbing Hu School of Educational Inforation Technology Central China oral University Wuhan China Eail: hzb@ail.ccnu.edu.cn

More information

An Adaptive Low-latency Power Management Protocol for Wireless Sensor Networks

An Adaptive Low-latency Power Management Protocol for Wireless Sensor Networks An Adaptive Low-latency Power Manageent Protocol for Wireless Sensor Networks Giuseppe Anastasi, Marco Conti*, Mario Di Francesco, Andrea Passarella* Pervasive Coputing & Networking Lab. (PerLab) Departent

More information

Storing and Accessing Live Mashup Content in the Cloud

Storing and Accessing Live Mashup Content in the Cloud Storing and Accessing Live ashup Content in the Cloud Krzysztof Ostrowski Cornell University Ithaca, NY 14853, USA krzys@cs.cornell.edu Ken Biran Cornell University Ithaca, NY 14853, USA ken@cs.cornell.edu

More information