On-Chip Interconnect Implications of Shared Memory Multicores
|
|
- Malcolm Merritt
- 6 years ago
- Views:
Transcription
1 On-Chi Interconnect Ilications of Shared Meory Multicores Srini Devadas Couter Science and Artificial Intelligence Laboratory (CSAIL) Massachusetts Institute of Technology 1
2 Prograing 1000 cores MPI has been used to exloit large-scale arallelis in soe alications (e.g., 3D rendering) Requires individual tasks to be large; becoes difficult to aly at a fine-grained level Paradigs such as MaReduce and shard-based databases have been successful in articular alication doains A shared eory abstraction is required for generalurose rograing and running an oerating syste 3
3 The Proble Will failiar shared eory rograing odels be feasible at 1000 cores? 4
4 The Proble Will failiar shared eory rograing odels be feasible at 1000 cores? Cache Coherence Challenges Interconnection Network Challenges 5
5 The Proble Will failiar shared eory rograing odels be feasible at 1000 cores? Cache Coherence Proble Existing full-a directory-based rotocols do not scale High area overhead [O(N 2 )] Energy overhead roortional to area Hotsots on networks due to frequent invalidations Interconnection Network Proble 6
6 The Proble Will failiar shared eory rograing odels be feasible at 1000 cores? Cache Coherence Proble Interconnection Network Proble Existing eshes, rings unlikely to scale Energy-inefficient due to ultile routers and links High latency fro one core to another 7
7 Directory Cache Coherence Background Directory-based rotocols Need to kee track of who is sharing a cache block 8
8 Directory Cache Coherence Directory-based rotocols Need to kee track of who is sharing a cache block Full-a directories Background Maintain a bit for every ossible sharer Baseline rotocol requires invalidation of all read coies (otentially at every core) and collection of acknowledgeents (otentially fro every core) 9
9 Full-a directories Full-Ma Directories Maintain a bit for every ossible sharer For 1000 cores, need a 1000-bit vector for each 512-bit cache block (in naïve ileentation) Full-a directories consue too uch area. 10
10 Liited Directories Liited directories Liited nuber of hardware ointers (k) in the sharer list Address State Sharer 1 Sharer 2 Sharer k 11
11 Liited Directories Liited directories Liited nuber of hardware ointers (k) in the sharer list Dir(k)B Allow unliited sharers, but if (# sharers > k), use broadcast invalidate on exclusive request Requires ACKs fro ALL cores If sharers > k, we are broadcasting to 1000 cores and waiting for acknowledgeents fro 1000 cores Interconnect network will need to handle this traffic efficiently, else erforance suffers. 13
12 1-to-M (Broadcast/Multicast) and M-to-1 (Acks) occurrence? 14 % 14 % 2% 51 % 71 % 47 % AMD HyerTransort Token Coherence 64-core full-syste siulations 16
13 Why are 1-to-M and M-to-1 bad? Increased bandwidth consution U to M ties Increased network contention M essages at src/dest links More ackets in network 1-to-M => ulticasts M-to-1 => ACKs Worse as M Increased ower consution Bursty, not sustained Can we handle using an efficient network design? 17
14 Proble: how to route broadcasts? Sarse Multicast Tree Broadcast/Dense Multicast Tree Contention! Idle links! Tree constructed dynaically based on destination set Sae destination set (all nodes) => sae tree structure 18
15 Network-Centric Aroaches ATAC and ACKwise architecture that leverages hotonics (Agarwal grou) Assues a fast, energy-efficient otical network that enables energy-efficient broadcasts and long distance essages 19
16 ATAC Fro 10,000 Feet Electrical Mesh Interconnect Tiled Multicore Processor with Otical Network Overlay 2-D array of sile cores connected by an electrical esh network Electrical network rovides efficient short-range counication Otical overlay network rovides fast broadcast and long-distance counication Otical WDM Interconnect 20
17 ACKwise Protocol Extension of Dir(k)B rotocol Designed to leverage the ATAC broadcast network Address State Global Sharers 1-3 addr shared false Core-A Core-B Core-C Structure of an ACKwise(3) Directory Entry 21
18 ACKwise Protocol Nuber of Sharers > Nuber of Hardware Pointers (k) Tracks the nuber of sharers If (# sharers > k), use broadcast invalidate on exclusive request Requires ACKs fro ONLY sharers Address State Global Sharers 1-3 addr shared true 4 Structure of an ACKwise(3) Directory Entry 22
19 ATAC Architecture Details StarNet ENet Hub StarNet ENet (a) 64 Otically-Connected Clusters ONet (b) Electrical Networks Connecting 16 cores Takeaway: Otical network necessary but not sufficient for efficient coherence and high erforance 23
20 Evaluation Requires New Toolflow Cache Models Benchark Network Models Inuts Cache Counters Electrical Technology Paraeters Grahite NM Electrical Router & Link Counters Otical Link Counters Otical Technology Paraeters McPAT Modified Orion 2.0 Otical Models Tools Coletion Tie Cache Energy & Area Electrical Router & Link Energy & Area Otical Link Energy & Area Oututs 24
21 Network-Centric Aroaches ATAC and ACKwise architecture that leverages hotonics (Agarwal grou) Assues a fast, energy-efficient otical network that enables energy-efficient broadcasts and long distance essages Directoryless coherence via execution igration Migrate threads as oosed to igrating data for faster data access Requires high-bandwidth interconnect network 25
22 Execution Migration Machine (EM²) No data relication: Reote Access (RA) eory organization Idea: send thread on 1 st eory access ove context (RF etc.) to core where data lives ossibly evict a context currently at destination and execute in its lace igration entirely at hardware level for seed Avoids directories and directory rotocols but oses challenges in interconnect design
23 EM² Pluses and Minuses + One-way data access through thread igration Deterine core iss and reote core destination in arallel with L1 looku + No relication across on-chi caches lowers off-chi eory access rates in coarison to DirCC + No directories, broadcast or ulticast required - Context size of 2-4Kb significantly greater than 1 word (RA), and greater than 512-bit cache block size (Directory rotocols) - Contention esecially for reads of shared data (even read-only data is not relicated)
24 AML versus network bandwidth Needs high bandwidth, lowcontention network to be coetitive
25 Suary Network design and ileentation is going to be hugely iortant in roviding shared eory abstractions for ulticores regardless of articular aroach used! 29
Recap Consistent cuts. CS514: Intermediate Course in Operating Systems. What time is it? But what does time mean? Drawing time-line pictures:
CS514: Interediate Course in Oerating Systes Professor Ken iran Vivek Vishnuurthy: T Reca Consistent cuts On Monday we saw that sily gathering the state of a syste isn t enough Often the state includes
More informationPerformance analysis of hybrid (M/M/1 and M/M/m) client server model using Queuing theory
International Journal of Electronic and Couter cience Engineering vailable Online at wwwijeceorg IN- 77-9 erforance analyi of hybrid M/M/ and M/M/ client erver odel uing ueuing theory atarhi Guta, Dr Rajan
More informationA Fail-Aware Datagram Service
A Fail-Aware Datagra Service Christof Fetzer and Flaviu Cristian christof@research.att.co, htt://www.christof.org Abstract In distributed real-tie systes it is often useful for a rocess to know that another
More informationCollaborative Web Caching Based on Proxy Affinities
Collaborative Web Caching Based on Proxy Affinities Jiong Yang T J Watson Research Center IBM jiyang@usibco Wei Wang T J Watson Research Center IBM ww1@usibco Richard Muntz Coputer Science Departent UCLA
More information10. Multiprocessor Scheduling (Advanced)
10. Multirocessor Scheduling (Advanced) Oerating System: Three Easy Pieces AOS@UC 1 Multirocessor Scheduling The rise of the multicore rocessor is the source of multirocessorscheduling roliferation. w
More informationA Fail-Aware Datagram Service
A Fail-Aware Datagra Service Christof Fetzer and Flaviu Cristian christof@research.att.co, htt://www.christof.org Abstract In distributed real-tie systes it is often useful for a rocess Ô to know that
More informationA Cache Coherence Protocol to Implement Sequential Consistency. Memory Consistency in SMPs
6.823, L20--1 A Cache Coherence rotocol to Ipleent Sequential Consistency Laboratory for Coputer Science M.I.T. http://www.csg.lcs.it.edu/6.823 Meory Consistency in SMs CU-1 CU-2 6.823, L20--2 A 100 cache-1
More informationEXTENDED SVD FLATNESS CONTROL. Per Erik Modén and Markus Holm ABB AB, Västerås, Sweden
EXTENDED SVD FLATNESS CONTROL Per Erik Modén and Markus Hol ABB AB, Västerås, Sweden ABSTRACT Cold rolling ills soeties do not see able to control flatness as well as exected, taking into account the nuber
More informationScalable Cache Coherence
arallel Computing Scalable Cache Coherence Hwansoo Han Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels of caches on a processor Large scale multiprocessors with hierarchy
More informationAdaptive Parameter Estimation Based Congestion Avoidance Strategy for DTN
Proceedings of the nd International onference on oputer Science and Electronics Engineering (ISEE 3) Adaptive Paraeter Estiation Based ongestion Avoidance Strategy for DTN Qicai Yang, Futong Qin, Jianquan
More informationMultiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types
Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon
More informationScalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Scalable Cache Coherence Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels
More informationScalable Cache Coherence
Scalable Cache Coherence [ 8.1] All of the cache-coherent systems we have talked about until now have had a bus. Not only does the bus guarantee serialization of transactions; it also serves as convenient
More informationAn energy-efficient random verification protocol for the detection of node clone attacks in wireless sensor networks
Zhou et al. EURASIP Journal on Wireless Counications and Networking 2014, 2014:163 htt://jwcn.eurasijournals.co/content/2014/1/163 RESEARCH Oen Access An energy-efficient rando verification rotocol for
More informationI n many cases, the SPRT will come to a decision with fewer samples than would have been required for a fixed size test.
STATGRAPHICS Rev. 9/6/3 Sequential Saling Suary... Data Inut... 3 Analysis Otions... 3 Analysis Suary... 5 Cuulative Plot... 6 Decision Nubers... 9 Test Perforance... O. C. Curve... ASN Function... Forulas...
More informationNew method of angle error measurement in angular artifacts using minimum zone flatness plane
Alied Mechanics and Materials Subitted: 04-05-4 ISSN: 66-748, Vols. 599-60, 997-004 Acceted: 04-06-05 doi:0.408/www.scientific.net/amm.599-60.997 Online: 04-08- 04 Trans Tech Publications, Switzerland
More informationEnergy-Efficient Disk Replacement and File Placement Techniques for Mobile Systems with Hard Disks
Energy-Efficient Disk Replaceent and File Placeent Techniques for Mobile Systes with Hard Disks Young-Jin Ki School of Coputer Science & Engineering Seoul National University Seoul 151-742, KOREA youngjk@davinci.snu.ac.kr
More informationClosing The Performance Gap between Causal Consistency and Eventual Consistency
Closing The Perforance Gap between Causal Consistency and Eventual Consistency Jiaqing Du Călin Iorgulescu Aitabha Roy Willy Zwaenepoel EPFL ABSTRACT It is well known that causal consistency is ore expensive
More informationLecture 5: Directory Protocols. Topics: directory-based cache coherence implementations
Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations 1 Flat Memory-Based Directories Block size = 128 B Memory in each node = 1 GB Cache in each node = 1 MB For 64 nodes
More informationATAC: Improving Performance and Programmability with On-Chip Optical Networks
ATAC: Improving Performance and Programmability with On-Chip Optical Networks James Psota, Jason Miller, George Kurian, Nathan Beckmann, Jonathan Eastep, Henry Hoffman, Jifeng Liu, Mark Beals, Jurgen Michel,
More informationScheduling Parallel Real-Time Recurrent Tasks on Multicore Platforms
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL., NO., NOV 27 Scheduling Parallel Real-Tie Recurrent Tasks on Multicore Platfors Risat Pathan, Petros Voudouris, and Per Stenströ Abstract We
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationSIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto
SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES University of Toronto Interaction of Coherence and Network 2 Cache coherence protocol drives network-on-chip traffic Scalable coherence protocols
More informationDYNAMIC ESTIMATION OF BDP IN MANETS FOR EFFECTIVE NEXT NODE SELECTION
www.arpnjournals.co DYNAMIC ESTIMATION OF BDP IN MANETS FOR EFFECTIVE NEXT NODE SELECTION N. Snehalatha 1 and Paul Rodrigues 2 1 School of Coputing, SRM University, Chennai, Tail Nadu, India 2 Departent
More informationAn Efficient Approach for Content Delivery in Overlay Networks
An Efficient Approach for Content Delivery in Overlay Networks Mohaad Malli, Chadi Barakat, Walid Dabbous Projet Planète, INRIA-Sophia Antipolis, France E-ail:{alli, cbarakat, dabbous}@sophia.inria.fr
More informationScalable Multiprocessors
Scalable Multiprocessors [ 11.1] scalable system is one in which resources can be added to the system without reaching a hard limit. Of course, there may still be economic limits. s the size of the system
More informationDSENT A Tool Connecting Emerging Photonics with Electronics for Opto- Electronic Networks-on-Chip Modeling Chen Sun
A Tool Connecting Emerging Photonics with Electronics for Opto- Electronic Networks-on-Chip Modeling Chen Sun In collaboration with: Chia-Hsin Owen Chen George Kurian Lan Wei Jason Miller Jurgen Michel
More informationCache Coherence in Scalable Machines
ache oherence in Scalable Machines SE 661 arallel and Vector Architectures rof. Muhamed Mudawar omputer Engineering Department King Fahd University of etroleum and Minerals Generic Scalable Multiprocessor
More informationModeling Parallel Applications Performance on Heterogeneous Systems
Modeling Parallel Applications Perforance on Heterogeneous Systes Jaeela Al-Jaroodi, Nader Mohaed, Hong Jiang and David Swanson Departent of Coputer Science and Engineering University of Nebraska Lincoln
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604
More informationShortest Path Determination in a Wireless Packet Switch Network System in University of Calabar Using a Modified Dijkstra s Algorithm
International Journal of Engineering and Technical Research (IJETR) ISSN: 31-869 (O) 454-4698 (P), Volue-5, Issue-1, May 16 Shortest Path Deterination in a Wireless Packet Switch Network Syste in University
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationA Low-cost Memory Architecture with NAND XIP for Mobile Embedded Systems
A Low-cost Meory Architecture with XIP for Mobile Ebedded Systes Chanik Park, Jaeyu Seo, Sunghwan Bae, Hyojun Ki, Shinhan Ki and Busoo Ki Software Center, SAMSUNG Electronics, Co., Ltd. Seoul 135-893,
More informationA Scalable SAS Machine
arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 Scalable ache oherence Design principles of scalable cache protocols Overview of design space (8.1) Basic operation
More informationControl plane and data plane. Computing systems now. Glacial process of innovation made worse by standards process. Computing systems once upon a time
Classical work Architecture A A A Intro to SDN A A Oerating A Secialized Packet A A Oerating Secialized Packet A A A Oerating A Secialized Packet A A Oerating A Secialized Packet Oerating Secialized Packet
More informationIntegrating fast mobility in the OLSR routing protocol
Integrating fast obility in the OLSR routing protocol Mounir BENZAID 1,2, Pascale MINET 1 and Khaldoun AL AGHA 1,2 1 INRIA, Doaine de Voluceau - B.P.105, 78153 Le Chesnay Cedex, FRANCE ounir.benzaid, pascale.inet@inria.fr
More informationAnalysing Real-Time Communications: Controller Area Network (CAN) *
Analysing Real-Tie Counications: Controller Area Network (CAN) * Abstract The increasing use of counication networks in tie critical applications presents engineers with fundaental probles with the deterination
More informationMeet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors
Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Sandro Bartolini* Department of Information Engineering, University of Siena, Italy bartolini@dii.unisi.it
More informationInvestigation of The Time-Offset-Based QoS Support with Optical Burst Switching in WDM Networks
Investigation of The Tie-Offset-Based QoS Support with Optical Burst Switching in WDM Networks Pingyi Fan, Chongxi Feng,Yichao Wang, Ning Ge State Key Laboratory on Microwave and Digital Counications,
More informationGrading Results Total 100
University of California, Berkeley College of Engineering Departent of Electrical Engineering and Coputer Sciences Fall 2003 Instructor: Dave Patterson 2003-11-19 v1.9 CS 152 Exa #2 Solutions Personal
More informationCache Coherence in Scalable Machines
Cache Coherence in Scalable Machines COE 502 arallel rocessing Architectures rof. Muhamed Mudawar Computer Engineering Department King Fahd University of etroleum and Minerals Generic Scalable Multiprocessor
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working
More informationCache Coherence. Todd C. Mowry CS 740 November 10, Topics. The Cache Coherence Problem Snoopy Protocols Directory Protocols
Cache Coherence Todd C. Mowry CS 740 November 10, 1998 Topics The Cache Coherence roblem Snoopy rotocols Directory rotocols The Cache Coherence roblem Caches are critical to modern high-speed processors
More informationCLOUD computing is quickly becoming an effective and
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 6, JUNE 203 087 Otial Multiserver Configuration for Profit Maxiization in Cloud Couting Junwei Cao, Senior Meber, IEEE, Kai Hwang, Fellow,
More informationData & Knowledge Engineering
Data & Knowledge Engineering 7 (211) 17 187 Contents lists available at ScienceDirect Data & Knowledge Engineering journal hoepage: www.elsevier.co/locate/datak An approxiate duplicate eliination in RFID
More informationMarkov Analysis for Optimum Caching as an Alternative to Belady s Algorithm
arov Analysis for Otiu Caching as an Alternative to Belady s Algorith, Deutsche Teleo, Darstadt, Gerany gerhard.hasslinger@teleo.de Analytic Results on LRU, LFU, Otiu Caching Belady s Princile for Otiu
More informationUtility-Based Resource Allocation for Mixed Traffic in Wireless Networks
IEEE IFOCO 2 International Workshop on Future edia etworks and IP-based TV Utility-Based Resource Allocation for ixed Traffic in Wireless etworks Li Chen, Bin Wang, Xiaohang Chen, Xin Zhang, and Dacheng
More informationCache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues
More informationLecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations
Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations 1 Design Issues, Optimizations When does memory get updated? demotion from modified to shared? move from modified in
More informationCOMP 250. Lecture 4. Array lists. Sept. 15, 2017
COMP 25 Lecture 4 Arra lists Set. 5, 27 Arras in Java int[ ] Ints = new int[5]; Ints[3] = -732; Arra whose eleents have a riitive te 2 Ints int[ ] Ints = new int[5]; Ints[3] = -732; 2 3 : 4-732 : Arras
More informationEnhancing Real-Time CAN Communications by the Prioritization of Urgent Messages at the Outgoing Queue
Enhancing Real-Tie CAN Counications by the Prioritization of Urgent Messages at the Outgoing Queue ANTÓNIO J. PIRES (1), JOÃO P. SOUSA (), FRANCISCO VASQUES (3) 1,,3 Faculdade de Engenharia da Universidade
More informationScalable Cache Coherent Systems Scalable distributed shared memory machines Assumptions:
Scalable ache oherent Systems Scalable distributed shared memory machines ssumptions: rocessor-ache-memory nodes connected by scalable network. Distributed shared physical address space. ommunication assist
More informationPacket-Switched On-Chip FPGA Overlay Networks
Packet-Switched On-Chip FPGA Overlay Networks Thesis by Nachiket Kapre In Partial Fulfillent of the Requireents for the Degree of Master of Science California Institute of Technology Pasadena, California
More informationThe VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing
The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing Mikael Taveniku 2,3, Anders Åhlander 1,3, Magnus Jonsson 1 and Bertil Svensson 1,2
More informationMultiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism
Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,
More informationCache Coherence: Part II Scalable Approaches
ache oherence: art II Scalable pproaches Hierarchical ache oherence Todd. Mowry S 74 October 27, 2 (a) 1 2 1 2 (b) 1 Topics Hierarchies Directory rotocols Hierarchies arise in different ways: (a) processor
More informationSecure Wireless Multihop Transmissions by Intentional Collisions with Noise Wireless Signals
Int'l Conf. Wireless etworks ICW'16 51 Secure Wireless Multihop Transissions by Intentional Collisions with oise Wireless Signals Isau Shiada 1 and Hiroaki Higaki 1 1 Tokyo Denki University, Japan Abstract
More informationCMSC 411 Computer Systems Architecture Lecture 21 Multiprocessors 3
MS 411 omputer Systems rchitecture Lecture 21 Multiprocessors 3 Outline Review oherence Write onsistency dministrivia Snooping Building Blocks Snooping protocols and examples oherence traffic and performance
More informationLecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)
Lecture 8: Snooping and Directory Protocols Topics: split-transaction implementation details, directory implementations (memory- and cache-based) 1 Split Transaction Bus So far, we have assumed that a
More informationLocality-Aware Data Replication in the Last-Level Cache
Locality-Aware Data Replication in the Last-Level Cache George Kurian, Srinivas Devadas Massachusetts Institute of Technology Cambridge, MA USA {gkurian, devadas}@csail.mit.edu Omer Khan University of
More informationMultipath Selection and Channel Assignment in Wireless Mesh Networks
Multipath Selection and Channel Assignent in Wireless Mesh Networs Soo-young Jang and Chae Y. Lee Dept. of Industrial and Systes Engineering, KAIST, 373-1 Kusung-dong, Taejon, Korea Tel: +82-42-350-5916,
More informationScalable Cache Coherent Systems
NUM SS Scalable ache oherent Systems Scalable distributed shared memory machines ssumptions: rocessor-ache-memory nodes connected by scalable network. Distributed shared physical address space. ommunication
More informationDesign Optimization of Mixed Time/Event-Triggered Distributed Embedded Systems
Design Optiization of Mixed Tie/Event-Triggered Distributed Ebedded Systes Traian Pop, Petru Eles, Zebo Peng Dept. of Coputer and Inforation Science, Linköping University {trapo, petel, zebpe}@ida.liu.se
More informationLecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations
Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations 1 Split Transaction Bus So far, we have assumed that a coherence operation (request, snoops, responses,
More information6.1 Multiprocessor Computing Environment
6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,
More informationMIDA: AN IDA SEARCH WITH DYNAMIC CONTROL
April 1991 UILU-ENG-91-2216 CRHC-91-9 Center fo r Reliable and High-Perforance Coputing MIDA: AN IDA SEARCH WITH DYNAMIC CONTROL Benjain W. Wah Coordinated Science Laboratory College of Engineering UNIVERSITY
More informationEE 364B Convex Optimization An ADMM Solution to the Sparse Coding Problem. Sonia Bhaskar, Will Zou Final Project Spring 2011
EE 364B Convex Optiization An ADMM Solution to the Sparse Coding Proble Sonia Bhaskar, Will Zou Final Project Spring 20 I. INTRODUCTION For our project, we apply the ethod of the alternating direction
More informationPlatforms Design Challenges with many cores
latforms Design hallenges with many cores Raj Yavatkar, Intel Fellow Director, Systems Technology Lab orporate Technology Group 1 Environmental Trends: ell 2 *Other names and brands may be claimed as the
More information10 File System Mass Storage Structure Mass Storage Systems Mass Storage Structure Mass Storage Structure FILE SYSTEM 1
10 File System 1 We will examine this chater in three subtitles: Mass Storage Systems OERATING SYSTEMS FILE SYSTEM 1 File System Interface File System Imlementation 10.1.1 Mass Storage Structure 3 2 10.1
More informationShared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network
Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache
More informationA Novel Architecture for Compiled-type Software CNC System
Key Engineering Materials Online: 2007-05-15 ISSN: 1662-9795, ol. 339, 442-446 doi:10.4028/.scientific.net/kem.339.442 2007 rans ech Pulications, Sitzerland A Novel Architecture for Coiled-tye Softare
More informationMulti Packet Reception and Network Coding
The 2010 Military Counications Conference - Unclassified Progra - etworking Protocols and Perforance Track Multi Packet Reception and etwork Coding Aran Rezaee Research Laboratory of Electronics Massachusetts
More informationRethinking Last-Level Cache Management for Multicores Operating at Near-Threshold
Rethinking Last-Level Cache Management for Multicores Operating at Near-Threshold Farrukh Hijaz, Omer Khan University of Connecticut Power Efficiency Performance/Watt Multicores enable efficiency Power-performance
More informationThe Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith
The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith Review Introduction Optimizing the OS based on hardware Processor changes Shared Memory vs
More informationDerivation of an Analytical Model for Evaluating the Performance of a Multi- Queue Nodes Network Router
Derivation of an Analytical Model for Evaluating the Perforance of a Multi- Queue Nodes Network Router 1 Hussein Al-Bahadili, 1 Jafar Ababneh, and 2 Fadi Thabtah 1 Coputer Inforation Systes Faculty of
More informationNear Light Correction for Image Relighting and 3D Shape Recovery
Near Light Correction for Iage Relighting and 3D Shae Recovery Anonyous for Review Abstract In this aer, we roose a near-light illuination odel for iage relighting and 3D shae recovery Classic ethods such
More informationAutomated Installation Verification of COMSOL via LiveLink for MATLAB
Autoated Installation Verification of COMSOL via LiveLink for MATLAB Michael W. Crowell Oak Ridge National Laboratory, PO Bo 2008 MS6423, Oak Ridge, TN 37831 crowellw@ornl.gov Abstract: Verifying that
More informationA Novel Fast Constructive Algorithm for Neural Classifier
A Novel Fast Constructive Algorith for Neural Classifier Xudong Jiang Centre for Signal Processing, School of Electrical and Electronic Engineering Nanyang Technological University Nanyang Avenue, Singapore
More informationStructural Balance in Networks. An Optimizational Approach. Andrej Mrvar. Faculty of Social Sciences. University of Ljubljana. Kardeljeva pl.
Structural Balance in Networks An Optiizational Approach Andrej Mrvar Faculty of Social Sciences University of Ljubljana Kardeljeva pl. 5 61109 Ljubljana March 23 1994 Contents 1 Balanced and clusterable
More informationA Low-Cost Multi-Failure Resilient Replication Scheme for High Data Availability in Cloud Storage
216 IEEE 23rd International Conference on High Perforance Coputing A Low-Cost Multi-Failure Resilient Replication Schee for High Data Availability in Cloud Storage Jinwei Liu* and Haiying Shen *Departent
More informationA Case for Fine-Grain Adaptive Cache Coherence George Kurian, Omer Khan, and Srinivas Devadas
Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2012-012 May 22, 2012 A Case for Fine-Grain Adaptive Cache Coherence George Kurian, Omer Khan, and Srinivas Devadas
More informationA Generic Architecture for Programmable Trac. Shaper for High Speed Networks. Krishnan K. Kailas y Ashok K. Agrawala z. fkrish,
A Generic Architecture for Prograable Trac Shaper for High Speed Networks Krishnan K. Kailas y Ashok K. Agrawala z fkrish, agrawalag@cs.ud.edu y Departent of Electrical Engineering z Departent of Coputer
More informationRedundancy Level Impact of the Mean Time to Failure on Wireless Sensor Network
(IJACSA) International Journal of Advanced Coputer Science and Applications Vol. 8, No. 1, 217 Redundancy Level Ipact of the Mean Tie to Failure on Wireless Sensor Network Alaa E. S. Ahed 1 College of
More informationJoint Measurement- and Traffic Descriptor-based Admission Control at Real-Time Traffic Aggregation Points
Joint Measureent- and Traffic Descriptor-based Adission Control at Real-Tie Traffic Aggregation Points Stylianos Georgoulas, Panos Triintzios and George Pavlou Centre for Counication Systes Research, University
More informationControl Message Reduction Techniques in Backward Learning Ad Hoc Routing Protocols
Control Message Reduction Techniques in Backward Learning Ad Hoc Routing Protocols Navodaya Garepalli Kartik Gopalan Ping Yang Coputer Science, Binghaton University (State University of New York) Contact:
More informationA Network-based Seamless Handover Scheme for Multi-homed Devices
A Network-based Sealess Handover Schee for Multi-hoed Devices Md. Shohrab Hossain and Mohaed Atiquzzaan School of Coputer Science, University of Oklahoa, Noran, OK 7319 Eail: {shohrab, atiq}@ou.edu Abstract
More information1 P a g e. F x,x...,x,.,.' written as F D, is the same.
11. The security syste at an IT office is coposed of 10 coputers of which exactly four are working. To check whether the syste is functional, the officials inspect four of the coputers picked at rando
More informationData-driven Hybrid Caching in Hierarchical Edge Cache Networks
Data-driven Hybrid Caching in Hierarchical Edge Cache Networks Abstract Hierarchical cache networks are increasingly deployed to facilitate high-throughput and low-latency content delivery to end users.
More informationEfficient file search in non-dht P2P networks
Available online at www.sciencedirect.co Coputer Counications 3 (28) 34 37 www.elsevier.co/locate/coco Efficient file search in non-dht P2P networks Shiping Chen a, Zhan Zhang b, *, Shigang Chen b, Baile
More informationMAC schemes - Fixed-assignment schemes
MAC schees - Fixed-assignent schees M. Veeraraghavan, April 6, 04 Mediu Access Control (MAC) schees are echaniss for sharing a single link. MAC schees are essentially ultiplexing schees. For exaple, on
More informationAnalysis of a Biologically-Inspired System for Real-time Object Recognition
Cognitive Science Online, Vol.3.,.-4, 5 htt://cogsci-online.ucsd.edu Analysis of a Biologically-Insired Syste for Real-tie Object Recognition Erik Murhy-Chutorian,*, Sarah Aboutalib & Jochen Triesch,3
More information43. Log-structured File Systems
43. Log-structured File Systems Oerating System: Three Easy Pieces AOS@UC 1 LFS: Log-structured File System Proosed by Stanford back in 91 Motivated by: w DRAM Memory sizes where growing. w Large ga between
More informationMapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues
Mapping Data in Peer-to-Peer Systes: Seantics and Algorithic Issues Anastasios Keentsietsidis Marcelo Arenas Renée J. Miller Departent of Coputer Science University of Toronto {tasos,arenas,iller}@cs.toronto.edu
More informationFoundations of Computer Systems
18-600 Foundations of Computer Systems Lecture 21: Multicore Cache Coherence John P. Shen & Zhiyi Yu November 14, 2016 Prevalence of multicore processors: 2006: 75% for desktops, 85% for servers 2007:
More informationQUERY ROUTING OPTIMIZATION IN SENSOR COMMUNICATION NETWORKS
QUERY ROUTING OPTIMIZATION IN SENSOR COMMUNICATION NETWORKS Guofei Jiang and George Cybenko Institute for Security Technology Studies and Thayer School of Engineering Dartouth College, Hanover NH 03755
More informationFlynn s Classification
Flynn s Classification SISD (Single Instruction Single Data) Uniprocessors MISD (Multiple Instruction Single Data) No machine is built yet for this type SIMD (Single Instruction Multiple Data) Examples:
More informationOn the Accuracy of MANET Simulators
On the ccuracy of MNT Siulators avid avin david.cavin@epfl.ch Yoav Sasson yoav.sasson@epfl.ch istributed Systes Laboratory cole Polytechnique Fédérale de Lausanne (PFL) H-115 Lausanne ndré Schiper andre.schiper@epfl.ch
More informationAn Ensemble of Adaptive Neuro-Fuzzy Kohonen Networks for Online Data Stream Fuzzy Clustering
An Enseble of Adative euro-fuzzy Kohonen etworks for Online Data Strea Fuzzy Clustering Zhengbing Hu School of Educational Inforation Technology Central China oral University Wuhan China Eail: hzb@ail.ccnu.edu.cn
More informationAn Adaptive Low-latency Power Management Protocol for Wireless Sensor Networks
An Adaptive Low-latency Power Manageent Protocol for Wireless Sensor Networks Giuseppe Anastasi, Marco Conti*, Mario Di Francesco, Andrea Passarella* Pervasive Coputing & Networking Lab. (PerLab) Departent
More informationStoring and Accessing Live Mashup Content in the Cloud
Storing and Accessing Live ashup Content in the Cloud Krzysztof Ostrowski Cornell University Ithaca, NY 14853, USA krzys@cs.cornell.edu Ken Biran Cornell University Ithaca, NY 14853, USA ken@cs.cornell.edu
More information