Adaptive Techniques for Homebased Software DSMs
|
|
- Reynold Nash
- 5 years ago
- Views:
Transcription
1 Adaptive Techniques for Homebased Software DSMs Lauro Whately, Raquel Pinto, Muralidharan Rangarajan, Liviu Iftode, Ricardo Bianchini, Claudio L. Amorim COPPE / UFRJ Rutgers University
2 Contents Motivation Adaptive Protocols HAP Experimental Results Conclusions
3 Motivation Problems with home-based protocols Location of the home Sharing between two non-home processes The choice of the home node Coherence Solution Diffs creation Single writer pages Access faults Adapt location of home and coherence protocol according to behavior
4 Adaptive Protocols Can adapt between single and multiple writer Can adapt between invalidate and update coherence Can be very successful at reducing communication, coherence and memory overheads in traditional LRC
5 Adaptive Protocols Benefits to home-based protocol Can dynamically assign the home node according to the sharing pattern Can use update-based coherence for migratory data inside critical section and producer/consumers data
6 HAP Propose: Home-based Adaptive Protocol HAP = HLRC + ADSM-like adaptiveness
7 HAP Page Sharing Patterns and Actions Falsely-shared Multiple writers Twinning and diffing No home migration Migratory Single writer Try to avoid twinning and diffing Home migrates to next writer Producer/Consumer(s) Single-writer Avoid twinning and diffing Home moves to producer Update consumers
8 HAP Single Writer Protected by locks Migratory pages associated with a lock variable Home migrates on acquire operations Protected by barriers Producer/Consumer(s) Consumers vector Sends updates with write-notices Migratory Migrates the home to the first requester/single writer
9 HAP Pattern Detection Detection done by home node and at barrier point Home node receives modifications and lock id Nodes at barrier receive write-notices Single-writer page goes back to multiple-writer if a modification is received
10 HAP Page States and Transitions
11 Experimental Results Environment : 8-nodes cluster 650 MHz Pentium III 512 MB RAM 256 KB Cache L2 Linux VIA Giganet : one-way latency = 8.2 s bandwidth = 10 MB/s Workload Appls. IS SOR FFT Problem Size 2 16 keys, 300 iter. 256x5120, 100 iter elements Synchronization Locks, barriers Barriers Barriers
12 Experimental Results IS Execution time Breakdown
13 Experimental Results IS Execution Statistics (average over all nodes) HLRC H_MI HAP Messages (k) Data (kb) Access Faults Page Requests Diffs
14 Experimental Results SOR Execution time breakdown
15 Experimental Results SOR Execution Statistics (average over all nodes) HLRC H_PC HAP Messages (k) Data (kb) Access Faults (k) Page Requests Diffs
16 Experimental Results FFT Execution time breakdown
17 Experimental Results FFT Execution Statistics (average over all nodes) HLRC H_PC H_MO HAP Messages (k) Data (kb) Access Faults (k) Page Requests (k) Diffs
18 Experimental Results Discussion Sucessful at improving performance for IS (19%) Potential for improvements with other patterns: SOR communication overhead decreased by 83% number of pages requested reduced by 86% Overhead : Detection of MIGo pages Unnecessary PC updates
19 Conclusion HAP Detects migratory, producer/consumer and multiplewriter pages Adaptation : Dynamic adaptation between multiple and single-writer coherence protocols Dynamic adaptation between invalidation and update-based coherence Home migration of single-writer pages to writing node Preliminary implementation of HAP performs well for certain applications, but requires modification for others
Adaptive Prefetching Technique for Shared Virtual Memory
Adaptive Prefetching Technique for Shared Virtual Memory Sang-Kwon Lee Hee-Chul Yun Joonwon Lee Seungryoul Maeng Computer Architecture Laboratory Korea Advanced Institute of Science and Technology 373-1
More informationDistributed Shared Memory. Presented by Humayun Arafat
Distributed Shared Memory Presented by Humayun Arafat 1 Background Shared Memory, Distributed memory systems Distributed shared memory Design Implementation TreadMarks Comparison TreadMarks with Princeton
More informationOUT of the two main parallel programming models in
1180 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2003 Comparing Latency-Tolerance Techniques for Software DSM Systems Raquel Pinto, Ricardo Bianchini, Member, IEEE
More informationAn Efficient Lock Protocol for Home-based Lazy Release Consistency
An Efficient Lock Protocol for Home-based Lazy ease Consistency Hee-Chul Yun Sang-Kwon Lee Joonwon Lee Seungryoul Maeng Computer Architecture Laboratory Korea Advanced Institute of Science and Technology
More informationDesigning High Performance DSM Systems using InfiniBand Features
Designing High Performance DSM Systems using InfiniBand Features Ranjit Noronha and Dhabaleswar K. Panda The Ohio State University NBC Outline Introduction Motivation Design and Implementation Results
More informationOpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa
OpenMP on the FDSM software distributed shared memory Hiroya Matsuba Yutaka Ishikawa 1 2 Software DSM OpenMP programs usually run on the shared memory computers OpenMP programs work on the distributed
More informationLazy Home-Based Protocol: Combining Homeless and Home-Based Distributed Shared Memory Protocols
Lazy Home-Based Protocol: Combining Homeless and Home-Based Distributed Shared Memory Protocols Byung-Hyun Yu, Paul Werstein, Martin Purvis, and Stephen Cranefield University of Otago, Dunedin 9001, New
More informationShared Virtual Memory with Automatic Update Support
Shared Virtual Memory with Automatic Update Support Liviu Iftode 1, Matthias Blumrich 2, Cezary Dubnicki 3, David L. Oppenheimer 4, Jaswinder Pal Singh 5 and Kai Li 5 1 Rutgers University, Department of
More informationProducer-Push a Technique to Tolerate Latency in Software Distributed Shared Memory Systems
Producer-Push a Technique to Tolerate Latency in Software Distributed Shared Memory Systems Sven Karlsson and Mats Brorsson Computer Systems Group, Department of Information Technology, Lund University
More informationSDSM Progression. Implementing Shared Memory on Distributed Systems. Software Distributed Shared Memory. Why a Distributed Shared Memory (DSM) System?
SDSM Progression Implementing Shared Memory on Distributed Systems Sandhya Dwarkadas University of Rochester TreadMarks shared memory for networks of workstations Cashmere-2L - 2-level shared memory system
More informationA Migrating-Home Protocol for Implementing Scope Consistency Model on a Cluster of Workstations
A Migrating-Home Protocol for Implementing Scope Consistency Model on a Cluster of Workstations Benny Wang-Leung Cheung, Cho-Li Wang and Kai Hwang Department of Computer Science and Information Systems
More informationPerformance Evaluation of Two Home-Based Lazy Release Consistency Protocols for Shared Virtual Memory Systems
The following paper was originally published in the Proceedings of the USENIX 2nd Symposium on Operating Systems Design and Implementation Seattle, Washington, October 1996 Performance Evaluation of Two
More informationMultiple-Writer Distributed Memory
Multiple-Writer Distributed Memory The Sequential Consistency Memory Model sequential processors issue memory ops in program order P1 P2 P3 Easily implemented with shared bus. switch randomly set after
More informationDistributed Object Sharing for Cluster-based Java Virtual Machine
Distributed Object Sharing for Cluster-based Java Virtual Machine Fang Weijian A thesis submitted in partial fulfillment of the requirement for the degree of Doctor of Philosophy at the University of Hong
More informationOptimizing Home-Based Software DSM Protocols
Cluster Computing 4, 235 242, 2001 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Optimizing Home-Based Software DSM Protocols WEIWU HU, WEISONG SHI and ZHIMIN TANG Institute of Computing
More informationShared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network
Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache
More informationLecture 12: Hardware/Software Trade-Offs. Topics: COMA, Software Virtual Memory
Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory 1 Capacity Limitations P P P P B1 C C B1 C C Mem Coherence Monitor Mem Coherence Monitor B2 In a Sequent NUMA-Q design above,
More informationShared Virtual Memory. Programming Models
Shared Virtual Memory Arvind Krishnamurthy Fall 2004 Programming Models Shared memory model Collection of threads Sharing the same address space Reads/writes on shared address space visible to all other
More informationKernel Level Speculative DSM
Motivation Main interest is performance, fault-tolerance, and correctness of distributed systems Present our ideas in the context of a DSM system We are developing tools that Improve performance Address
More informationDistributed Memory and Cache Consistency. (some slides courtesy of Alvin Lebeck)
Distributed Memory and Cache Consistency (some slides courtesy of Alvin Lebeck) Software DSM 101 Software-based distributed shared memory (DSM) provides anillusionofsharedmemoryonacluster. remote-fork
More informationHomeless and Home-based Lazy Release Consistency Protocols on Distributed Shared Memory
Homeless and Home-based Lazy Release Consistency Protocols on Distributed Shared Memory Byung-Hyun Yu and Zhiyi Huang Department of Computer Science University of Otago, New Zealand Email: byu,hzy@cs.otago.ac.nz
More informationOn the Design of Global Object Space for Efficient Multi-threading Java Computing on Clusters 1
On the Design of Global Object Space for Efficient Multi-threading Java Computing on Clusters 1 Weijian Fang, Cho-Li Wang, Francis C.M. Lau System Research Group Department of Computer Science and Information
More informationDistributed Shared Memory: Concepts and Systems
Distributed Shared Memory: Concepts and Systems Jelica Protić, Milo Toma sević and Veljko Milutinović IEEE Parallel & Distributed Technology, Summer 1996 Context: distributed memory management high-performance
More informationCoherence-Centric Logging and Recovery for Home-Based Software Distributed Shared Memory
Coherence-Centric Logging and Recovery for Home-Based Software Distributed Shared Memory Angkul Kongmunvattana and Nian-Feng Tzeng Center for Advanced Computer Studies University of Southwestern Louisiana
More informationCan High Performance Software DSM Systems Designed With InfiniBand Features Benefit from PCI-Express?
Can High Performance Software DSM Systems Designed With InfiniBand Features Benefit from PCI-Express? Ranjit Noronha and Dhabaleswar K. Panda Dept. of Computer Science and Engineering The Ohio State University
More informationRICE UNIVERSITY. The Eect of Contention on the Scalability of. Page-Based Software Shared Memory Systems. Eyal de Lara. A Thesis Submitted
RICE UNIVERSITY The Eect of Contention on the Scalability of Page-Based Software Shared Memory Systems by Eyal de Lara A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree Master
More informationDistributed Systems. Distributed Shared Memory. Paul Krzyzanowski
Distributed Systems Distributed Shared Memory Paul Krzyzanowski pxk@cs.rutgers.edu Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
More informationSelection-based Weak Sequential Consistency Models for. for Distributed Shared Memory.
Selection-based Weak Sequential Consistency Models for Distributed Shared Memory Z. Huang, C. Sun, and M. Purvis Departments of Computer & Information Science University of Otago, Dunedin, New Zealand
More informationDelphi: Prediction-Based Page Prefetching to Improve the Performance of Shared Virtual Memory Systems
Delphi: Prediction-Based Page Prefetching to Improve the Performance of Shared Virtual Memory Systems Evan Speight and Martin Burtscher School of Electrical and Computer Engineering Computer Systems Laboratory,
More informationMemory Consistency and Multiprocessor Performance. Adapted from UCB CS252 S01, Copyright 2001 USB
Memory Consistency and Multiprocessor Performance Adapted from UCB CS252 S01, Copyright 2001 USB 1 Memory Consistency Model Define memory correctness for parallel execution Execution appears to the that
More informationORION: An Adaptive Home-based Software Distributed Shared Memory System
ORION: An Adaptive Home-based Software Distributed Shared Memory System M.C. Ng School of Computing National University of Singapore Lower Kent Ridge Road Singapore 119620 ngmingch@comp.nus.edu.sg W.F.
More informationAdaptive Migratory Scheme for Distributed Shared Memory 1. Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University
Adaptive Migratory Scheme for Distributed Shared Memory 1 Jai-Hoon Kim Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 E-mail: fjhkim,vaidyag@cs.tamu.edu
More informationORION: An Adaptive Home-based Software DSM
ORION: An Adaptive Home-based Software DSM M.C. Ng School of Computing National University of Singapore Lower Kent Ridge Road Singapore 119620 ngmingch@comp.nus.edu.sg W.F. Wong School of Computing National
More informationMigratory TCP (MTCP) Transport Layer Support for Highly Available Network Services
Migratory TCP (MTCP) Transport Layer Support for Highly Available Network Services DisCo Lab Division of Computer and Information Sciences Rutgers University Nov. 29, 2001 CONS Light seminar 1 The People
More informationCS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5)
CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationSupporting Distributed Shared Memory. Axel Jantsch Xiaowen Chen, Zhonghai Lu Royal Institute of Technology, Sweden September 16, 2009
Supporting Distributed Shared Memory Axel Jantsch Xiaowen Chen, Zhonghai Lu Royal Institute of Technology, Sweden September 16, 2009 Memory content in today s SoCs 3 Elements in SoC Processing: Well understood;
More informationUsing Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems
Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems Angelos Bilas Dept. of Elec. and Comp. Eng. 10 King s College Road University of Toronto Toronto,
More informationMemory Consistency and Multiprocessor Performance
Memory Consistency Model Memory Consistency and Multiprocessor Performance Define memory correctness for parallel execution Execution appears to the that of some correct execution of some theoretical parallel
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationA Comparison of Entry Consistency and Lazy Release Consistency Implementations
A Comparison of Entry Consistency and Lazy Release Consistency Implementations Sarita V. Adve, Alan L. Cox, Sandhya Dwarkadas, Ramakrishnan Rajamony, Willy Zwaenepoel Departments of Computer Science and
More informationDISTRIBUTED SYSTEMS [COMP9243] Lecture 3b: Distributed Shared Memory DISTRIBUTED SHARED MEMORY (DSM) DSM consists of two components:
SHARED ADDRESS SPACE DSM consists of two components: DISTRIBUTED SYSTEMS [COMP9243] ➀ Shared address space ➁ Replication and consistency of memory objects Lecture 3b: Distributed Shared Memory Shared address
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationTwo Techniques for Improvement the Speedup ofnautilus DSM
Computación y Sistemas Vol.4 No.2 pp. 166-177 @2000, CIC-IPN. ISSN 1405-5546 ImpresoenMéxico Two Techniques for Improvement the Speedup ofnautilus DSM Mario D. Marino Computing Engineering Departament
More informationThe Effect of Network Total Order, Broadcast, and Remote-Write Capability on Network-Based Shared Memory Computing
The Effect of Network Total Order, Broadcast, and Remote-Write Capability on Network-Based Shared Memory Computing Robert Stets, Sandhya Dwarkadas, Leonidas Kontothanassis, Umit Rencuzogullari, and Michael
More information"is.n21.jiajia" "is.n21.nautilus" "is.n22.jiajia" "is.n22.nautilus"
A Preliminary Comparison Between Two Scope Consistency DSM Systems: JIAJIA and Nautilus Mario Donato Marino, Geraldo Lino de Campos Λ Computer Engineering Department- Polytechnic School of University of
More informationMassive Data Processing on the Acxiom Cluster Testbed
Clemson University TigerPrints Presentations School of Computing 8-2001 Massive Data Processing on the Acxiom Cluster Testbed Amy Apon Clemson University, aapon@clemson.edu Pawel Wolinski University of
More informationLeveraging Smart Phones to Reduce Mobility Footprints
Leveraging Smart Phones to Reduce Mobility Footprints Stephen Smaldone Liviu Iftode Benjamin Gilbert Mahadev Satyanarayanan Nilton Bila Eyal de Lara Rutgers University Carnegie Mellon University University
More informationComparative Evaluation of Fine- and Coarse-Grain Approaches for Software Distributed Shared Memory
Comparative Evaluation of Fine- and Coarse-Grain Approaches for Software Distributed Shared Memory Sandhya Dwarkadas,Kourosh Gharachorloo y, Leonidas Kontothanassis z, Daniel J. Scales y, Michael L. Scott,
More informationThe Effects of Communication Parameters on End Performance of Shared Virtual Memory Clusters
The Effects of Communication Parameters on End Performance of Shared Virtual Memory Clusters Angelos Bilas and Jaswinder Pal Singh Department of Computer Science Olden Street Princeton University Princeton,
More informationModule 10: "Design of Shared Memory Multiprocessors" Lecture 20: "Performance of Coherence Protocols" MOESI protocol.
MOESI protocol Dragon protocol State transition Dragon example Design issues General issues Evaluating protocols Protocol optimizations Cache size Cache line size Impact on bus traffic Large cache line
More informationLecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations
Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations 1 Split Transaction Bus So far, we have assumed that a coherence operation (request, snoops, responses,
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system
More informationThe Cache-Coherence Problem
The -Coherence Problem Lecture 12 (Chapter 6) 1 Outline Bus-based multiprocessors The cache-coherence problem Peterson s algorithm Coherence vs. consistency Shared vs. Distributed Memory What is the difference
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
More informationPortland State University ECE 588/688. Directory-Based Cache Coherence Protocols
Portland State University ECE 588/688 Directory-Based Cache Coherence Protocols Copyright by Alaa Alameldeen and Haitham Akkary 2018 Why Directory Protocols? Snooping-based protocols may not scale All
More informationAccelerating Multi-core Processor Design Space Evaluation Using Automatic Multi-threaded Workload Synthesis
Accelerating Multi-core Processor Design Space Evaluation Using Automatic Multi-threaded Workload Synthesis Clay Hughes & Tao Li Department of Electrical and Computer Engineering University of Florida
More informationA Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs. Marco Bekooij & Frank Ophelders
A Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs Marco Bekooij & Frank Ophelders Outline Context What is cache coherence Addressed challenge Short overview of related work Related
More informationComparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster G. Jost*, H. Jin*, D. an Mey**,F. Hatay*** *NASA Ames Research Center **Center for Computing and Communication, University of
More informationRun-Time Support for Distributed Sharing in Typed Languages
Under revision for ACM Trans. on Computer Systems. An earlier version appears in Proceedings of LCR2000: the 5th Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers. Run-Time
More informationAccelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh
Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary
More informationMultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines. MultiJav: Introduction
: A Distributed Shared Memory System Based on Multiple Java Virtual Machines X. Chen and V.H. Allan Computer Science Department, Utah State University 1998 : Introduction Built on concurrency supported
More informationExploring Hardware Support For Scaling Irregular Applications on Multi-node Multi-core Architectures
Exploring Hardware Support For Scaling Irregular Applications on Multi-node Multi-core Architectures MARCO CERIANI SIMONE SECCHI ANTONINO TUMEO ORESTE VILLA GIANLUCA PALERMO Politecnico di Milano - DEI,
More informationpnfs, POSIX, and MPI-IO: A Tale of Three Semantics
Dean Hildebrand Research Staff Member PDSW 2009 pnfs, POSIX, and MPI-IO: A Tale of Three Semantics Dean Hildebrand, Roger Haskin Arifa Nisar IBM Almaden Northwestern University Agenda Motivation pnfs HPC
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung ACM SIGOPS 2003 {Google Research} Vaibhav Bajpai NDS Seminar 2011 Looking Back time Classics Sun NFS (1985) CMU Andrew FS (1988) Fault
More informationEvaluation and Improvements of Programming Models for the Intel SCC Many-core Processor
Evaluation and Improvements of Programming Models for the Intel SCC Many-core Processor Carsten Clauss, Stefan Lankes, Pablo Reble, Thomas Bemmerl International Workshop on New Algorithms and Programming
More informationOpenMP for Networks of SMPs
OpenMP for Networks of SMPs Y. Charlie Hu y, Honghui Lu z, Alan L. Cox y and Willy Zwaenepoel y y Department of Computer Science z Department of Electrical and Computer Engineering Rice University, Houston,
More informationLog-structured files for fast checkpointing
Log-structured files for fast checkpointing Milo Polte Jiri Simsa, Wittawat Tantisiriroj,Shobhit Dayal, Mikhail Chainani, Dilip Kumar Uppugandla, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University
More informationA Method to Reduce the Acknowledgement Overhead of S-DSM Systems
Technical Report UEC-IS-2005-1, Version 2005-04-01 Graduate School of Information Systems, The University of Electro-Communications A Method to Reduce the Acknowledgement Overhead of S-DSM Systems Kenji
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationSEESAW: Set Enhanced Superpage Aware caching
SEESAW: Set Enhanced Superpage Aware caching http://synergy.ece.gatech.edu/ Set Associativity Mayank Parasar, Abhishek Bhattacharjee Ω, Tushar Krishna School of Electrical and Computer Engineering Georgia
More informationCache Coherence (II) Instructor: Josep Torrellas CS533. Copyright Josep Torrellas
Cache Coherence (II) Instructor: Josep Torrellas CS533 Copyright Josep Torrellas 2003 1 Sparse Directories Since total # of cache blocks in machine is much less than total # of memory blocks, most directory
More informationApplication Scaling under Shared Virtual Memory on a Cluster of SMPs
Application Scaling under Shared Virtual Memory on a Cluster of SMPs Dongming Jiang, Brian O Kelley, Xiang Yu, Sanjeev Kumar, Angelos Bilas, and Jaswinder Pal Singh Department of Computer Science Princeton
More informationGoogle File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo
Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance
More informationChapter Seven. Idea: create powerful computers by connecting many smaller ones
Chapter Seven Multiprocessors Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) vector processing may be coming back bad news:
More informationAdaptive Cluster Computing using JavaSpaces
Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationEvaluation of the JIAJIA Software DSM System on High Performance Computer Architectures y
Evaluation of the JIAJIA Software DSM System on High Performance Computer Architectures y M. Rasit Eskicioglu and T. Anthony Marsland University of Alberta Computing Science Department Edmonton, AB, Canada
More informationImplementation and Performance. Abstract. 1 Introduction. Zwaenepoel. John B. Carter, John K. Bennett, and Winy. Computer Systems Laboratory
Implementation and Performance of Munin John B. Carter, John K. Bennett, and Winy Zwaenepoel Computer Systems Laboratory Rice University Houston, Texas Abstract Munin is a distributed shared memory (DSM)
More informationLecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections
Lecture 18: Coherence and Synchronization Topics: directory-based coherence protocols, synchronization primitives (Sections 5.1-5.5) 1 Cache Coherence Protocols Directory-based: A single location (directory)
More informationLecture 5: Directory Protocols. Topics: directory-based cache coherence implementations
Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations 1 Flat Memory-Based Directories Block size = 128 B Memory in each node = 1 GB Cache in each node = 1 MB For 64 nodes
More informationGoogle File System. Arun Sundaram Operating Systems
Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)
More informationVM-Based Shared Memory on Low-Latency, Remote-Memory-Access Networks
VM-Based Shared Memory on Low-Latency, Remote-Memory-Access Networks Leonidas Kontothanassis y, Galen Hunt, Robert Stets, Nikolaos Hardavellas, Michał Cierniak, Srinivasan Parthasarathy, Wagner Meira,
More informationPerformance Evaluation of Fast Ethernet, Giganet and Myrinet on a Cluster
Performance Evaluation of Fast Ethernet, Giganet and Myrinet on a Cluster Marcelo Lobosco, Vítor Santos Costa, and Claudio L. de Amorim Programa de Engenharia de Sistemas e Computação, COPPE, UFRJ Centro
More informationbetween Single Writer and Multiple Writer 1 Introduction This paper focuses on protocols for implementing
Software DSM Protocols that Adapt between Single Writer and Multiple Writer Cristiana Amza y, Alan L. Cox y,sandhya Dwarkadas z, and Willy Zwaenepoel y y Department of Computer Science Rice University
More informationIntro to Multiprocessors
The Big Picture: Where are We Now? Intro to Multiprocessors Output Output Datapath Input Input Datapath [dapted from Computer Organization and Design, Patterson & Hennessy, 2005] Multiprocessor multiple
More informationWeaving Relations for Cache Performance
VLDB 2001, Rome, Italy Best Paper Award Weaving Relations for Cache Performance Anastassia Ailamaki David J. DeWitt Mark D. Hill Marios Skounakis Presented by: Ippokratis Pandis Bottleneck in DBMSs Processor
More informationPerformance Evaluation of View-Oriented Parallel Programming
Performance Evaluation of View-Oriented Parallel Programming Z. Huang, M. Purvis, P. Werstein Department of Computer Science Department of Information Science University of Otago, Dunedin, New Zealand
More informationLightweight Logging for Lazy Release Consistent Distributed Shared Memory
Lightweight Logging for Lazy Release Consistent Distributed Shared Memory Manuel Costa, Paulo Guedes, Manuel Sequeira, Nuno Neves, Miguel Castro IST - INESC R. Alves Redol 9, 1000 Lisboa PORTUGAL {msc,
More informationCache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues
More informationScientific Applications. Chao Sun
Large Scale Multiprocessors And Scientific Applications Zhou Li Chao Sun Contents Introduction Interprocessor Communication: The Critical Performance Issue Characteristics of Scientific Applications Synchronization:
More informationEfficient User-Level Thread Migration and Checkpointing on Windows NT Clusters
Efficient User-Level Thread Migration and Checkpointing on Windows NT Clusters Hazim Abdel-Shafi, Evan Speight, and John K. Bennett Department of Electrical and Computer Engineering Rice University Houston,
More informationOut-Of-Core Sort-First Parallel Rendering for Cluster-Based Tiled Displays
Out-Of-Core Sort-First Parallel Rendering for Cluster-Based Tiled Displays Wagner T. Corrêa James T. Klosowski Cláudio T. Silva Princeton/AT&T IBM OHSU/AT&T EG PGV, Germany September 10, 2002 Goals Render
More informationApplication Layer. Protocol/Programming Model Layer. Communication Layer. Communication Library. Network
Limits to the Performance of Software Shared Memory: A Layered Approach Jaswinder Pal Singh, Angelos Bilas, Dongming Jiang and Yuanyuan Zhou Department of Computer Science Princeton University Princeton,
More informationMultiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism
Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,
More informationContents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11
Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed
More informationOptimizing Replication, Communication, and Capacity Allocation in CMPs
Optimizing Replication, Communication, and Capacity Allocation in CMPs Zeshan Chishti, Michael D Powell, and T. N. Vijaykumar School of ECE Purdue University Motivation CMP becoming increasingly important
More informationShared Memory Parallel Programming. Shared Memory Systems Introduction to OpenMP
Shared Memory Parallel Programming Shared Memory Systems Introduction to OpenMP Parallel Architectures Distributed Memory Machine (DMP) Shared Memory Machine (SMP) DMP Multicomputer Architecture SMP Multiprocessor
More informationOn the Design and Implementation of a Portable DSM System for Low-Cost Multicomputers
On the Design and Implementation of a Portable DSM System for Low-Cost Multicomputers Federico Meza, Alvaro E. Campos, and Cristian Ruz Departamento de Ciencia de la Computación Pontificia Universidad
More information