2. Futile Stall HTM HTM HTM. Transactional Memory: TM [1] TM. HTM exponential backoff. magic waiting HTM. futile stall. Hardware Transactional Memory:
|
|
- Pamela Hudson
- 6 years ago
- Views:
Transcription
1 ,a) 1 HTM 2 2 LogTM 72.2% 28.4% 1. Transactional Memory: TM [1] TM Hardware Transactional Memory: 1 Nagoya Institute of Technology, Nagoya, Aichi, , Japan a) tsumura@nitech.ac.jp HTM HTM HTM HTM exponential backoff magic waiting HTM 2. Futile Stall futile stall 2.1 Futile Stall c 2013 Information Processing Society of Japan 1
2 t1 t2 t3 t4 t5 t6 t7 time Core1 thr.1 stalled stalled 1 Core2 thr.2 stalled stalled Futile Stall Core3 thr.3 futile stall 1 3 thr.1 3 futile stall futile stall thr.1 thr.2 thr.2 t1 thr.2 thr.1 thr.1 t2thr.1 thr.2 thr.2 thr.1 thr.2 futile stall thr.2 thr.3 t3 t4 thr.2 futile stall thr.3 t4 thr.2 t5 thr.1 thr.3 thr.1 t6 thr.3 t7 futile stall thr.3 futile stall 2.2 Futile Stall futile stall 1 thr.3 futile stall 2 A-tx L-inst F-I A-tx F-II L-inst OS L-inst c 2013 Information Processing Society of Japan 2
3 F-II L-inst S-inst F-II S-inst OS F-II HTM ID ID R-flags Stx-flags Ltx-flags ID Counter A-Counter Recurrence flags R-flags ID A-Counter A-tx ID Instruction Counter I-Counter Short Tx flags Stx-flags ID t1 t2 t3 t4 time Core1 thr.1 stalled stalled 2 Core2 thr.2 stalled stalled Core3 thr.3 A-Counter 1 A-Counter 2 R-flag[X] 1 I-Counter counts the number of executed instructions Stx-flag[X] 1 I-Counter L-inst ID Long Tx flags Ltx-flags ID I-Counter L-inst ID Stx-flags Ltx-flags I-Counter S-inst ID H/W 10 H/W A-Counter 2 bit I-Counter 9 bit R-flags Stx-flags Ltx-flags 16 bit 2 thr.3 A-tx 2 c 2013 Information Processing Society of Japan 3
4 thr.3 thr.2 t11 A-Counter 1 thr.3 thr.1 t2a-counter 2 A-tx R-flag ID R-flag[X] thr.3 R-flag[X] I-Counter I-Counter L-inst ID X Stx-flag[X] t4 I-Counter L-inst Ltx-flag[X] Stx-flag[X] Sequential flag S-flag ID of Opponent Thread O-id H/W 32 S-flag 1 bit O-id 5 bit H/W 260B futile stall H/W 3 3 thr.1 3 t1 t2 t3 t4 t5 t6 t7 t8 t9 time S-flag == 0 S-flag 1 S-flag 0 O-id 3 O-id 0 Core1 thr.1 start Magic Waiting 3 ACK start Core2 thr.2 start start start S-flag 1 ACK Core3 thr.3 S-flag 0 O-id 1 start O-id 0 Magic Waiting S-flag == 0 S-flag 1 S-flag 0 3 thr.2 t1 thr.1 thr.3 thr.1 thr.3 S-flag t2 thr.1 thr.3 t1 S-flag thr.1 thr.3 ACK thr.2 ACK thr.2 S-flag t3 thr.1 ACK thr.2 S-flag thr.1 t4thr.2 thr.1 1 O-id thr.2 thr.2 thr.2 S-flag thr.1 thr.2 thr.1 S-flag t5 thr.2 thr.1 thr.3 c 2013 Information Processing Society of Japan 4
5 Processor 1 number of cores fuency issue width issue order non-memory IPC 1 D1 cache ways latency D2 cache ways latency Memory latency Interconnect network latency SPARC V9 32 cores 1 GHz single-issue in-order 32 KBytes 4 ways 3 cycles 8 MBytes 8 ways 20 cycles 4 GBytes 450 cycles 14 cycles t6 thr.2 t7thr.2 O-id thr.1 thr.2 thr.1 t7 thr.2 thr.2 O-id thr.1 thr.3 t8 t HTM LogTM[2] starving writer? 3.1 HTM Simics [3] GEMS [4] Simics GEMS 32 SPARC V9 OS Solaris10 1 GEMS microbench SPLASH-2 [5] STAMP [6] 12 2 GEMS SPLASH-2 STAMP all (S) 8.5% 10.3% 1.7% 7.5% 17.3% 18.7% 1.9% 18.7% (F) 31.7% 26.8% 0.9% 19.8% 72.7% 71.5% 1.8% 71.5% (H) 36.6% 34.0% 2.1% 28.4% % 70.4% 3.1% 72.2% (B) LogTM baseline (S 3 ) Starving writer (F 4 ) Futile stall (H) Hybrid 2 (B) 1 Non trans Good trans Bad trans / ing Backoff Stall Barrier Magic Waiting exponential backoff magic waiting [7] 10 95% starving writer (S) (S) Slist (S) (B) Contention Genome Kmeans Vacation (B) 72.9% Kmeans 17.1% Genome magic waiting Kmeans 0.1% Btree Deque Prioque Barnes Radiosity c 2013 Information Processing Society of Japan 5
6 1.2 (B)traditional LogTM (baseline) (S)relieving starving writer (F)avoiding futile stall (H)Hybird model of (S) and (F) 1.01 MagicWaiting Ratio of cycles Barrier Stall Backoff ing Bad_trans Good_trans 0.00 Non_trans GEMS / 31threads SPLASH-2 / 31threads STAMP / 16threads 4 Bad trans ing Stall Backoff Deque Prioque Magic Waiting writer magic waiting exponential backoff reader 3 Magic Waiting writer Btree starving writer starving writer Backoff Btree (S) 86.8% 1/4 Radiosity (S) 30 magic waiting (S) (B) futile stall (F) Btree Contention Slist Vacation L-inst Cholesky Radiosity Genome Kmeans (B) (F) Deque Prioque Raytrace (F) Deque Prioque Non trans 2 random random Non trans c 2013 Information Processing Society of Japan 6
7 Non trans random 2 (H) Barnes 2 Barnes (S) (S) Radiosity Genome (H) (S) (F) starving writer (H) 4. [8], [9],?? HTM Titos [10] Eager Lazy Eager Lazy Yoo [11] HTM adaptive transaction scheduling ATS ATS HTM 97% Akpinar [12] Eager HTM backoff 15% Geoffrey [13] Similarity bloom filter Similarity Similarity HTM Gaona [14] HTM 10% HTM HTM 5. HTM c 2013 Information Processing Society of Japan 7
8 2 2 LogTM GEMS microbench SPLASH-2 STAMP 3 2 LogTM 72.2% 28.4% 2 384B 260B 2 1 starving writer starving writer backoff 2 futile stall futile stall A-tx L-inst S-inst [1] Herlihy, M. and Moss, J. E. B.: Transactional Memory: Architectural Support for Lock-Free Data Structures, Proc. 20th Annual Int l Symp. on Computer Architecture, pp (1993). [2] Moore, K. E., Bobba, J., Moravan, M. J., Hill, M. D. and Wood, D. A.: LogTM: Log-based Transactional Memory, Proc. 12th Int l Symp. on High-Performance Computer Architecture, pp (2006). [3] Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hållberg, G., Högberg, J., Larsson, F., Moestedt, A. and Werner, B.: Simics: A Full System Simulation Platform, Computer, Vol. 35, No. 2, pp (2002). [4] Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D. and Wood., D. A.: Multifacet s General Execution-driven Multiprocessor Simulator (GEMS) Toolset, ACM SIGARCH Computer Architecture News, Vol. 33, No. 4, pp (2005). [5] Woo, S. C., Ohara, M., Torrie, E., Singh, J. P. and Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations, Proc. 22nd Annual Int l. Symp. on Computer Architecture (ISCA 95), pp (1995). [6] Minh, C. C., Chung, J., Kozyrakis, C. and Olukotun, K.: STAMP: Stanford Transactional Applications for Multi- Processing, Proc. IEEE Int l Symp. on Workload Characterization (IISWC 08) (2008). [7] Alameldeen, A. R. and Wood, D. A.: Variability in Architectural Simulations of Multi-Threaded Workloads, Proc. 9th Int l Symp. on High-Performance Computer Architecture (HPCA 03), pp (2003). [8] Moravan, M. J., Bobba, J., Moore, K. E., Yen, L., Hill, M. D., Liblit, B., Swift, M. M. and Wood, D. A.: Supporting Nested Transactional Memory in LogTM, Proc. 12th Int l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp (2006). [9] Waliullah, M. M. and Stenstrom, P.: Intermediate Checkpointing with Conflicting Access Prediction in Transactional Memory Systems, Proc. Int l Symp. on Parallel and Distributed Processing (IPDPS), pp (2008). [10] Titos, R., Negi, A., Acacio, M. E., García, J. M. and Stenstrom, P.: ZEBRA:A Data-Centric, Hybrid-Policy Hardware Transactional Memory Design, Proc. Int l Conf. on Supercomputing (ICS 11), pp (2011). [11] Yoo, R. M. and Lee, H.-H. S.: Adaptive Transaction Scheduling for Transactional Memory Systems, Proc. 20th Annual Symp. on Parallelism in Algorithms and Architectures (SPAA 08), pp (2008). [12] Akpinar, E., Tomić, S., Cristal, A., Unsal, O. and Valero, M.: A Comprehensive Study of Conflict Resolution Policies in Hardware Transactional Memory, Proc. 6th ACM SIGPLAN Workshop on Transactional Computing (TRANSACT 11) (2011). [13] Blake, G., G., R., Dreslinski and Mudge, T.: Bloom Filter Guided Transaction Scheduling, Proc. 17th International Conference on High-Performance Computer Architecture (HPCA ), pp (2011). [14] Gaona, E., Titos, R., Acacio, M. E. and Fernández, J.: Dynamic Serialization Improving Energy Consumption in Eager-Eager Hardware Transactional Memory Systems, Proc. Parallel, Distributed and Network-Based Processing th Euromicro International Conference (PDP 12), pp (2012). c 2013 Information Processing Society of Japan 8
ASelectiveLoggingMechanismforHardwareTransactionalMemorySystems
ASelectiveLoggingMechanismforHardwareTransactionalMemorySystems Marc Lupon Grigorios Magklis Antonio González mlupon@ac.upc.edu grigorios.magklis@intel.com antonio.gonzalez@intel.com Computer Architecture
More informationLogTM: Log-Based Transactional Memory
LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood 12th International Symposium on High Performance Computer Architecture () 26 Mulitfacet
More informationPERFORMANCE PATHOLOGIES IN HARDWARE TRANSACTIONAL MEMORY
... PERFORMANCE PATHOLOGIES IN HARDWARE TRANSACTIONAL MEMORY... TRANSACTIONAL MEMORY IS A PROMISING APPROACH TO EASE PARALLEL PROGRAMMING. HARDWARE TRANSACTIONAL MEMORY SYSTEM DESIGNS REFLECT CHOICES ALONG
More informationPartition-Based Hardware Transactional Memory for Many-Core Processors
Partition-Based Hardware Transactional Memory for Many-Core Processors Yi Liu 1, Xinwei Zhang 1, Yonghui Wang 1, Depei Qian 1, Yali Chen 2, and Jin Wu 2 1 Sino-German Joint Software Institute, Beihang
More informationMitigating the Mismatch between the Coherence Protocol and Conflict Detection in Hardware Transactional Memory
Mitigating the Mismatch between the Coherence Protocol and Conflict Detection in Hardware Transactional Memory Lihang Zhao, Lizhong Chen, and Jeffrey Draper Information Sciences Institute, Ming Hsieh Department
More informationInsights into the Fallback Path of Best-Effort Hardware Transactional Memory Systems
Insights into the Fallback Path of Best-Effort Hardware Transactional Memory Systems Ricardo Quislant, Eladio Gutierrez, Emilio L. Zapata, and Oscar Plata Department of Computer Architecture, University
More informationEazyHTM: Eager-Lazy Hardware Transactional Memory
EazyHTM: Eager-Lazy Hardware Transactional Memory Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, Mateo Valero Barcelona Supercomputing Center,
More informationTransactional Prefetching: Narrowing the Window of Contention in Hardware Transactional Memory
Transactional Prefetching: Narrowing the Window of Contention in Hardware Transactional Memory Adrià Armejach Anurag Negi Adrián Cristal Osman S. Unsal Per Stenstrom Barcelona Supercomputing Center Universitat
More informationLog-Based Transactional Memory
Log-Based Transactional Memory Kevin E. Moore University of Wisconsin-Madison Motivation Chip-multiprocessors/Multi-core/Many-core are here Intel has 1 projects in the works that contain four or more computing
More informationClassification and Elimination of Conflicts in Hardware Transactional Memory Systems
Classification and Elimination of in Hardware Transactional Memory Systems Mridha Mohammad Waliullah, Per Stenstrom To cite this version: Mridha Mohammad Waliullah, Per Stenstrom. Classification and Elimination
More informationThe Impact of Non-coherent Buffers on Lazy Hardware Transactional Memory Systems
The Impact of Non-coherent Buffers on Lazy Hardware Transactional Memory Systems Anurag Negi Rubén Titos-Gil Manuel E. Acacio José M. García Per Stenstrom Universidad de Murcia Chalmers University of Technology
More informationDOI: /jos Tel/Fax: by Journal of Software. All rights reserved. , )
ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscasaccn Journal of Software, Vol18, No4, April 2007, pp1047 1055 http://wwwjosorgcn DOI: 101360/jos181047 Tel/Fax: +86-10-62562563 2007 by Journal of Software
More informationFrequent Value Compression in Packet-based NoC Architectures
Frequent Value Compression in Packet-based NoC Architectures Ping Zhou, BoZhao, YuDu, YiXu, Youtao Zhang, Jun Yang, Li Zhao ECE Department CS Department University of Pittsburgh University of Pittsburgh
More informationLeveraging Bloom Filters for Smart Search Within NUCA Caches
Leveraging Bloom Filters for Smart Search Within NUCA Caches Robert Ricci, Steve Barrus, Dan Gebhardt, and Rajeev Balasubramonian School of Computing, University of Utah {ricci,sbarrus,gebhardt,rajeev}@cs.utah.edu
More informationSimulating Server Consolidation
421 A Coruña, 16-18 de septiembre de 2009 Simulating Server Consolidation Antonio García-Guirado, Ricardo Fernández-Pascual, José M. García 1 Abstract Recently, virtualization has become a hot topic in
More informationMOST microprocessor road-maps today project rapid
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 5, MAY 2014 1359 ZEBRA: Data-Centric Contention Management in Hardware Transactional Memory Rubén Titos-Gil, Member, IEEE, Anurag Negi,
More informationEnhanced Concurrency Control with Transactional NACKs
Enhanced Concurrency Control with Transactional NACKs Woongki Baek Richard M. Yoo Christos Kozyrakis Pervasive Parallelism Laboratory Stanford University {wkbaek, rmyoo, kozyraki@stanford.edu Abstract
More informationMiss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors
Miss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors Dan Wallin and Erik Hagersten Uppsala University Department of Information Technology P.O. Box 337, SE-751 05 Uppsala, Sweden
More informationEager meets Lazy: the Impact of Write-Buffering on Hardware Transactional Memory
Eager meets Lazy: the Impact of Write-Buffering on Hardware Transactional Memory Anurag Negi Rubén Titos-Gil Manuel E. Acacio José M. García Per Stenstrom Chalmers University of Technology Universidad
More informationChí Cao Minh 28 May 2008
Chí Cao Minh 28 May 2008 Uniprocessor systems hitting limits Design complexity overwhelming Power consumption increasing dramatically Instruction-level parallelism exhausted Solution is multiprocessor
More informationPage 1. Consistency. Consistency & TM. Consider. Enter Consistency Models. For DSM systems. Sequential consistency
Consistency Consistency & TM Today s topics: Consistency models the when of the CC-NUMA game Transactional Memory an alternative to lock based synchronization additional reading: paper from HPCA 26 on
More informationConsistency & TM. Consistency
Consistency & TM Today s topics: Consistency models the when of the CC-NUMA game Transactional Memory an alternative to lock based synchronization additional reading: paper from HPCA 26 on class web page
More informationHardware Support For Serializable Transactions: A Study of Feasibility and Performance
Hardware Support For Serializable Transactions: A Study of Feasibility and Performance Utku Aydonat Tarek S. Abdelrahman Edward S. Rogers Sr. Department of Electrical and Computer Engineering University
More informationPathological Interaction of Locks with Transactional Memory
Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal and Michael M. Swift University of Wisconsin Madison {hvolos,neelam,swift,@cs.wisc.edu Abstract Transactional memory
More informationPower-Efficient Spilling Techniques for Chip Multiprocessors
Power-Efficient Spilling Techniques for Chip Multiprocessors Enric Herrero 1,JoséGonzález 2, and Ramon Canal 1 1 Dept. d Arquitectura de Computadors Universitat Politècnica de Catalunya {eherrero,rcanal}@ac.upc.edu
More informationScalable Directory Organization for Tiled CMP Architectures
Scalable Directory Organization for Tiled CMP Architectures Alberto Ros, Manuel E. Acacio, José M. García Departamento de Ingeniería y Tecnología de Computadores Universidad de Murcia {a.ros,meacacio,jmgarcia}@ditec.um.es
More informationBandwidth Adaptive Snooping
Two classes of multiprocessors Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet Project Computer Sciences Department University of Wisconsin
More informationDuplicating and Verifying LogTM with OS Support in the M5 Simulator
Duplicating and Verifying LogTM with OS Support in the M5 Simulator Geoffrey Blake, Trevor Mudge {blakeg,tnm}@eecs.umich.edu Advanced Computer Architecture Lab Department of Electrical Engineering and
More informationDiCo-CMP: Efficient Cache Coherency in Tiled CMP Architectures
DiCo-CMP: Efficient Cache Coherency in Tiled CMP Architectures Alberto Ros, Manuel E. Acacio, José M. García Departamento de Ingeniería y Tecnología de Computadores Universidad de Murcia Campus de Espinardo
More informationA Case for Using Value Prediction to Improve Performance of Transactional Memory
A Case for Using Value Prediction to Improve Performance of Transactional Memory Salil Pant Center for Efficient, Scalable and Reliable Computing North Carolina State University smpant@ncsu.edu Gregory
More informationLecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations
Lecture 12: TM, Consistency Models Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations 1 Paper on TM Pathologies (ISCA 08) LL: lazy versioning, lazy conflict detection, committing
More informationEigenBench: A Simple Exploration Tool for Orthogonal TM Characteristics
EigenBench: A Simple Exploration Tool for Orthogonal TM Characteristics Pervasive Parallelism Laboratory, Stanford University Sungpack Hong Tayo Oguntebi Jared Casper Nathan Bronson Christos Kozyrakis
More informationRECENT technology improvements have made
1 Cache coherence protocol level fault-tolerance for directory-based tiled CMPs Ricardo Fernández-Pascual, José M. García, Manuel E. Acacio 1 y José Duato 2 Abstract Current technology trends of increased
More informationA Shared-Variable-Based Synchronization Approach to Efficient Cache Coherence Simulation for Multi-Core Systems
A Shared-Variable-Based Synchronization Approach to Efficient Cache Coherence Simulation for Multi-Core Systems Cheng-Yang Fu, Meng-Huan Wu, and Ren-Song Tsay Department of Computer Science National Tsing
More informationPost-Silicon Verification for Cache Coherence
Post-Silicon Verification for Cache Coherence Andrew DeOrio, Adam Bauserman and Valeria Bertacco Dept. of Electrical Engineering and Computer Science University of Michigan, Ann Arbor {awdeorio, adambb,
More informationQuantifying and Reducing the Effects of Wrong-Path Memory References in Cache-Coherent Multiprocessor Systems
Quantifying and Reducing the Effects of Wrong-Path Memory References in Cache-Coherent Multiprocessor Systems Resit Sendag 1, Ayse Yilmazer 1, Joshua J. Yi 2, and Augustus K. Uht 1 1 - Department of Electrical
More informationPerformance Improvement by N-Chance Clustered Caching in NoC based Chip Multi-Processors
Performance Improvement by N-Chance Clustered Caching in NoC based Chip Multi-Processors Rakesh Yarlagadda, Sanmukh R Kuppannagari, and Hemangee K Kapoor Department of Computer Science and Engineering
More informationLogSI-HTM: Log Based Snapshot Isolation in Hardware Transactional Memory
LogSI-HTM: Log Based Snapshot Isolation in Hardware Transactional Memory Lois Orosa and Rodolfo zevedo Institute of Computing, University of Campinas (UNICMP) {lois.orosa,rodolfo}@ic.unicamp.br bstract
More informationScheduling Transactions in Replicated Distributed Transactional Memory
Scheduling Transactions in Replicated Distributed Transactional Memory Junwhan Kim and Binoy Ravindran Virginia Tech USA {junwhan,binoy}@vt.edu CCGrid 2013 Concurrency control on chip multiprocessors significantly
More informationA fault-tolerant directory-based cache coherence protocol for CMP architectures
A fault-tolerant directory-based cache coherence protocol for CMP architectures Ricardo Fernández-Pascual, José M. García, Manuel E. Acacio and José Duato Universidad de Murcia, Spain. E-mail: {rfernandez,jmgarcia,meacacio}@ditec.um.es
More informationEvaluation of Low-Overhead Organizations for the Directory in Future Many-Core CMPs
Evaluation of Low-Overhead Organizations for the Directory in Future Many-Core CMPs Alberto Ros 1 and Manuel E. Acacio 2 1 Dpto. de Informática de Sistemas y Computadores Universidad Politécnica de Valencia,
More informationCOMPUTER SYSTEM EVALUATION WITH COMMERCIAL WORKLOADS
COMPUTER SYSTEM EVALUATION WITH COMMERCIAL WORKLOADS JIM NILSSON, FREDRIK DAHLGREN, MAGNUS KARLSSON, PETER MAGNUSSON y, and PER STENSTRÖM fj,dahlgren,karlsson,persg@ce.chalmers.se Department of Computer
More informationMaking the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory
Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory Colin Blundell (University of Pennsylvania) Joe Devietti (University of Pennsylvania) E Christopher Lewis (VMware,
More informationAnalyzing Cache Coherence Protocols for Server Consolidation
2010 22nd International Symposium on Computer Architecture and High Performance Computing Analyzing Cache Coherence Protocols for Server Consolidation Antonio García-Guirado, Ricardo Fernández-Pascual,
More informationCommit Algorithms for Scalable Hardware Transactional Memory. Abstract
Commit Algorithms for Scalable Hardware Transactional Memory Seth H. Pugsley, Rajeev Balasubramonian UUCS-07-016 School of Computing University of Utah Salt Lake City, UT 84112 USA August 9, 2007 Abstract
More informationMETATM/TXLINUX: TRANSACTIONAL MEMORY FOR AN OPERATING SYSTEM
... METATM/TXLINUX: TRANSACTIONAL MEMORY FOR AN OPERATING SYSTEM... HARDWARE TRANSACTIONAL MEMORY CAN REDUCE SYNCHRONIZATION COMPLEXITY WHILE RETAINING HIGH PERFORMANCE. METATM MODELS CHANGES TO THE X86
More informationA Study of the Efficiency of Shared Attraction Memories in Cluster-Based COMA Multiprocessors
A Study of the Efficiency of Shared Attraction Memories in Cluster-Based COMA Multiprocessors Anders Landin and Mattias Karlgren Swedish Institute of Computer Science Box 1263, S-164 28 KISTA, Sweden flandin,
More informationBundling: Reducing the Overhead of Multiprocessor Prefetchers
Bundling: Reducing the Overhead of Multiprocessor Prefetchers Dan Wallin and Erik Hagersten Uppsala University, Department of Information Technology P.O. Box 337, SE-751 05 Uppsala, Sweden dan.wallin,
More informationBundling: Reducing the Overhead of Multiprocessor Prefetchers
Bundling: Reducing the Overhead of Multiprocessor Prefetchers Dan Wallin and Erik Hagersten Uppsala University, Department of Information Technology P.O. Box 337, SE-751 05 Uppsala, Sweden fdan.wallin,
More informationZEBRA: A Data-Centric, Hybrid-Policy Hardware Transactional Memory Design
ZEBRA: A Data-Centric, Hybrid-Policy Hardware Transactional Memory Design Rubén Titos-Gil Universidad de Murcia, Spain rtitos@ditec.um.es José M. García Universidad de Murcia, Spain jmgarcia@ditec.um.es
More informationMulticore-Aware Reuse Distance Analysis
Multicore-Aware Reuse Distance Analysis Derek L. Schuff, Benjamin S. Parsons, and Vijay S. Pai Purdue University West Lafayette, IN 47907 dschuff@purdue.edu, bsparson@purdue.edu, vpai@purdue.edu Abstract
More informationPerformance and Power Impact of Issuewidth in Chip-Multiprocessor Cores
Performance and Power Impact of Issuewidth in Chip-Multiprocessor Cores Magnus Ekman Per Stenstrom Department of Computer Engineering, Department of Computer Engineering, Outline Problem statement Assumptions
More informationOfficial Agenda Review the most commonly used tools in CMP µarch research. Unofficial Agenda Convince you to use Simics
Simics and Friends Modeling Tools for CMP Research Zvika Guz, Isask har (Zigi Zigi) Walter The Technion Israel Institute of Technology Official Agenda Agenda Review the most commonly used tools in CMP
More informationTo Be Silent or Not: On the Impact of Evictions of Clean Data in Cache-Coherent Multicores
Noname manuscript No. (will be inserted by the editor) To Be Silent or Not: On the Impact of Evictions of Clean in Cache-Coherent Multicores Ricardo Fernández-Pascual Alberto Ros Manuel E. Acacio Received:
More informationNew Memory Organizations For 3D DRAM and PCMs
New Memory Organizations For 3D DRAM and PCMs Ademola Fawibe 1, Jared Sherman 1, Krishna Kavi 1 Mike Ignatowski 2, and David Mayhew 2 1 University of North Texas, AdemolaFawibe@my.unt.edu, JaredSherman@my.unt.edu,
More informationLecture 6: TM Eager Implementations. Topics: Eager conflict detection (LogTM), TM pathologies
Lecture 6: TM Eager Implementations Topics: Eager conflict detection (LogTM), TM pathologies 1 Design Space Data Versioning Eager: based on an undo log Lazy: based on a write buffer Typically, versioning
More informationAdaptive Prefetching Technique for Shared Virtual Memory
Adaptive Prefetching Technique for Shared Virtual Memory Sang-Kwon Lee Hee-Chul Yun Joonwon Lee Seungryoul Maeng Computer Architecture Laboratory Korea Advanced Institute of Science and Technology 373-1
More informationInvestigating design tradeoffs in S-NUCA based CMP systems
Investigating design tradeoffs in S-NUCA based CMP systems P. Foglia, C.A. Prete, M. Solinas University of Pisa Dept. of Information Engineering via Diotisalvi, 2 56100 Pisa, Italy {foglia, prete, marco.solinas}@iet.unipi.it
More informationOS Support for Virtualizing Hardware Transactional Memory
OS Support for Virtualizing Hardware Transactional Memory Michael M. Swift, Haris Volos, Luke Yen, Neelam Goyal, Mark D. Hill and David A. Wood University of Wisconsin Madison The Virtualization Problem
More informationEvaluating Contention Management Using Discrete Event Simulation
Evaluating Contention Management Using Discrete Event Simulation Brian Demsky Alokika Dash Department of Electrical Engineering and Computer Science University of California, Irvine Irvine, CA 92697 {bdemsky,adash}@uci.edu
More informationXAMP: an extensible Analytical Model Platform
XAMP: an extensible Analytical Model Platform Yipeng Wang and Yan Solihin Department of Electrical and Computer Engineering North Carolina State University {ywang50, solihin}@ncsu.edu Abstract Analytical
More informationImplementing Atomic Section by Using Hybrid Concurrent Control
2007 IFIP International Conference on Network and Parallel Computing - Workshops Implementing Atomic Section by Using Hybrid Concurrent Control Lei Zhao, Yu Zhang Department of Computer Science & Technology
More informationMulticore-Aware Reuse Distance Analysis
Multicore-Aware Reuse Distance Analysis Derek L. Schuff, Benjamin S. Parsons, and Vijay S. Pai Purdue University West Lafayette, IN 47907 E-mail: dschuff@purdue.edu, bsparson@purdue.ed, vpai@purdue.edu
More informationProactive Transaction Scheduling for Contention Management
Proactive Transaction for Contention Management Geoffrey Blake University of Michigan Ann Arbor, MI blakeg@umich.edu Ronald G. Dreslinski University of Michigan Ann Arbor, MI rdreslin@umich.edu Trevor
More informationEvaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications
Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications Fernando Rui, Márcio Castro, Dalvan Griebler, Luiz Gustavo Fernandes Email: fernando.rui@acad.pucrs.br,
More informationHardware Transactional Memory. Daniel Schwartz-Narbonne
Hardware Transactional Memory Daniel Schwartz-Narbonne Hardware Transactional Memories Hybrid Transactional Memories Case Study: Sun Rock Clever ways to use TM Recap: Parallel Programming 1. Find independent
More informationLecture 8: Eager Transactional Memory. Topics: implementation details of eager TM, various TM pathologies
Lecture 8: Eager Transactional Memory Topics: implementation details of eager TM, various TM pathologies 1 Eager Overview Topics: Logs Log optimization Conflict examples Handling deadlocks Sticky scenarios
More informationTemporal-Aware Mechanism to Detect Private Data in Chip Multiprocessors
Temporal-Aware Mechanism to Detect Private Data in Chip Multiprocessors Alberto Ros, Blas Cuesta, María E. Gómez, Antonio Robles, José Duato Departamento de Ingeniería y Tecnología de Computadores, Universidad
More informationDesign and Implementation of Signatures. for Transactional Memory Systems
Design and Implementation of Signatures for Transactional Memory Systems Daniel Sanchez Department of Computer Sciences University of Wisconsin-Madison August 7 Abstract Transactional Memory (TM) systems
More informationDual Thread Speculation: Two Threads in the Machine are Worth Eight in the Bush
Dual Thread Speculation: Two Threads in the Machine are Worth Eight in the Bush Fredrik Warg and Per Stenstrom Chalmers University of Technology 2006 IEEE. Personal use of this material is permitted. However,
More informationWork Report: Lessons learned on RTM
Work Report: Lessons learned on RTM Sylvain Genevès IPADS September 5, 2013 Sylvain Genevès Transactionnal Memory in commodity hardware 1 / 25 Topic Context Intel launches Restricted Transactional Memory
More informationWormBench - A Configurable Workload for Evaluating Transactional Memory Systems
WormBench - A Configurable Workload for Evaluating Transactional Memory Systems Ferad Zyulkyarov ferad.zyulkyarov@bsc.es Adrian Cristal adrian.cristal@bsc.es Sanja Cvijic Belgrade University scvijic@etf.bg.ac.yu
More informationDBT Tool. DBT Framework
Thread-Safe Dynamic Binary Translation using Transactional Memory JaeWoong Chung,, Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University http://csl.stanford.edu
More informationInvestigating CMP Synchronization Mechanisms
Investigating CMP Synchronization Mechanisms Koushik Chakraborty kchak@cs.wisc.edu Anu Vaidyanathan vaidyana@cs.wisc.edu CS 838 Dec 19, 2003 Philip Wells pwells@cs.wisc.edu 1 Introduction Synchronization
More informationTokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory
Appears in the International Symposium on Computer Architecture (ISCA), June 2008 TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory Jayaram Bobba, Neelam Goyal, Mark
More informationElastic Cooperative Caching: An Autonomous Dynamically Adaptive Memory Hierarchy for Chip Multiprocessors
Elastic Cooperative Caching: An Autonomous Dynamically Adaptive Memory Hierarchy for Chip Multiprocessors Enric Herrero 1 eherrero@ac.upc.edu José González 2 pepe.gonzalez@intel.com 1 Dept. d Arquitectura
More informationTransactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems
Concurrency unlocked Programming Bingsheng Wang TM Operating Systems 1 Outline Background Motivation Database Transaction Transactional Memory History Transactional Memory Example Mechanisms Software Transactional
More informationThe Common Case Transactional Behavior of Multithreaded Programs
The Common Case Transactional Behavior of Multithreaded Programs JaeWoong Chung Hassan Chafi,, Chi Cao Minh, Austen McDonald, Brian D. Carlstrom, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab
More informationFlexTM. Flexible Decoupled Transactional Memory Support. Arrvindh Shriraman Sandhya Dwarkadas Michael L. Scott Department of Computer Science
FlexTM Flexible Decoupled Transactional Memory Support Arrvindh Shriraman Sandhya Dwarkadas Michael L. Scott Department of Computer Science 1 Transactions: Our Goal Lazy Txs (i.e., optimistic conflict
More informationEfficient Memory Management of a Hierarchical and a Hybrid Main Memory for MN-MATE
Efficient Memory Management of a Hierarchical and a Hybrid Main Memory for MN-MATE Platform Kyu Ho Park, Sung Kyu Park, Hyunchul Seok, Woomin Hwang, Dong-Jae Shin, Jong Hun Choi, and Ki-Woong Park Computer
More informationExploiting Spatial Store Locality through Permission Caching in Software DSMs
Exploiting Spatial Store Locality through Permission Caching in Software DSMs Håkan Zeffer, Zoran Radović, Oskar Grenholm, and Erik Hagersten Uppsala University, Dept. of Information Technology, P.O. Box
More informationHybrid Limited-Pointer Linked-List Cache Directory and Cache Coherence Protocol
Hybrid Limited-Pointer Linked-List Cache Directory and Cache Coherence Protocol Mostafa Mahmoud, Amr Wassal Computer Engineering Department, Faculty of Engineering, Cairo University, Cairo, Egypt {mostafa.m.hassan,
More informationPreventing versus Curing: Avoiding Conflicts in Transactional Memories
Preventing versus Curing: Avoiding Conflicts in Transactional Memories Aleksandar Dragojević Anmol V. Singh Rachid Guerraoui Vasu Singh EPFL Abstract Transactional memories are typically speculative and
More informationA Multi-Vdd Dynamic Variable-Pipeline On-Chip Router for CMPs
A Multi-Vdd Dynamic Variable-Pipeline On-Chip Router for CMPs Hiroki Matsutani 1, Yuto Hirata 1, Michihiro Koibuchi 2, Kimiyoshi Usami 3, Hiroshi Nakamura 4, and Hideharu Amano 1 1 Keio University 2 National
More informationADAPTIVE BLOCK PINNING BASED: DYNAMIC CACHE PARTITIONING FOR MULTI-CORE ARCHITECTURES
ADAPTIVE BLOCK PINNING BASED: DYNAMIC CACHE PARTITIONING FOR MULTI-CORE ARCHITECTURES Nitin Chaturvedi 1 Jithin Thomas 1, S Gurunarayanan 2 1 Birla Institute of Technology and Science, EEE Group, Pilani,
More information6 Transactional Memory. Robert Mullins
6 Transactional Memory ( MPhil Chip Multiprocessors (ACS Robert Mullins Overview Limitations of lock-based programming Transactional memory Programming with TM ( STM ) Software TM ( HTM ) Hardware TM 2
More informationSP-NUCA: A Cost Effective Dynamic Non-Uniform Cache Architecture
SP-NUCA: A Cost Effective Dynamic Non-Uniform Cache Architecture Javier Merino, Valentín Puente, Pablo Prieto, José Ángel Gregorio Universidad de Cantabria Spain {jmerino, vpuente, pprieto, monaster}@unican.es
More informationEnhancing LRU Replacement via Phantom Associativity
Enhancing Replacement via Phantom Associativity Min Feng Chen Tian Rajiv Gupta Dept. of CSE, University of California, Riverside Email: {mfeng, tianc, gupta}@cs.ucr.edu Abstract In this paper, we propose
More informationDynamically Filtering Thread-Local Variables in Lazy-Lazy Hardware Transactional Memory
Dynamically Filtering Thread-Local Variables in Lazy-Lazy Hardware Transactional Memory Sutirtha Sanyal 1, Sourav Roy 2, Adrian Cristal 1, Osman S. Unsal 1, Mateo Valero 1 1 Barcelona Super Computing Center,
More informationVariability in Architectural Simulations of Multi-threaded
Variability in Architectural Simulations of Multi-threaded threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison {alaa,david}@cs.wisc.edu http://www.cs.wisc.edu/multifacet
More informationTradeoffs in Transactional Memory Virtualization
Tradeoffs in Transactional Memory Virtualization JaeWoong Chung Chi Cao Minh, Austen McDonald, Travis Skare, Hassan Chafi,, Brian D. Carlstrom, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab Stanford
More informationA Sharing-Aware Memory Management Unit for Online Mapping in Multi-Core Architectures
A Sharing-Aware Memory Management Unit for Online Mapping in Multi-Core Architectures Eduardo H. M. Cruz 1, Matthias Diener 1, Laércio L. Pilla 2, Philippe O. A. Navaux 1 1 Informatics Institute, Federal
More informationComposable Transactional Objects: a Position Paper
Composable Transactional Objects: a Position Paper Maurice Herlihy 1 and Eric Koskinen 2 1 Brown University, Providence, RI, USA 2 New York University, New York, NY, USA Abstract. Memory transactions provide
More informationManaging Off-Chip Bandwidth: A Case for Bandwidth-Friendly Replacement Policy
Managing Off-Chip Bandwidth: A Case for Bandwidth-Friendly Replacement Policy Bushra Ahsan Electrical Engineering Department City University of New York bahsan@gc.cuny.edu Mohamed Zahran Electrical Engineering
More informationAn Integrated Pseudo-Associativity and Relaxed-Order Approach to Hardware Transactional Memory
An Integrated Pseudo-Associativity and Relaxed-Order Approach to Hardware Transactional Memory ZHICHAO YAN, Huazhong University of Science and Technology HONG JIANG, University of Nebraska - Lincoln YUJUAN
More informationKicking the Tires of Software Transactional Memory: Why the Going Gets Tough
Kicking the Tires of Software Transactional Memory: Why the Going Gets Tough Richard M. Yoo yoo@ece.gatech.edu Bratin Saha bratin.saha@intel.com Yang Ni yang.ni@intel.com Ali-Reza Adl-Tabatabai ali-reza.adltabatabai@intel.com
More informationHardware Transactional Memory Architecture and Emulation
Hardware Transactional Memory Architecture and Emulation Dr. Peng Liu 刘鹏 liupeng@zju.edu.cn Media Processor Lab Dept. of Information Science and Electronic Engineering Zhejiang University Hangzhou, 310027,P.R.China
More informationFlexBulk: Intelligently Forming Atomic Blocks in Blocked-Execution Multiprocessors to Minimize Squashes
FlexBulk: Intelligently Forming Atomic Blocks in Blocked-Execution Multiprocessors to Minimize Squashes Rishi Agarwal and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu
More informationPERFORMANCE-EFFICIENT DATA SYNCHRONIZATION ARCHITECTURE FOR MULTI-CORE SYSTEMS USING C-LOCK
PERFORMANCE-EFFICIENT DATA SYNCHRONIZATION ARCHITECTURE FOR MULTI-CORE SYSTEMS USING C-LOCK Aswathy Surendran The Oxford College of Engineering, Bangalore aswathysurendran4u@gmail.com Abstract: Data synchronization
More informationTransactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech
Transactional Memory Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech (Adapted from Stanford TCC group and MIT SuperTech Group) Motivation Uniprocessor Systems Frequency
More information