2. Futile Stall HTM HTM HTM. Transactional Memory: TM [1] TM. HTM exponential backoff. magic waiting HTM. futile stall. Hardware Transactional Memory:

Size: px

Start display at page:

Download "2. Futile Stall HTM HTM HTM. Transactional Memory: TM [1] TM. HTM exponential backoff. magic waiting HTM. futile stall. Hardware Transactional Memory:"

Pamela Hudson
6 years ago
Views:

1 ,a) 1 HTM 2 2 LogTM 72.2% 28.4% 1. Transactional Memory: TM [1] TM Hardware Transactional Memory: 1 Nagoya Institute of Technology, Nagoya, Aichi, , Japan a) tsumura@nitech.ac.jp HTM HTM HTM HTM exponential backoff magic waiting HTM 2. Futile Stall futile stall 2.1 Futile Stall c 2013 Information Processing Society of Japan 1

2 t1 t2 t3 t4 t5 t6 t7 time Core1 thr.1 stalled stalled 1 Core2 thr.2 stalled stalled Futile Stall Core3 thr.3 futile stall 1 3 thr.1 3 futile stall futile stall thr.1 thr.2 thr.2 t1 thr.2 thr.1 thr.1 t2thr.1 thr.2 thr.2 thr.1 thr.2 futile stall thr.2 thr.3 t3 t4 thr.2 futile stall thr.3 t4 thr.2 t5 thr.1 thr.3 thr.1 t6 thr.3 t7 futile stall thr.3 futile stall 2.2 Futile Stall futile stall 1 thr.3 futile stall 2 A-tx L-inst F-I A-tx F-II L-inst OS L-inst c 2013 Information Processing Society of Japan 2

3 F-II L-inst S-inst F-II S-inst OS F-II HTM ID ID R-flags Stx-flags Ltx-flags ID Counter A-Counter Recurrence flags R-flags ID A-Counter A-tx ID Instruction Counter I-Counter Short Tx flags Stx-flags ID t1 t2 t3 t4 time Core1 thr.1 stalled stalled 2 Core2 thr.2 stalled stalled Core3 thr.3 A-Counter 1 A-Counter 2 R-flag[X] 1 I-Counter counts the number of executed instructions Stx-flag[X] 1 I-Counter L-inst ID Long Tx flags Ltx-flags ID I-Counter L-inst ID Stx-flags Ltx-flags I-Counter S-inst ID H/W 10 H/W A-Counter 2 bit I-Counter 9 bit R-flags Stx-flags Ltx-flags 16 bit 2 thr.3 A-tx 2 c 2013 Information Processing Society of Japan 3

4 thr.3 thr.2 t11 A-Counter 1 thr.3 thr.1 t2a-counter 2 A-tx R-flag ID R-flag[X] thr.3 R-flag[X] I-Counter I-Counter L-inst ID X Stx-flag[X] t4 I-Counter L-inst Ltx-flag[X] Stx-flag[X] Sequential flag S-flag ID of Opponent Thread O-id H/W 32 S-flag 1 bit O-id 5 bit H/W 260B futile stall H/W 3 3 thr.1 3 t1 t2 t3 t4 t5 t6 t7 t8 t9 time S-flag == 0 S-flag 1 S-flag 0 O-id 3 O-id 0 Core1 thr.1 start Magic Waiting 3 ACK start Core2 thr.2 start start start S-flag 1 ACK Core3 thr.3 S-flag 0 O-id 1 start O-id 0 Magic Waiting S-flag == 0 S-flag 1 S-flag 0 3 thr.2 t1 thr.1 thr.3 thr.1 thr.3 S-flag t2 thr.1 thr.3 t1 S-flag thr.1 thr.3 ACK thr.2 ACK thr.2 S-flag t3 thr.1 ACK thr.2 S-flag thr.1 t4thr.2 thr.1 1 O-id thr.2 thr.2 thr.2 S-flag thr.1 thr.2 thr.1 S-flag t5 thr.2 thr.1 thr.3 c 2013 Information Processing Society of Japan 4

5 Processor 1 number of cores fuency issue width issue order non-memory IPC 1 D1 cache ways latency D2 cache ways latency Memory latency Interconnect network latency SPARC V9 32 cores 1 GHz single-issue in-order 32 KBytes 4 ways 3 cycles 8 MBytes 8 ways 20 cycles 4 GBytes 450 cycles 14 cycles t6 thr.2 t7thr.2 O-id thr.1 thr.2 thr.1 t7 thr.2 thr.2 O-id thr.1 thr.3 t8 t HTM LogTM[2] starving writer? 3.1 HTM Simics [3] GEMS [4] Simics GEMS 32 SPARC V9 OS Solaris10 1 GEMS microbench SPLASH-2 [5] STAMP [6] 12 2 GEMS SPLASH-2 STAMP all (S) 8.5% 10.3% 1.7% 7.5% 17.3% 18.7% 1.9% 18.7% (F) 31.7% 26.8% 0.9% 19.8% 72.7% 71.5% 1.8% 71.5% (H) 36.6% 34.0% 2.1% 28.4% % 70.4% 3.1% 72.2% (B) LogTM baseline (S 3 ) Starving writer (F 4 ) Futile stall (H) Hybrid 2 (B) 1 Non trans Good trans Bad trans / ing Backoff Stall Barrier Magic Waiting exponential backoff magic waiting [7] 10 95% starving writer (S) (S) Slist (S) (B) Contention Genome Kmeans Vacation (B) 72.9% Kmeans 17.1% Genome magic waiting Kmeans 0.1% Btree Deque Prioque Barnes Radiosity c 2013 Information Processing Society of Japan 5

6 1.2 (B)traditional LogTM (baseline) (S)relieving starving writer (F)avoiding futile stall (H)Hybird model of (S) and (F) 1.01 MagicWaiting Ratio of cycles Barrier Stall Backoff ing Bad_trans Good_trans 0.00 Non_trans GEMS / 31threads SPLASH-2 / 31threads STAMP / 16threads 4 Bad trans ing Stall Backoff Deque Prioque Magic Waiting writer magic waiting exponential backoff reader 3 Magic Waiting writer Btree starving writer starving writer Backoff Btree (S) 86.8% 1/4 Radiosity (S) 30 magic waiting (S) (B) futile stall (F) Btree Contention Slist Vacation L-inst Cholesky Radiosity Genome Kmeans (B) (F) Deque Prioque Raytrace (F) Deque Prioque Non trans 2 random random Non trans c 2013 Information Processing Society of Japan 6

7 Non trans random 2 (H) Barnes 2 Barnes (S) (S) Radiosity Genome (H) (S) (F) starving writer (H) 4. [8], [9],?? HTM Titos [10] Eager Lazy Eager Lazy Yoo [11] HTM adaptive transaction scheduling ATS ATS HTM 97% Akpinar [12] Eager HTM backoff 15% Geoffrey [13] Similarity bloom filter Similarity Similarity HTM Gaona [14] HTM 10% HTM HTM 5. HTM c 2013 Information Processing Society of Japan 7

8 2 2 LogTM GEMS microbench SPLASH-2 STAMP 3 2 LogTM 72.2% 28.4% 2 384B 260B 2 1 starving writer starving writer backoff 2 futile stall futile stall A-tx L-inst S-inst [1] Herlihy, M. and Moss, J. E. B.: Transactional Memory: Architectural Support for Lock-Free Data Structures, Proc. 20th Annual Int l Symp. on Computer Architecture, pp (1993). [2] Moore, K. E., Bobba, J., Moravan, M. J., Hill, M. D. and Wood, D. A.: LogTM: Log-based Transactional Memory, Proc. 12th Int l Symp. on High-Performance Computer Architecture, pp (2006). [3] Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hållberg, G., Högberg, J., Larsson, F., Moestedt, A. and Werner, B.: Simics: A Full System Simulation Platform, Computer, Vol. 35, No. 2, pp (2002). [4] Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D. and Wood., D. A.: Multifacet s General Execution-driven Multiprocessor Simulator (GEMS) Toolset, ACM SIGARCH Computer Architecture News, Vol. 33, No. 4, pp (2005). [5] Woo, S. C., Ohara, M., Torrie, E., Singh, J. P. and Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations, Proc. 22nd Annual Int l. Symp. on Computer Architecture (ISCA 95), pp (1995). [6] Minh, C. C., Chung, J., Kozyrakis, C. and Olukotun, K.: STAMP: Stanford Transactional Applications for Multi- Processing, Proc. IEEE Int l Symp. on Workload Characterization (IISWC 08) (2008). [7] Alameldeen, A. R. and Wood, D. A.: Variability in Architectural Simulations of Multi-Threaded Workloads, Proc. 9th Int l Symp. on High-Performance Computer Architecture (HPCA 03), pp (2003). [8] Moravan, M. J., Bobba, J., Moore, K. E., Yen, L., Hill, M. D., Liblit, B., Swift, M. M. and Wood, D. A.: Supporting Nested Transactional Memory in LogTM, Proc. 12th Int l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp (2006). [9] Waliullah, M. M. and Stenstrom, P.: Intermediate Checkpointing with Conflicting Access Prediction in Transactional Memory Systems, Proc. Int l Symp. on Parallel and Distributed Processing (IPDPS), pp (2008). [10] Titos, R., Negi, A., Acacio, M. E., García, J. M. and Stenstrom, P.: ZEBRA:A Data-Centric, Hybrid-Policy Hardware Transactional Memory Design, Proc. Int l Conf. on Supercomputing (ICS 11), pp (2011). [11] Yoo, R. M. and Lee, H.-H. S.: Adaptive Transaction Scheduling for Transactional Memory Systems, Proc. 20th Annual Symp. on Parallelism in Algorithms and Architectures (SPAA 08), pp (2008). [12] Akpinar, E., Tomić, S., Cristal, A., Unsal, O. and Valero, M.: A Comprehensive Study of Conflict Resolution Policies in Hardware Transactional Memory, Proc. 6th ACM SIGPLAN Workshop on Transactional Computing (TRANSACT 11) (2011). [13] Blake, G., G., R., Dreslinski and Mudge, T.: Bloom Filter Guided Transaction Scheduling, Proc. 17th International Conference on High-Performance Computer Architecture (HPCA ), pp (2011). [14] Gaona, E., Titos, R., Acacio, M. E. and Fernández, J.: Dynamic Serialization Improving Energy Consumption in Eager-Eager Hardware Transactional Memory Systems, Proc. Parallel, Distributed and Network-Based Processing th Euromicro International Conference (PDP 12), pp (2012). c 2013 Information Processing Society of Japan 8

ASelectiveLoggingMechanismforHardwareTransactionalMemorySystems

ASelectiveLoggingMechanismforHardwareTransactionalMemorySystems Marc Lupon Grigorios Magklis Antonio González mlupon@ac.upc.edu grigorios.magklis@intel.com antonio.gonzalez@intel.com Computer Architecture