CNS Update. José F. Martínez. M 3 Architecture Research Group

Size: px

Start display at page:

Download "CNS Update. José F. Martínez. M 3 Architecture Research Group"

Steven Townsend
5 years ago
Views:

1 1 CNS Update José F. M 3 Architecture Research Group

2 2 Project s Recent Highlights Dynamic multicore reconfiguration/adaptation E. İpek, M. Kırman, N. Kırman, and J.F. Core Fusion: Accommodating Software Diversity in Multicore Chips In Intl. Symp. on Computer Architecture (ISCA), June 2007 C.C. LaFrieda, E. İpek, J.F., and R. Manohar Dynamic Core Coupling for Resilient Multicore Chips In Intl. Conf. on Dependable Systems and Networks (DSN), June 2007 J. Li and J.F. Power-Performance Optimization of Parallel Computing in Multicore Chips In Intl. Symp. on High Performance Computer Architecture (HPCA), Feb. 2006

3 3 Project s Recent Highlights Dynamic hardware reconfiguration/adaptation E. İpek, M. Kırman, N. Kırman, and J.F. Core Fusion: Accommodating Software Diversity in Multicore Chips In Intl. Symp. on Computer Architecture (ISCA), June 2007 C.C. LaFrieda, E. İpek, J.F., and R. Manohar Dynamic Core Coupling for Resilient Multicore Chips In Intl. Conf. on Dependable Systems and Networks (DSN), June 2007 J. Li and J.F. Power-Performance Optimization of Parallel Computing in Multicore Chips In Intl. Symp. on High Performance Computer Architecture (HPCA), Feb. 2006

4 4 Challenge: CMPs Lack Flexibility In CMPs, core is new transistor Must support diverse apps Sequential Multiprogrammed Parallel (coarse- or fine-grain) Evolving Conflicting requirements No. of cores Per-core performance

5 5 Challenge: CMPs Lack Flexibility In CMPs, core is new transistor Must support diverse apps Sequential HW Mismatch Multiprogrammed 1.6x Parallel (coarse- or fine-grain) Evolving Conflicting requirements 1.6x No. of cores Per-core performance

6 6 High-ILP, High-TLP Hardware Spatial approach: Multiscalar, RAW, Smart Memories, TRIPS + Modular, flexible designs - Significant software support Temporal approach: SMT + Tiny overhead on top of base core; quasi-transparent - Top-down approach: Large base core - Little tolerance for hardware bugs/faults - Resource interference - Lower parallel efficiency

7 7 Proposal: Core Fusion Run-time CMP synthesis High compatibility Single execution model Backward-compatible ISA No sophisticated SW support Bottom-up hierarchical design Tolerant to hardware bugs/faults No interference across base cores High parallel efficiency

8 8 Contributions and Findings Run-time fully reconfigurable and distributed Front-end + i-cache LSQ + d-cache ROB Thorough evaluation using diverse workload classes Sequential Parallel Effective Always best or 2 nd best Always best in intermediate parallelization stages Others lag significantly in 1+ cases Highly compatible Multiprogrammed Evolving

9 9 Conceptual Organization Concept: Add enveloping hardware to enable on-demand core fusion L2 $ L1 d-$ L1 d-$ L1 d-$ L1 d-$ L1 d-$ L1 d-$ L1 d-$ L1 d-$ CORE CORE CORE CORE CORE CORE CORE CORE L1 i-$ L1 i-$ L1 i-$ L1 i-$ L1 i-$ L1 i-$ L1 i-$ L1 i-$ Not meant to represent actual floorplan

10 10 Core Fusion Operation i-cache fusion and reconfiguration Collective fetch Instruction steering/renaming Collective execution Distributed memory access Collective commit

11 11 Core Fusion Operation i-cache fusion and reconfiguration Collective fetch Instruction steering/renaming Collective execution Distributed memory access Collective commit

12 12 Collective Fetch BTB BTB BTB B BTB BPred BPred BPred BPred GHR GHR GHR GHR RAS RAS RAS RAS

13 13 Collective Fetch BTB BPred X X BTB BPred X X BTB BPred X X B BTB BPred X X X GHR GHR GHR GHR RAS RAS RAS RAS

14 14 Collective Commit I i1 i3 i5 i7 i0 i2 i4 i6 i1 i3 i5 i7 Pre-commit i0 i2 i4 i6 i1 i3 i5 i7 i0 i2 i4 i6 i1 i3 i5 i7 Commit i0 i2 i4 i6

15 15 Collective Commit II i1 i3 i5 i7 i0 i2 i4 i6 i1 i3 i5 i7 Pre-commit i0 i2 i4 i6 i1 i3 i5 i7 i0 i2 i4 i6 i1 i3 i5 i7 Commit i0 i2 i4 i6

16 16 Run-time Reconfiguration Run-time control of granularity Serial vs. parallel sections Variable granularity in parallel sections Mechanism: Fusion, fission ISA instruction Typically encapsulated in macros or directives (e.g., OpenMP sections) Can be safely ignored (single execution model) Relatively simple Flush pipelines and i-caches Reconfigure i-cache tags Transfer architectural state as needed

17 17 Evaluation Nugget: Evolving Apps

18 18 Issues that Intrigue Me Synergistic hardware-software technology Virtualization OS scheduling Multicore compiler mechanisms Application programming

19 19 Acknowledgments Outstanding Ph.D. students: E. İpek, M. Kırman, N. Kırman, C. LaFrieda, J. Li Generous support NSF Award CNS (Darema) Other NSF Awards - CAREER CCF (Pinkston) - CCF (Pinkston) IBM Faculty Award Intel graduate fellowships (M. Kırman and N. Kırman) Intel gifts and equipment donations

20 20 NGS-CSR Workshop Bullet If we forget Amdahl s Law, it will come back to haunt us

Core Fusion: Accommodating Software Diversity in Chip Multiprocessors

Core Fusion: Accommodating Software Diversity in Chip Multiprocessors Authors: Engin Ipek, Meyrem Kırman, Nevin Kırman, and Jose F. Martinez Navreet Virk Dept of Computer & Information Sciences University