Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS
|
|
- Aron King
- 5 years ago
- Views:
Transcription
1 Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS
2 Who am I? Education Master of Technology, NTNU, 2007 PhD, NTNU, Title: «Managing Shared Resources in Chip Multiprocessor Memory Systems» Current Associate Professor in Computer Architecture (since 2012) Head of CARD research group Coordinator of the Energy Efficient Computing Systems «fyrtårn» Research Interests Memory systems for multi core architectures, heterogeneous computer systems, energy efficiency, computer architecture simulation, compilers and system software
3 Outline 1. Energy Efficient Computing Systems (EECS) 2. Computer Architecture Trends 3. Achieving Energy Efficient Computation 4. The SHMAC Research Project 5. Concluding Remarks
4 Outline 1. Energy Efficient Computing Systems (EECS) 2. Computer Architecture Trends 3. Achieving Energy Efficient Computation 4. The SHMAC Research Project 5. Concluding Remarks
5 Green ICT Improving computing technology Improving energy intensive processes Big potential impact Massive potential impact EECS focus
6 Environmental Motivation Global Energy Consumption Other Computing Systems Sources: International Energy Agency and SSB Research project goal: 10% reduction 80 TWh Global Consumer ICT (800 TWh) Use energy reduction to turn off coal power plants Norway Total Energy (220 TWh) Save 80 M metric tones CO 2 Norway annual CO 2 : 87M metric tones
7 Technical Motivation Energy efficiency is becoming the primary design goal across all market segments Extremely energy sensitive systems Lifetime of system is equal to battery life Lower energy consumption can open new markets Mobile systems Energy: Users want long battery life Limited size of cooling system results in strict power constraints Desktop computers Fixed power budget due to cooling challenges Cannot improve performance without improving energy efficiency Data centers and HPC Energy bill dominates operating cost Power consumption is a significant engineering challenge
8 EECS Structure Vertical approach Leverage strong groups working horizontally Application agnostic Matches focus of high volume international industry Choose demonstrator applications that clearly demonstrates proposed innovations People 6.2 affiliated permanent staff 10 affiliated PhD students 5 affiliated researchers/lecturers SHMAC
9 NTNU EECS Key Assets EECS at a Glance Expertise from nano scale electronics via computer architecture to system software and applications Excellent lab infrastructure for prototype development and significant high performance computing resources Industry partners including ARM and SiLabs The SHMAC Research Infrastructure Single ISA Heterogeneous MAny core Computer An infrastructure for rapid prototyping of heterogeneous computing systems Significant EU Proposal Support Scientific man year allocated to EU proposal processes Significant administrative support 21 researchers from 10 nations Contact: Assoc. Prof. Magnus Jahre (magnus.jahre@idi.ntnu.no), EECS Coordinator
10 Outline 1. Energy Efficient Computing Systems (EECS) 2. Computer Architecture Trends 3. Achieving Energy Efficient Computation 4. The SHMAC Research Project 5. Concluding Remarks
11 Historical Computer Performance Significant performance improvement for each generation The high performance processor industry economy relies on these trends Performance Relative to VAX 11/ Single core performance Year Aggregate performance Enabled by Transistor speed scaling Core microarchitecture techniques Cache memory architecture Performance Relative to VAX 11/ % 52% Year 22%
12 Traditional Transistor Speed Scaling Every generation: Transistor density doubles Performance increases by 40% Supply voltage is reduced by 30% which reduces dynamic energy by 65% Known as Dennard scaling Almost to good to be true Keeping these scaling properties is no longer possible Key question: How can we continue to deliver performance growth to end users?
13 Architectural Advances and Energy Efficiency
14 Pollack s Rule Microarchitectural techniques that exploit the growth in available transistors ( area) deliver a performance growth which is the square root of the number of transistors ( area) Example: Adding techniques that result in doubling the area occupied by the processor increases performance by a factor of 2 1.4
15 DRAM Density and Speed Good: Optimizing for DRAM density gives large memory capacity Bad: Optimizing for DRAM density creates the need to hide latency
16 Performance Observation: Two thirds of the 1000x improvement from the last 30 years is due to technology
17 Business as usual? Business as usual scenario: Add more cores and increase clock frequency Dark Silicon Practical server limit (about 100W) Practical desktop limit (about 65W) Market expects 30x performance improvement over the next decade
18 Outline 1. Energy Efficient Computing Systems (EECS) 2. Computer Architecture Trends 3. Achieving Energy Efficient Computation 4. The SHMAC Research Project 5. Concluding Remarks
19 ACHIEVING ENERGY EFFICIENT COMPUTATION Based on: Borkar et al., The Future of Microprocessors, CACM 2011
20 Challenge 1: Leakage PMOS NMOS A transistor is not a perfect switch Inverter Leakage: The small current that flows through a transistor when it is off Bad news: Leakage increases exponentially with reduction in threshold voltage The increased number of transistors on a chip increases the consequences of this We cannot reduce the threshold voltage further! Threshold voltage reduction is key to reducing energy consumption
21 Challenge 2: Energy limits number of logic transistors Power and energy constraints limit the number of cores and clock frequency of future processors Energy efficiency is the key metric Energy proportional computing will be the goal of both hardware and software Use each additional joule to deliver more value!
22 Challenge 3: Memory Hierarchies (1/2) AMD Barcelona 4 core Processor Which units do actual computation?
23 Challenge 3: Memory Hierarchies (2/2) Modern processors have large memory hierarchies aimed at hiding latency Memory orchestration costs energy Energy consumption of a transfer is proportional to the length of the wire Moving data at high frequency over long distances leaves little energy for computation
24 Challenge 4: Software 1 st major challenge: Parallel software is needed to exploit the performance gains from technology scaling 2 nd major challenge: The parallel software must be energy efficient Trade off: programmability vs. efficiency The programmer needs help to achieve these goals
25 New Thinking Needed Traditional 90/10 optimization Use maximum transistors for the 90% case to improve single thread performance Pollack: Return is square root of invested resources Not energy efficient! New strategy: 10/10 optimization Add accelerators for the 10% cases All accelerators are energy efficient but for different tasks Meet energy budget by carefully choosing which accelerators should be active at a given time Corollary: most accelerators are off most of the time
26 Solution 1: Energy Efficient Programming Trick: Exploit that the system is not fully loaded at all times Sleep modes (application) Turn off the parts of the system that you don t need Retain data vs. not retaining data Very useful in embedded systems Dynamic Voltage and Frequency Scaling (DVFS) (system software) Match the performance to the tasks at hand Very useful for desktops/servers
27 Sleep Modes in EFM 32
28 Solution 2: Multiple cores P P P P P P P P P P P P P P P P Small core Homogenous Large core Homogenous P P P P Asymmetric Multi core Processor P P P P P P P P P P P P P Area = 4 Power = 4 Performance = 2 Area = 1 Power = 1 Performance = 1
29 Example: A Parallel Application Idea: Exploit heterogeneity to by using the simple energy efficient cores in the parallel phase and the fast energy hungry core for the sequential phase
30 Solution 3: Customization Idea: Add specialized units that can execute certain operations in an energy efficient manner Potential: 50x to 500x improvement (application dependent) Possible accelerators: SIMD cores or GPUs Fixed function units (media encoders, crypto, etc.) FPGAs TI OMAP Smartphone SoC
31 Solution 4: Hybrid Networks Idea: increase the size of the local storage to keep data as close to the core as possible Effect: Bandwidth demand is reduced, enabling simpler networks
32 Solution 5: Voltage Scaling (1/2) Reducing supply voltage and frequency increases energy efficiency
33 Solution 5: Voltage Scaling (2/2) Reducing the supply voltage for a powerful core reduces single thread performance Small throughput oriented cores are less sensitive to single thread performance Increase in number of cores offsets performance reduction Challenge: voltage reduction exposes production variability Solution: allow different clock frequencies for different cores
34 Outline 1. Energy Efficient Computing Systems (EECS) 2. Computer Architecture Trends 3. Achieving Energy Efficient Computation 4. The SHMAC Research Project 5. Concluding Remarks
35 Dark Silicon Business as usual scenario: Add more cores and increase clock frequency Dark Silicon Practical server limit (about 100W) Practical desktop limit (about 65W) Dark silicon: The end of Dennard scaling combined with Moore s law creates the situation where only a subset of the transistors can be powered within the power budget
36 The SHMAC Project Dark silicon effect makes heterogeneous processors likely Software for heterogeneous processors is an open research problem Heterogeneity of off the shelf components is limited Simulators have unlimited heterogeneity but are slow Solution: SHMAC = Single ISA Heterogeneous MAny core Computer
37 SHMAC Architecture Tiled multi core design paradigm describing a class of processor architectures Common instruction set and architecture model gives software portability across SHMAC instances SHMAC instances can contain various tile types: Processors with different energy/performance characteristics Optimized processors (vector, OOO, etc.) Accelerators Fast Core Core w/ accelerator Core w/ accelerator Vector core
38 Design Goal: Software Portability All processor tiles are functionally equivalent Performance may be very different Different processor classes and accelerators 0xFFFF FFFF 0xFFFF xFFFE xF System Registers Tile Registers Scratchpad Tile Memory Scratchpad Tile 1 Scratchpad Tile 2... Scratchpad Tile k Uniform architecture Main Memory All processing tiles see the same memory map Tile registers are per tile, other memory locations are global 0x x Exception Table SHMAC Memory Map Research question: What are the costs associated with the Single ISA abstraction?
39 Leveraging Reconfigurability Generic components Benchmarks Operating Systems Runtime Systems In order core Core w/ accelerator Synthesis Vector core SHMAC instance running on an FPGA Scratchpad SHMAC Configuration Measure, evaluate, repeat
40 Project Example 1: Integrating Accelerators Tightly coupled accelerator Loosely coupled accelerator Accelerator research topics: Tightly vs. loosely coupled accelerators Which accelerators should be included? How can accelerators be leveraged by programmers? The most efficient solution will most likely require both software and hardware changes Key SHMAC Components: Accelerator support Processor tiles Memory tiles System Software Benchmarks
41 Project Example 2: Task Based Parallelism (TBP) for Heterogeneous Systems TBP: Program is organized as DAG of tasks (nodes) and dependencies (edges) Task scheduling for energy efficiency : How should tasks be scheduled in a heterogeneous environment? Which hardware feedback mechanisms are needed? SHMAC advantages: Efficient software development, large diversity in systems, possibility to add feedback components Key SHMAC Components: OS support Benchmarks Processor tiles Memory tiles
42 Project Example 3: Exploiting Near /Sub threshold technology Delay Power PDP EDP Reducing the supply voltage to near the threshold gives: Energy per switching is reduced by one order of magnitude Latency increases by 3 4 orders of magnitude How can this technology be leveraged at the microarchitecture level? Key SHMAC Components: Processor tiles System Software Benchmarks Tape out necessary to validate implementation
43 SHMAC Enables Collaboration SHMAC combines generic components and powerful abstractions Reimplement/extend the part(s) involved in your research project SHMAC best suited for cross disciplinary projects where hardware and software innovations are combined Different partners can focus on different parts of the system EECS is one of seven groups at NTNU that receives special support towards Horizon 2020
44 Future Directions Status: Minimal set of tiles to support software development Future hardware Efficient accelerator integration Vector core Out of order core Future software Benchmarks (micro, macro) Operating Systems (conventional, multikernel) Runtime systems Significant effort: 1 Post doc., 2 PhD students, 15 master students
45 How can I help save the world? Choose project/master thesis related to the SHMAC project!
46 Outline 1. Energy Efficient Computing Systems (EECS) 2. Computer Architecture Trends 3. Achieving Energy Efficient Computation 4. The SHMAC Research Project 5. Concluding Remarks
47 Concluding Remarks EECS Motivation Environment: Climate change and efficient use of energy Technology: Power/energy consumption limits performance growth across all computing segments Master and project topics Strong relations to international high volume industry SHMAC project covers the vertical depth of the strategic research area
TDT 4260 lecture 12 spring semester 2015
1 TDT 4260 lecture 12 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU TDT4260 computer architecture 2 Lecture overview Administrative update Miniproject presentations
More informationMicroprocessor Trends and Implications for the Future
Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from
More informationECE 486/586. Computer Architecture. Lecture # 2
ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:
More informationComputer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:
More informationComputer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13
Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,
More informationFundamentals of Quantitative Design and Analysis
Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature
More informationIMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM
IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information
More informationComputer Systems Research in the Post-Dennard Scaling Era. Emilio G. Cota Candidacy Exam April 30, 2013
Computer Systems Research in the Post-Dennard Scaling Era Emilio G. Cota Candidacy Exam April 30, 2013 Intel 4004, 1971 1 core, no cache 23K 10um transistors Intel Nehalem EX, 2009 8c, 24MB cache 2.3B
More informationFundamentals of Computer Design
CS359: Computer Architecture Fundamentals of Computer Design Yanyan Shen Department of Computer Science and Engineering 1 Defining Computer Architecture Agenda Introduction Classes of Computers 1.3 Defining
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology
More informationReconfigurable Multicore Server Processors for Low Power Operation
Reconfigurable Multicore Server Processors for Low Power Operation Ronald G. Dreslinski, David Fick, David Blaauw, Dennis Sylvester, Trevor Mudge University of Michigan, Advanced Computer Architecture
More informationEECS4201 Computer Architecture
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be
More informationComputer Architecture s Changing Definition
Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction
More informationSpring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University
18-742 Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core Prof. Onur Mutlu Carnegie Mellon University Research Project Project proposal due: Jan 31 Project topics Does everyone have a topic?
More informationLecture 1: Introduction
Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationMulti-Core Microprocessor Chips: Motivation & Challenges
Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005
More informationSystem-on-Chip Architecture for Mobile Applications. Sabyasachi Dey
System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution
More informationMulticore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.
CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous
More informationCMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on
More informationCS/EE 6810: Computer Architecture
CS/EE 6810: Computer Architecture Class format: Most lectures on YouTube *BEFORE* class Use class time for discussions, clarifications, problem-solving, assignments 1 Introduction Background: CS 3810 or
More informationInternational Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN
International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 938 LOW POWER SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY T.SANKARARAO STUDENT OF GITAS, S.SEKHAR DILEEP
More informationAdvanced Computer Architecture (CS620)
Advanced Computer Architecture (CS620) Background: Good understanding of computer organization (eg.cs220), basic computer architecture (eg.cs221) and knowledge of probability, statistics and modeling (eg.cs433).
More informationEfficient Hardware Acceleration on SoC- FPGA using OpenCL
Efficient Hardware Acceleration on SoC- FPGA using OpenCL Advisor : Dr. Benjamin Carrion Schafer Susmitha Gogineni 30 th August 17 Presentation Overview 1.Objective & Motivation 2.Configurable SoC -FPGA
More informationThe Return of Innovation. David May. David May 1 Cambridge December 2005
The Return of Innovation David May David May 1 Cambridge December 2005 Long term trends Computer performance/cost has followed an exponential path since the 1940s, doubling about every 18 months This has
More informationEfficiency and Programmability: Enablers for ExaScale. Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford
Efficiency and Programmability: Enablers for ExaScale Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford Scientific Discovery and Business Analytics Driving an Insatiable
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationComputer Architecture
Informatics 3 Computer Architecture Dr. Boris Grot and Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh General Information Instructors: Boris
More informationFPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges
Bob Blainey IBM Software Group 27 Feb 2011 FPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges Workshop on The Role of FPGAs in a Converged Future with Heterogeneous Programmable
More informationCS 475: Parallel Programming Introduction
CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.
More informationLecture 1: Gentle Introduction to GPUs
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationPerformance, Power, Die Yield. CS301 Prof Szajda
Performance, Power, Die Yield CS301 Prof Szajda Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the
More informationAddressing the Memory Wall
Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the
More informationImproving multicore memory systems
1 Improving multicore memory systems and some thoughts on chip multiprocessor programming NIK MULTICORE TECHNOLOGY WORKSHOP 19. Nov. 2007 Lasse.Natvig@idi.ntnu.no NTNU Computer Architecture Research group
More informationWhen and Where? Course Information. Expected Background ECE 486/586. Computer Architecture. Lecture # 1. Spring Portland State University
When and Where? ECE 486/586 Computer Architecture Lecture # 1 Spring 2015 Portland State University When: Tuesdays and Thursdays 7:00-8:50 PM Where: Willow Creek Center (WCC) 312 Office hours: Tuesday
More informationDr. Yassine Hariri CMC Microsystems
Dr. Yassine Hariri Hariri@cmc.ca CMC Microsystems 03-26-2013 Agenda MCES Workshop Agenda and Topics Canada s National Design Network and CMC Microsystems Processor Eras: Background and History Single core
More informationModule 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT
TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012
More informationExploiting Dark Silicon in Server Design. Nikos Hardavellas Northwestern University, EECS
Exploiting Dark Silicon in Server Design Nikos Hardavellas Northwestern University, EECS Moore s Law Is Alive And Well 90nm 90nm transistor (Intel, 2005) Swine Flu A/H1N1 (CDC) 65nm 45nm 32nm 22nm 16nm
More informationTDT 4260 lecture 3 spring semester 2015
1 TDT 4260 lecture 3 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU http://research.idi.ntnu.no/multicore 2 Lecture overview Repetition Chap.1: Performance,
More informationLECTURE 1. Introduction
LECTURE 1 Introduction CLASSES OF COMPUTERS When we think of a computer, most of us might first think of our laptop or maybe one of the desktop machines frequently used in the Majors Lab. Computers, however,
More informationToward a Memory-centric Architecture
Toward a Memory-centric Architecture Martin Fink EVP & Chief Technology Officer Western Digital Corporation August 8, 2017 1 SAFE HARBOR DISCLAIMERS Forward-Looking Statements This presentation contains
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationChapter 1: Fundamentals of Quantitative Design and Analysis
1 / 12 Chapter 1: Fundamentals of Quantitative Design and Analysis Be careful in this chapter. It contains a tremendous amount of information and data about the changes in computer architecture since the
More informationLow-Power Interconnection Networks
Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:
More informationLecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform
More informationReducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University
Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck
More informationMulti-threading technology and the challenges of meeting performance and power consumption demands for mobile applications
Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications September 2013 Navigating between ever-higher performance targets and strict limits
More informationToday. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )
Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Systems Group Department of Computer Science ETH Zürich SMP architecture
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore
More informationECE 588/688 Advanced Computer Architecture II
ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen alaa@ece.pdx.edu Fall 2009 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2009 1 When and Where? When:
More informationPerformance of computer systems
Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type
More informationCSCI-GA Multicore Processors: Architecture & Programming Lecture 3: The Memory System You Can t Ignore it!
CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 3: The Memory System You Can t Ignore it! Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Memory Computer Technology
More informationECE 8823: GPU Architectures. Objectives
ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading
More informationOPERA. Low Power Heterogeneous Architecture for the Next Generation of Smart Infrastructure and Platforms in Industrial and Societal Applications
OPERA Low Power Heterogeneous Architecture for the Next Generation of Smart Infrastructure and Platforms in Industrial and Societal Applications Co-funded by the Horizon 2020 Framework Programme of the
More informationComputer Architecture
Informatics 3 Computer Architecture Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh (thanks to Prof. Nigel Topham) General Information Instructor
More informationComputer Architecture. R. Poss
Computer Architecture R. Poss 1 ca01-10 september 2015 Course & organization 2 ca01-10 september 2015 Aims of this course The aims of this course are: to highlight current trends to introduce the notion
More informationThe Implications of Multi-core
The Implications of Multi- What I want to do today Given that everyone is heralding Multi- Is it really the Holy Grail? Will it cure cancer? A lot of misinformation has surfaced What multi- is and what
More information6 February Parallel Computing: A View From Berkeley. E. M. Hielscher. Introduction. Applications and Dwarfs. Hardware. Programming Models
Parallel 6 February 2008 Motivation All major processor manufacturers have switched to parallel architectures This switch driven by three Walls : the Power Wall, Memory Wall, and ILP Wall Power = Capacitance
More informationLecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)"
Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)" 1" Processor components" Multicore processors and programming" Processor comparison" CSE 30321 - Lecture 01 - vs." Goal: Explain
More informationProgrammable Near-Memory Acceleration on ConTutto
Programmable Near- Acceleration on ConTutto Jan van Lunteren, IBM Research Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit IBM Zurich (CH) Team Jan van Lunteren, Christoph Hagleitner
More informationEmbedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.
Embedded processors Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.fi Comparing processors Evaluating processors Taxonomy of processors
More informationLecture 2: Performance
Lecture 2: Performance Today s topics: Technology wrap-up Performance trends and equations Reminders: YouTube videos, canvas, and class webpage: http://www.cs.utah.edu/~rajeev/cs3810/ 1 Important Trends
More informationTechnology challenges and trends over the next decade (A look through a 2030 crystal ball) Al Gara Intel Fellow & Chief HPC System Architect
Technology challenges and trends over the next decade (A look through a 2030 crystal ball) Al Gara Intel Fellow & Chief HPC System Architect Today s Focus Areas For Discussion Will look at various technologies
More informationPower dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.
The VLSI Interconnect Challenge Avinoam Kolodny Electrical Engineering Department Technion Israel Institute of Technology VLSI Challenges System complexity Performance Tolerance to digital noise and faults
More informationComputer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John
Computer Performance Evaluation and Benchmarking EE 382M Dr. Lizy Kurian John Evolution of Single-Chip Transistor Count 10K- 100K Clock Frequency 0.2-2MHz Microprocessors 1970 s 1980 s 1990 s 2010s 100K-1M
More informationDepartment of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory - CAPSL. Introduction
Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory - CAPSL Introduction CPEG 852 - Spring 2014 Advanced Topics in Computing Systems Guang R. Gao ACM
More informationPERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Sept. 5 th : Homework 1 release (due on Sept.
More informationThe Bifrost GPU architecture and the ARM Mali-G71 GPU
The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our
More informationADVANCED FPGA BASED SYSTEM DESIGN. Dr. Tayab Din Memon Lecture 3 & 4
ADVANCED FPGA BASED SYSTEM DESIGN Dr. Tayab Din Memon tayabuddin.memon@faculty.muet.edu.pk Lecture 3 & 4 Books Recommended Books: Text Book: FPGA Based System Design by Wayne Wolf Overview Why VLSI? Moore
More informationComputer Architecture. Fall Dongkun Shin, SKKU
Computer Architecture Fall 2018 1 Syllabus Instructors: Dongkun Shin Office : Room 85470 E-mail : dongkun@skku.edu Office Hours: Wed. 15:00-17:30 or by appointment Lecture notes nyx.skku.ac.kr Courses
More informationFacilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit
Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM Join the Conversation #OpenPOWERSummit Moral of the Story OpenPOWER is the best platform to
More informationComputer Architecture!
Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors
More information3D systems-on-chip. A clever partitioning of circuits to improve area, cost, power and performance. The 3D technology landscape
Edition April 2017 Semiconductor technology & processing 3D systems-on-chip A clever partitioning of circuits to improve area, cost, power and performance. In recent years, the technology of 3D integration
More informationHigher Level Programming Abstractions for FPGAs using OpenCL
Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center ! Technology scaling favors programmability CPUs."#/0$*12'$-*
More informationAR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors Computer Sciences Department University of Wisconsin Madison http://www.cs.wisc.edu/~ericro/ericro.html ericro@cs.wisc.edu High-Performance
More informationSuggested Readings! Lecture 24" Parallel Processing on Multi-Core Chips! Technology Drive to Multi-core! ! Readings! ! H&P: Chapter 7! vs.! CSE 30321!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 7!! (Over next 2 weeks)! Lecture 24" Parallel Processing on Multi-Core Chips! 3! Processor components! Multicore processors and programming! Processor
More informationCS654 Advanced Computer Architecture. Lec 1 - Introduction
CS654 Advanced Computer Architecture Lec 1 - Introduction Peter Kemper Adapted from the slides of EECS 252 by Prof. David Patterson Electrical Engineering and Computer Sciences University of California,
More informationLecture 9: MIMD Architecture
Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is
More informationOn-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc.
On-chip Networks Enable the Dark Silicon Advantage Drew Wingard CTO & Co-founder Sonics, Inc. Agenda Sonics history and corporate summary Power challenges in advanced SoCs General power management techniques
More informationTowards Deploying Decommissioned Mobile Devices as Cheap Energy-Efficient Compute Nodes
Towards Deploying Decommissioned Mobile Devices as Cheap Energy-Efficient Compute Nodes Mohammad Shahrad and David Wentzlaff Monday, July 10, 2017 Growing Dominance of Smartphones 6.3 billion 3.2 Smartphones
More informationMicroprocessors, Lecture 1: Introduction to Microprocessors
Microprocessors, Lecture 1: Introduction to Microprocessors Computing Systems General-purpose standalone systems (سيستم ھای نھفته ( systems Embedded 2 General-purpose standalone systems Stand-alone computer
More informationInternational Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering
IP-SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY A LOW POWER DESIGN D. Harihara Santosh 1, Lagudu Ramesh Naidu 2 Assistant professor, Dept. of ECE, MVGR College of Engineering, Andhra Pradesh, India
More informationMaximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman
Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael
More informationFast Design Space Subsetting. University of Florida Electrical and Computer Engineering Department Embedded Systems Lab
Fast Design Space Subsetting University of Florida Electrical and Computer Engineering Department Embedded Systems Lab Motivation & Greater Impact Energy & Data Centers Estimated¹ energy by servers data
More informationHPC future trends from a science perspective
HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively
More informationParallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework and numa control Examples
More informationParallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model
Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy
More informationArchitecture at the end of Moore
Architecture at the end of Moore Stefanos Kaxiras Uppsala University IT Uppsala universitet Conclusions There s a power problem and it seems bad Nothing works really well (e.g., multicores) Heterogeous
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationScaling Throughput Processors for Machine Intelligence
Scaling Throughput Processors for Machine Intelligence ScaledML Stanford 24-Mar-18 simon@graphcore.ai 1 MI The impact on humanity of harnessing machine intelligence will be greater than the impact of harnessing
More informationCenter for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop
Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop http://cscads.rice.edu/ Discussion and Feedback CScADS Autotuning 07 Top Priority Questions for Discussion
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationHPC Technology Trends
HPC Technology Trends High Performance Embedded Computing Conference September 18, 2007 David S Scott, Ph.D. Petascale Product Line Architect Digital Enterprise Group Risk Factors Today s s presentations
More informationStack Machines. Towards Scalable Stack Based Parallelism. 1 of 53. Tutorial Organizer: Dr Chris Crispin-Bailey
1 of 53 Stack Machines Towards Scalable Stack Based Parallelism Tutorial Organizer: Department of Computer Science University of York 2 of 53 Today s Speakers Dr Mark Shannon Dr Huibin Shi 3 of 53 Stack
More informationDVFS Space Exploration in Power-Constrained Processing-in-Memory Systems
DVFS Space Exploration in Power-Constrained Processing-in-Memory Systems Marko Scrbak and Krishna M. Kavi Computer Systems Research Laboratory Department of Computer Science & Engineering University of
More informationParallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010
Parallelizing FPGA Technology Mapping using GPUs Doris Chen Deshanand Singh Aug 31 st, 2010 Motivation: Compile Time In last 12 years: 110x increase in FPGA Logic, 23x increase in CPU speed, 4.8x gap Question:
More information