Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor

Size: px
Start display at page:

Download "Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor"

Transcription

1 1 Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan, Michael Dalton, Christos Kozyrakis Presenter: Yue Zheng Yulin Shi

2 Outline Motivation & Background Hardware DIFT overview DIFT Coprocessor Design Prototype System Evaluation Conclusion Discussion points 2 2

3 Outline Motivation & Background Hardware DIFT overview DIFT Coprocessor Design Prototype System Evaluation Conclusion Discussion points 3 3

4 Motivation & Background? Why is the pointer dereference potentially dangerous Dynamic Information Flow Tracking (DIFT) Tag data from untrusted source Track tainted data propagation Check unsafe tainted data usage Vulnerable C Code int idx = tainted_input; buffer[idx] = x; // memory corruption Tainted Pointer Dereference R1 load R2 add R4 store M[R4] TRAP &tainted_input M[R1] R2 + R3 R5 #Re g R1 R2 R3 R4 R5 Data &tainted_input idx (tainted_input) &buffer &buffer + idx x Tag 4 4

5 Why is the pointer dereference potentially dangerous? Buffer Overflow Attack store M[R4] R5 Vulnerable C Code char buf[126]; strcpy(buf,str); TRAP 5 5

6 Motivation & Background Software DIFT Use Dynamic Binary Translation (DBT) to implement DIFT Avoid recompilation Introduce significant overheads Limitation Incompatible with self-modifying and multithreaded programs 6 6

7 Outline Motivation & Background Hardware DIFT overview DIFT Coprocessor Design Prototype System Evaluation Conclusion Discussion points 7 7

8 Hardware DIFT overview In-core DIFT Minimize runtime overhead Require significant modifications to processor structure 8 8

9 Hardware DIFT overview Offloading DIFT Offer the flexibility of analysis in software Introduce significant overheads halve the throughput, double the power consumption Require pipeline changes 9 9

10 Hardware DIFT overview Off-core DIFT DIFT synchronizes with main core only on system calls Eliminate the changes to processor Allow pairing with multiple processor designs 10 10

11 Outline Motivation & Background Hardware DIFT overview DIFT Coprocessor Design Prototype System Evaluation Conclusion Discussion points 11 11

12 DIFT Coprocessor Design Security Model R1 Security Model load R2 &tainted_input M[R1] Synchronization Coprocessor Instruction Exception add R4 store M[R4] TRAP R2 + R3 R5 System Call Tainted Pointer Dereference IF ID RE N Main Core R O B EX EX TRAP Commit 12 12

13 DIFT Coprocessor Design - Architecture Main Core Processor VS Coprocessor Main Core Processor Coprocessor Handle data 32 bits Complexity Handle tag propagation and check 4 bits Simple 13 13

14 DIFT Coprocessor Design - Architecture Four-stage pipeline: Decode, Execution, TagCheck, WriteBack Main Core Decoupling Queue Tag Reg File Security Decode Tag ALU Tag Check Logic Writeback Queue Stall DIFT Coprocessor Tag Cache L2 Cache DRAM Tag s Decode Execution TagCheck WriteBack 14 14

15 DIFT Coprocessor Design - Interface Coprocessor Setup Software control security policies Instruction Flow Information PC, Instruction Encoding and Memory Address Decoupling Greater or equal processing rate to avoid full queue Security Exceptions Asynchronous interrupts and run in trusted mode PC, Inst Encoding, Mem Addr Security Decode 15 15

16 Outline Motivation & Background Hardware DIFT overview DIFT Coprocessor Design Prototype System Evaluation Conclusion Discussion points 16 16

17 Prototype Hardware SPARC V8 Processor FPGA Board Software Gentoo Linux 2.6 Design Statistics 7.64% area overhead 17 17

18 Outline Motivation & Background Hardware DIFT overview DIFT Coprocessor Architecture Prototype System Evaluation Conclusion Discussion points 18 18

19 Evaluation Security Evaluation Wide Low High Independent level workload attacks of range programming language Common Utilities Servers Web apps Kernel code 19 Program Lan g. Attack Detected Vulnerability Gzip C Directory traversal Open tainted directory Tar C Directory traversal Open tainted directory Scry PHP Cross-site scripting Tainted <script> tag Htdig C++ Cross-site scripting Tainted <script> tag Polymorph C Buffer overflow Tainted code ptr dereference Sendmail C Buffer overflow Tainted data ptr dereference Quotactl syscall C User/kernel pointer dereference Tainted pointer to kernelspace SUS C Format string bug Tainted %n in syslog WU_FTPD C Format string bug Tainted %n in vfprintf 19

20 Evaluation Performance Evaluation Execution time 512-byte tag cache 6-entry queue Runtime overhead < 1% 20 20

21 Evaluation Performance Evaluation Scaling the tag cache 21 21

22 Evaluation Performance Evaluation Scaling the decoupling queue 16-byte tag cache Tag miss 22 22

23 Evaluation Processor/Coprocessor Performance Ratio 11.7% 3.8% 16-entry queue Can be paired with various main cores 23 23

24 Outline Motivation & Background Hardware DIFT overview DIFT Coprocessor Architecture Prototype System Evaluation Conclusion Discussion points 24 24

25 Conclusion DIFT: a promising security technique Proposed an off-core, decoupling coprocessor for DIFT Provide the same security features as in-core DIFT Reduce DIFT implementation cost drastically Has low area and performance overheads Developed a full-system prototype Protect real-world Linux applications 25 25

26 Questions? 26 26

27 27 Debate Is a wider-issue coprocessor better than a single-issue coprocessor for 3-way superscalar processors? Is it worth to add a checkpoint scheme to DIFT to provide reliable recovery? ( A checkpoint scheme allows the system to rollback for recovery)

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan, Michael Dalton, Christos Kozyrakis Computer Systems Laboratory Stanford University Motivation Dynamic analysis help

More information

Real-World Buffer Overflow Protection in User & Kernel Space

Real-World Buffer Overflow Protection in User & Kernel Space Real-World Buffer Overflow Protection in User & Kernel Space Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University http://raksha.stanford.edu 1 Motivation Buffer

More information

Prototyping Architectural Support for Program Rollback Using FPGAs

Prototyping Architectural Support for Program Rollback Using FPGAs Prototyping Architectural Support for Program Rollback Using FPGAs Radu Teodorescu and Josep Torrellas http://iacoma.cs.uiuc.edu University of Illinois at Urbana-Champaign Motivation Problem: Software

More information

Raksha: A Flexible Information Flow Architecture for Software Security

Raksha: A Flexible Information Flow Architecture for Software Security Raksha: A Flexible Information Flow Architecture for Software Security Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University {mwdalton, hkannan, kozyraki}@stanford.edu

More information

DBT Tool. DBT Framework

DBT Tool. DBT Framework Thread-Safe Dynamic Binary Translation using Transactional Memory JaeWoong Chung,, Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University http://csl.stanford.edu

More information

Filtering Metadata Lookups in Instruction-Grain Application Monitoring

Filtering Metadata Lookups in Instruction-Grain Application Monitoring EDIC RESEARCH PROPOSAL 1 Filtering etadata Lookups in Instruction-Grain Application onitoring Yusuf Onur Kocberber Parallel Systems Architecture Lab (PARSA) Ecole Polytechnique Fédérale de Lausanne Lausanne,

More information

What are Exceptions? EE 457 Unit 8. Exception Processing. Exception Examples 1. Exceptions What Happens When Things Go Wrong

What are Exceptions? EE 457 Unit 8. Exception Processing. Exception Examples 1. Exceptions What Happens When Things Go Wrong 8. 8.2 What are Exceptions? EE 457 Unit 8 Exceptions What Happens When Things Go Wrong Exceptions are rare events triggered by the hardware and forcing the processor to execute a software handler Similar

More information

AFRL-RI-RS-TR

AFRL-RI-RS-TR AFRL-RI-RS-TR-2014-088 A NEW OPERATING SYSTEM FOR SECURITY TAGGED ARCHITECTURE HARDWARE IN SUPPORT OF MULTIPLE INDEPENDENT LEVELS OF SECURITY (MILS) COMPLIANT SYSTEM UNIVERSITY OF IDAHO APRIL 2014 FINAL

More information

EE 457 Unit 8. Exceptions What Happens When Things Go Wrong

EE 457 Unit 8. Exceptions What Happens When Things Go Wrong 1 EE 457 Unit 8 Exceptions What Happens When Things Go Wrong 2 What are Exceptions? Exceptions are rare events triggered by the hardware and forcing the processor to execute a software handler HW Interrupts

More information

Thread-Safe Dynamic Binary Translation using Transactional Memory

Thread-Safe Dynamic Binary Translation using Transactional Memory Thread-Safe Dynamic Binary Translation using Transactional Memory JaeWoong Chung, Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University {jwchung, mwdalton, hkannan,

More information

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory ECE4750/CS4420 Computer Architecture L11: Speculative Execution I Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab3 due today 2 1 Overview Branch penalties limit performance

More information

Sandboxing Untrusted Code: Software-Based Fault Isolation (SFI)

Sandboxing Untrusted Code: Software-Based Fault Isolation (SFI) Sandboxing Untrusted Code: Software-Based Fault Isolation (SFI) Brad Karp UCL Computer Science CS GZ03 / M030 9 th December 2011 Motivation: Vulnerabilities in C Seen dangers of vulnerabilities: injection

More information

High-Performance Parallel Accelerator for Flexible and Efficient Run-Time Monitoring

High-Performance Parallel Accelerator for Flexible and Efficient Run-Time Monitoring High-Performance Parallel Accelerator for Flexible and Efficient Run-Time Monitoring Daniel Y. Deng and G. Edward Suh Computer Systems Laboratory, Cornell University Ithaca, New York 14850 {deng, suh}@csl.cornell.edu

More information

Static Vulnerability Analysis

Static Vulnerability Analysis Static Vulnerability Analysis Static Vulnerability Detection helps in finding vulnerabilities in code that can be extracted by malicious input. There are different static analysis tools for different kinds

More information

Tolerating Malicious Drivers in Linux. Silas Boyd-Wickizer and Nickolai Zeldovich

Tolerating Malicious Drivers in Linux. Silas Boyd-Wickizer and Nickolai Zeldovich XXX Tolerating Malicious Drivers in Linux Silas Boyd-Wickizer and Nickolai Zeldovich How could a device driver be malicious? Today's device drivers are highly privileged Write kernel memory, allocate memory,...

More information

Multithreaded Processors. Department of Electrical Engineering Stanford University

Multithreaded Processors. Department of Electrical Engineering Stanford University Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread

More information

Hardware Enforcement of Application Security Policies Using Tagged Memory

Hardware Enforcement of Application Security Policies Using Tagged Memory Hardware Enforcement of Application Security Policies Using Tagged Memory Nickolai Zeldovich, Hari Kannan, Michael Dalton, and Christos Kozyrakis MIT Stanford University ABSTRACT Computers are notoriously

More information

The Design Complexity of Program Undo Support in a General-Purpose Processor

The Design Complexity of Program Undo Support in a General-Purpose Processor The Design Complexity of Program Undo Support in a General-Purpose Processor Radu Teodorescu and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Instruction Commit The End of the Road (um Pipe) Commit is typically the last stage of the pipeline Anything an insn. does at this point is irrevocable Only actions following

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Instruction Commit The End of the Road (um Pipe) Commit is typically the last stage of the pipeline Anything an insn. does at this point is irrevocable Only actions following

More information

Tainting is Not Pointless

Tainting is Not Pointless Tainting is Not Pointless Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University {mwdalton, hkannan, kozyraki}@stanford.edu Pointer tainting is a form of Dynamic

More information

How to Sandbox IIS Automatically without 0 False Positive and Negative

How to Sandbox IIS Automatically without 0 False Positive and Negative How to Sandbox IIS Automatically without 0 False Positive and Negative Professor Tzi-cker Chiueh Computer Science Department Stony Brook University chiueh@cs.sunysb.edu 1/10/06 Blackhat Federal 2006 1

More information

The Design Complexity of Program Undo Support in a General Purpose Processor. Radu Teodorescu and Josep Torrellas

The Design Complexity of Program Undo Support in a General Purpose Processor. Radu Teodorescu and Josep Torrellas The Design Complexity of Program Undo Support in a General Purpose Processor Radu Teodorescu and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Processor with program

More information

Caches. Hiding Memory Access Times

Caches. Hiding Memory Access Times Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY

More information

M 3 Microkernel-based System for Heterogeneous Manycores

M 3 Microkernel-based System for Heterogeneous Manycores M 3 Microkernel-based System for Heterogeneous Manycores Nils Asmussen MKC, 06/29/2017 1 / 35 Outline 1 Introduction 2 Data Transfer Unit 3 Prototype Platforms 4 M 3 5 Summary 2 / 35 Outline 1 Introduction

More information

CS5460: Operating Systems

CS5460: Operating Systems CS5460: Operating Systems Lecture 2: OS Hardware Interface (Chapter 2) Course web page: http://www.eng.utah.edu/~cs5460/ CADE lab: WEB L224 and L226 http://www.cade.utah.edu/ Projects will be on Linux

More information

Spectre and Meltdown. Clifford Wolf q/talk

Spectre and Meltdown. Clifford Wolf q/talk Spectre and Meltdown Clifford Wolf q/talk 2018-01-30 Spectre and Meltdown Spectre (CVE-2017-5753 and CVE-2017-5715) Is an architectural security bug that effects most modern processors with speculative

More information

data block 0, word 0 block 0, word 1 block 1, word 0 block 1, word 1 block 2, word 0 block 2, word 1 block 3, word 0 block 3, word 1 Word index cache

data block 0, word 0 block 0, word 1 block 1, word 0 block 1, word 1 block 2, word 0 block 2, word 1 block 3, word 0 block 3, word 1 Word index cache Taking advantage of spatial locality Use block size larger than one word Example: two words Block index tag () () Alternate representations Word index tag block, word block, word block, word block, word

More information

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing Mingyu Gao and Christos Kozyrakis Stanford University http://mast.stanford.edu HPCA March 14, 2016 PIM is Coming Back End of Dennard

More information

From Speculation to Security: Practical and Efficient Information Flow Tracking Using Speculative Hardware

From Speculation to Security: Practical and Efficient Information Flow Tracking Using Speculative Hardware From Speculation to Security: Practical and Efficient Information Flow Tracking Using Speculative Hardware Haibo Chen, Xi Wu, Liwei Yuan, Binyu Zang, Pen-chung Yew, and Frederic T. Chong Parallel Processing

More information

Lecture 10. Pointless Tainting? Evaluating the Practicality of Pointer Tainting. Asia Slowinska, Herbert Bos. Advanced Operating Systems

Lecture 10. Pointless Tainting? Evaluating the Practicality of Pointer Tainting. Asia Slowinska, Herbert Bos. Advanced Operating Systems Lecture 10 Pointless Tainting? Evaluating the Practicality of Pointer Tainting Asia Slowinska, Herbert Bos Advanced Operating Systems December 15, 2010 SOA/OS Lecture 10, Pointer Tainting 1/40 Introduction

More information

Caching and reliability

Caching and reliability Caching and reliability Block cache Vs. Latency ~10 ns 1~ ms Access unit Byte (word) Sector Capacity Gigabytes Terabytes Price Expensive Cheap Caching disk contents in RAM Hit ratio h : probability of

More information

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste

More information

A program execution is memory safe so long as memory access errors never occur:

A program execution is memory safe so long as memory access errors never occur: A program execution is memory safe so long as memory access errors never occur: Buffer overflows, null pointer dereference, use after free, use of uninitialized memory, illegal free Memory safety categories

More information

NFSv4 as the Building Block for Fault Tolerant Applications

NFSv4 as the Building Block for Fault Tolerant Applications NFSv4 as the Building Block for Fault Tolerant Applications Alexandros Batsakis Overview Goal: To provide support for recoverability and application fault tolerance through the NFSv4 file system Motivation:

More information

CCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs

CCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs CCured Type-Safe Retrofitting of C Programs [Necula, McPeak,, Weimer, Condit, Harren] #1 One-Slide Summary CCured enforces memory safety and type safety in legacy C programs. CCured analyzes how you use

More information

POLITECNICO DI MILANO. Exception handling. Donatella Sciuto:

POLITECNICO DI MILANO. Exception handling. Donatella Sciuto: POLITECNICO DI MILANO Exception handling Donatella Sciuto: donatella.sciuto@polimi.it Interrupts: altering the normal flow of control I i-1 HI 1 program I i HI 2 interrupt handler I i+1 HI n An external

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Koji Inoue Department of Informatics, Kyushu University Japan Science and Technology Agency

Koji Inoue Department of Informatics, Kyushu University Japan Science and Technology Agency Lock and Unlock: A Data Management Algorithm for A Security-Aware Cache Department of Informatics, Japan Science and Technology Agency ICECS'06 1 Background (1/2) Trusted Program Malicious Program Branch

More information

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary

More information

Itanium 2 Processor Microarchitecture Overview

Itanium 2 Processor Microarchitecture Overview Itanium 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy, August 2002 Block Diagram F 16KB L1 I-cache Instr 2 Instr 1 Instr 0 M/A M/A M/A M/A I/A Template I/A B B 2 FMACs

More information

EE108B Lecture 13. Caches Wrap Up Processes, Interrupts, and Exceptions. Christos Kozyrakis Stanford University

EE108B Lecture 13. Caches Wrap Up Processes, Interrupts, and Exceptions. Christos Kozyrakis Stanford University EE108B Lecture 13 Caches Wrap Up Processes, Interrupts, and Exceptions Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements Lab3 and PA2.1 are due ton 2/27 Don t forget

More information

Confinement (Running Untrusted Programs)

Confinement (Running Untrusted Programs) Confinement (Running Untrusted Programs) Chester Rebeiro Indian Institute of Technology Madras Untrusted Programs Untrusted Application Entire Application untrusted Part of application untrusted Modules

More information

José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2

José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2 CHERRY: CHECKPOINTED EARLY RESOURCE RECYCLING José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2 1 2 3 MOTIVATION Problem: Limited processor resources Goal: More

More information

Announcements. EE108B Lecture 13. Caches Wrap Up Processes, Interrupts, and Exceptions. Measuring Performance. Memory Performance

Announcements. EE108B Lecture 13. Caches Wrap Up Processes, Interrupts, and Exceptions. Measuring Performance. Memory Performance Announcements EE18B Lecture 13 Caches Wrap Up Processes, Interrupts, and Exceptions Lab3 and PA2.1 are due ton 2/27 Don t forget to study the textbook Don t forget the review sessions Christos Kozyrakis

More information

250P: Computer Systems Architecture. Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019

250P: Computer Systems Architecture. Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019 250P: Computer Systems Architecture Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019 The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Branch prediction and instr

More information

Syscalls, exceptions, and interrupts, oh my!

Syscalls, exceptions, and interrupts, oh my! Syscalls, exceptions, and interrupts, oh my! Hakim Weatherspoon CS 3410 Computer Science Cornell University [Altinbuken, Weatherspoon, Bala, Bracy, McKee, and Sirer] Announcements P4-Buffer Overflow is

More information

One-Slide Summary. Lecture Outline. Language Security

One-Slide Summary. Lecture Outline. Language Security Language Security Or: bringing a knife to a gun fight #1 One-Slide Summary A language s design principles and features have a strong influence on the security of programs written in that language. C s

More information

CS 537 Lecture 2 Computer Architecture and Operating Systems. OS Tasks

CS 537 Lecture 2 Computer Architecture and Operating Systems. OS Tasks CS 537 Lecture 2 Computer Architecture and Operating Systems Michael Swift OS Tasks What is the role of the OS regarding hardware? What does the OS need from hardware to perform this role? 1 Computer Hardware

More information

are Softw Instruction Set Architecture Microarchitecture are rdw

are Softw Instruction Set Architecture Microarchitecture are rdw Program, Application Software Programming Language Compiler/Interpreter Operating System Instruction Set Architecture Hardware Microarchitecture Digital Logic Devices (transistors, etc.) Solid-State Physics

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Outline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA

Outline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA CS 258 Parallel Computer Architecture Data Speculation Support for a Chip Multiprocessor (Hydra CMP) Lance Hammond, Mark Willey and Kunle Olukotun Presented: May 7 th, 2008 Ankit Jain Outline The Hydra

More information

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk class8.ppt 5-23 The course that gives CMU its Zip! Virtual Oct. 29, 22 Topics Motivations for VM Address translation Accelerating translation with TLBs Motivations for Virtual Use Physical DRAM as a Cache

More information

Ensuring Critical Data Integrity via Information Flow Signatures

Ensuring Critical Data Integrity via Information Flow Signatures Ensuring Critical Data Integrity via Information Flow Signatures William Healey, Karthik Pattabiraman, Shane Ryoo, Paul Dabrowski, Zbigniew Kalbarczyk, Ravi Iyer and Wen-Mei Hwu ABSTRACT Center for Reliable

More information

Lecture 11 Cache. Peng Liu.

Lecture 11 Cache. Peng Liu. Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative

More information

CS 61C: Great Ideas in Computer Architecture Virtual Memory. Instructors: John Wawrzynek & Vladimir Stojanovic

CS 61C: Great Ideas in Computer Architecture Virtual Memory. Instructors: John Wawrzynek & Vladimir Stojanovic CS 61C: Great Ideas in Computer Architecture Virtual Memory Instructors: John Wawrzynek & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ 1 Review Programmed I/O Polling vs. Interrupts Booting

More information

Pipelining to Superscalar

Pipelining to Superscalar Pipelining to Superscalar ECE/CS 752 Fall 207 Prof. Mikko H. Lipasti University of Wisconsin-Madison Pipelining to Superscalar Forecast Limits of pipelining The case for superscalar Instruction-level parallel

More information

Virtual Memory Oct. 29, 2002

Virtual Memory Oct. 29, 2002 5-23 The course that gives CMU its Zip! Virtual Memory Oct. 29, 22 Topics Motivations for VM Address translation Accelerating translation with TLBs class9.ppt Motivations for Virtual Memory Use Physical

More information

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs Virtual Memory Today Motivations for VM Address translation Accelerating translation with TLBs Fabián Chris E. Bustamante, Riesbeck, Fall Spring 2007 2007 A system with physical memory only Addresses generated

More information

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK A D D A D D A L U

More information

secubt Hacking the Hackers with User Space Virtualization

secubt Hacking the Hackers with User Space Virtualization secubt Hacking the Hackers with User Space Virtualization Mathias Payer Mathias Payer: secubt User Space Virtualization 1 Motivation Virtualizing and encapsulating running programs

More information

CS 152 Computer Architecture and Engineering. Lecture 13 - Out-of-Order Issue and Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 13 - Out-of-Order Issue and Register Renaming CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue and Register Renaming Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://wwweecsberkeleyedu/~krste

More information

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4 Motivation Threads Chapter 4 Most modern applications are multithreaded Threads run within application Multiple tasks with the application can be implemented by separate Update display Fetch data Spell

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements EECS15 - Digital Design Lecture 11 SRAM (II), Caches September 29, 211 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http//www-inst.eecs.berkeley.edu/~cs15 Fall

More information

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors Milos Prvulovic, Zheng Zhang*, Josep Torrellas University of Illinois at Urbana-Champaign *Hewlett-Packard

More information

Mobilizing the Micro-Ops: Exploiting Context Sensitive Decoding for Security and Energy Efficiency

Mobilizing the Micro-Ops: Exploiting Context Sensitive Decoding for Security and Energy Efficiency Mobilizing the Micro-Ops: Exploiting Context Sensitive Decoding for Security and Energy Efficiency Mohammadkazem Taram, Ashish Venkat, Dean Tullsen University of California, San Diego The Tension between

More information

Hakim Weatherspoon CS 3410 Computer Science Cornell University

Hakim Weatherspoon CS 3410 Computer Science Cornell University Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Deniz Altinbuken, Professors Weatherspoon, Bala, Bracy, and Sirer. C practice

More information

Nemesis: Preventing Web Authentication & Access Control Vulnerabilities. Michael Dalton, Christos Kozyrakis Stanford University

Nemesis: Preventing Web Authentication & Access Control Vulnerabilities. Michael Dalton, Christos Kozyrakis Stanford University Nemesis: Preventing Web Authentication & Access Control Vulnerabilities Michael Dalton, Christos Kozyrakis Stanford University Nickolai Zeldovich Massachusetts Institute of Technology Web Application Overview

More information

IsoStack Highly Efficient Network Processing on Dedicated Cores

IsoStack Highly Efficient Network Processing on Dedicated Cores IsoStack Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran, Muli Ben-Yehuda Outline Motivation IsoStack architecture Prototype TCP/IP over 10GE on a single

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

ELE 375 Final Exam Fall, 2000 Prof. Martonosi ELE 375 Final Exam Fall, 2000 Prof. Martonosi Question Score 1 /10 2 /20 3 /15 4 /15 5 /10 6 /20 7 /20 8 /25 9 /30 10 /30 11 /30 12 /15 13 /10 Total / 250 Please write your answers clearly in the space

More information

Flexible Architecture Research Machine (FARM)

Flexible Architecture Research Machine (FARM) Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Branch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines

Branch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines 6.823, L15--1 Branch Prediction & Speculative Execution Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L15--2 Branch Penalties in Modern Pipelines UltraSPARC-III

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Bloom Filtering Cache Misses for Accurate Data Speculation and Prefetching

Bloom Filtering Cache Misses for Accurate Data Speculation and Prefetching Bloom Filtering Cache Misses for Accurate Data Speculation and Prefetching Jih-Kwon Peir, Shih-Chang Lai, Shih-Lien Lu, Jared Stark, Konrad Lai peir@cise.ufl.edu Computer & Information Science and Engineering

More information

CISC 360. Virtual Memory Dec. 4, 2008

CISC 360. Virtual Memory Dec. 4, 2008 CISC 36 Virtual Dec. 4, 28 Topics Motivations for VM Address translation Accelerating translation with TLBs Motivations for Virtual Use Physical DRAM as a Cache for the Disk Address space of a process

More information

C 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA

C 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA CSE 490/590 Computer Architecture Complex Pipelining I Steve Ko Computer Sciences and Engineering University at Buffalo Last time Virtual address caches Virtually-indexed, physically-tagged cache design

More information

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipelining to Superscalar Forecast Real

More information

Operating System: Chap13 I/O Systems. National Tsing-Hua University 2016, Fall Semester

Operating System: Chap13 I/O Systems. National Tsing-Hua University 2016, Fall Semester Operating System: Chap13 I/O Systems National Tsing-Hua University 2016, Fall Semester Outline Overview I/O Hardware I/O Methods Kernel I/O Subsystem Performance Application Interface Operating System

More information

virtual memory Page 1 CSE 361S Disk Disk

virtual memory Page 1 CSE 361S Disk Disk CSE 36S Motivations for Use DRAM a for the Address space of a process can exceed physical memory size Sum of address spaces of multiple processes can exceed physical memory Simplify Management 2 Multiple

More information

Impact of Cache Coherence Protocols on the Processing of Network Traffic

Impact of Cache Coherence Protocols on the Processing of Network Traffic Impact of Cache Coherence Protocols on the Processing of Network Traffic Amit Kumar and Ram Huggahalli Communication Technology Lab Corporate Technology Group Intel Corporation 12/3/2007 Outline Background

More information

Securing Software Applications Using Dynamic Dataflow Analysis. OWASP June 16, The OWASP Foundation

Securing Software Applications Using Dynamic Dataflow Analysis. OWASP June 16, The OWASP Foundation Securing Software Applications Using Dynamic Dataflow Analysis Steve Cook OWASP June 16, 2010 0 Southwest Research Institute scook@swri.org (210) 522-6322 Copyright The OWASP Foundation Permission is granted

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 12 -- Virtual Memory 2014-2-27 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152 L12: Virtual

More information

V. Primary & Secondary Memory!

V. Primary & Secondary Memory! V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)

More information

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names Fast access ===> use map to find object HW == SW ===> map is in HW or SW or combo Extend range ===> longer, hierarchical names How is map embodied: --- L1? --- Memory? The Environment ---- Long Latency

More information

Language Security. Lecture 40

Language Security. Lecture 40 Language Security Lecture 40 (from notes by G. Necula) Prof. Hilfinger CS 164 Lecture 40 1 Lecture Outline Beyond compilers Looking at other issues in programming language design and tools C Arrays Exploiting

More information

Beyond Block I/O: Rethinking

Beyond Block I/O: Rethinking Beyond Block I/O: Rethinking Traditional Storage Primitives Xiangyong Ouyang *, David Nellans, Robert Wipfel, David idflynn, D. K. Panda * * The Ohio State University Fusion io Agenda Introduction and

More information

Lecture 18: Multithreading and Multicores

Lecture 18: Multithreading and Multicores S 09 L18-1 18-447 Lecture 18: Multithreading and Multicores James C. Hoe Dept of ECE, CMU April 1, 2009 Announcements: Handouts: Handout #13 Project 4 (On Blackboard) Design Challenges of Technology Scaling,

More information

Runtime Defenses against Memory Corruption

Runtime Defenses against Memory Corruption CS 380S Runtime Defenses against Memory Corruption Vitaly Shmatikov slide 1 Reading Assignment Cowan et al. Buffer overflows: Attacks and defenses for the vulnerability of the decade (DISCEX 2000). Avijit,

More information

Road Map. Road Map. Motivation (Cont.) Motivation. Intel IXA 2400 NP Architecture. Performance of Embedded System Application on Network Processor

Road Map. Road Map. Motivation (Cont.) Motivation. Intel IXA 2400 NP Architecture. Performance of Embedded System Application on Network Processor Performance of Embedded System Application on Network Processor 2006 Spring Directed Study Project Danhua Guo University of California, Riverside dguo@cs.ucr.edu 06-07 07-2006 Motivation NP Overview Programmability

More information

15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011

15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011 15-740/18-740 Computer Architecture Lecture 7: Pipelining Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011 Review of Last Lecture More ISA Tradeoffs Programmer vs. microarchitect Transactional

More information

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Unleashing D* on Android Kernel Drivers. Aravind Machiry

Unleashing D* on Android Kernel Drivers. Aravind Machiry Unleashing D* on Android Kernel Drivers Aravind Machiry (@machiry_msidc) $ whoami Fourth year P.h.D Student at University of California, Santa Barbara. Vulnerability Detection in System software. machiry.github.io

More information

Effective Memory Protection Using Dynamic Tainting

Effective Memory Protection Using Dynamic Tainting Effective Memory Protection Using Dynamic Tainting James Clause Alessandro Orso (software) and Ioanis Doudalis Milos Prvulovic (hardware) College of Computing Georgia Institute of Technology Supported

More information

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1) Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling

More information

15-740/ Computer Architecture Lecture 10: Runahead and MLP. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 10: Runahead and MLP. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 10: Runahead and MLP Prof. Onur Mutlu Carnegie Mellon University Last Time Issues in Out-of-order execution Buffer decoupling Register alias tables Physical

More information