Are Your Passwords Safe: Energy-Efficient Bcrypt Cracking with Low-Cost Parallel Hardware

Size: px
Start display at page:

Download "Are Your Passwords Safe: Energy-Efficient Bcrypt Cracking with Low-Cost Parallel Hardware"

Transcription

1 Are Your Passwords Safe: Energy-Efficient Bcrypt Cracking with Low-Cost Parallel Hardware Katja Malvoni (kmalvoni at openwall.com) Solar Designer (solar at openwall.com) Josip Knezovic (josip.knezovic at fer.hr) KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

2 Motivation Bcrypt is: Slow Sequential Designed to be resistant to brute force attacks and to remain secure despite hardware improvements You could almost think why even bother optimizing KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

3 But KatjaAre Malvoni, Solar Designer, Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware Your Passwords Safe: Energy-Efficient August 19, / 21

4 Outline 1 Bcrypt 2 Implementations Parallella/Epiphany ZedBoard and ZC706 3 Experimental Results 4 Future work 5 Takeaways KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

5 Bcrypt Bcrypt Based on Blowfish block cipher Expensive key setup User defined cost setting Pseudorandom memory accesses Blowfish. Photo source: KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

6 Implementations Parallella/Epiphany Architecture Epiphany 16/64 32-bit RISC cores operating at up to 1 GHz/800 MHz Energy-efficient - 2 W maximum chip power consumption 32 KB of local memory per core FPU can be switched to integer mode KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

7 Implementations Parallella/Epiphany Implementation Epiphany John the Ripper prepares data on ARM cores Bcrypt hashes computed on Epiphany Optimized in assembly Each Epiphany core computes two bcrypt hashes Computation overlapped to exploit dual-issue architecture Integer ALU FPU in integer mode KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

8 Implementations ZedBoard and ZC706 Architecture Zynq 7020 and Zynq 7045 Heterogeneous device Dual ARM Cortex-A9 MPCore Advanced low power 28nm programmable logic Zynq times bigger than Zynq 7020 AXI buses used for CPU-FPGA communication KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

9 Implementations ZedBoard and ZC706 Implementation Zynq 7020 and Zynq 7045 John the Ripper prepares data on ARM cores Bcrypt instances compute hash(es) KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

10 Implementations ZedBoard and ZC706 Implementation Number of concurrent instances Number of concurrent instances limited by available BRAM Overlapping multiple bcrypt instances in one module 56, 70 or 112 instances on Zynq 7020 with 140 BRAMs or many more on Zynq 7045 with 545 BRAMs Large communication overhead for low bcrypt cost setting Hardware defects of boards limit optimizations KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

11 Experimental Results Epiphany vs x86 6,000 Performance (c/s) Energy-efficiency (c/s/w) 5,000 4,812 4,876 4,000 3,000 2,000 2,400 1,000 1,207 1, Epiphany 16 T7200 Epiphany 64 i7 2600K KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

12 Experimental Results Performance and efficiency comparison 25,000 Performance (c/s) Energy-efficiency (c/s/w) 20,000 20,583 15,000 10,000 5, ,044 4,571 4,812 3,522 4,116 4,556 2,285 2,400 1, ,246 6,596 5, Epiphany 16 Zynq 7020 Epiphany 64 Zynq 7020 Zynq 7045 HD 7970 FX 8120 Xeon Phi 5110P i7 4770K (emulated with 7045) KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

13 Experimental Results Cost comparison System price (c/s/$) Chip or device price (c/s/$) $99 $75 $199 - $395 $119 $2495 $ $549 $505 $205 - $2649 $650 $350 Epiphany 16 Epiphany 64 Zynq 7020 Zynq 7045 HD 7970 FX 8120 Xeon Phi 5110P i7 4770K KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

14 Experimental Results Derived performance from cost 12 35,000 Performance for cost 12 (c/s) Theoretical performance for cost 5 (c/s) Measured performance for cost 5 (c/s) 30,000 28,462 25,000 20,000 20,538 15,000 10,000 5, ,112 6,313 6,246 6,753 6,596 5,347 4,571 4,556 5,408 4,490 1,207 1, Epiphany 16 Zynq 7020 Zynq 7045 HD 7970 FX 8120 Xeon Phi 5110P i7 4770K KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

15 Experimental Results Theoretical Peak Performance Analysis Theory c/s = N ports f (2 cost ) N reads 16 N ports - number of available read ports to local memory or L1 cache N reads - number of reads per Blowfish round 2 cost number of Blowfish block encryptions in bcrypt hash computation f (in Hz) - clock rate EksBlowfish Ekspensive key schedule Blowfish bcrypt(cost, salt, pwd) Bcrypt (1) 1: state InitState() 2: state ExpandKey(state, salt, key) 3: repeat(2 cost ) 4: state ExpandKey(state, 0, salt) 5: state ExpandKey(state, 0, key) 6: ctext OrpheanBeholderScryDoubt 7: repeat(64) 8: ctext EncryptECB(state, ctext) 9: return Concatenate(cost, salt, ctext) Katja Malvoni and Solar Designer Energy-efficient bcrypt cracking atjaare Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

16 Experimental Results Theoretical Peak Performance Analysis Comparison 35,000 30,000 Theoretical estimate for cost 5 (c/s) Measured performance for cost 5 (c/s) 29,179 25,000 20,000 17,989 20,538 15,000 10,000 5, ,494 11,093 7,496 4,497 4,571 4,812 4,876 6,595 1,207 Epiphany 16 Zynq 7020 Epiphany 64 Zynq 7045 i7 2600K i7 4770K KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

17 Experimental Results Related work F.Wiemer, R. Zimmermann. Speed and Area-Optimized Password Search of bcrypt on FPGAs bcrypt running on ZedBoard at 80 MHz 40 parallel instances 5208 c/s at cost 5, 41.6 c/s at cost 12 Yuri Gonzaga, Google Summer of Code 2011 KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

18 Future work Future work Parallella/Epiphany Using both Epiphany and Zynq 7020 at once Possible to integrate up to 64 chips on a single board Scalability of current implementation is promising 64 * 64 = 4096 cores with theoretical performance of c/s FPGA Zynq 7020 and 7045 optimizations Improve clock rate Reduce communication overhead Targeting bigger FPGAs Targeting multi-fpga boards KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

19 Takeaways Takeaways Many-core low power RISC platforms and FPGAs are capable of exploiting bcrypt peculiarities to achieve comparable performance and higher energy-efficiency Higher energy-efficiency enables higher density More chips per board, more boards per system It doesn t take ASICs to improve bcrypt cracking energy-efficiency by a factor of 45+ Although ASICs would do better yet KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

20 Thanks Thanks Sayantan Datta Steve Thomas Parallella project Google Summer of Code Xilinx Faculty of Electrical Engineering and Computing, University of Zagreb KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

21 Questions Questions? KatjaAre Malvoni, Your Passwords Solar Designer, Safe: Energy-Efficient Josip Knezovic Bcrypt Cracking with Low-Cost Parallel Hardware August 19, / 21

Survey of Codebreaking Machines. Swathi Guruduth Vivekanand Kamanuri Harshad Patil

Survey of Codebreaking Machines. Swathi Guruduth Vivekanand Kamanuri Harshad Patil Survey of Codebreaking Machines Swathi Guruduth Vivekanand Kamanuri Harshad Patil Contents Introduction Motivation Goal Machines considered Comparison based on technology used Brief description of machines

More information

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013 A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

scrypt: A new key derivation function

scrypt: A new key derivation function Doing our best to thwart TLAs armed with ASICs Colin Percival Tarsnap cperciva@tarsnap.com May 9, 2009 Making bcrypt obsolete Colin Percival Tarsnap cperciva@tarsnap.com May 9, 2009 Are you sure your SSH

More information

Chapter 18 - Multicore Computers

Chapter 18 - Multicore Computers Chapter 18 - Multicore Computers Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 18 - Multicore Computers 1 / 28 Table of Contents I 1 2 Where to focus your study Luis Tarrataca

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 7 September 2018 Announcements Reminder: The class notes are posted to the website. HW#1 will

More information

Zynq-7000 All Programmable SoC Product Overview

Zynq-7000 All Programmable SoC Product Overview Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform

More information

ESE532: System-on-a-Chip Architecture. Today. Programmable SoC. Message. Process. Reminder

ESE532: System-on-a-Chip Architecture. Today. Programmable SoC. Message. Process. Reminder ESE532: System-on-a-Chip Architecture Day 5: September 18, 2017 Dataflow Process Model Today Dataflow Process Model Motivation Issues Abstraction Basic Approach Dataflow variants Motivations/demands for

More information

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3

More information

HEAD HardwarE Accelerated Deduplication

HEAD HardwarE Accelerated Deduplication HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee Executive Summary A-Z development of deduplication SW version

More information

Cryptographic Hash Functions. Secure Software Systems

Cryptographic Hash Functions. Secure Software Systems 1 Cryptographic Hash Functions 2 Cryptographic Hash Functions Input: Message of arbitrary size Output: Digest (hashed output) of fixed size Loreum ipsum Hash Function 23sdfw83x8mjyacd6 (message of arbitrary

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS

FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS Mike Ashworth, Graham Riley, Andrew Attwood and John Mawer Advanced Processor Technologies Group School

More information

COPACOBANA: RECONFIGURABLE COMPUTING IN CRYPTANALYSIS. Ben Johnstone

COPACOBANA: RECONFIGURABLE COMPUTING IN CRYPTANALYSIS. Ben Johnstone COPACOBANA: RECONFIGURABLE COMPUTING IN CRYPTANALYSIS Ben Johnstone Overview Goals Architecture DES Performance Conclusion What is COPACOBANA? Cost Optimized Parallel Code Breaker History Developed at

More information

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei

More information

Side-Channel Countermeasures for Hardware: is There a Light at the End of the Tunnel?

Side-Channel Countermeasures for Hardware: is There a Light at the End of the Tunnel? Side-Channel Countermeasures for Hardware: is There a Light at the End of the Tunnel? 11. Sep 2013 Ruhr University Bochum Outline Power Analysis Attack Masking Problems in hardware Possible approaches

More information

Lecture 41: Introduction to Reconfigurable Computing

Lecture 41: Introduction to Reconfigurable Computing inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following

More information

Zynq Ultrascale+ Architecture

Zynq Ultrascale+ Architecture Zynq Ultrascale+ Architecture Stephanie Soldavini and Andrew Ramsey CMPE-550 Dec 2017 Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 1 / 17 Agenda Heterogeneous Computing Zynq Ultrascale+

More information

A Prototype Storage Subsystem based on PCM

A Prototype Storage Subsystem based on PCM PSS A Prototype Storage Subsystem based on IBM Research Zurich Ioannis Koltsidas, Roman Pletka, Peter Mueller, Thomas Weigold, Evangelos Eleftheriou University of Patras Maria Varsamou, Athina Ntalla,

More information

Ten (or so) Small Computers

Ten (or so) Small Computers Ten (or so) Small Computers by Jon "maddog" Hall Executive Director Linux International and President, Project Cauã 1 of 50 Who Am I? Half Electrical Engineer, Half Business, Half Computer Software In

More information

Introduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013

Introduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013 Introduction to FPGA Design with Vivado High-Level Synthesis Notice of Disclaimer The information disclosed to you hereunder (the Materials ) is provided solely for the selection and use of Xilinx products.

More information

Effective Password Hashing

Effective Password Hashing Effective Password Hashing November 18th, 2015 Colin Keigher colin@keigher.ca ~ @afreak ~ https://afreak.ca ~ https://canary.pw Who am I? I am a Senior Security Analyst at a large Canadian company Actively

More information

There s STILL plenty of room at the bottom! Andreas Olofsson

There s STILL plenty of room at the bottom! Andreas Olofsson There s STILL plenty of room at the bottom! Andreas Olofsson 1 Richard Feynman s Lecture (1959) There's Plenty of Room at the Bottom An Invitation to Enter a New Field of Physics Why cannot we write the

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

Parallelizing Cryptography. Gordon Werner Samantha Kenyon

Parallelizing Cryptography. Gordon Werner Samantha Kenyon Parallelizing Cryptography Gordon Werner Samantha Kenyon Outline Security requirements Cryptographic Primitives Block Cipher Parallelization of current Standards AES RSA Elliptic Curve Cryptographic Attacks

More information

Designing for Performance. Patrick Happ Raul Feitosa

Designing for Performance. Patrick Happ Raul Feitosa Designing for Performance Patrick Happ Raul Feitosa Objective In this section we examine the most common approach to assessing processor and computer system performance W. Stallings Designing for Performance

More information

CS 31: Intro to Systems Threading & Parallel Applications. Kevin Webb Swarthmore College November 27, 2018

CS 31: Intro to Systems Threading & Parallel Applications. Kevin Webb Swarthmore College November 27, 2018 CS 31: Intro to Systems Threading & Parallel Applications Kevin Webb Swarthmore College November 27, 2018 Reading Quiz Making Programs Run Faster We all like how fast computers are In the old days (1980

More information

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

SoC Platforms and CPU Cores

SoC Platforms and CPU Cores SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

Lecture 4: RISC Computers

Lecture 4: RISC Computers Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) represents an important

More information

Lecture 4: RISC Computers

Lecture 4: RISC Computers Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation

More information

Multicore Hardware and Parallelism

Multicore Hardware and Parallelism Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3

More information

Architecture without explicit locks for logic simulation on SIMD machines

Architecture without explicit locks for logic simulation on SIMD machines Architecture without explicit locks for logic on machines M. Chimeh Department of Computer Science University of Glasgow UKMAC, 2016 Contents 1 2 3 4 5 6 The Using models to replicate the behaviour of

More information

Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL

Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL Donggyu Kim, Adam Izraelevitz, Christopher Celio, Hokeun Kim, Brian Zimmer, Yunsup Lee, Jonathan Bachrach, Krste Asanović

More information

Chosen-IV Correlation Power Analysis on KCipher-2 and a Countermeasure

Chosen-IV Correlation Power Analysis on KCipher-2 and a Countermeasure Fourth International Workshop on Constructive Side-Channel Analysis and Secure Design (COSADE 2013) Chosen-IV Correlation Power Analysis on KCipher-2 and a Countermeasure Takafumi Hibiki*, Naofumi Homma*,

More information

FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS

FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS Mike Ashworth, Graham Riley, Andrew Attwood and John Mawer Advanced Processor Technologies Group School

More information

PERFORMANCE MEASUREMENT

PERFORMANCE MEASUREMENT Administrivia CMSC 411 Computer Systems Architecture Lecture 3 Performance Measurement and Reliability Homework problems for Unit 1 posted today due next Thursday, 2/12 Start reading Appendix C Basic Pipelining

More information

Design Choices for FPGA-based SoCs When Adding a SATA Storage }

Design Choices for FPGA-based SoCs When Adding a SATA Storage } U4 U7 U7 Q D U5 Q D Design Choices for FPGA-based SoCs When Adding a SATA Storage } Lorenz Kolb & Endric Schubert, Missing Link Electronics Rudolf Usselmann, ASICS World Services Motivation for SATA Storage

More information

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous

More information

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory Embedded Systems 8. Hardware Components Lothar Thiele Computer Engineering and Networks Laboratory Do you Remember? 8 2 8 3 High Level Physical View 8 4 High Level Physical View 8 5 Implementation Alternatives

More information

Renesas Synergy MCUs Build a Foundation for Groundbreaking Integrated Embedded Platform Development

Renesas Synergy MCUs Build a Foundation for Groundbreaking Integrated Embedded Platform Development Renesas Synergy MCUs Build a Foundation for Groundbreaking Integrated Embedded Platform Development New Family of Microcontrollers Combine Scalability and Power Efficiency with Extensive Peripheral Capabilities

More information

Exploring OpenCL Memory Throughput on the Zynq

Exploring OpenCL Memory Throughput on the Zynq Exploring OpenCL Memory Throughput on the Zynq Technical Report no. 2016:04, ISSN 1652-926X Chalmers University of Technology Bo Joel Svensson bo.joel.svensson@gmail.com Abstract The Zynq platform combines

More information

CS 101, Mock Computer Architecture

CS 101, Mock Computer Architecture CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

FPGAhammer: Remote Voltage Fault Attacks on Shared FPGAs, suitable for DFA on AES

FPGAhammer: Remote Voltage Fault Attacks on Shared FPGAs, suitable for DFA on AES , suitable for DFA on AES Jonas Krautter, Dennis R.E. Gnad, Mehdi B. Tahoori 10.09.2018 INSTITUTE OF COMPUTER ENGINEERING CHAIR OF DEPENDABLE NANO COMPUTING KIT Die Forschungsuniversität in der Helmholtz-Gemeinschaft

More information

Building and Using the ATLAS Transactional Memory System

Building and Using the ATLAS Transactional Memory System Building and Using the ATLAS Transactional Memory System Njuguna Njoroge, Sewook Wee, Jared Casper, Justin Burdick, Yuriy Teslyar, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory Stanford

More information

Probabilistic password generators

Probabilistic password generators Probabilistic password generators (and fancy curves) Simon Marechal (bartavelle at openwall.com) http://www.openwall.com @Openwall December 212 Warning! I am no mathematician Conclusions might be erroneous

More information

A Software Development Toolset for Multi-Core Processors. Yuichi Nakamura System IP Core Research Labs. NEC Corp.

A Software Development Toolset for Multi-Core Processors. Yuichi Nakamura System IP Core Research Labs. NEC Corp. A Software Development Toolset for Multi-Core Processors Yuichi Nakamura System IP Core Research Labs. NEC Corp. Motivations Embedded Systems: Performance enhancement by multi-core systems CPU0 CPU1 Multi-core

More information

金融商品取引アルゴリズムのハードウェアアクセラレ ーションに関する研究 [ 課題研究報告書 ] Description Supervisor: 田中清史, 情報科学研究科, 修士

金融商品取引アルゴリズムのハードウェアアクセラレ ーションに関する研究 [ 課題研究報告書 ]   Description Supervisor: 田中清史, 情報科学研究科, 修士 JAIST Reposi https://dspace.j Title 金融商品取引アルゴリズムのハードウェアアクセラレ ーションに関する研究 [ 課題研究報告書 ] Author(s) 小林, 弘幸 Citation Issue Date 2018-03 Type Thesis or Dissertation Text version author URL http://hdl.handle.net/10119/15214

More information

SysAlloc: A Hardware Manager for Dynamic Memory Allocation in Heterogeneous Systems

SysAlloc: A Hardware Manager for Dynamic Memory Allocation in Heterogeneous Systems SysAlloc: A Hardware Manager for Dynamic Memory Allocation in Heterogeneous Systems Zeping Xue, David Thomas zeping.xue10@imperial.ac.uk FPL 2015, London 4 Sept 2015 1 Single Processor System Allocator

More information

STORING DATA: DISK AND FILES

STORING DATA: DISK AND FILES STORING DATA: DISK AND FILES CS 564- Spring 2018 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? How does a DBMS store data? disk, SSD, main memory The Buffer manager controls how

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

EECS150 - Digital Design Lecture 09 - Parallelism

EECS150 - Digital Design Lecture 09 - Parallelism EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization

More information

Introduction to Microprocessor

Introduction to Microprocessor Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device

More information

Porting BLIS to new architectures Early experiences

Porting BLIS to new architectures Early experiences 1st BLIS Retreat. Austin (Texas) Early experiences Universidad Complutense de Madrid (Spain) September 5, 2013 BLIS design principles BLIS = Programmability + Performance + Portability Share experiences

More information

CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces

CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces Zvonimir Z. Bandic, Sr. Director Robert Golla, Sr. Fellow Dejan Vucinic,

More information

Yarn password hashing function

Yarn password hashing function Yarn password hashing function Evgeny Kapun abacabadabacaba@gmail.com 1 Introduction I propose a password hashing function Yarn. This is a memory-hard function, however, a prime objective of this function

More information

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD Leveraging OpenSPARC ESA Round Table 2006 on Next Generation Microprocessors for Space Applications G.Furano, L.Messina TEC- OpenSPARC T1 The T1 is a new-from-the-ground-up SPARC microprocessor implementation

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

Optimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor

Optimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor Optimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor Intel K. K. E-mail: hirokazu.kobayashi@intel.com Yoshifumi Nakamura RIKEN AICS E-mail: nakamura@riken.jp Shinji Takeda

More information

Linux Storage System Bottleneck Exploration

Linux Storage System Bottleneck Exploration Linux Storage System Bottleneck Exploration Bean Huo / Zoltan Szubbocsev Beanhuo@micron.com / zszubbocsev@micron.com 215 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications

More information

Hiding Higher-Order Leakages in Hardware

Hiding Higher-Order Leakages in Hardware Hiding Higher-Order Leakages in Hardware 21. May 2015 Ruhr-Universität Bochum Acknowledgement Pascal Sasdrich Tobias Schneider Alexander Wild 2 Story? Threshold Implementation should be explained? 1 st

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

Protecting Embedded Systems from Zero-Day Attacks

Protecting Embedded Systems from Zero-Day Attacks Protecting Embedded Systems from Zero-Day Attacks Professor Stephen Taylor Thayer School of Engineering at Dartmouth stnh.email@icloud.com (603) 727-8945 MicroArx.com Apiotics.com 1 Research Support Current

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

john-devkit: specialized compiler for hash cracking

john-devkit: specialized compiler for hash cracking john-devkit: specialized compiler for hash cracking Aleksey Cherepanov lyosha@openwall.com May 26, 2015 General john-devkit is an experiment not yet embraced by John the Ripper developer community is a

More information

An FPGA Architecture Supporting Dynamically-Controlled Power Gating

An FPGA Architecture Supporting Dynamically-Controlled Power Gating An FPGA Architecture Supporting Dynamically-Controlled Power Gating Altera Corporation March 16 th, 2012 Assem Bsoul and Steve Wilton {absoul, stevew}@ece.ubc.ca System-on-Chip Research Group Department

More information

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT Manycore and GPU Channelisers Seth Hall High Performance Computing Lab, AUT GPU Accelerated Computing GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate

More information

ARM Cortex-A9 ARM v7-a. A programmer s perspective Part1

ARM Cortex-A9 ARM v7-a. A programmer s perspective Part1 ARM Cortex-A9 ARM v7-a A programmer s perspective Part1 ARM: Advanced RISC Machine First appeared in 1985 as Acorn RISC Machine from Acorn Computers in Manchester England Limited success outcompeted by

More information

Scaling the Peak: Maximizing floating point performance on the Epiphany NoC

Scaling the Peak: Maximizing floating point performance on the Epiphany NoC Scaling the Peak: Maximizing floating point performance on the Epiphany NoC Anish Varghese, Gaurav Mitra, Robert Edwards and Alistair Rendell Research School of Computer Science The Australian National

More information

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089

More information

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer

More information

Developing a Prototyping Board for Emerging Memory

Developing a Prototyping Board for Emerging Memory Developing a Prototyping Board for Emerging Memory 2013. 10. 25 Sungjoo Yoo Embedded System Architecture Lab. POSTECH Introduction scaling problem [ITRS, 2012] Year 2012 2013 2014 2015 2016 2017 2018 2019

More information

An Alternative to GPU Acceleration For Mobile Platforms

An Alternative to GPU Acceleration For Mobile Platforms Inventing the Future of Computing An Alternative to GPU Acceleration For Mobile Platforms Andreas Olofsson andreas@adapteva.com 50 th DAC June 5th, Austin, TX Adapteva Achieves 3 World Firsts 1. First

More information

Software Defined Radio (SDR) for Parallel SDR intro in a simple way Satellite Reception in Mobile/Deployable Ground Segments

Software Defined Radio (SDR) for Parallel SDR intro in a simple way Satellite Reception in Mobile/Deployable Ground Segments Software Defined Radio (SDR) for Parallel SDR intro in a simple way Satellite Reception in Mobile/Deployable Ground Segments Small Satellite Conference, Logan, Utah 11/08/2015 1 2 Dr Mark Bowyer & 1 Dr

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved

More information

The Many Dimensions of SDR Hardware

The Many Dimensions of SDR Hardware The Many Dimensions of SDR Hardware Plotting a Course for the Hardware Behind the Software Sept 2017 John Orlando Epiq Solutions LO RFIC Epiq Solutions in a Nutshell Schaumburg, IL EST 2009 N. Virginia

More information

Computer System Architecture

Computer System Architecture CSC 203 1.5 Computer System Architecture Budditha Hettige Department of Statistics and Computer Science University of Sri Jayewardenepura Microprocessors 2011 Budditha Hettige 2 Processor Instructions

More information

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation

More information

Data Encryption Standard

Data Encryption Standard ECE 646 Lecture 7 Data Encryption Standard Required Reading W. Stallings, "Cryptography and Network-Security," 5th Edition, Chapter 3: Block Ciphers and the Data Encryption Standard Chapter 6.1: Multiple

More information

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Christos Kozyrakis Stanford University David Patterson U.C. Berkeley http://csl.stanford.edu/~christos Motivation Ideal processor

More information

Security Applications

Security Applications 1. Introduction Security Applications Abhyudaya Chodisetti Paul Wang Lee Garrett Smith Cryptography applications generally involve a large amount of processing. Thus, there is the possibility that these

More information

Knowledge Organiser. Computing. Year 10 Term 1 Hardware

Knowledge Organiser. Computing. Year 10 Term 1 Hardware Organiser Computing Year 10 Term 1 Hardware Enquiry Question How does a computer do everything it does? Big questions that will help you answer this enquiry question: 1. What is the purpose of the CPU?

More information

Versal: The New Xilinx Adaptive Compute Acceleration Platform (ACAP) in 7nm

Versal: The New Xilinx Adaptive Compute Acceleration Platform (ACAP) in 7nm Engineering Director, Xilinx Silicon Architecture Group Versal: The New Xilinx Adaptive Compute Acceleration Platform (ACAP) in 7nm Presented By Kees Vissers Fellow February 25, FPGA 2019 Technology scaling

More information

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 8: RISC & Parallel Computers. Parallel computers Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer

More information

FACTFILE: GCE DIGITAL TECHNOLOGY

FACTFILE: GCE DIGITAL TECHNOLOGY FACTFILE: GCE DIGITAL TECHNOLOGY AS2: FUNDAMENTALS OF DIGITAL TECHNOLOGY Hardware and Software Architecture 1 Learning Outcomes Students should be able to: describe the internal components of a computer

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Authenticated Storage Using Small Trusted Hardware Hsin-Jung Yang, Victor Costan, Nickolai Zeldovich, and Srini Devadas

Authenticated Storage Using Small Trusted Hardware Hsin-Jung Yang, Victor Costan, Nickolai Zeldovich, and Srini Devadas Authenticated Storage Using Small Trusted Hardware Hsin-Jung Yang, Victor Costan, Nickolai Zeldovich, and Srini Devadas Massachusetts Institute of Technology November 8th, CCSW 2013 Cloud Storage Model

More information

Sri Vidya College of Engineering and Technology. EC6703 Embedded and Real Time Systems Unit IV Page 1.

Sri Vidya College of Engineering and Technology. EC6703 Embedded and Real Time Systems Unit IV Page 1. Sri Vidya College of Engineering and Technology ERTS Course Material EC6703 Embedded and Real Time Systems Page 1 Sri Vidya College of Engineering and Technology ERTS Course Material EC6703 Embedded and

More information

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information

More information

Copyright 2016 Xilinx

Copyright 2016 Xilinx Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building

More information

Parallella: A $99 Open Hardware Parallel Computing Platform

Parallella: A $99 Open Hardware Parallel Computing Platform Inventing the Future of Computing Parallella: A $99 Open Hardware Parallel Computing Platform Andreas Olofsson andreas@adapteva.com IPDPS May 22th, Cambridge, MA Adapteva Achieves 3 World Firsts 1. First

More information

Chapter 1: Introduction to Parallel Computing

Chapter 1: Introduction to Parallel Computing Parallel and Distributed Computing Chapter 1: Introduction to Parallel Computing Jun Zhang Laboratory for High Performance Computing & Computer Simulation Department of Computer Science University of Kentucky

More information

FPGA system development What you need to think about. Frédéric Leens, CEO

FPGA system development What you need to think about. Frédéric Leens, CEO FPGA system development What you need to think about Frédéric Leens, CEO About Byte Paradigm 2005 : Founded by 3 ASIC-SoC-FPGA engineers as a Design Center for high-end FPGA and board design. 2007 : GP

More information

Computer System Components

Computer System Components Computer System Components CPU Core 1 GHz - 3.2 GHz 4-way Superscaler RISC or RISC-core (x86): Deep Instruction Pipelines Dynamic scheduling Multiple FP, integer FUs Dynamic branch prediction Hardware

More information

Argon2 for password hashing and cryptocurrencies

Argon2 for password hashing and cryptocurrencies Argon2 for password hashing and cryptocurrencies Alex Biryukov, Daniel Dinu, Dmitry Khovratovich University of Luxembourg 2nd April 2016 Motivation Password-based authentication Keyless password authentication:

More information