Scrypt ASIC Prototyping Preliminary Design Document

Similar documents
CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

Optimizing latency in Xilinx FPGA Implementations of the GBT. Jean-Pierre CACHEMICHE

Xynergy It really makes the difference!

Advanced FPGA Design Methodologies with Xilinx Vivado

Development an update. Aeroflex Gaisler

A Framework for Rule Processing in Reconfigurable Network Systems

Prototyping NGC. First Light. PICNIC Array Image of ESO Messenger Front Page

AN HI-3200 Avionics Data Management Engine Evaluation Board Software Guide

Fast Scalable FPGA-Based Network-on-Chip Simulation Models

Block Diagram. mast_sel. mast_inst. mast_data. mast_val mast_rdy. clk. slv_sel. slv_inst. slv_data. slv_val slv_rdy. rfifo_depth_log2.

SHA3 Core Specification. Author: Homer Hsing

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

easic Technology & Nextreme Architecture

Simple IP-based utca Control System

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

NetFPGA Hardware Architecture

Bruno Pujos. January 14, 2015

Field Programmable Gate Array (FPGA) Devices

Spiral 3-1. Hardware/Software Interfacing

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Learning Outcomes. Spiral 3 1. Digital Design Targets ASICS & FPGAS REVIEW. Hardware/Software Interfacing

Memory Expansion. Lecture Embedded Systems

CAN / RS485. Product Description. Technical Reference Note. Interface Adapter. Special Features

INTRODUCTION TO FPGA ARCHITECTURE

Hardware Design Guidelines for POWERLINK SLAVE on FPGA

ECE 699: Lecture 9. Programmable Logic Memories

EE 459/500 HDL Based Digital Design with Programmable Logic

FPGA VHDL Design Flow AES128 Implementation

Building blocks for custom HyperTransport solutions

Virtex-5 GTP Aurora v2.8

Digital Integrated Circuits

KCU GBASE-KR Ethernet TRD User Guide

Configurable Embedded Systems: Using Programmable Logic to Compress Embedded System Design Cycles

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

eip-24/100 Embedded TCP/IP 10/100-BaseT Network Module Features Description Applications

8. Migrating Stratix II Device Resources to HardCopy II Devices

ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University

An Efficient Implementation of LZW Compression in the FPGA

Interlaken IP datasheet

Field Program mable Gate Arrays

Outline of Presentation Field Programmable Gate Arrays (FPGAs(

SOCS BASED OPENRISC AND MICROBLAZE SOFT PROCESSORS COMPARISON STUDY CASES: AUDIO IMPLEMENTATION AND NETWORK IMPLEMENTATION BASED SOCS

High-Tech-Marketing. Selecting an FPGA. By Paul Dillien

Preliminary SMT128. User Manual

Early Models in Silicon with SystemC synthesis

UDP1G-IP Introduction (Xilinx( Agenda

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

ETH. Ethernet MAC with Timestamp Extension. TCD30xx User Guide. Revision July 17, 2015

The CompSOC Design Flow for Virtual Execution Platforms

AN4872 Application note

STAR BEMC TDC VME Interface & Programming Reference Preliminary 5/21/01

Half Fast Bitcoin Miner Design Peter Xu (px2117) Patrick Taylor (pat2138) Ben Nappier (ben2113) Matheus Candido (mcc2224)

Security IP-Cores. AES Encryption & decryption RSA Public Key Crypto System H-MAC SHA1 Authentication & Hashing. l e a d i n g t h e w a y

Melon S3 FPGA Development Board Product Datasheet

scrypt: A new key derivation function

UDP1G-IP reference design manual

Digital Circuits Part 2 - Communication

DRF1605H Zigbee Module 1.6km Transfer CC2530 Wireless Module UART to Zigbee

Memory-Mapped SHA-1 Coprocessor

Intellectual Property Macrocell for. SpaceWire Interface. Compliant with AMBA-APB Bus

Embedded Systems: Hardware Components (part I) Todor Stefanov

FPGA RAM (C1) Young Won Lim 5/13/16

DCB1M - Transceiver for Powerline Communication

Use of Embedded FPGA Resources in Implementations of Five Round Three SHA-3 Candidates

INTEGRATION AND IMPLIMENTATION SYSTEM-ON-A- PROGRAMMABLE-CHIP (SOPC) IN FPGA

ECE 545: Lecture 11. Programmable Logic Memories

ECE 545: Lecture 11. Programmable Logic Memories. Recommended reading. Memory Types. Memory Types. Memory Types specific to Xilinx FPGAs

OCB3 Block Specification

Am186ER/Am188ER AMD continues 16-bit innovation

SMCSlite and DS-Link Macrocell Development

LogiCORE IP Serial RapidIO v5.6

Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays

fleximac A MICROSEQUENCER FOR FLEXIBLE PROTOCOL PROCESSING

LEON4: Fourth Generation of the LEON Processor

TOE10G-IP with CPU reference design

Interfacing a High Speed Crypto Accelerator to an Embedded CPU

SpaceWire IP for Actel Radiation Tolerant FPGAs

Microcontroller basics

Memory Map for the MCU320 board:

Virtex-6 FPGA Embedded Tri-Mode Ethernet MAC Wrapper v1.4

Copyright 2017 Xilinx.

Table 1: Cross Reference of Applicable Products

Advanced FPGA Design Methodologies with Xilinx Vivado

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

27 March 2018 Mikael Arguedas and Morgan Quigley

C O M P A N Y O V E R V I E W

The Lekha 3GPP LTE FEC IP Core meets 3GPP LTE specification 3GPP TS V Release 10[1].

ECE 353 Lab 4. General MIDI Explorer. Professor Daniel Holcomb Fall 2015

Fast implementations of secret-key block ciphers using mixed inner- and outer-round pipelining

Lesson 5 Arduino Prototype Development Platforms. Chapter-8 L05: "Internet of Things ", Raj Kamal, Publs.: McGraw-Hill Education

WiMOD LR Base Host Controller Interface

SRL0 Serial Port Unit

A ONE CHIP HARDENED SOLUTION FOR HIGH SPEED SPACEWIRE SYSTEM IMPLEMENTATIONS

Hardware Design of Safety Arming Mechanism using FPGA

User Manual for FC100

TPMC Channel Serial Interface RS232/RS422. Version 1.0. User Manual. Issue August 2014

ML410 BSB DDR2 Design Creation Using 8.2i SP1 EDK Base System Builder (BSB) April

TSEA44 - Design for FPGAs

SIMULATION DIAGRAM SHOWING ZERO LATENCY ON RECEIVE

On Optimized FPGA Implementations of the SHA-3 Candidate Grøstl

Transcription:

Scrypt ASIC Prototyping Preliminary Design Document 1/13

Revision History Version Date Author Remarks Approved by v0.1 2/13

Contents 1 Scrypt Algorithm... 5 2 Major blocks in a Scrypt core... 6 3 Internal architecture in an FPGA/ASIC... 7 4 External communication with multiple FPGA/ASIC... 8 5 Proposed internal architecture for ASIC... 9 6 Scrypt Hash Computation Complexity... 11 7 Hash rate experimented using Xilinx Virtex 6 FPGA... 12 8 Target Hash rate in ASIC... 13 3/13

List of Figures Figure 1 Scrypt Algorithm...5 Figure 2 Scrypt Core Blocks...6 Figure 3 Internal Architecture of Scrypt Cores...7 Figure 4 External Communication with multiple FPGA/ASIC...8 Figure 5 Proposed Internal Architecture for ASIC...9 Figure 6 Proposed Packet Format... 10 List of Tables No table of figures entries found. 4/13

1 Scrypt Algorithm The following flow chart illustrates the Scrypt algorithm used in the Litecoin module. The input to the algorithm is a 84 byte block data. The algorithm generates a 128 bytes hash, of which MSB 4 bytes is compared with the input target. If the generated hash is less than the target, the module returns the nonce for which the successful hash was generated. Input block (80 + 4) bytes Data (76 bytes) + Nonce (4 bytes) Target (4 bytes) PBKDF2 HMAC SHA256 Increment Nonce 128 bytes Hash SALSA20 using 8 Iterations 128 bytes Hash PBKDF2 HMAC SHA256 32 bytes Hash Hash less than Target? No Yes Return current Nonce Figure 1 Scrypt Algorithm 5/13

2 Major blocks in a Scrypt core The two major blocks in the Scrypt core is shown in the following block diagram. The blocks are similarly placed in the hardware architecture as well. The Scrypt cores require 84 bytes block as input and returns the matched nonce on successful computation of hash that is less than the target. Data (76 bytes) Nonce (4 bytes) PBKDF2 Engine 128 bytes SALSA20 Engine Target (4 bytes) ROM Mix Matched Nonce (4 bytes) SHA256 Transform 128 bytes BLOCK Mix Scrypt core Figure 2 Scrypt Core Blocks 6/13

3 Internal architecture in an FPGA/ASIC The basic ports in a FPGA/ASIC based Litecoin mining unit are clock, reset, serial data input/output and status LEDs. The input block data for mining is received by the FPGA/ASIC via serial communication (UART/SPI) from a microcontroller unit. A deserializer unit extracts the serial data into Blocks, Nonce and Target, and provides it to all the Scrypt cores. Each of the Scrypt core is given a core identification number to divide the nonce uniformly among the cores. After computation of one scrypt hash, each of the cores increments the nonce by itself and re-computes the hash using the new nonce. When the scrypt hash generated in any of the core is less than the target, the current nonce value from that particular core is sent to the microcontroller via serial communication. The block diagram illustrating the internal/external communication of an FPGA/ASIC based Litecoin mining module as follows: Serial Rx Clock Reset UART Receiver Block Data & Nonce Deserializer Unit Scrypt Core 1 Scrypt Core 2 Scrypt Core N Matching Nonce Selection & Hash rate Computation Unit UART Transmitter Status LEDs Serial Tx Figure 3 Internal Architecture of Scrypt Cores 7/13

4 External communication with multiple FPGA/ASIC An FPGA/ASIC may consist of several Scrypt cores. Each of the FPGA/ASIC can be connected to an 8/16/32 bit MCU for distribution of header block to mine on single or multiple cores. A simple block diagram using such an interface is shown in the following figure. FPGA / ASIC #1 FPGA / ASIC #2 USB 8/16/32 Bit MCU FPGA / ASIC #3 ` FPGA / ASIC #N Figure 4 External Communication with multiple FPGA/ASIC 8/13

5 Proposed internal architecture for ASIC In the ASIC implementation of Litecoin module, a total of 128 cores are available for mining. 8 groups are created with 16 cores in each group. The architecture will have the flexibility in communicating each of the cores individually or group wise. This feature will be made available by using an Address decoder/command Parser unit. A Status Register will be used to keep track of the Nonce match/distribution status of the cores. The mining module will communicate with the external devices (computer or microcontroller) via serial communication engine (UART/SPI). A block diagram of the proposed architecture is shown in the following figure. Figure 5 Proposed Internal Architecture for ASIC The proposed packet format for communicating the mining module is shown below. This packet will be passed through the Address Decoder/Command Parser unit for configuring the Scrypt cores for mining. 9/13

Figure 6 Proposed Packet Format Group Addr: Core Addr: Command: Data: For group wise addressing of Scrypt cores Eg: 0x00 to 0x07 for unicast and 0xFF for broadcast For addressing individual Scrypt cores Eg: 0x00 to 0x0F for unicast and 0xFF for broadcast A command for execution in the module Eg: Reset, Disable, Start mining, Update with new data/nonce, Read matched nonce, Specify nonce range, etc. Applicable data depending on the command used 10/13

6 Scrypt Hash Computation Complexity Scrypt blocks Clock cycles per block Total Cycles PBKDF2 (A) SHA256 Transform 258 (B) Serial data In and Out 32 Total = (A + B) 18 (258 + 32) 18 5,220 SALSA20/8 (C) BLOCK Mix (Mem. WR) 2 (32+2) = 68 (D) ROM Mix (Mem. WR) 1024 C = 69,632 (E) BLOCK Mix (Mem. RD) (2 (32+2))+4 = 72 (F) BLOCK Mix (Mem. RD) 1024 E = 73,728 (G) Serial data In and Out 32 2 = 64 Total = D + F + G 69632 + 73728 + 64 143,424 Serial data pre/post processing 678 2 1,356 Per Scrypt hash computation 150,000 11/13

7 Hash rate experimented using Xilinx Virtex 6 FPGA Device: Xilinx Virtex 6 LX240T (ML605) FPGA resources Available 1 core 4 cores Slice LUTs 150,720 13,268 55,123 Slice Registers 301,440 10,955 45,179 Slices 37,680 (12%) 4,670 (49%) 18,628 BRAM (18 Kbit) 832 57 228 Total Memory (M bits) - Full RAM 14.6 1 4 Total I/O (using UART) 600 8 8 Max. clock frequency (MHz): - 200 200 Hash Rate (KH/s) - 1.33 5.33 Note : There are more cores possible in LX240T, however idea is to show case linearity achieved in actual as in theory 12/13

8 Target Hash rate in ASIC Technology: 28nm/45 nm 8 cores 64 cores 128 cores 512 cores Virtex 6 LX240T single 8 16 64 ASIC gates (Million) 1.7 14 28 112 Total memory (Mbits) 8 64 128 512 Hash rate at 200 MHz 11 KH/s 85 KH/s 170 KH/s 680 KH/s Hash rate at 400 MHz 22 KH/s 170 KH/s 340 KH/s 1.36 MH/s Hash rate at 600 MHz 33 KH/s 255 KH/s 510 KH/s 2 MH/s Note : Above figures are estimated based on our experiments done on LX240T present on ML605 13/13