Efficient Event Processing through Reconfigurable Hardware for Algorithmic Trading. University of Toronto

Size: px
Start display at page:

Download "Efficient Event Processing through Reconfigurable Hardware for Algorithmic Trading. University of Toronto"

Transcription

1 Efficient Event Processing through Reconfigurable Hardware for Algorithmic Trading Martin Labrecque Harsh Singh Warren Shum Hans-Arno Jacobsen University of Toronto

2 Algorithm Trading

3 Examples of Financial Strategies & Market Events Market Feed (event) [stock = ABX, TSE ask = 40.04, NYSE ask = 40.05] Investment Strategies (subscription) 1 Classical arbitrage strategy [stock = ABX, TSE ask NYSE ask, ACTION: BUY & SELL] 2 Classical short-sell strategy [stock = ABX, TSE ask 40.04, ACTION: SELL ] [stock = ABX, TSE ask 38.04, ACTION: BUY ]

4 Algorithm Trading Vision Algorithm Trading Key Observation 1 Every 1-millisecond reduction in response-time is estimated to generate the staggering amount of over 100 million a year 2 Millions of market events (and increasing) are expected per second Our Solution We propose a novel FPGA-based event processing platform to significantly speed up algorithm trading computations, namely, market event parsing and market event matching against strategies

5 Why FPGAs 1 Hardware reconfigurability: the ability to be re-configured on-demand into a highly parallel custom hardware circuit 2 Hardware parallelism: eliminating inter-processor signaling and message passing overhead at the program and OS level 3 High throughput packet processing: using multiple high bandwidth (giga-bit) I/O pins to eliminate the OS layer latency overhead in moving data between input and output ports

6 Propagation Data Structure Overview (SIGMOD 01) hash(?) S 1 S 2 hash(?) S 8 S 15 S 20 hash(?) S 5 S i AP(S i ) Hash(AP(S i )) S i hash(?) S 31 S 4 1 Strategies are distributed in disjoint clusters to enables highly parallelizable event matching through custom hardware units 2 In each cluster, strategies are stored as contiguous blocks of memory to enable fast sequential access to improve memory locality

7 Soft-Processor Approach 1 Simplest solution with virtually no deployment effort 2 Identical C program is compiled to execute on FPGA soft-processor

8 Hybrid Approach 1 4 Matching units (custom processors) are ran in parallel 2 Strategies are stored both in off-chip DDR2 and on-chip BRAM 3 DDR2 memory access are batched to reduce hand-shaking latency 4 BRAM memory is accessed during DDR2 hand-shaking phase 5 Scales in order hundred of thousands strategies

9 Hardware-only Approach 1 Each strategy encoded as a matching unit (a custom processor) 2 High rate of matching due to lack of memory access 3 High degree of parallelization, all strategies are executed in parallel 4 Resource exhaustive with respect to the number of strategies 5 Scales in order of thousands of strategies (latest FPGA chip)

10 Verilog Snippet case (CurrentState) IDLE: begin if (Go) NextState = SELECT_CLUSTER_ID; else NextState = IDLE; end SELECT_CLUSTER_ID: begin // Select a valid cluster index if (curcluster > LAST_CLUSTER) // Finished reading last cluster // When all clusters are invalid NextState = WAIT; else NextState = START_ADDRESS; end START_ADDRESS: begin if (curcluster > LAST_CLUSTER) NextState = WAIT; else if (can_take_more_requests) NextState = MEM_BURST_WAIT; else NextState = START_ADDRESS; end NEXT_ADDRESS: begin if (clusterendfound) NextState = SELECT_CLUSTER_ID; else if (can_take_more_requests) NextState = MEM_BURST_WAIT; else NextState = NEXT_ADDRESS; end default NextState = IDLE; endcase // Select cluster address // Finished reading last cluster // When all clusters are invalid // If 'can_take_more_requests' is low, wait // Check data from cluster terminator // Increment curaddr by 16 bytes // If 'can_take_more_requests' is low, wait

11 Verilog Compilation 1 Synthesis: checks syntax and analyzes the design to ensure that it is optimized for the architecture; outputs a design Netlist file 2 Design Translation: merges the input Netlists and design constraints; outputs a NGD file, describing the logical design reduced to gate primitives 3 Mapping: maps an NGD logic into FPGA; outputs a native circuit description (NCD) that represents the design mapped to the FPGA. 4 Place & Route: takes a NCD file and places and routes the design; outputs an NCD file for bitstream generation.

12 Evaluation Testbed 1 Throughput is the maximum sustainable input packet rate, determined through a bisection search, when no packet is dropped 2 Latency is the interval between the time a market event packet leaves the Event Monitor output queue to the time the action is received

13 Experimental Results End-to-end System Latency (µs) Workload PC Soft-Processor Hybrid Hardware-only K N/A 10K , N/A 100K 2, , , N/A System Throughput (market events/sec) Workload PC Soft-Processor Hybrid Hardware-only ,654 14, ,142 1,024,590 1K , ,500 N/A 10K ,779 N/A 100K N/A

14 Event Sender

15 Stock Event Sender

16 Workload Replayer

17 Event Packet Analyzer

18 Lessons Learned 1 Algorithmic trading trends Account for 70% of all trading in equities Cost millions per subsecond response delay 2 Expressive predicate language Support classical arbitrage strategy Support buy-and-hold strategy 3 Reconfigurable hardware (FPGA) Accelerate using custom logic circuit Utilize hardware parallelism 4 Line-rate algorithmic trading Eliminate OS layer latency Leverage on-board packet processing 5

19 Thank You,

Multi-Query Stream Processing on FPGAs. University of Toronto

Multi-Query Stream Processing on FPGAs. University of Toronto Multi-Query Stream Processing on FPGAs Mohammad Sadoghi Rija Javed Naif Tarafdar Harsh Singh Rohan Palaniappan Hans-Arno Jacobsen April 2012 Algorithmic Trading NASDAQ NYSE TSX AMGN=58 HON=24 Market ORCL=12

More information

Adaptive Parallel Compressed Event Matching

Adaptive Parallel Compressed Event Matching Adaptive Parallel Compressed Event Matching Mohammad Sadoghi 1,2 Hans-Arno Jacobsen 2 1 IBM T.J. Watson Research Center 2 Middleware Systems Research Group, University of Toronto April 2014 Mohammad Sadoghi

More information

FPGA Design Flow 1. All About FPGA

FPGA Design Flow 1. All About FPGA FPGA Design Flow 1 In this part of tutorial we are going to have a short intro on FPGA design flow. A simplified version of FPGA design flow is given in the flowing diagram. FPGA Design Flow 2 FPGA_Design_FLOW

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

Faster FAST Multicore Acceleration of

Faster FAST Multicore Acceleration of Faster FAST Multicore Acceleration of Streaming Financial Data Virat Agarwal, David A. Bader, Lin Dan, Lurng-Kuo Liu, Davide Pasetto, Michael Perrone, Fabrizio Petrini Financial Market Finance can be defined

More information

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010 Parallelizing FPGA Technology Mapping using GPUs Doris Chen Deshanand Singh Aug 31 st, 2010 Motivation: Compile Time In last 12 years: 110x increase in FPGA Logic, 23x increase in CPU speed, 4.8x gap Question:

More information

Digital Design with FPGAs. By Neeraj Kulkarni

Digital Design with FPGAs. By Neeraj Kulkarni Digital Design with FPGAs By Neeraj Kulkarni Some Basic Electronics Basic Elements: Gates: And, Or, Nor, Nand, Xor.. Memory elements: Flip Flops, Registers.. Techniques to design a circuit using basic

More information

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Politecnico di Milano & EPFL A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Vincenzo Rana, Ivan Beretta, Donatella Sciuto Donatella Sciuto sciuto@elet.polimi.it Introduction

More information

High-Speed NAND Flash

High-Speed NAND Flash High-Speed NAND Flash Design Considerations to Maximize Performance Presented by: Robert Pierce Sr. Director, NAND Flash Denali Software, Inc. History of NAND Bandwidth Trend MB/s 20 60 80 100 200 The

More information

HYRISE In-Memory Storage Engine

HYRISE In-Memory Storage Engine HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University

More information

LOW LATENCY DATA DISTRIBUTION IN CAPITAL MARKETS: GETTING IT RIGHT

LOW LATENCY DATA DISTRIBUTION IN CAPITAL MARKETS: GETTING IT RIGHT LOW LATENCY DATA DISTRIBUTION IN CAPITAL MARKETS: GETTING IT RIGHT PATRICK KUSTER Head of Business Development, Enterprise Capabilities, Thomson Reuters +358 (40) 840 7788; patrick.kuster@thomsonreuters.com

More information

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Stanley Bak Abstract Network algorithms are deployed on large networks, and proper algorithm evaluation is necessary to avoid

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, 1 Yong Wang, 1 Bo Yu, 1 Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

ECE 4514 Digital Design II. Spring Lecture 15: FSM-based Control

ECE 4514 Digital Design II. Spring Lecture 15: FSM-based Control ECE 4514 Digital Design II Lecture 15: FSM-based Control A Design Lecture Overview Finite State Machines Verilog Mapping: one, two, three always blocks State Encoding User-defined or tool-defined State

More information

Using FPGAs as Microservices

Using FPGAs as Microservices Using FPGAs as Microservices David Ojika, Ann Gordon-Ross, Herman Lam, Bhavesh Patel, Gaurav Kaul, Jayson Strayer (University of Florida, DELL EMC, Intel Corporation) The 9 th Workshop on Big Data Benchmarks,

More information

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests Mingxing Tan 1 2, Gai Liu 1, Ritchie Zhao 1, Steve Dai 1, Zhiru Zhang 1 1 Computer Systems Laboratory, Electrical and Computer

More information

What is Xilinx Design Language?

What is Xilinx Design Language? Bill Jason P. Tomas University of Nevada Las Vegas Dept. of Electrical and Computer Engineering What is Xilinx Design Language? XDL is a human readable ASCII format compatible with the more widely used

More information

BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High-dimensional Space. University of Toronto

BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High-dimensional Space. University of Toronto BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High-dimensional Space Mohammad Sadoghi Hans-Arno Jacobsen University of Toronto June 15, 2011 Mohammad Sadoghi (University of

More information

EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis

EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis Jan 31, 2012 John Wawrzynek Spring 2012 EECS150 - Lec05-verilog_synth Page 1 Outline Quick review of essentials of state elements Finite State

More information

SoftMC A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies

SoftMC A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies SoftMC A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies Hasan Hassan, Nandita Vijaykumar, Samira Khan, Saugata Ghose, Kevin Chang, Gennady Pekhimenko, Donghyuk

More information

LegUp: Accelerating Memcached on Cloud FPGAs

LegUp: Accelerating Memcached on Cloud FPGAs 0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are

More information

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP 1 M.DEIVAKANI, 2 D.SHANTHI 1 Associate Professor, Department of Electronics and Communication Engineering PSNA College

More information

ISE Design Suite Software Manuals and Help

ISE Design Suite Software Manuals and Help ISE Design Suite Software Manuals and Help These documents support the Xilinx ISE Design Suite. Click a document title on the left to view a document, or click a design step in the following figure to

More information

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery Java Message Service (JMS) is a standardized messaging interface that has become a pervasive part of the IT landscape

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

Implementing Ultra Low Latency Data Center Services with Programmable Logic

Implementing Ultra Low Latency Data Center Services with Programmable Logic Implementing Ultra Low Latency Data Center Services with Programmable Logic John W. Lockwood, CEO: Algo-Logic Systems, Inc. http://algo-logic.com Solutions@Algo-Logic.com (408) 707-3740 2255-D Martin Ave.,

More information

Chapter 9: Integration of Full ASIP and its FPGA Implementation

Chapter 9: Integration of Full ASIP and its FPGA Implementation Chapter 9: Integration of Full ASIP and its FPGA Implementation 9.1 Introduction A top-level module has been created for the ASIP in VHDL in which all the blocks have been instantiated at the Register

More information

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1 FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1 Anurag Dwivedi Digital Design : Bottom Up Approach Basic Block - Gates Digital Design : Bottom Up Approach Gates -> Flip Flops Digital

More information

and 32 bit for 32 bit. If you don t pay attention to this, there will be unexpected behavior in the ISE software and thing may not work properly!

and 32 bit for 32 bit. If you don t pay attention to this, there will be unexpected behavior in the ISE software and thing may not work properly! This tutorial will show you how to: Part I: Set up a new project in ISE 14.7 Part II: Implement a function using Schematics Part III: Simulate the schematic circuit using ISim Part IV: Constraint, Synthesize,

More information

Reconfigurable Acceleration of Fitness Evaluation in Trading Strategies

Reconfigurable Acceleration of Fitness Evaluation in Trading Strategies Reconfigurable Acceleration of Fitness Evaluation in Trading Strategies INGRID FUNIE, PAUL GRIGORAS, PAVEL BUROVSKIY, WAYNE LUK, MARK SALMON Department of Computing Imperial College London Published in

More information

Tutorial on Software-Hardware Codesign with CORDIC

Tutorial on Software-Hardware Codesign with CORDIC ECE5775 High-Level Digital Design Automation, Fall 2017 School of Electrical Computer Engineering, Cornell University Tutorial on Software-Hardware Codesign with CORDIC 1 Introduction So far in ECE5775

More information

Evaluation of the Chelsio T580-CR iscsi Offload adapter

Evaluation of the Chelsio T580-CR iscsi Offload adapter October 2016 Evaluation of the Chelsio T580-CR iscsi iscsi Offload makes a difference Executive Summary As application processing demands increase and the amount of data continues to grow, getting this

More information

Laboratory Exercise 7

Laboratory Exercise 7 Laboratory Exercise 7 Finite State Machines This is an exercise in using finite state machines. Part I We wish to implement a finite state machine (FSM) that recognizes two specific sequences of applied

More information

Tutorial: Working with Verilog and the Xilinx FPGA in ISE 9.2i

Tutorial: Working with Verilog and the Xilinx FPGA in ISE 9.2i Tutorial: Working with Verilog and the Xilinx FPGA in ISE 9.2i This tutorial will show you how to: Use Verilog to specify a design Simulate that Verilog design Define pin constraints for the FPGA (.ucf

More information

A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation

A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation Abstract: The power budget is expected to limit the portion of the chip that we can power ON at the upcoming technology nodes. This problem,

More information

Design principles in parser design

Design principles in parser design Design principles in parser design Glen Gibb Dept. of Electrical Engineering Advisor: Prof. Nick McKeown Header parsing? 2 Header parsing? Identify headers & extract fields A???? B???? C?? Field Field

More information

Nikhil Gupta. FPGA Challenge Takneek 2012

Nikhil Gupta. FPGA Challenge Takneek 2012 Nikhil Gupta FPGA Challenge Takneek 2012 RECAP FPGA Field Programmable Gate Array Matrix of logic gates Can be configured in any way by the user Codes for FPGA are executed in parallel Configured using

More information

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router Overview Implementing Gigabit Routers with NetFPGA Prof. Sasu Tarkoma The NetFPGA is a low-cost platform for teaching networking hardware and router design, and a tool for networking researchers. The NetFPGA

More information

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

High Frequency Trading Turns to High Frequency Technology to Reduce Latency

High Frequency Trading Turns to High Frequency Technology to Reduce Latency High Frequency Trading Turns to High Frequency Technology to Reduce Latency For financial companies engaged in high frequency trading, profitability depends on how quickly trades are executed. Now, new

More information

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Performance COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals What is Performance? How do we measure the performance of

More information

Hardware Acceleration for Database Systems using Content Addressable Memories

Hardware Acceleration for Database Systems using Content Addressable Memories Hardware Acceleration for Database Systems using Content Addressable Memories Nagender Bandi, Sam Schneider, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara Overview The Memory

More information

Cloud Bursting: Top Reasons Your Organization will Benefit. Scott Jeschonek Director of Cloud Products Avere Systems

Cloud Bursting: Top Reasons Your Organization will Benefit. Scott Jeschonek Director of Cloud Products Avere Systems Cloud Bursting: Top Reasons Your Organization will Benefit Scott Jeschonek Director of Cloud Products Avere Systems Agenda Define Cloud Bursting Benefits of using Cloud Bursting Identify Cloud Bursting

More information

Introduction to Verilog. Mitch Trope EECS 240 Spring 2004

Introduction to Verilog. Mitch Trope EECS 240 Spring 2004 Introduction to Verilog Mitch Trope mtrope@ittc.ku.edu EECS 240 Spring 2004 Overview What is Verilog? Verilog History Max+Plus II Schematic entry Verilog entry System Design Using Verilog: Sum of Products

More information

Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications

Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications K. Vaidyanathan, P. Lai, S. Narravula and D. K. Panda Network Based Computing Laboratory

More information

Designing Next Generation Data-Centers with Advanced Communication Protocols and Systems Services

Designing Next Generation Data-Centers with Advanced Communication Protocols and Systems Services Designing Next Generation Data-Centers with Advanced Communication Protocols and Systems Services P. Balaji, K. Vaidyanathan, S. Narravula, H. W. Jin and D. K. Panda Network Based Computing Laboratory

More information

Flexible Architecture Research Machine (FARM)

Flexible Architecture Research Machine (FARM) Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense

More information

Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search

Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search Jialiang Zhang, Soroosh Khoram and Jing Li 1 Outline Background Big graph analytics Hybrid

More information

Ultra-high-speed In-memory Data Management Software for High-speed Response and High Throughput

Ultra-high-speed In-memory Data Management Software for High-speed Response and High Throughput Ultra-high-speed In-memory Data Management Software for High-speed Response and High Throughput Yasuhiko Hashizume Kikuo Takasaki Takeshi Yamazaki Shouji Yamamoto The evolution of networks has created

More information

POLYMORPHIC ON-CHIP NETWORKS

POLYMORPHIC ON-CHIP NETWORKS POLYMORPHIC ON-CHIP NETWORKS Martha Mercaldi Kim, John D. Davis*, Mark Oskin, Todd Austin** University of Washington *Microsoft Research, Silicon Valley ** University of Michigan On-Chip Network Selection

More information

FPGA Augmented ASICs: The Time Has Come

FPGA Augmented ASICs: The Time Has Come FPGA Augmented ASICs: The Time Has Come David Riddoch Steve Pope Copyright 2012 Solarflare Communications, Inc. All Rights Reserved. Hardware acceleration is Niche (With the obvious exception of graphics

More information

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation report prepared under contract with Dot Hill August 2015 Executive Summary Solid state

More information

Persistent Memory. High Speed and Low Latency. White Paper M-WP006

Persistent Memory. High Speed and Low Latency. White Paper M-WP006 Persistent Memory High Speed and Low Latency White Paper M-WP6 Corporate Headquarters: 3987 Eureka Dr., Newark, CA 9456, USA Tel: (51) 623-1231 Fax: (51) 623-1434 E-mail: info@smartm.com Customer Service:

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

FPGA-Based Rapid Prototyping of Digital Signal Processing Systems

FPGA-Based Rapid Prototyping of Digital Signal Processing Systems FPGA-Based Rapid Prototyping of Digital Signal Processing Systems Kevin Banovic, Mohammed A. S. Khalid, and Esam Abdel-Raheem Presented By Kevin Banovic July 29, 2005 To be presented at the 48 th Midwest

More information

A Low Latency Solution Stack for High Frequency Trading. High-Frequency Trading. Solution. White Paper

A Low Latency Solution Stack for High Frequency Trading. High-Frequency Trading. Solution. White Paper A Low Latency Solution Stack for High Frequency Trading White Paper High-Frequency Trading High-frequency trading has gained a strong foothold in financial markets, driven by several factors including

More information

MODELING LANGUAGES AND ABSTRACT MODELS. Giovanni De Micheli Stanford University. Chapter 3 in book, please read it.

MODELING LANGUAGES AND ABSTRACT MODELS. Giovanni De Micheli Stanford University. Chapter 3 in book, please read it. MODELING LANGUAGES AND ABSTRACT MODELS Giovanni De Micheli Stanford University Chapter 3 in book, please read it. Outline Hardware modeling issues: Representations and models. Issues in hardware languages.

More information

A Fast Ethernet Tester Using FPGAs and Handel-C

A Fast Ethernet Tester Using FPGAs and Handel-C A Fast Ethernet Tester Using FPGAs and Handel-C R. Beuran, R.W. Dobinson, S. Haas, M.J. LeVine, J. Lokier, B. Martin, C. Meirosu Copyright 2000 OPNET Technologies, Inc. The Large Hadron Collider at CERN

More information

Exploration of Cache Coherent CPU- FPGA Heterogeneous System

Exploration of Cache Coherent CPU- FPGA Heterogeneous System Exploration of Cache Coherent CPU- FPGA Heterogeneous System Wei Zhang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology 1 Outline ointroduction to FPGA-based

More information

AN 831: Intel FPGA SDK for OpenCL

AN 831: Intel FPGA SDK for OpenCL AN 831: Intel FPGA SDK for OpenCL Host Pipelined Multithread Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1 Intel FPGA SDK for OpenCL Host Pipelined Multithread...3 1.1

More information

ECE1387 Exercise 3: Using the LegUp High-level Synthesis Framework

ECE1387 Exercise 3: Using the LegUp High-level Synthesis Framework ECE1387 Exercise 3: Using the LegUp High-level Synthesis Framework 1 Introduction and Motivation This lab will give you an overview of how to use the LegUp high-level synthesis framework. In LegUp, you

More information

PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database. Johnny Ho

PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database. Johnny Ho PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database Johnny Ho Supervisor: Guy Lemieux Date: September 11, 2009 University of British Columbia

More information

Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center

Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center Naif Tarafdar, Thomas Lin, Eric Fukuda, Hadi Bannazadeh, Alberto Leon-Garcia, Paul Chow University of Toronto 1 Cloudy with

More information

Ten Reasons to Optimize a Processor

Ten Reasons to Optimize a Processor By Neil Robinson SoC designs today require application-specific logic that meets exacting design requirements, yet is flexible enough to adjust to evolving industry standards. Optimizing your processor

More information

Windowing System on a 3D Pipeline. February 2005

Windowing System on a 3D Pipeline. February 2005 Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

High-Performance Holistic XML Twig Filtering Using GPUs. Ildar Absalyamov, Roger Moussalli, Walid Najjar and Vassilis Tsotras

High-Performance Holistic XML Twig Filtering Using GPUs. Ildar Absalyamov, Roger Moussalli, Walid Najjar and Vassilis Tsotras High-Performance Holistic XML Twig Filtering Using GPUs Ildar Absalyamov, Roger Moussalli, Walid Najjar and Vassilis Tsotras Outline! Motivation! XML filtering in the literature! Software approaches! Hardware

More information

Speaker: Kayting Adviser: Prof. An-Yeu Wu Date: 2009/11/23

Speaker: Kayting Adviser: Prof. An-Yeu Wu Date: 2009/11/23 98-1 Under-Graduate Project Synthesis of Combinational Logic Speaker: Kayting Adviser: Prof. An-Yeu Wu Date: 2009/11/23 What is synthesis? Outline Behavior Description for Synthesis Write Efficient HDL

More information

Performance and Overhead in a Hybrid Reconfigurable Computer

Performance and Overhead in a Hybrid Reconfigurable Computer Performance and Overhead in a Hybrid Reconfigurable Computer Osman Devrim Fidanci 1, Dan Poznanovic 2, Kris Gaj 3, Tarek El-Ghazawi 1, Nikitas Alexandridis 1 1 George Washington University, 2 SRC Computers

More information

Exadata Implementation Strategy

Exadata Implementation Strategy Exadata Implementation Strategy BY UMAIR MANSOOB 1 Who Am I Work as Senior Principle Engineer for an Oracle Partner Oracle Certified Administrator from Oracle 7 12c Exadata Certified Implementation Specialist

More information

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.

More information

Verilog Fundamentals. Shubham Singh. Junior Undergrad. Electrical Engineering

Verilog Fundamentals. Shubham Singh. Junior Undergrad. Electrical Engineering Verilog Fundamentals Shubham Singh Junior Undergrad. Electrical Engineering VERILOG FUNDAMENTALS HDLs HISTORY HOW FPGA & VERILOG ARE RELATED CODING IN VERILOG HDLs HISTORY HDL HARDWARE DESCRIPTION LANGUAGE

More information

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments 8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments QII51017-9.0.0 Introduction The Quartus II incremental compilation feature allows you to partition a design, compile partitions

More information

Programmable Logic Devices HDL-Based Design Flows CMPE 415

Programmable Logic Devices HDL-Based Design Flows CMPE 415 HDL-Based Design Flows: ASIC Toward the end of the 80s, it became difficult to use schematic-based ASIC flows to deal with the size and complexity of >5K or more gates. HDLs were introduced to deal with

More information

Secure Function Evaluation using an FPGA Overlay Architecture

Secure Function Evaluation using an FPGA Overlay Architecture Secure Function Evaluation using an FPGA Overlay Architecture Xin Fang Stratis Ioannidis Miriam Leeser Dept. of Electrical and Computer Engineering Northeastern University Boston, MA, USA FPGA 217 1 Introduction

More information

Top-Down Network Design

Top-Down Network Design Top-Down Network Design Chapter Two Analyzing Technical Goals and Tradeoffs Copyright 2010 Cisco Press & Priscilla Oppenheimer 1 Technical Goals Scalability Availability Performance Security Manageability

More information

Graduate Institute of Electronics Engineering, NTU. Lecturer: Chihhao Chao Date:

Graduate Institute of Electronics Engineering, NTU. Lecturer: Chihhao Chao Date: Design of Datapath Controllers and Sequential Logic Lecturer: Date: 2009.03.18 ACCESS IC LAB Sequential Circuit Model & Timing Parameters ACCESS IC LAB Combinational Logic Review Combinational logic circuits

More information

P4 Pub/Sub. Practical Publish-Subscribe in the Forwarding Plane

P4 Pub/Sub. Practical Publish-Subscribe in the Forwarding Plane P4 Pub/Sub Practical Publish-Subscribe in the Forwarding Plane Outline Address-oriented routing Publish/subscribe How to do pub/sub in the network Implementation status Outlook Subscribers Publish/Subscribe

More information

FPGA design with National Instuments

FPGA design with National Instuments FPGA design with National Instuments Rémi DA SILVA Systems Engineer - Embedded and Data Acquisition Systems - MED Region ni.com The NI Approach to Flexible Hardware Processor Real-time OS Application software

More information

CS 268: Computer Networking

CS 268: Computer Networking CS 268: Computer Networking L-6 Router Congestion Control TCP & Routers RED XCP Assigned reading [FJ93] Random Early Detection Gateways for Congestion Avoidance [KHR02] Congestion Control for High Bandwidth-Delay

More information

Creating Safe State Machines

Creating Safe State Machines Creating Safe State Machines Definition & Overview Finite state machines are widely used in digital circuit designs. Generally, when designing a state machine using a hardware description language (HDL),

More information

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema [1] Laila A, [2] Ajeesh R V [1] PG Student [VLSI & ES] [2] Assistant professor, Department of ECE, TKM Institute of Technology, Kollam

More information

CS 856 Latency in Communication Systems

CS 856 Latency in Communication Systems CS 856 Latency in Communication Systems Winter 2010 Latency Challenges CS 856, Winter 2010, Latency Challenges 1 Overview Sources of Latency low-level mechanisms services Application Requirements Latency

More information

Mobile Memory Forum 2011

Mobile Memory Forum 2011 SSD for Mobile Jonathan Hubert Director, Strategic Marketing Micron Technology Mobile Memory Forum 2011 The Third Wave of Compute Platforms Billions of users Broadband Mobile Mobile Compute Platforms Cloud-Based

More information

Architecting Low Latency Cloud Networks

Architecting Low Latency Cloud Networks Architecting Low Latency Cloud Networks As data centers transition to next generation virtualized & elastic cloud architectures, high performance and resilient cloud networking has become a requirement

More information

SUPERNA RPO REPORTING AND BROCADE IP EXTENSION WITH ISILON SYNCIQ

SUPERNA RPO REPORTING AND BROCADE IP EXTENSION WITH ISILON SYNCIQ SUPERNA RPO REPORTING AND BROCADE IP EXTENSION WITH ISILON SYNCIQ Reduce risk and data loss exposure with the Eyeglass RPO Reporting and Brocade 7840 IP Extension solution for Isilon SyncIQ SOLUTION ESSENTIALS

More information

SECURE PARTIAL RECONFIGURATION OF FPGAs. Amir S. Zeineddini Kris Gaj

SECURE PARTIAL RECONFIGURATION OF FPGAs. Amir S. Zeineddini Kris Gaj SECURE PARTIAL RECONFIGURATION OF FPGAs Amir S. Zeineddini Kris Gaj Outline FPGAs Security Our scheme Implementation approach Experimental results Conclusions FPGAs SECURITY SRAM FPGA Security Designer/Vendor

More information

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael

More information

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner RiceNIC Prototyping Network Interfaces Jeffrey Shafer Scott Rixner RiceNIC Overview Gigabit Ethernet Network Interface Card RiceNIC - Prototyping Network Interfaces 2 RiceNIC Overview Reconfigurable and

More information

A Hardware Structure for FAST Protocol Decoding Adapting to 40Gbps Bandwidth Lei-Lei YU 1,a, Yu-Zhuo FU 2,b,* and Ting LIU 3,c

A Hardware Structure for FAST Protocol Decoding Adapting to 40Gbps Bandwidth Lei-Lei YU 1,a, Yu-Zhuo FU 2,b,* and Ting LIU 3,c 2017 3rd International Conference on Computer Science and Mechanical Automation (CSMA 2017) ISBN: 978-1-60595-506-3 A Hardware Structure for FAST Protocol Decoding Adapting to 40Gbps Bandwidth Lei-Lei

More information

OCP Engineering Workshop - Telco

OCP Engineering Workshop - Telco OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,

More information

Using FPGAs in Supercomputing Reconfigurable Supercomputing

Using FPGAs in Supercomputing Reconfigurable Supercomputing Using FPGAs in Supercomputing Reconfigurable Supercomputing Why FPGAs? FPGAs are 10 100x faster than a modern Itanium or Opteron Performance gap is likely to grow further in the future Several major vendors

More information

Lab 1: Using the LegUp High-level Synthesis Framework

Lab 1: Using the LegUp High-level Synthesis Framework Lab 1: Using the LegUp High-level Synthesis Framework 1 Introduction and Motivation This lab will give you an overview of how to use the LegUp high-level synthesis framework. In LegUp, you can compile

More information

High-Throughput Publish/Subscribe in the Forwarding Plane

High-Throughput Publish/Subscribe in the Forwarding Plane 1 High-Throughput Publish/Subscribe in the Forwarding Plane Theo Jepsen, Masoud Moushref, Antonio Carzaniga, Nate Foster, Xiaozhou Li, Milad Sharif, Robert Soulé Università della Svizzera italiana (USI)

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

Performance in the Multicore Era

Performance in the Multicore Era Performance in the Multicore Era Gustavo Alonso Systems Group -- ETH Zurich, Switzerland Systems Group Enterprise Computing Center Performance in the multicore era 2 BACKGROUND - SWISSBOX SwissBox: An

More information

Network Support for Multimedia

Network Support for Multimedia Network Support for Multimedia Daniel Zappala CS 460 Computer Networking Brigham Young University Network Support for Multimedia 2/33 make the best of best effort use application-level techniques use CDNs

More information