Computer Systems Colloquium (EE380) Wednesday, 4:15-5:30PM 5:30PM in Gates B01

Similar documents
An Online EHW Pattern Recognition System Applied to Face Image Recognition

Online Evolution for a High-Speed Image Recognition System Implemented On a Virtex-II Pro FPGA

On-Chip Evolution Using a Soft Processor Core Applied to Image Recognition

HARDWARE ACCELERATOR OF CARTESIAN GENETIC PROGRAMMING WITH MULTIPLE FITNESS UNITS

Evolutionary On-line Synthesis of Hardware Accelerators for Software Modules in Reconfigurable Embedded Systems

hardware classifier of sonar signals on a 28 nm process technology FPGA, and compare

Kyrre Glette INF3490 Evolvable Hardware Cartesian Genetic Programming

Design Space Exploration Using Parameterized Cores

Embedded Systems: Hardware Components (part I) Todor Stefanov

Speeding up Hardware Evolution: A Coprocessor for Evolutionary Algorithms

Hardware Implementation of TRaX Architecture

PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

Coevolutionary Cartesian Genetic Programming in FPGA

SECURE PARTIAL RECONFIGURATION OF FPGAs. Amir S. Zeineddini Kris Gaj

EITF35: Introduction to Structured VLSI Design

Towards Evolvable Systems Based on the Xilinx Zynq Platform

CHAPTER 5. CHE BASED SoPC FOR EVOLVABLE HARDWARE

Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?)

High-speed FPGA-based Implementations of a Genetic Algorithm

Scalable Evolvable Hardware Applied to Road Image Recognition

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

Introduction to reconfigurable systems

A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management

Building and Using the ATLAS Transactional Memory System

An Evolvable Network Controller Model for Reconfigurable Network Devices

EMBEDDED SOPC DESIGN WITH NIOS II PROCESSOR AND VHDL EXAMPLES

Hardware Software Co-design and SoC. Neeraj Goel IIT Delhi

FPGA: What? Why? Marco D. Santambrogio

Genetic Algorithms for Vision and Pattern Recognition

Genetic Programming. and its use for learning Concepts in Description Logics

Virtual Reconfigurable Circuits for Real-World Applications of Evolvable Hardware

Fast dynamic and partial reconfiguration Data Path

Design of a Network Camera with an FPGA

Outline Introduction System development Video capture Image processing Results Application Conclusion Bibliography

EE 7722 GPU Microarchitecture. Offered by: Prerequisites By Topic: Text EE 7722 GPU Microarchitecture. URL:

A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system

Graduate Institute of Electronics Engineering, NTU FPGA Design with Xilinx ISE

Compute Node Design for DAQ and Trigger Subsystem in Giessen. Justus Liebig University in Giessen

PowerPC on NetFPGA CSE 237B. Erik Rubow

CNP: An FPGA-based Processor for Convolutional Networks

ATLAS: A Chip-Multiprocessor. with Transactional Memory Support

ECE 448 Lecture 5. FPGA Devices

ReconOS: An RTOS Supporting Hardware and Software Threads

Reminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline

Intel Research mote. Ralph Kling Intel Corporation Research Santa Clara, CA

Digital Systems Design. System on a Programmable Chip

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Introduction to Field Programmable Gate Arrays

Running vxworksrtos on the. Mechatronics Laboratory

Advanced course on Embedded Systems design using FPGA

A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models

An Adaptive Hardware Classifier in FPGA based-on a Cellular Compact Genetic Algorithm and Block-based Neural Network

Developing a Data Driven System for Computational Neuroscience

GENETIC ALGORITHM with Hands-On exercise

Memory Systems IRAM. Principle of IRAM

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

ECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design

ReconOS: Multithreaded Programming and Execution Models for Reconfigurable Hardware

A Novel Deadlock Avoidance Algorithm and Its Hardware Implementation

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego

White Paper. The advantages of using a combination of DSP s and FPGA s. Version: 1.0. Author: Louis N. Bélanger. Date: May, 2004.

Midterm Exam. Solutions

Chapter 2 Logic Gates and Introduction to Computer Architecture

An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs

Cost-and Power Optimized FPGA based System Integration: Methodologies and Integration of a Lo

Machine Evolution. Machine Evolution. Let s look at. Machine Evolution. Machine Evolution. Machine Evolution. Machine Evolution

Simplify System Complexity

Reminder. Course project team forming deadline. Course project ideas. Next milestone

Advanced Digital Design Using FPGA. Dr. Shahrokh Abadi

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

Outline of Presentation Field Programmable Gate Arrays (FPGAs(

Design of a Gigabit Distributed Data Multiplexer and Recorder System

System-on Solution from Altera and Xilinx

Design of Embedded DSP Processors Unit 5: Data access. 9/11/2017 Unit 5 of TSEA H1 1

Power Panel Integrated control, operation and visualization

A VARIETY OF ICS ARE POSSIBLE DESIGNING FPGAS & ASICS. APPLICATIONS MAY USE STANDARD ICs or FPGAs/ASICs FAB FOUNDRIES COST BILLIONS

The WINLAB Cognitive Radio Platform

A Framework for Rule Processing in Reconfigurable Network Systems

Lecture 6: Genetic Algorithm. An Introduction to Meta-Heuristics, Produced by Qiangfu Zhao (Since 2012), All rights reserved

Applied Cloning Techniques for a Genetic Algorithm Used in Evolvable Hardware Design

Employing Multi-FPGA Debug Techniques

AN AREA-EFFICIENT ALTERNATIVE TO ADAPTIVE MEDIAN FILTERING IN FPGAS. Zdenek Vasicek and Lukas Sekanina

Genetic Algorithms. Kang Zheng Karl Schober

Program-Driven Fine-Grained Power Management for the Reconfigurable Mesh

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

TKT-2431 SoC design. Introduction to exercises. SoC design / September 10

ISSN: [Keswani* et al., 7(1): January, 2018] Impact Factor: 4.116

Introduction to Partial Reconfiguration Methodology

Simplify System Complexity

Using Dynamic Voltage Scaling to Reduce the Configuration Energy of Run Time Reconfigurable Devices

Pharmacy college.. Assist.Prof. Dr. Abdullah A. Abdullah

Zynq-7000 All Programmable SoC Product Overview

Lecture 7: Introduction to Co-synthesis Algorithms

CS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure

Digital Integrated Circuits

The Binary Genetic Algorithm. Universidad de los Andes-CODENSA

Computer Architecture. R. Poss

Transcription:

Adapting Systems by Evolving Hardware Computer Systems Colloquium (EE380) Wednesday, 4:15-5:30PM 5:30PM in Gates B01 Jim Torresen Group Department of Informatics University of Oslo, Norway E-mail: jimtoer@ifi.uio.no http://www.ifi.uio.no/~jimtoer University of Oslo Norway 1

Robotics and Intelligent Systems Group Applications Make everyday living easier and safer Electronics Robotics Biology Apply principles from nature Robotics and intelligent systems Work presented is a collaboration with PhD- student Kyrre Glette 2

Work presented is a collaboration with PhD- student Kyrre Glette Picture from media Outline Background Introduction to evolvable hardware and reconfigurable logic Evolving hardware for signal and image classification Results Conclusions 3

Principles of the Nature Evolution: Biological systems develop and change during generations. Development: By cell division a multicellular organism is developed. Learning: Individuals undergo learning through their lifetime. Evolution Biological evolution: Lifeforms adapt to a particular environment over successive generations. Combinations of traits that are better adapted tend to increase representation in population. Mechanisms: Selection+Crossover, Mutation and Survival of the fittest. Evolutionary Computing (EC): Mimic the biological evolution to optimize solutions to a wide variety of complex problems. In every new generation, a new set of solutions is created using bits and pieces of the fittest of the old. Several algorithms are available including Genetic Algorithms (GA) and Genetic Programming (GP) 4

How do we evolve a circuit? Input/Output Specification Genetic Algorithm Two main directions: Possibilities of Evolvable Hardware (EHW) Tuning/optimizing parameters of a circuit or system (also called evolutionary circuit design). Evolving a system from scratch. New features provided: Run-time adaptive hardware Self repairing hardware Goal: Make better computing systems than traditional architectures for real-world applications. 5

EHW Applied to Real-World Applications Analog Adaptive Equalizer Amplifier and Filter Design Analog Circuit Synthesis Parameter tuning Clock Timing Adjustment Analog Filter Tuning Robot Control Image processing Image Compression Image Filtering Image Recognition Classification/Recognition Sonar Classification Gene Finding Prosthetic Hand Evolving a Circuit Define the basic building blocks: Digital (gates or higher level functions) Analog (transistor/resistor/l/c) Represent each building block in the chromosome by its function and its connections to other building blocks in the circuit. 6

Evolving a Circuit Initialize a population of circuits Evaluate the circuits Sort circuits based on their fitness Make new circuits by combining parts from the highest ranked circuits Is the best circuit acceptable? No Yes Evolutionary Operators: Genetic Algorithms (GA) Circuit x Circuit y 00000000000000 11111111111111 00000001111111 11111110000000 Crossover Mutation 00100001111111 11111010000000 7

Hillclimbing Problem in Search Fitness A measure of how well adapted an individual is. The value determines the probability of being selected for reproduction. The way the fitness function is constructed is critical for the GA performance. 8

Computing Circuit Fitness Decode Execute/ Simulate Fitness= ΣΣ(match of output o for pattern p) p o Fitness Computation OFFLINE (extrinsic) GA+Fitness computation Best Circuit 9

Fitness Computation ONLINE (intrinsic) GA Configuration Fitness Evolution GA Off-chip evolution CPU GA GA Our approach On-chip evolution Complete HW evolution 10

FPGA (Field( Programmable Gate Array) Primary inputs Configuration of FPGA (Field( Programmable Gate Array) X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X (a) Unconfigured Uninitialized SRAM cells Primary outputs Configuration data stream Primary inputs SRAM cells loaded with 0s and 1s 0 0 1 1 0 1 1 0 1 0 0 1 1 0 1 1 0 1 0 0 1 0 1 1 1 1 0 1 0 0 1 0 0 0 0 1 0 1 1 0 0 1 1 0 1 0 0 0 0 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 1 0 1 (b) Configured Primary outputs The Design Warrior s Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com) 11

FPGA for Data Processing Traditional system FPGA based system RAM Processor FPGA RAM FPGA Configuration FPGA Processor Performance Comparison (Nallatech) Microprocessor (P4) FPGA (2VP100) Clock speed 3.8 GHz 180 MHz Internal Memory Bandwidth # Floating Point Units Power Consumption Peak Performance Sustained Performance I/O / External Memory Bandwidth 122 GBytes per Sec 2 > 100W 7.6 GFLOPS 0.76 GFLOPS 8.5 GBytes/sec 7.5 TBytes per Sec 146 < 10W 26 GFLOPS 13 GFLOPS 67 GBytes/sec 12

Functions Well Suited to FPGA Acceleration Searching Sorting Signal processing Audio/video/image manipulation Encryption Error correction Coding/decoding Network packet processing Data analysis (oil, gas, finance) Processor versus FPGA Processor Program memory External program memory SRAM program memory Program loaded at startup Complete program in internal or external memory No swapping to other programs Processor technology ~1985 FPGA technology 2008! 13

Virtual Reconfigurable Circuit System-On On-Chip Evolution on Field Programmable Gate Array (FPGA) EHW GA Xilinx Virtex-II Pro FPGA A hard (PowerPC) or soft (MicroBlaze) processor core is used The processor core runs the evolutionary algorithm Evaluation of individuals is performed on the EHW module in the same chip 14

Prototyping Board: XUP Virtex-II Pro XC2VP30 Virtex-II Pro FPGA 13969 Slices, 2448 Kb BRAM 2 PPC405 cores DDR SDRAM slot CompactFlash slot VGA, AC97, PS2, RS232,... System-On On-Chip Local Memory Bus Fast access to program memory On-chip Peripheral Bus Used to connect the peripherals in the system 15

Evolving Systems Chromosome string representation Small and simple problem 10010110101 Small system Large and complex problem 100101101...10110001001 Large system Dividing the application Idea: Evolve a system gradually (divide-and-conquer the application). Inputs Outputs Partitioned training vectors x 3 x 2 x 1 y 3 y 2 y 1 0 0 0 0 0 1 0 0 1 1 1 0 0 1 0 1 1 1 Partitioned training set 0 1 1 0 1 0 1 0 0 1 1 1 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 0 1 0 Benefits: Simpler and smaller search space for the evolution. 16

Earlier Architecture: Prosthetic Hand 1 Step 1 evolution 6 Step 2 evolution Classification Applications Face Image Recognition Sonar Return Classification Prosthetic Hand Sensor Signal Classification 17

System Overview FPGA PowerPC (hard core) or MicroBlaze (soft core) Classification Module Category Detection Module 18

CDM Functional Unit f Description Function 0 Greater than O = 1 if I > C, else 0 1 Less than or equal O = 1 if I < C, else 0 19

Application to be presented AT&T Database of Faces (formerly The ORL Database of faces) Classify images of 40 different persons 10 different images of each face Input Reduction 92x112 8-bit pixels resampled to 8x8 8-bit pixels Input pattern: 512 bits 20

Example: One FU Row Evolution determines: Selection of pixels Expressions < 70 >150 <100 >170 AND To counter FU HW Implementation 21

Evolution and Fitness Function FU: Pixel address (6 bit) Function (1 bit) Constant (8 bit) FU row: FU 1 (15b) FU 2 (15b)... FU N (15b) One FU row can be evolved at a time Fitness function emphasizes positive matches of the category being evolved at the moment. Training Set Category 1 Fitness Computation A correct match (1) counts A times more than a correct match (0) for other categories Category i Category K CDM i 22

FEM Can be duplicated for parallel evaluation CDM Flexible N Flexible M 23

Results: Architecture Parameters More FU rows within a CDM gives a higher chance of correct classification Each FU row evolved with random initial values gives different rules The more FU rows in a CDM, the higher the output resolution Too few or too many FUs in a row make it more difficult to evolve a classifier Results: Classification Accuracy Using 6 FUs per row and 10 rows per CDM Using x training vectors per category and the rest (10-x) as test vectors 96.25% accuracy using 9 training vectors per category Better than previous offline EHW architecture Competitive to Eigenfaces or Fisherfaces, SVM (Support Vector Machine) performs better 24

HW Implementation Implemented on a Xilinx Virtex-II Pro XC2VP30 Evaluation module: 371 slices (2%) Parallel evaluation (8 individuals): 1393 slices (10%) Estimated < 1 µs classification time per pattern (@100 MHz), using time multiplexing Evolution Speed 8 parallel evaluations Remaining fitness comp. time GA without fitness evaluation Fitness evaluation alone is 2.12 times faster in HW GA is not fully optimized for the PowerPC 25

Input (Sonar) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 sonar spectrum 200 180 160 140 120 100 80 60 40 20 0 0 10 20 30 40 50 60 60 8-bit samples metal rock Original data in float values, converted to 8 bit integers 60 samples, 8 bits per sample => 480 bits per input pattern Each pattern is presented to all FUs max. detector Classification Accuracy (Sonar) FU rows (M) 20 58 Training set (avg.) 97.3% 99.6% Test set (avg.) 87.8% 91.4% Using 6 FUs per row, average over 10 runs 104 training vectors, 104 test vectors (total) Better than previous offline EHW architecture Better than the original ANN implementation (90.4% avg.) SVM performs better (95.2%) 26

HW Implementation (Sonar) Implemented on a Xilinx Virtex-II Pro XC2VP30 (XUP2VP) Classification module: FU rows Slices % of FPGA 20 58 3866 11189 28 81 0.5 µs classification time per pattern (@118 MHz), using time multiplexing Evaluation module: 371 slices (2%) Parallel evaluation (8 individuals): 1393 slices (10%) System Overview 27

Evolution Speed (Sonar) Average number of generations required per row is 853 (limit 1000) 98948 generations for entire system 0.54s per row 63s for entire system System is operational after 4.3s (8 rows) Partial reconfiguration of the CDMs 28

Conclusions 1 High classification speed Suitable for problems requiring high throughput High classification accuracy Comparable or better than other methods (with exception for SVM) Incremental evolution Makes evolution of a large system possible Evolution time is short The evaluation module requires few FPGA resources Conclusions 2 Run-time Combination reconfigurable architecture Allows for online evolution in an on-chip system Suitable for adaptation to a changing fitness function/training set of SW/HW on-chip gives benefits Evolution time can be kept down with HW fitness evaluation The rest of the GA can run in SW, allowing for easy modifications A compact system (SoC) 29

EHW Conferences International Conference on Evolvable Systems (ICES). Springer LNCS. Next conference: Prague, Czech Republic Paper deadline: March 19, 2008 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), IEEE. Next conference: Noordwijk, The Netherlands Paper deadline: January 31, 2008 The work presented has been funded by the Research Council of Norway through the project Biological Inspired Design of Systems for Complex Real-World Applications (project no 160308/V30). 30

Thank you! Please feel free to contact me after this talk. Several PhD positions will be available soon (one in our group). I m also available for discussions here at Stanford, Friday January 25. E-mail: jimtoer@ifi.uio.no http://www.ifi.uio.no/~jimtoer 31