Face Tracking Using Convolutional Neural Network Accelerator IP Reference Design

Similar documents
Object Counting Using Convolutional Neural Network Accelerator IP Reference Design

Speed Sign Detection Using Convolutional Neural Network Accelerator IP User Guide

Speed Sign Detection Using Convolutional Neural Network Accelerator IP Reference Design

EVDK Based Speed Sign Detection Demonstration User Guide

Key Phrase Detection Using Compact CNN Accelerator IP Reference Design

2:1 HDMI Switch Demo User Guide

Machine Learning Demo User s Guide

Lattice Embedded Vision Development Kit User Guide

MDP Based Face Detection Demonstration User Guide

MDP Based Key Phrase Detection Demonstration User Guide

ice40 UltraPlus Display Frame Buffer User Guide

MIPI D-PHY Bandwidth Matrix Table User Guide

Programming External SPI Flash through JTAG for ECP5/ECP5-5G Technical Note

MIPI D-PHY Bandwidth Matrix and Implementation Technical Note

Byte-to-Pixel Converter IP User Guide

ice40 SPRAM Usage Guide Technical Note

Accelerating Implementation of Low Power Artificial Intelligence at the Edge

Neural Network Compiler BNN Scripts User Guide

MIPI D-PHY to CMOS Interface Bridge Soft IP

ice40 UltraPlus Image Sensor Elliptical Crypto Engine (ECC) Demo - Radiant Software User Guide

LCMXO3LF-9400C SED/SEC Demo

Embedded Vision Solutions

ECP5 Product Families Update - Errata to Soft Error Detection (SED) Function

NEW USE CASES HIGHLIGHT CROSSLINK S BROAD APPLICABILITY

FIR Filter IP Core User s Guide

1:2 and 1:1 MIPI DSI Display Interface Bridge Soft IP User Guide

DG0849 Demo Guide PolarFire Dual Camera Video Kit

Digital Blocks Semiconductor IP

CrossLink Hardware Checklist Technical Note

AIScaleCDP2 IP Core Preliminary Datasheet

Digital Blocks Semiconductor IP

Parallel to MIPI CSI-2 TX Bridge

Memory Modules User Guide

Gamma Corrector IP Core User Guide

Lattice Memory Mapped Interface and Lattice Interrupt Interface User Guide

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

PCI Express Basic Demo Verilog Source Code User s Guide

UG0850 User Guide PolarFire FPGA Video Solution

Low-Cost Serial RapidIO to TI 6482 Digital Signal Processor Interoperability with LatticeECP3

ESL design with the Agility Compiler for SystemC

FFT Compiler IP Core User Guide

Revolutionizing the Datacenter

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman

PCI Express Throughput Demo Verilog Source Code User s Guide

PCI Express Demos for the ECP5 and ECP5-5G Versa Development Boards User Guide. UG98 Version 1.3, October 2016

LCMXO3LF-9400C Simple Hardware Management Demo User Guide

JESD204B IP Core User Guide

MIPI CSI2-to-CMOS Parallel Sensor Bridge

PCI Express Demos for the ECP5 PCI Express Board User Guide

ice40 UltraPlus RGB LED Controller with BLE User Guide

THE NVIDIA DEEP LEARNING ACCELERATOR

借助 SDSoC 快速開發複雜的嵌入式應用

The Path to Embedded Vision & AI using a Low Power Vision DSP. Yair Siegel, Director of Segment Marketing Hotchips August 2016

CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)

Lattice PCI Express x4 Scatter-Gather DMA Demo Verilog Source Code User s Guide

EXOSTIV Dashboard Hands-on - MICA board

U4421A MIPI D-PHY (CSI-2/DSI) Protocol Exerciser and Analyzer. Bring your CSI-2 and DSI-1 designs to market faster with complete confidence

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision. Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017

DPM Demo Kit User s Manual Version: dpm_dk_um_1_0_1.doc

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

2D Edge Detector IP Core User s Guide

DDR2 Demo for the LatticeECP3 Serial Protocol Board User s Guide

A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology

4K HEVC Video Processing with GPU Optimization on Jetson TX1

Recurrent Convolutional Neural Networks for Scene Labeling

Atlys (Xilinx Spartan-6 LX45)

C-Brain: A Deep Learning Accelerator

LatticeECP3 Digital Front End Demonstration Design User s Guide

4K Format Conversion Reference Design

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University

Advanced Digital Machine Vision Cameras. GigE/USB3 Application. Quick Start. Application Notes. Rugged Machine Vision. Rev D

SmartFusion2 SoC FPGA Demo: Code Shadowing from SPI Flash to SDR Memory User s Guide

Lattice SDI Quad-view

ice40 UltraPlus 8:1 Mic Aggregation Demo User Guide

Avnet Zynq Mini Module Plus Embedded Design

i_csn i_wr i_rd i_cpol i_cpha i_lsb_first i_data [15:0] o_data [15:0] o_tx_ready o_rx_ready o_rx_error o_tx_error o_tx_ack o_tx_no_ack

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

PCI Express Demos for the LatticeECP3 Versa Evaluation Board User s Guide

Altera ASMI Parallel II IP Core User Guide

Color Space Converter IP Core User Guide

Advanced Digital Design Using FPGA. Dr. Shahrokh Abadi

ArduCAM USB Camera SDK

VIDEO BRIDGING SOLUTION PROMISES NEW LEVEL OF DESIGN FLEXIBILITY AND INNOVATION

Homework 9: Software Design Considerations

Speculations about Computer Architecture in Next Three Years. Jan. 20, 2018

S2C K7 Prodigy Logic Module Series

isplever Parallel FIR Filter User s Guide October 2005 ipug06_02.0

A flexible memory shuffling unit for image processing accelerators

UART Register Set. UART Master Controller. Tx FSM. Rx FSM XMIT FIFO RCVR. i_rx_clk o_intr. o_out1 o_txrdy_n. o_out2 o_rxdy_n i_cs0 i_cs1 i_ads_n

I 2 C Master Control FSM. I 2 C Bus Control FSM. I 2 C Master Controller

ice40 Ultra Self-Learning IR Remote User s Guide

Bring Intelligence to the Edge with Intel Movidius Neural Compute Stick

AN 829: PCI Express* Avalon -MM DMA Reference Design

Sundance Multiprocessor Technology Limited. Capture Demo For Intech Unit / Module Number: C Hong. EVP6472 Intech Demo. Abstract

CPRI IP Core User s Guide

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

AL361A-EVB-A1. Multi-channel Video Processor EVB. Datasheet. (HDMI/AHD-to-HDMI) 2017 by AverLogic Technologies, Corp. Version 1.0

NEW FPGA DESIGN AND VERIFICATION TECHNIQUES MICHAL HUSEJKO IT-PES-ES

Transcription:

Face Tracking Using Convolutional Neural Network Accelerator IP FPGA-RD-02037-1.0 May 2018

Contents 1. Introduction... 4 2. Related Documentation... 5 2.1. Soft IP Document... 5 2.2. Diamond Document... 5 3. Software Requirements... 5 4. Hardware Requirements... 5 5. Overview... 6 5.1. Block Diagram... 6 5.2. Top Level Blocks... 6 6. CNN Accelerator Engine... 7 7. SD Card Loader... 10 8. AXI Salve and DDR3 Memory Interface... 11 9. CSI2 to DVI Interface... 11 10. Video Processing Module... 11 References... 13 Technical Support Assistance... 13 Revision History... 13 Figures Figure 1.1. Lattice Embedded Vision Development Kit with MicroSD Card Adapter Board... 4 Figure 5.1. Face Tracking Block Diagram... 6 Figure 6.1. CNN Accelerator IP Core Generation GUI... 7 Figure 6.2. CNN Accelerator Soft IP Inputs and Outputs... 8 Figure 6.3. Command Format... 9 Figure 7.1. Neural Network Compiler Output File Generation and Flow... 10 Figure 10.1. Face Tracking Design *.yml file code snippet... 11 2 FPGA-RD-02037-1.0

Acronyms in This Document A list of acronyms used in this document. FPGA-RD-02037-1.0 3

1. Introduction This document describes the Face Tracking machine learning neural network reference design. This reference design can be implemented on Lattice s Embedded Vision Development Kit (EVDK), featuring the CrossLink and ECP5 FPGAs. Figure 1.1. Lattice Embedded Vision Development Kit with MicroSD Card Adapter Board 4 FPGA-RD-02037-1.0

2. Related Documentation 2.1. Soft IP Document CNN Accelerator IP Core User Guide (FPGA-IPUG-02037) 2.2. Diamond Document For more information on Lattice Diamond Software, visit Lattice website at: www.latticesemi.com/products/designsoftwareandip 3. Software Requirements Lattice Diamond Software Version 3.10 Synplify Pro Synthesis Tool 4. Hardware Requirements Lattice Embedded Vision Development Kit (LF-EVDK1-EVN) Mini-USB Cable (included in the Lattice Embedded Vision Development Kit) 12 V Power Supply (included with the Kit) HDMI Cable HDMI Monitor (1080p60) Micro-SD Card Adapter (MICROSD-ADP-ENV) Micro-SD Card. Standard Micro-SD card only. FPGA-RD-02037-1.0 5

5. Overview 5.1. Block Diagram Figure 5.1 shows the block diagram of the Face Tracking reference design. ECP5 External DRAM DDR3 Control (ddr3_ip_inst) AXI Slave (axi2lattic e128) CNN Accelerator Engine (lsc_ml_wrap) External Micro SD Card SD Loader (Sd_spi) Frame Data (32x32, 90x90) Result External Camera CSI2_to_DVI_top Video Processing (crop_downscale) External HDMI TX Lattice IP (clarity) Face Tracking Demo Support Design Modules External Components Figure 5.1. Face Tracking Block Diagram 5.2. Top Level Blocks The reference design uses ECP5-85 FPGA containing the following major blocks: CNN Accelerator Engine SD card to SPI interface AXI Salve interface DDR3 memory interface CSI2 to DVI interface Video processing module 6 FPGA-RD-02037-1.0

6. CNN Accelerator Engine The Lattice Semiconductor CNN Accelerator IP Core is a calculation engine for Deep Neural Network with fixed point weight or binarized weight. It calculates full layers of Neural Network including the convolution layer, pooling layer, batch normalization layer, and full connect layer by executing sequence code with weight value generated by the Lattice Neural Network Compiler tool. The engine is optimized for convolutional neural network, hence it can be used for vision-based applications such as classification or object detection and tracking. The IP Core does not require an extra processor; it can perform all required calculations by itself. The design is implemented in Verilog HDL. It can be targeted to ECP5 and ECP5-5G (LFE5U, LFE5UM, and LFE5UM5G) FPGA devices and implemented using the Lattice Diamond Software Place and Route tool integrated with the Synplify Pro synthesis tool. The key features of the CNN Accelerator IP Core include: Supports convolution layer, max pooling layer, batch normalization layer and full connect layer Configurable bit width of weight (16-bit, 1-bit) Configurable bit width of activation (16/8-bit, 1-bit) Dynamically support 16-bit and 8-bit width of activation Configurable number of memory blocks for tradeoff between resource and performance Configurable number of convolution engines for tradeoff between resource and performance Optimized for 3x3 2D convolution calculation Dynamically support various 1D convolution from 1 to 72 taps Support max pooling with overlap (For example, kernel 3 and stride 2) Engine configuration parameters can be set using the Clarity Designer s IP Core Configuration GUI as shown in Figure 6.1. Figure 6.1. CNN Accelerator IP Core Generation GUI FPGA-RD-02037-1.0 7

Figure 6.2 shows the inputs and outputs of this IP module. clk clk/reset resetn AXI bus External memory i_start control o_rd_rdy i_we CNN Accelerator Soft IP o_we o_dout output data (result) i_mem_sei input data i_waddr o_status debug_info i_din Input Data Format: Figure 6.2. CNN Accelerator Soft IP Inputs and Outputs Input data is a sequence of 8-bit or 16-bit data. Memory index and address are decided by Neural Network. Therefore, external block should process input raw data and write input data to Lattice CNN Accelerator IP Core through input data write interface. Since CNN Accelerator IP Core has only 16-bit width interface, external block should pack two of 8- bit data if 8-bit width is used for input data layer. For example, face detection neural network may take 32x32 of R, G, B planes at memory index 0 with address 0x0000 for Red plane, 0x0400 for Green plane, and 0x0800 for Blue plane. Object detection neural network on the other hand may take 90x90 of R, G, B planes which are assigned to memory index 0, 1 and 2, respectively. Because memory assignment is defined by Neural Network, external block should handle input raw data and write it to proper position of internal memory of the CNN Accelerator IP Core. Writing input data to DRAM and using Load command to fetch input data are also possible for the case of large input data. The IP core expects data in little-endian order. Result: Result, that is, by final blob data of neural network can be written to DRAM per command code. In this case, external logic should read result data from DRAM. However, command code also can simply feed result data to external logic through this interface. Interface consists of o_we as valid indicator and o_dout as 16-bit data. Usually, it is a single burst series of 16-bit data and it s also fully programmable by command code. Output Data Format: Output data is a sequence of 16-bit data which is controlled by commands. Amount of data is also decided by Neural Network, that is, by output blobs. External block should interpret output sequence and generate usable information. For example, face detection design outputs 2-bit burst (two consecutive) of 16-bit data, first is confidence of non-face while the second one is confidence of face. Whenever the latter is larger than the former, conclusion is Face. The IP core outputs data is in little-endian order. Command Format: Command is a sequence of 32-bit data with or without additional parameters or weights as shown in Figure 6.3. It should be loaded at DRAM address 0x0000 before execution. Command is generated by the Lattice Neural Network Compiler tool. For more information, refer to Lattice Neural Network Compiler Software User Guide (FPGA-UG-02052). 8 FPGA-RD-02037-1.0

Figure 6.3. Command Format FPGA-RD-02037-1.0 9

7. SD Card Loader SD card interface in this design is used to get the input data into the CNN accelerator IP. SD card contains a binary file that is generated by Lattice Neural Network Compiler Tool. Lattice Neural Network Compiler tool allows analyzing and compiling a trained neural network (such as what is generated by Caffe or TensorFlow tools) for use with select Lattice Semiconductor FPGA products. Lattice Neural Network Compiler tool outputs three files: A hardware configuration file (*.yml) that contains info on fixed point converted network and memory allocation. A firmware file (*.lscml) that contains weights coming from a trained model file. A binary file (*.bin) generated from firmware file (*.Lscml) for programming into SD card. Figure 7.1 shows the output file generation flow of the Neural Network Compiler tool. Figure 7.1. Neural Network Compiler Output File Generation and Flow 10 FPGA-RD-02037-1.0

8. AXI Salve and DDR3 Memory Interface AXI interface allows command code to be written in DRAM before execution of CNN Accelerator IP Core. Input data may also be written in DRAM. CNN Accelerator IP Core reads command code from DRAM, and performs calculations using internal sub execution engines. Intermediate data may also be transferred from/to DRAM per command code. 9. CSI2 to DVI Interface This module implements a bridge function that converts the camera input s MIPI CSI data to DVI output using CrossLink and SiI1136 HDMI transmitter. 10. Video Processing Module The crop_downscale module provides all the necessary functions needed to manage the process of inputting data, receiving output data, and generating a composite image for output to the HDMI interface. Three examples are included in the design: crop_downscale.v crops input to 32x32 crop_downscale_key.v crops input to 90x90 crop_downscale_keyl.v crops input to 224x224 The face tracking demo uses crop_downscale_key.v. Key functions of the code include: Capturing a downscaled image from the camera input module and saving it to a frame buffer. Writing the frame buffer data into CNN accelerator engine during the blanking period. Buffering the output after completion of the image data processing. Creating a PIP (Picture-in-Picture) bounding box with green borders and outputting the composite image. Output from CSI2_to_DVI_top module is a stream of data that reflects the camera image. Input image is then downscaled to 90x90 pixels, stored in a frame buffer and passed to output. Image data is written from the frame buffer into the CNN acceleration engine prior to the start of the processing. Data is then formatted for compatibility with the trained network. The *.yml file provides majority of the information needed for understanding how the input data should be prepared. A snippet of the code in *.yml file for face tracking design is shown in Figure 10.1. Figure 10.1. Face Tracking Design *.yml file code snippet FPGA-RD-02037-1.0 11

Input Size: [1, 3, 90, 90] -- Indicates one input array consisting of 3 layers of dimensions 90x90. memblks: 3 Total number of memory blocks needed. depth_per_mem: 1 Number of memory blocks allocated to each memory layer. frac: 8 Number of bits that is allocated to the fractional component. It is equal to the minimum number of bits to represent this number minus 1. In this case, 3 bits to represent 8-1=7. num_ebr: 16 Number of memory blocks. Note: Despite the variable name, this does not tie directly to the number of Embedded Block Ram (EBR) used in the design. ebr_blk_size: 16384 This defines the size of the memory blocks in Bytes. Note the blocks have a width of 16 bits and the depth is variable. CNN accelerator engine s port results, o_we and o_dout[15:0], can be used to output any number of results. Designer can add a read command to allow reading any data based on the neural network design. In face tracking design, we use 4 results to be transferred from CNN accelerator engine to crop_downscale_key module. These are used to create a green bounding box around the face on the output image stream: X pixels (from image top left to box center) Y pixels (from image top left to box center) Box X (dimension in pixels) Box Y (dimension in pixels) 12 FPGA-RD-02037-1.0

References For more information on the FPGA device, visit: http://www.latticesemi.com/products/fpgaandcpld/ecp5. For complete information on Lattice Diamond Project-Based Environment, Design Flow, Implementation Flow and Tasks, as well as on the Simulation Flow, see the Lattice Diamond User Guide. Technical Support Assistance Submit a technical support case through www.latticesemi.com/techsupport. Revision History Date Version Change Summary May 2018 1.0 Initial release. FPGA-RD-02037-1.0 13

7 th Floor, 111 SW 5 th Avenue Portland, OR 97204, USA T 503.268.8000 www.latticesemi.com