Speed Sign Detection Using Convolutional Neural Network Accelerator IP Reference Design

Similar documents
Speed Sign Detection Using Convolutional Neural Network Accelerator IP User Guide

Object Counting Using Convolutional Neural Network Accelerator IP Reference Design

Face Tracking Using Convolutional Neural Network Accelerator IP Reference Design

EVDK Based Speed Sign Detection Demonstration User Guide

Key Phrase Detection Using Compact CNN Accelerator IP Reference Design

2:1 HDMI Switch Demo User Guide

Machine Learning Demo User s Guide

Lattice Embedded Vision Development Kit User Guide

Embedded Vision Solutions

MDP Based Face Detection Demonstration User Guide

MDP Based Key Phrase Detection Demonstration User Guide

ice40 UltraPlus Display Frame Buffer User Guide

Programming External SPI Flash through JTAG for ECP5/ECP5-5G Technical Note

Accelerating Implementation of Low Power Artificial Intelligence at the Edge

ice40 UltraPlus Image Sensor Elliptical Crypto Engine (ECC) Demo - Radiant Software User Guide

NEW USE CASES HIGHLIGHT CROSSLINK S BROAD APPLICABILITY

MIPI D-PHY Bandwidth Matrix Table User Guide

MIPI D-PHY Bandwidth Matrix and Implementation Technical Note

ice40 SPRAM Usage Guide Technical Note

DG0849 Demo Guide PolarFire Dual Camera Video Kit

Neural Network Compiler BNN Scripts User Guide

LCMXO3LF-9400C SED/SEC Demo

CrossLink Hardware Checklist Technical Note

Byte-to-Pixel Converter IP User Guide

VIDEO BRIDGING SOLUTION PROMISES NEW LEVEL OF DESIGN FLEXIBILITY AND INNOVATION

ice40 UltraPlus RGB LED Controller with BLE User Guide

The Path to Embedded Vision & AI using a Low Power Vision DSP. Yair Siegel, Director of Segment Marketing Hotchips August 2016

MIPI D-PHY to CMOS Interface Bridge Soft IP

UG0850 User Guide PolarFire FPGA Video Solution

Low-Cost Serial RapidIO to TI 6482 Digital Signal Processor Interoperability with LatticeECP3

Digital Blocks Semiconductor IP

借助 SDSoC 快速開發複雜的嵌入式應用

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)

Digital Blocks Semiconductor IP

OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision. Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017

Parallel to MIPI CSI-2 TX Bridge

ECP5 Product Families Update - Errata to Soft Error Detection (SED) Function

Multimedia SoC System Solutions

4K HEVC Video Processing with GPU Optimization on Jetson TX1

Model: LT-125 USER MANUAL. A Lattice ECP3 based HD video compression and decompression evaluation platform. AUGUST Page 1

SBC-S32V234 QUICK START GUIDE (QSG)

A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology

LatticeECP3 Digital Front End Demonstration Design User s Guide

ESL design with the Agility Compiler for SystemC

AL361A-EVB-A1. Multi-channel Video Processor EVB. Datasheet. (HDMI/AHD-to-HDMI) 2017 by AverLogic Technologies, Corp. Version 1.0

ice40 UltraPlus 8:1 Mic Aggregation Demo User Guide

SmartFusion2 SoC FPGA Demo: Code Shadowing from SPI Flash to SDR Memory User s Guide

1:2 and 1:1 MIPI DSI Display Interface Bridge Soft IP User Guide

ice40 Ultra Self-Learning IR Remote User s Guide

DG0723 Demo Guide SmartFusion2 Imaging and Video Kit MIPI CSI-2

IoT Sensor Connectivity and Processing with Ultra-Low Power, Small Form-Factor FPGAs

Memory Modules User Guide

Developing a Camera Application with i.mx RT Series

Lattice Memory Mapped Interface and Lattice Interrupt Interface User Guide

LCMXO3LF-9400C Simple Hardware Management Demo User Guide

Lattice SDI Quad-view

Sundance Multiprocessor Technology Limited. Capture Demo For Intech Unit / Module Number: C Hong. EVP6472 Intech Demo. Abstract

Avnet Zynq Mini Module Plus Embedded Design

Designing with NXP i.mx8m SoC

Prem Arora Microsemi Corporation. Multiple MIPI CSI-2 SM Camera Solution Using FPGAs

UM1853 User manual. STM32CubeF1 Nucleo demonstration firmware. Introduction

MIPI CSI2-to-CMOS Parallel Sensor Bridge

ENABLING MOBILE INTERFACE BRIDGING IN ADAS AND INFOTAINMENT APPLICATIONS

AL362B-EVB-A1. AHD-to-HDMI Quad Box Development Kit by AverLogic Technologies, Corp. Version 1.0

S2C K7 Prodigy Logic Module Series

Lesson 6 Intel Galileo and Edison Prototype Development Platforms. Chapter-8 L06: "Internet of Things ", Raj Kamal, Publs.: McGraw-Hill Education

SVM-03/03U Utility Software. [SVMCtl] Software Manual. Rev. 8.1

Developing a simple UVC device based on i.mx RT1050

Bring Intelligence to the Edge with Intel Movidius Neural Compute Stick

Efficient Video Processing on Embedded GPU

Forza 4 ASCII Game. Demo for the AK-MACHX

P I X E V I A : A I B A S E D, R E A L - T I M E C O M P U T E R V I S I O N S Y S T E M F O R D R O N E S

4K Video Processing and Streaming Platform on TX1

DisplayPort MegaCore. Altera Technology Roadshow 2013

Microtronix Stratix III Broadcast IP Development Kit USER MANUAL REVISION Woodcock St. London, ON Canada N5H 5S1

QUICK DESCRIPTION - REF8116 USB CAMERA - - GLOBAL ARCHITECTURE - AN/00064

Atlys (Xilinx Spartan-6 LX45)

Advanced Digital Machine Vision Cameras. GigE/USB3 Application. Quick Start. Application Notes. Rugged Machine Vision. Rev D

The BlueNRG-1, BlueNRG-2 BLE OTA (over-the-air) firmware upgrade

4.1 Design Concept Demonstration for Altera DE2-115 FPGA Board Demonstration for Cyclone III Development Board...

4K Video Processing and Streaming Platform on TX1

Homework 9: Software Design Considerations

DPM Demo Kit User s Manual Version: dpm_dk_um_1_0_1.doc

SC2000 Smart Kit Selection Checklist

Designing Multi-Channel, Real-Time Video Processors with Zynq All Programmable SoC Hyuk Kim Embedded Specialist Jun, 2014

AL362B-DMB-A0. 4K HDMI Quad DEMO Board. Version 1.1. Mode IN1 IN2 IN3 IN4 Power RS232 IR

4K Format Conversion Reference Design

AL582C-EVB-A0 Evaluation Board

B-191 B-191s B-192 B-192S. B-190 Series - Range. 1000x. 600x. 1000x. 600x

THE NVIDIA DEEP LEARNING ACCELERATOR

ArduCAM USB Camera SDK

Arria V GX Video Development System

Advanced Digital Design Using FPGA. Dr. Shahrokh Abadi

Alpha FX Core IP-Enabled Video Wall Controller

Quick Start Guide. SABRE Platform for Smart Devices Based on the i.mx 6 Series

Intel Galileo gen 2 Board

VGA Demo. Forza 4 and Slideshow

Acadia II Product Guide. Low-Power, Low-Latency Video Processing for Enhanced Vision in Any Condition

MIPI : Advanced Driver Assistance System

Microsemi SmartFusion 2 SoC FPGA and IGLOO 2 FPGA

Transcription:

Speed Sign Detection Using Convolutional Neural Network Accelerator IP FPGA-RD-02035 Version 1.1 September 2018

Contents Acronyms in This Document... 3 1. Introduction... 4 2. Overview... 5 2.1. Block diagram... 5 3. Related Documentation... 6 3.1. Soft IP Document... 6 3.2. Diamond Document... 6 4. Hardware Requirements... 6 5. CNN Accelerator Engine... 7 6. SD Card Loader... 8 7. AXI Slave and DDR3 Memory Interface... 9 8. CSI2 to DVI Interface... 9 9. Video Processing Module... 9 10. Generating the Firmware File... 11 References... 14 Technical Support Assistance... 14 Revision History... 14 Figures Figure 1.1. Embedded Vision Development Kit... 4 Figure 2.1. Speed Sign Detection Block Diagram... 5 Figure 5.1. CNN Accelerator IP Core Generation GUI... 7 Figure 6.1. Neural Network Compiler Tool Output File Generation Flow... 8 Figure 9.1. Speed Sign Detection Design *.yml File Snippet... 10 Figure 10.1. SensAI v1.1 Project Settings Part 1... 11 Figure 10.2. SensAI v1.1 Project Settings Part 2... 12 Figure 10.3. SensAI v1.1 Fractional Bit Change... 12 2 FPGA-RD-02035-1.1

Acronyms in This Document A list of acronyms used in this document. Acronym Definition AXI CNN DRAM DVI FPGA GUI PIP SD Card SPI Advanced extensible Interface Convolutional Neural Network Dynamic Random Access Memory Digital Visual Interface Field Programmable Gate Array Graphic User Interface Picture In Picture Secure Digital (Memory) Card Serial Peripheral Interface FPGA-RD-02035-1.1 3

1. Introduction This document describes the Speed Sign Detection machine learning neural network reference design. This reference design can be implemented on Lattice s Embedded Vision Development Kit, featuring the Lattice CrossLink and ECP5 FPGA devices. Lattice s Embedded Vision Development Kit Stackable Modular Video Interface Platform (VIP) CrossLink Input Bridge Board LIF-MD6000 passp Two Sony IMX 214 Cameras 2:1 CSI-2 MUX ECP5 Processor Board ECP5-85FGPA Image Signal Processing Sensor Interface IONOS ISP Pipeline All-inclusive demo system with video sources Prototyping header Easy programming via USB interface HDMI Output Bridge Board SiI1136 HDMI assp Non-HDCP Output Figure 1.1. Embedded Vision Development Kit 4 FPGA-RD-02035-1.1

2. Overview 2.1. Block diagram Figure 2.1 shows the block diagram of the Speed Sign Detection reference design. ECP5 External DRAM DDR3 Control (ddr3_ip_inst) AXI Slave (axi2lattic e128) CNN Accelerator Engine (lsc_ml_wrap) External Micro SD Card SD Loader (Sd_spi) Frame Data (32x32, 90x90) Result External Camera CSI2_to_DVI_top Video Processing (crop_downscale) External HDMI TX Lattice IP (clarity) Face Tracking Demo Support Design Modules External Components Figure 2.1. Speed Sign Detection Block Diagram The uses ECP5-85 FPGA containing the following major blocks: CNN accelerator engine SD card to SPI interface AXI Slave interface DDR3 memory interface CSI2 to DVI interface Video processing module FPGA-RD-02035-1.1 5

3. Related Documentation 3.1. Soft IP Document CNN Accelerator IP Core User Guide (FPGA-IPUG-02037) 3.2. Diamond Document For more information on Lattice Diamond Software, visit Lattice website at: www.latticesemi.com/products/designsoftwareandip 4. Hardware Requirements Lattice Embedded Vision Development Kit (LF-EVDK1-EVN) Mini-USB Cable (included in the Lattice Embedded Vision Development Kit) 12 V Power Supply (included with the Kit) HDMI Cable HDMI Monitor (1080p60) Micro-SD Card Adapter (MICROSD-ADP-ENV) Micro-SD Card. Standard Micro-SD card only. 6 FPGA-RD-02035-1.1

5. CNN Accelerator Engine Lattice Semiconductor CNN Accelerator IP Core can be used through the Diamond Clarity IP Designer. Engine configuration parameters can be set using the Clarity Designer s IP core configuration GUI, as shown in Figure 5.1. Figure 5.1. CNN Accelerator IP Core Generation GUI For detailed information about Lattice Semiconductor CNN Accelerator IP core, such as input data format, output data format and command format, refer to CNN Accelerator IP Core User Guide (FPGA-IPUG-02037). For the command generation by Lattice Neural Network Compiler, refer to Lattice Neural Network Compiler Software User Guide (FPGA- UG-02052). FPGA-RD-02035-1.1 7

6. SD Card Loader SD card interface in this design is used to get the command data into the DRAM for execution by the CNN accelerator IP. The SD card contains a file that is generated by Lattice Neural Network Compiler Tool. Lattice Neural Network Compiler Tool allows analyzing and compiling a trained neural network, such as what is generated by Caffe or TensorFlow tool, to use with selected Lattice Semiconductor FPGA products. Lattice Neural Network Compiler tool outputs three files: A hardware configuration file (*.yml) that contains information on fixed point converted network and memory allocation. A firmware file (*.lscml) in ASCII format that contains weights coming from a trained model file. Firmware file (*.lscml) must be converted to binary format before loading into the SD card. A firmware file (*.bin) in binary format that can be directly loaded into the SD card. For detailed operation instructions, refer to Lattice Neural Network Compiler Software User Guide (FPGA-UG-02052). Figure 6.1 shows the output file generation flow of the Neural Network Compiler Tool. Coffe *.proto *.coffemodel SampleImage.jpeg Trained Model HW Configuration Generator *.yml TensorFlow *.pb SampleImage.jpeg *.lscml Firmware (ASCII) Hardware Simulation *.bin Firmware (Binary) Figure 6.1. Neural Network Compiler Tool Output File Generation Flow 8 FPGA-RD-02035-1.1

7. AXI Slave and DDR3 Memory Interface AXI interface allows command code to be written in DRAM before execution of CNN Accelerator IP Core. Input data may also be written in DRAM. CNN Accelerator IP Core reads command code from DRAM and performs calculations using internal sub-execution engines. Intermediate data may also be transferred from/to DRAM per command code. 8. CSI2 to DVI Interface This module implements a bridge function that converts the camera input MIPI CSI data to DVI output using Lattice CrossLink device and SiI1136 HDMI transmitter. 9. Video Processing Module The crop_downscale module provides all the necessary functions needed to manage the process of inputting data, receiving output, data and generating a composite image for output to the HDMI interface. Four examples are included in the design: crop_downscale.v crops input to 32 32 crop_downscale_key.v crops input to 90 90 crop_downscale_sign.v crops input to 128 128 crop_downscale_keyl.v crops input to 224 224 The Speed Sign Detection demo uses crop_downscale_sign.v. Key functions of the code include: Capturing a downscaled image from the camera input module and saving it to a frame buffer. Writing the frame buffer data into CNN accelerator engine during the blanking period. Buffering the output after completion of the image data processing. Creating a Picture In Picture (PIP) bounding box with green borders, and outputting the composite image. Output from CSI2_to_DVI_top module is a stream of data that reflects the camera image. Input image is then downscaled to 128 128 pixels, stored in a frame buffer and passed to output. Image data is written from the frame buffer into the CNN acceleration engine prior to the start of the processing. Data is then formatted for compatibility with the trained network. The *.yml file provides majority of the information needed for understanding how the input data should be prepared. A snippet of the code in *.yml file for Speed Sign Detection design is shown in Figure 9.1. FPGA-RD-02035-1.1 9

Figure 9.1. Speed Sign Detection Design *.yml File Snippet Input Size: [1, 3, 128, 128] indicates one input array consisting of 3 layers of dimensions 128 128 memblks: 3 total number of memory blocks needed depth_per_mem: 1 number of memory blocks allocated to each memory layer frac: 8 number of bits that is allocated to the fractional component. It is equal to the minimum number of bits to represent this number minus 1. In this case, 3 bits to represent 8-1=7. num_ebr: 16 number of memory blocks. Note despite the variable name, this does not tie directly to the number of Embedded Block Ram (EBR) used in the design. ebr_blk_size: 16384 this defines the size of the memory blocks in bytes. Note the blocks have a width of 16 bits and the depth is variable. CNN accelerator engine s ports for results, o_we and o_dout[15:0], can be used to output any number of results. Designer can add a read command to allow reading any data based on the neural network design. In Speed Sign Detection design, the speedsign_post.v accepts the 16-bit output data from CNN accelerator engine, and generates the confidence level for the pre-trained Speed Limit Signs. The results then will be overlayed onto the left side of the output image stream in the magnification bar chart format. 10 FPGA-RD-02035-1.1

10. Generating the Firmware File To generate the Speed Sign Detection Firmware file: 1. Using the files located in Software/Data/Tensorflow/VIP_SpeedSignDetectionDemo/VIP_SpeedSignDetectionDemo.ldnn, create a new project in SensAI and apply the following settings as shown in Figure 10.1. Framework TensorFlow Device ECP5 Class CNN Network File Software/ML_Training_Results/Inference/Tensorflow/speedlimitdet.pb Image/Video/Audio Data Software/ML_Training_Results/Speed40.jpg 2. Click Next. Figure 10.1. SensAI v1.1 Project Settings Part 1 3. In the second section, apply the Neural Network engine settings as shown in Figure 10.2: Mean Value for Data Pre-Processing 0 Scale Value for Data Pre-Processing 1.0 4. Click OK. FPGA-RD-02035-1.1 11

Figure 10.2. SensAI v1.1 Project Settings Part 2 5. Note that this reference design requires fractional bit changes to fine-tune the accuracy of the design. After analyzing the network, change the Stored Data Format (User Edit) for the data Blob from 11.4 to 7.0. 6. Figure 10.3 shows the change of the fractional bit. Figure 10.3. SensAI v1.1 Fractional Bit Change 12 FPGA-RD-02035-1.1

7. After changing the fractional bit, save the project file and reanalyze the design. 8. Click Compile to generate the Firmware file. After compiling, you can find the generated firmware file at: Software/Data/Tensorflow/VIP_SpeedSignDetectionDemo/Impl0/VIP_SpeedSignDetectionDemo.bin. FPGA-RD-02035-1.1 13

References For more information on FPGA device, visit http://www.latticesemi.com/products/fpgaandcpld/ecp5 For complete information on Lattice Diamond Project-Based Environment, Design Flow, Implementation Flow and Tasks, as well as on the Simulation Flow, see the Lattice Diamond User Guide. Technical Support Assistance Submit a technical support case through www.latticesemi.com/techsupport. Revision History Revision 1.1, September 2018 Section Generating the Firmware File Change Summary Added this section. Revision 1.0, May 2018 Section All Change Summary Initial release. 14 FPGA-RD-02035-1.1

7 th Floor, 111 SW 5 th Avenue Portland, OR 97204, USA T 503.268.8000 www.latticesemi.com