Speed Sign Detection Using Convolutional Neural Network Accelerator IP User Guide

Speed Sign Detection Using Convolutional Neural Network Accelerator IP FPGA-RD-02035-1.0 May 2018

Contents Acronyms in This Document... 3 Introduction... 4 Reference Design Overview... 5 Block diagram... 5 CNN Accelerator Engine... 6 SD Card Loader... 7 AXI Salve and DDR3 Memory Interface... 8 CSI2 to DVI Interface... 9 Video Processing Module... 10 Related Documentation... 12 Soft IP Document... 12 Diamond Document... 12 Hardware Requirements... 13 References... 14 Technical Support Assistance... 15 Revision History... 16 Figures Figure 1.1. Embedded Vision Development Kit... 4 Figure 2.1. Speed Sign Detection Reference Design Block Diagram... 5 Figure 3.1. CNN Accelerator IP Core Generation GUI... 6 Figure 4.1. Neural Network Compiler Tool Output File Generation Flow... 7 Figure 7.1. Speed Sign Detection Design *.yml File Snippet... 10 2 FPGA-RD-02035-1.0

Acronyms in This Document A list of acronyms used in this document. Acronym Definition AXI CNN DRAM DVI FPGA GUI passp PIP SD Card SPI Advanced extensible Interface Convolutional Neural Network Dynamic Random Access Memory Digital Visual Interface Field Programmable Gate Array Graphic User Interface Programmable Application Specific Standard Product Picture In Picture Secure Digital (Memory) Card Serial Peripheral Interface FPGA-RD-02035-1.0 3

Introduction This document describes the Speed Sign Detection machine learning neural network reference design. This reference design can be implemented on Lattice s Embedded Vision Development Kit, featuring the Lattice CrossLink passp and ECP5 FPGA. Lattice s Embedded Vision Development Kit Stackable Modular Video Interface Platform (VIP) CrossLink Input Bridge Board LIF-MD6000 passp Two Sony IMX 214 Cameras 2:1 CSI-2 MUX ECP5 Processor Board ECP5-85FGPA Image Signal Processing Sensor Interface IONOS ISP Pipeline All-inclusive demo system with video sources Prototyping header Easy programming via USB interface HDMI Output Bridge Board SiI1136 HDMI assp Non-HDCP Output Figure 1.1. Embedded Vision Development Kit 4 FPGA-RD-02035-1.0

Reference Design Overview Block diagram Figure 2.1 shows the block diagram of the Speed Sign Detection reference design. ECP5 External DRAM DDR3 Control (ddr3_ip_inst) AXI Slave (axi2lattic e128) CNN Accelerator Engine (lsc_ml_wrap) External Micro SD Card SD Loader (Sd_spi) Frame Data (32x32, 90x90) Result External Camera CSI2_to_DVI_top Video Processing (crop_downscale) External HDMI TX Lattice IP (clarity) Face Tracking Demo Support Design Modules External Components Figure 2.1. Speed Sign Detection Reference Design Block Diagram The Reference Design uses ECP5-85 FPGA containing the following major blocks: CNN accelerator engine SD card to SPI interface AXI Salve interface DDR3 memory interface CSI2 to DVI interface Video processing module FPGA-RD-02035-1.0 5

CNN Accelerator Engine Lattice Semiconductor CNN Accelerator IP Core can be used through the Diamond Clarity IP Designer. Engine configuration parameters can be set using the Clarity Designer s IP core configuration GUI, as shown in Figure 3.1. Figure 3.1. CNN Accelerator IP Core Generation GUI For detailed information about Lattice Semiconductor CNN Accelerator IP core, such as input data format, output data format and command format, refer to CNN Accelerator IP Core (FPGA-IPUG-02037). For the command generation by Lattice Neural Network Compiler, refer to Lattice Neural Network Compiler Software (FPGA- UG-02052). 6 FPGA-RD-02035-1.0

TensorFlow Coffe Speed Sign Detection Using Convolutional Neural Network Accelerator IP SD Card Loader SD card interface in this design is used to get the command data into the DRAM for execution by the CNN accelerator IP. The SD card contains a file that is generated by Lattice Neural Network Compiler Tool. Lattice Neural Network Compiler Tool allows analyzing and compiling a trained neural network, such as what is generated by Caffe or TensorFlow tool, to use with selected Lattice Semiconductor FPGA products. Lattice Neural Network Compiler tool outputs three files: A hardware configuration file (*.yml) that contains information on fixed point converted network and memory allocation. A firmware file (*.lscml) in ASCII format that contains weights coming from a trained model file. Firmware file (*.lscml) must be converted to binary format before loading into the SD card. A firmware file (*.bin) in binary format that can be directly loaded into the SD card. For detailed operation instructions, refer to Lattice Neural Network Compiler Software (FPGA-UG-02052). Figure 4.1 shows the output file generation flow of the Neural Network Compiler Tool. *.proto *.coffemodel SampleImage.jpeg Trained Model HW Configuration Generator *.yml *.pb SampleImage.jpeg *.lscml Firmware (ASCII) Hardware Simulation *.bin Firmware (Binary) Figure 4.1. Neural Network Compiler Tool Output File Generation Flow FPGA-RD-02035-1.0 7

AXI Salve and DDR3 Memory Interface AXI interface allows command code to be written in DRAM before execution of CNN Accelerator IP Core. Input data may also be written in DRAM. CNN Accelerator IP Core reads command code from DRAM and performs calculations using internal sub-execution engines. Intermediate data may also be transferred from/to DRAM per command code. 8 FPGA-RD-02035-1.0

CSI2 to DVI Interface This module implements a bridge function that converts the camera input MIPI CSI data to DVI output using Lattice CrossLink passp and SiI1136 HDMI transmitter. FPGA-RD-02035-1.0 9

Video Processing Module The crop_downscale module provides all the necessary functions needed to manage the process of inputting data, receiving output, data and generating a composite image for output to the HDMI interface. Four examples are included in the design: crop_downscale.v crops input to 32 32 crop_downscale_key.v crops input to 90 90 crop_downscale_sign.v crops input to 128 128 crop_downscale_keyl.v crops input to 224 224 The Speed Sign Detection demo uses crop_downscale_sign.v. Key functions of the code include: Capturing a downscaled image from the camera input module and saving it to a frame buffer. Writing the frame buffer data into CNN accelerator engine during the blanking period. Buffering the output after completion of the image data processing. Creating a Picture In Picture (PIP) bounding box with green borders, and outputting the composite image. Output from CSI2_to_DVI_top module is a stream of data that reflects the camera image. Input image is then downscaled to 128 128 pixels, stored in a frame buffer and passed to output. Image data is written from the frame buffer into the CNN acceleration engine prior to the start of the processing. Data is then formatted for compatibility with the trained network. The *.yml file provides majority of the information needed for understanding how the input data should be prepared. A snippet of the code in *.yml file for Speed Sign Detection design is shown in Figure 7.1. Figure 7.1. Speed Sign Detection Design *.yml File Snippet 10 FPGA-RD-02035-1.0

Input Size: [1, 3, 128, 128] indicates one input array consisting of 3 layers of dimensions 128 128 memblks: 3 total number of memory blocks needed depth_per_mem: 1 number of memory blocks allocated to each memory layer frac: 8 number of bits that is allocated to the fractional component. It is equal to the minimum number of bits to represent this number minus 1. In this case, 3 bits to represent 8-1=7. num_ebr: 16 number of memory blocks. Note despite the variable name, this does not tie directly to the number of Embedded Block Ram (EBR) used in the design. ebr_blk_size: 16384 this defines the size of the memory blocks in bytes. Note the blocks have a width of 16 bits and the depth is variable. CNN accelerator engine s ports for results, o_we and o_dout[15:0], can be used to output any number of results. Designer can add a read command to allow reading any data based on the neural network design. In Speed Sign Detection design, the speedsign_post.v accepts the 16-bit output data from CNN accelerator engine, and generates the confidence level for the pre-trained Speed Limit Signs. The results then will be overplayed onto the left side of the output image stream in the magnification bar chart format. FPGA-RD-02035-1.0 11

Related Documentation Soft IP Document CNN Accelerator IP Core (FPGA-IPUG-02037) Diamond Document For more information on Lattice Diamond Software, visit Lattice website at: www.latticesemi.com/products/designsoftwareandip 12 FPGA-RD-02035-1.0

Hardware Requirements Lattice Embedded Vision Development Kit (LF-EVDK1-EVN) Mini-USB Cable (included in the Lattice Embedded Vision Development Kit) 12 V Power Supply (included with the Kit) HDMI Cable HDMI Monitor (1080p60) Micro-SD Card Adapter (MICROSD-ADP-ENV) Micro-SD Card. Standard Micro-SD card only. FPGA-RD-02035-1.0 13

References For more information on FPGA device, visit http://www.latticesemi.com/products/fpgaandcpld/ecp5 For complete information on Lattice Diamond Project-Based Environment, Design Flow, Implementation Flow and Tasks, as well as on the Simulation Flow, see the Lattice Diamond. 14 FPGA-RD-02035-1.0

Technical Support Assistance Submit a technical support case through www.latticesemi.com/techsupport. FPGA-RD-02035-1.0 15

Revision History Revision 1.0, May 2018 First release. 16 FPGA-RD-02035-1.0

7 th Floor, 111 SW 5 th Avenue Portland, OR 97204, USA T 503.268.8000 www.latticesemi.com