Speed Sign Detection Using Convolutional Neural Network Accelerator IP Reference Design

Speed Sign Detection Using Convolutional Neural Network Accelerator IP FPGA-RD-02035 Version 1.1 September 2018

Contents Acronyms in This Document... 3 1. Introduction... 4 2. Overview... 5 2.1. Block diagram... 5 3. Related Documentation... 6 3.1. Soft IP Document... 6 3.2. Diamond Document... 6 4. Hardware Requirements... 6 5. CNN Accelerator Engine... 7 6. SD Card Loader... 8 7. AXI Slave and DDR3 Memory Interface... 9 8. CSI2 to DVI Interface... 9 9. Video Processing Module... 9 10. Generating the Firmware File... 11 References... 14 Technical Support Assistance... 14 Revision History... 14 Figures Figure 1.1. Embedded Vision Development Kit... 4 Figure 2.1. Speed Sign Detection Block Diagram... 5 Figure 5.1. CNN Accelerator IP Core Generation GUI... 7 Figure 6.1. Neural Network Compiler Tool Output File Generation Flow... 8 Figure 9.1. Speed Sign Detection Design *.yml File Snippet... 10 Figure 10.1. SensAI v1.1 Project Settings Part 1... 11 Figure 10.2. SensAI v1.1 Project Settings Part 2... 12 Figure 10.3. SensAI v1.1 Fractional Bit Change... 12 2 FPGA-RD-02035-1.1

Acronyms in This Document A list of acronyms used in this document. Acronym Definition AXI CNN DRAM DVI FPGA GUI PIP SD Card SPI Advanced extensible Interface Convolutional Neural Network Dynamic Random Access Memory Digital Visual Interface Field Programmable Gate Array Graphic User Interface Picture In Picture Secure Digital (Memory) Card Serial Peripheral Interface FPGA-RD-02035-1.1 3

1. Introduction This document describes the Speed Sign Detection machine learning neural network reference design. This reference design can be implemented on Lattice s Embedded Vision Development Kit, featuring the Lattice CrossLink and ECP5 FPGA devices. Lattice s Embedded Vision Development Kit Stackable Modular Video Interface Platform (VIP) CrossLink Input Bridge Board LIF-MD6000 passp Two Sony IMX 214 Cameras 2:1 CSI-2 MUX ECP5 Processor Board ECP5-85FGPA Image Signal Processing Sensor Interface IONOS ISP Pipeline All-inclusive demo system with video sources Prototyping header Easy programming via USB interface HDMI Output Bridge Board SiI1136 HDMI assp Non-HDCP Output Figure 1.1. Embedded Vision Development Kit 4 FPGA-RD-02035-1.1

2. Overview 2.1. Block diagram Figure 2.1 shows the block diagram of the Speed Sign Detection reference design. ECP5 External DRAM DDR3 Control (ddr3_ip_inst) AXI Slave (axi2lattic e128) CNN Accelerator Engine (lsc_ml_wrap) External Micro SD Card SD Loader (Sd_spi) Frame Data (32x32, 90x90) Result External Camera CSI2_to_DVI_top Video Processing (crop_downscale) External HDMI TX Lattice IP (clarity) Face Tracking Demo Support Design Modules External Components Figure 2.1. Speed Sign Detection Block Diagram The uses ECP5-85 FPGA containing the following major blocks: CNN accelerator engine SD card to SPI interface AXI Slave interface DDR3 memory interface CSI2 to DVI interface Video processing module FPGA-RD-02035-1.1 5

3. Related Documentation 3.1. Soft IP Document CNN Accelerator IP Core User Guide (FPGA-IPUG-02037) 3.2. Diamond Document For more information on Lattice Diamond Software, visit Lattice website at: www.latticesemi.com/products/designsoftwareandip 4. Hardware Requirements Lattice Embedded Vision Development Kit (LF-EVDK1-EVN) Mini-USB Cable (included in the Lattice Embedded Vision Development Kit) 12 V Power Supply (included with the Kit) HDMI Cable HDMI Monitor (1080p60) Micro-SD Card Adapter (MICROSD-ADP-ENV) Micro-SD Card. Standard Micro-SD card only. 6 FPGA-RD-02035-1.1

5. CNN Accelerator Engine Lattice Semiconductor CNN Accelerator IP Core can be used through the Diamond Clarity IP Designer. Engine configuration parameters can be set using the Clarity Designer s IP core configuration GUI, as shown in Figure 5.1. Figure 5.1. CNN Accelerator IP Core Generation GUI For detailed information about Lattice Semiconductor CNN Accelerator IP core, such as input data format, output data format and command format, refer to CNN Accelerator IP Core User Guide (FPGA-IPUG-02037). For the command generation by Lattice Neural Network Compiler, refer to Lattice Neural Network Compiler Software User Guide (FPGA- UG-02052). FPGA-RD-02035-1.1 7

6. SD Card Loader SD card interface in this design is used to get the command data into the DRAM for execution by the CNN accelerator IP. The SD card contains a file that is generated by Lattice Neural Network Compiler Tool. Lattice Neural Network Compiler Tool allows analyzing and compiling a trained neural network, such as what is generated by Caffe or TensorFlow tool, to use with selected Lattice Semiconductor FPGA products. Lattice Neural Network Compiler tool outputs three files: A hardware configuration file (*.yml) that contains information on fixed point converted network and memory allocation. A firmware file (*.lscml) in ASCII format that contains weights coming from a trained model file. Firmware file (*.lscml) must be converted to binary format before loading into the SD card. A firmware file (*.bin) in binary format that can be directly loaded into the SD card. For detailed operation instructions, refer to Lattice Neural Network Compiler Software User Guide (FPGA-UG-02052). Figure 6.1 shows the output file generation flow of the Neural Network Compiler Tool. Coffe *.proto *.coffemodel SampleImage.jpeg Trained Model HW Configuration Generator *.yml TensorFlow *.pb SampleImage.jpeg *.lscml Firmware (ASCII) Hardware Simulation *.bin Firmware (Binary) Figure 6.1. Neural Network Compiler Tool Output File Generation Flow 8 FPGA-RD-02035-1.1

7. AXI Slave and DDR3 Memory Interface AXI interface allows command code to be written in DRAM before execution of CNN Accelerator IP Core. Input data may also be written in DRAM. CNN Accelerator IP Core reads command code from DRAM and performs calculations using internal sub-execution engines. Intermediate data may also be transferred from/to DRAM per command code. 8. CSI2 to DVI Interface This module implements a bridge function that converts the camera input MIPI CSI data to DVI output using Lattice CrossLink device and SiI1136 HDMI transmitter. 9. Video Processing Module The crop_downscale module provides all the necessary functions needed to manage the process of inputting data, receiving output, data and generating a composite image for output to the HDMI interface. Four examples are included in the design: crop_downscale.v crops input to 32 32 crop_downscale_key.v crops input to 90 90 crop_downscale_sign.v crops input to 128 128 crop_downscale_keyl.v crops input to 224 224 The Speed Sign Detection demo uses crop_downscale_sign.v. Key functions of the code include: Capturing a downscaled image from the camera input module and saving it to a frame buffer. Writing the frame buffer data into CNN accelerator engine during the blanking period. Buffering the output after completion of the image data processing. Creating a Picture In Picture (PIP) bounding box with green borders, and outputting the composite image. Output from CSI2_to_DVI_top module is a stream of data that reflects the camera image. Input image is then downscaled to 128 128 pixels, stored in a frame buffer and passed to output. Image data is written from the frame buffer into the CNN acceleration engine prior to the start of the processing. Data is then formatted for compatibility with the trained network. The *.yml file provides majority of the information needed for understanding how the input data should be prepared. A snippet of the code in *.yml file for Speed Sign Detection design is shown in Figure 9.1. FPGA-RD-02035-1.1 9

Figure 9.1. Speed Sign Detection Design *.yml File Snippet Input Size: [1, 3, 128, 128] indicates one input array consisting of 3 layers of dimensions 128 128 memblks: 3 total number of memory blocks needed depth_per_mem: 1 number of memory blocks allocated to each memory layer frac: 8 number of bits that is allocated to the fractional component. It is equal to the minimum number of bits to represent this number minus 1. In this case, 3 bits to represent 8-1=7. num_ebr: 16 number of memory blocks. Note despite the variable name, this does not tie directly to the number of Embedded Block Ram (EBR) used in the design. ebr_blk_size: 16384 this defines the size of the memory blocks in bytes. Note the blocks have a width of 16 bits and the depth is variable. CNN accelerator engine s ports for results, o_we and o_dout[15:0], can be used to output any number of results. Designer can add a read command to allow reading any data based on the neural network design. In Speed Sign Detection design, the speedsign_post.v accepts the 16-bit output data from CNN accelerator engine, and generates the confidence level for the pre-trained Speed Limit Signs. The results then will be overlayed onto the left side of the output image stream in the magnification bar chart format. 10 FPGA-RD-02035-1.1

10. Generating the Firmware File To generate the Speed Sign Detection Firmware file: 1. Using the files located in Software/Data/Tensorflow/VIP_SpeedSignDetectionDemo/VIP_SpeedSignDetectionDemo.ldnn, create a new project in SensAI and apply the following settings as shown in Figure 10.1. Framework TensorFlow Device ECP5 Class CNN Network File Software/ML_Training_Results/Inference/Tensorflow/speedlimitdet.pb Image/Video/Audio Data Software/ML_Training_Results/Speed40.jpg 2. Click Next. Figure 10.1. SensAI v1.1 Project Settings Part 1 3. In the second section, apply the Neural Network engine settings as shown in Figure 10.2: Mean Value for Data Pre-Processing 0 Scale Value for Data Pre-Processing 1.0 4. Click OK. FPGA-RD-02035-1.1 11

Figure 10.2. SensAI v1.1 Project Settings Part 2 5. Note that this reference design requires fractional bit changes to fine-tune the accuracy of the design. After analyzing the network, change the Stored Data Format (User Edit) for the data Blob from 11.4 to 7.0. 6. Figure 10.3 shows the change of the fractional bit. Figure 10.3. SensAI v1.1 Fractional Bit Change 12 FPGA-RD-02035-1.1

7. After changing the fractional bit, save the project file and reanalyze the design. 8. Click Compile to generate the Firmware file. After compiling, you can find the generated firmware file at: Software/Data/Tensorflow/VIP_SpeedSignDetectionDemo/Impl0/VIP_SpeedSignDetectionDemo.bin. FPGA-RD-02035-1.1 13

References For more information on FPGA device, visit http://www.latticesemi.com/products/fpgaandcpld/ecp5 For complete information on Lattice Diamond Project-Based Environment, Design Flow, Implementation Flow and Tasks, as well as on the Simulation Flow, see the Lattice Diamond User Guide. Technical Support Assistance Submit a technical support case through www.latticesemi.com/techsupport. Revision History Revision 1.1, September 2018 Section Generating the Firmware File Change Summary Added this section. Revision 1.0, May 2018 Section All Change Summary Initial release. 14 FPGA-RD-02035-1.1

7 th Floor, 111 SW 5 th Avenue Portland, OR 97204, USA T 503.268.8000 www.latticesemi.com