How to Accelerate OpenCV Applications with the Zynq-7000 All Programmable SoC using Vivado HLS Video Libraries August 28, 2013

Size: px
Start display at page:

Download "How to Accelerate OpenCV Applications with the Zynq-7000 All Programmable SoC using Vivado HLS Video Libraries August 28, 2013"

Transcription

1 How to Accelerate OpenCV Applications with the Zynq-7000 All Programmable SoC using Vivado HLS Video Libraries August 28, 2013

2 OpenCV Overview Open Source Computer Vision (OpenCV) is widely used to develop Computer Vision applications Library of optimized video functions Optimized for desktop processors and GPUs Tens of thousands users Runs out of the box on ARM processors in Zynq However HD processing with OpenCV is often limited by external memory Memory bandwidth is a bottleneck for performance Memory accesses limit power efficiency Zynq All-programmable SOCs are a great way of implementing embedded computer vision applications High performance and Low Power Page 2

3 Real-Time Computer Vision Applications Computer Vision Applications Real-time Analytics Function Advanced Drivers Assist for Safety Lane or Pedestrian detection Surveillance for Security Friend vs Foe recognition Machine Vision for Quality High velocity object detection Medical Imaging For non invasive surgery Tumor detection Page 3

4 Real-time Video Analytics Processing Pixel based Image Processing and Feature Extraction Frame based Feature processing and decision making 4Kx2K Pixel based Image processing and Feature extraction 1080p F1 F2 F3 720p 480p 100s Ops/pixel 8MPx100 Ops/ frame = 100s Gops 10000s Ops/feature 1000s of features/sec = Mops Page 4

5 Heterogeneous Implementation of Real-time Video Analytics Pixel based Image Processing and Feature Extraction Frame based Feature processing and decision making 4Kx2K Pixel based Image processing and Feature extraction Hardware Domain (FPGA) 1080p 720p 480p F1 Software F2 Domain (ARM) F3 100s Ops/pixel 8MPx100 Ops/ frame = 100s Gops 10000s Ops/feature 1000s of features/sec = Mops Page 5

6 Xilinx Real-time Image Analytics Implementation: Zynq All Programmable SoC Pixel based Image Processing and Feature Extraction Frame based Feature processing and decision making 4Kx2K Pixel based Image processing and Feature extraction 1080p F1 F2 F3 720p 480p 100s Ops/pixel 8MPx100 Ops/ frame = 100s Gops 10000s Ops/feature 1000s of features/sec = Mops Page 6

7 Vivado: Productivity gains for OpenCV functions C simulation of HD video algorithm ~1 fps RTL simulation of HD video 1 frame per hour Real-time FPGA implementation up to 60fps Page 7

8 Accelerating OpenCV Applications Driver Assist Broadcast Monitor HD Surveillance Video Conferencing Studio Cinema Camera Frame-level processing Library for PS Pixel processing interfaces and basic functions for analytics Vivado HLS Cinema Projection Digital Signage Office-class MFP Consumer Displays Machine Vision Medical Displays Page 8

9 Zynq Video TRD architecture DDR3 External Memory DDR3 Processing System DDR Memory Controller SD Card Hardened Peripherals Dual Core Cortex-A9 S_AXI_HP 64 bit S_AXI_GP 32b bit AXI4 Stream AXI Interconnect IP Core HDMI Video Input AXI VDMA HLS-generated pipeline Xylon Display Controller HDMI Video access to external memory using 64-bit High Performance ports Control register access using 32-bit General Purpose ports Video streams implemented using AXI4-Stream Page 9

10 IP Centric Design flow Accelerated IP Generation and Integration C based IP Creation User Preferred System Integration Environment C, C++ or SystemC C Libraries System Generator for DSP Floating point mathh Fixed point Video VHDL or Verilog plus SW Drivers Vivado IP Integrator IP Subsystem Xilinx IP 3 rd Party IP User IP Vivado RTL Integration Page 10

11 Page 11

12 Synthesizable Block Synthesized Block Using OpenCV in FPGA designs Pure OpenCV Application Integrated OpenCV Application OpenCV Reference Accelerated OpenCV Application Image File Read (OpenCV) Image File Read (OpenCV) Live Video Input OpenCV2AXIvideo Live Video Input AXIvideo2Mat AXIvideo2Mat OpenCV function chain OpenCV function chain HLS video library function chain HLS video library function chain Mat2AXIvideo Mat2AXIvideo Image File Write (OpenCV) Live Video Output AXIvideo2OpenCV Live Video Output Image File Write (OpenCV) Page 12

13 Pure OpenCV Application Image File Read (OpenCV) DDR3 External Memory DDR3 OpenCV function chain SD Card Processing System Hardened Peripherals DDR Memory Controller Dual Core Cortex-A9 Image File Write (OpenCV) AXI Interconnect HDMI Video Input AXI VDMA HLS-generated pipeline Xylon Display Controller HDMI Page 13

14 Pure OpenCV Application Image File Read (OpenCV) DDR3 External Memory 1 DDR3 OpenCV function chain SD Card Processing System Hardened Peripherals DDR Memory Controller Dual Core Cortex-A9 Image File Write (OpenCV) AXI Interconnect HDMI Video Input AXI VDMA HLS-generated pipeline Xylon Display Controller HDMI Page 14

15 Pure OpenCV Application Image File Read (OpenCV) DDR3 External Memory DDR3 OpenCV function chain SD Card Processing System Hardened Peripherals DDR Memory Controller Dual Core Cortex-A9 Image File Write (OpenCV) AXI Interconnect HDMI Video Input AXI VDMA HLS-generated pipeline Xylon Display Controller HDMI Page 15

16 Pure OpenCV Application Image File Read (OpenCV) DDR3 External Memory DDR3 OpenCV function chain SD Card Processing System Hardened Peripherals DDR Memory Controller Dual Core Cortex-A9 Image File Write (OpenCV) AXI Interconnect HDMI Video Input AXI VDMA HLS-generated pipeline Xylon Display Controller HDMI Page 16

17 Integrated OpenCV Application Live Video Input DDR3 External Memory DDR3 OpenCV function chain SD Card Processing System Hardened Peripherals DDR Memory Controller Dual Core Cortex-A9 Live Video Output AXI Interconnect HDMI Video Input AXI VDMA HLS-generated pipeline Xylon Display Controller HDMI Page 17

18 OpenCV Reference / Software Execution Image File Read (OpenCV) DDR3 External Memory DDR3 OpenCV2AXIvideo AXIvideo2Mat HLS video library function chain SD Card Processing System Hardened Peripherals DDR Memory Controller Dual Core Cortex-A9 Mat2AXIvideo AXI Interconnect AXIvideo2OpenCV Image File Write (OpenCV) HDMI Video Input AXI VDMA HLS-generated pipeline Xylon Display Controller HDMI Page 18

19 OpenCV Reference / In system Test Image File Read (OpenCV) DDR3 External Memory 1 2 DDR3 OpenCV2AXIvideo AXIvideo2Mat HLS video library function chain SD Card Processing System Hardened Peripherals DDR Memory Controller Dual Core Cortex-A9 Mat2AXIvideo AXI Interconnect AXIvideo2OpenCV Image File Write (OpenCV) HDMI Video Input AXI VDMA HLS-generated pipeline Xylon Display Controller HDMI Page 19

20 Accelerated OpenCV Application Live Video Input AXIvideo2Mat HLS video library function chain SD Card DDR3 External Memory 1 2 Processing System Hardened Peripherals DDR3 DDR Memory Controller Dual Core Cortex-A9 Mat2AXIvideo AXI Interconnect Live Video Output HDMI Video Input AXI VDMA HLS-generated pipeline Xylon Display Controller HDMI Page 20

21 OpenCV design flow OpenCV Block A 1) Develop OpenCV application on Desktop 2) Run OpenCV application on ARM cores without modification OpenCV Block B 3) Abstract FPGA portion using I/O functions 4) Replace OpenCV function calls with synthesizable code OpenCV Block C 5) Run HLS to generate FPGA accelerator 6) Replace call to synthesizable code with call to FPGA accelerator OpenCV Block D Page 21

22 Partitioned OpenCV Application OpenCV Block A opencv2axivideo AXIvideo2HLS OpenCV Block B Synchronization HLS Block B OpenCV Block C OpenCV Block D HLS Block C HLS2AXIvideo AXIvideo2opencv Synthesizable Page 22

23 OpenCV Design Tradeoffs OpenCV-based image processing is built around memory frame buffers Poor access locality -> small caches perform poorly Complex architectures for performance -> higher power Likely good enough for many applications Low resolution or framerate Processing of features or regions of interest in a larger image Streaming architectures give high performance and low power Chaining image processing functions reduces external memory accesses Video-optimized line buffers and window buffers simpler than processor caches Can be implemented with streaming optimizations in HLS Requires conversion of code to be synthesizable

24 HLS Video Libraries OpenCV functions are not directly synthesizable with HLS Dynamic memory allocation Floating point Assumes images are modified in external memory The HLS video library is intended to replace many basic OpenCV functions Similar interfaces and algorithms to OpenCV Focus on image processing functions implemented in FPGA fabric Includes FPGA-specific optimizations Fixed point operations instead of floating point On-chip Linebuffers and window buffers Not necessarily bit-accurate Page 24

25 Xilinx HLS Video Library Video Data Modeling Linebuffer class Window class AXI4-Stream IO Functions AXIvideo2Mat Mat2AXIvideo OpenCV Interface Functions cvmat2axivideo AXIvideo2cvMat cvmat2hlsmat hlsmat2cvmat IplImage2AXIvideo AXIvideo2IplImage IplImage2hlsMat hlsmat2iplimage CvMat2AXIvideo AXIvideo2CvMat CvMat2hlsMat hlsmat2cvmat Video Functions AbsDiff Duplicate MaxS Remap AddS EqualizeHist Mean Resize AddWeighted Erode Merge Scale And FASTX Min Set Avg Filter2D MinMaxLoc Sobel AvgSdv GaussianBlur MinS Split Cmp Harris Mul SubRS CmpS HoughLines2 Not SubS CornerHarris Integral PaintMask Sum CvtColor InitUndistortRectifyMap Range Threshold Dilate Max Reduce Zero For function signatures and descriptions, see the HLS user guide UG 902 Page 25

26 Video Library Functions C++ code contained in hls namespace #include hls_videoh Similar interface, equivalent behavior with OpenCV, eg OpenCV library: HLS video library: cvscale(src, dst, scale, shift); hls::scale<>(src, dst, scale, shift); Some constructor arguments have corresponding or replacement template parameters, eg OpenCV library: HLS video library: cv::mat mat(rows, cols, CV_8UC3); hls::mat<rows, COLS, HLS_8UC3> mat(rows, cols); ROWS and COLS specify the maximum size of an image processed Page 26

27 Video Library Core Structures OpenCV cv::point_<t>, CvPoint cv::size_<t>, CvSize cv::rect_<t>, CvRect HLS Video Library hls::point_<t>, hls::point hls::size_<t>, hls::size hls::rect_<t>, hls::rect cv::scalar_<t>, CvScalar hls::scalar<n, T> cv::mat, IplImage, CvMat hls::mat<rows, COLS, T> cv::mat mat(rows, cols, CV_8UC3); IplImage* img = cvcreateimage(cvsize(cols,rows), IPL_DEPTH_8U, 3); hls::mat<rows, COLS, HLS_8UC3> mat (rows, cols); hls::mat<rows, COLS, HLS_8UC3> img, (rows, cols); hls::mat<rows, COLS, HLS_8UC3> img; hls::window<rows, COLS, T> hls::linebuffer<rows, COLS, T> Page 27

28 Limitations Must replace OpenCV calls with video library functions Frame buffer access not supported through pointers use VDMA and AXI Stream adapter functions Random access not supported data read more than once must be duplicated see hls::duplicate() In-place update not supported eg cvrectangle (img, point1, point2) Read operation OpenCV pix = cv_matat<t>(i,j) pix = cvget2d(cv_img,i,j) HLS Video Library hls_img >> pix Write operation Page 28 cv_matat<t>(i,j) = pix cvset2d(cv_img,i,j,pix) hls_img << pix

29 OpenCV Code One image input, one image output Processed by chain of functions sequentially IplImage* src=cvloadimage("test_1080pbmp"); IplImage* dst=cvcreateimage(cvgetsize(src), src->depth, src->nchannels); Image Read (OpenCV) cvsobel(src, dst, 1, 0); cvsubs(dst, cvscalar(100,100,100), src); cvscale(src, dst, 2, 0); cverode(dst, src); cvdilate(src, dst); OpenCV function chain cvsaveimage("result_1080pbmp", dst); cvreleaseimage(&src); cvreleaseimage(&dst); Image Write (OpenCV) test_opencvcpp Page 29

30 Integrated OpenCV Application System provides pointer to frame buffers Synthesizable code can also be run on ARM void img_process(znq_s32 *rgb_data_in, ZNQ_S32 *rgb_data_out, int height, int width, int stride, int flag_opencv) { // constructing OpenCV interface IplImage* src_dma = cvcreateimageheader(cvsize(width, height), IPL_DEPTH_8U, 4); IplImage* dst_dma = cvcreateimageheader(cvsize(width, height), IPL_DEPTH_8U, 4); src_dma->imagedata = (char*)rgb_data_in; dst_dma->imagedata = (char*)rgb_data_out; src_dma->widthstep = 4 * stride; dst_dma->widthstep = 4 * stride; if (flag_opencv) { opencv_image_filter(src_dma, dst_dma); } else { sw_image_filter(src_dma, dst_dma); } Live Video Input OpenCV function chain Live Video Output cvreleaseimageheader(&src_dma); cvreleaseimageheader(&dst_dma); } img_filtersc Page 30

31 Accelerated with Vivado HLS video library Top level function extracted for HW acceleration #include hls_videoh // header file of HLS video library #include hls_opencvh // header file of OpenCV I/O // typedef video library core structures typedef hls::stream<ap_axiu<32,1,1,1> > typedef hls::scalar<3, uchar> typedef hls::mat<1080,1920,hls_8uc3> AXI_STREAM; RGB_PIXEL; RGB_IMAGE; Image Read (OpenCV) void image_filter(axi_stream& src_axi, AXI_STREAM& dst_axi, int rows, int cols); #include toph IplImage* src=cvloadimage("test_1080pbmp"); IplImage* dst=cvcreateimage(cvgetsize(src), src->depth, src->nchannels); AXI_STREAM src_axi, dst_axi; IplImage2AXIvideo(src, src_axi); image_filter(src_axi, dst_axi, src->height, src->width); toph OpenCV2AXIvideo AXIvideo2Mat HLS video library function chain Mat2AXIvideo AXIvideo2OpenCV AXIvideo2IplImage(dst_axi, dst); cvsaveimage("result_1080pbmp", dst); cvreleaseimage(&src); cvreleaseimage(&dst); Page 31 testcpp Image Write (OpenCV)

32 Accelerated with Vivado HLS video library HW Synthesizable Block for FPGA acceleration Consist of video library function and interfaces Replace OpenCV function with similar function in hls namespace void image_filter(axi_stream& input, AXI_STREAM& output, int rows, int cols) { //Create AXI streaming interfaces for the core #pragma HLS RESOURCE variable=input core=axis metadata="-bus_bundle INPUT_STREAM" #pragma HLS RESOURCE variable=output core=axis metadata="-bus_bundle OUTPUT_STREAM" #pragma HLS RESOURCE variable=rows core=axi_slave metadata="-bus_bundle CONTROL_BUS" #pragma HLS RESOURCE variable=cols core=axi_slave metadata="-bus_bundle CONTROL_BUS" #pragma HLS RESOURCE variable=return core=axi_slave metadata="-bus_bundle CONTROL_BUS" #pragma HLS INTERFACE ap_stable port=rows #pragma HLS INTERFACE ap_stable port=cols RGB_IMAGE img_0(rows, cols), img_1(rows, cols), img_2(rows, cols); RGB_IMAGE img_3(rows, cols), img_4(rows, cols), img_5(rows, cols); RGB_PIXEL pix(50, 50, 50); #pragma HLS dataflow hls::axivideo2mat(input, img_0); hls::sobel<1,0,3>(img_0, img_1); hls::subs(img_1, pix, img_2); hls::scale(img_2, img_3, 2, 0); hls::erode(img_3, img_4); hls::dilate(img_4, img_5); hls::mat2axivideo(img_5, output); } topcpp Image Read (OpenCV) OpenCV2AXIvideo AXIvideo2Mat HLS video library function chain Mat2AXIvideo AXIvideo2OpenCV Image Write (OpenCV) Page 32

33 Using Linux Userspace API Modify device tree to include register map { compatible = "xlnx,generic-hls"; reg = <0x400d0000 0xffff>; interrupts = <0x0 0x37 0x4>; interrupt-parent = <0x1>; }; Live Video Input Call from userspace after mmap() Ximage_filter xsfilter; int fd_uio = 0; AXIvideo2Mat if ((fd_uio = open("/dev/uio0", O_RDWR)) < 0) { printf("uio: Cannot open device node\n"); } xsfiltercontrol_bus_baseaddress = (u32)mmap(null, XSOBEL_FILTER_CONTROL_BUS_SIZE, PROT_READ PROT_WRITE, MAP_SHARED, fd_uio, 0); xsfilterisready = XIL_COMPONENT_IS_READY; HLS video library function chain Mat2AXIvideo Live Video Output // init the configuration for image filter XImage_filter_SetRows(&xsfilter, sobel_configurationheight); XImage_filter_SetCols(&xsfilter, sobel_configurationwidth); XImage_filter_EnableAutoRestart(&xsfilter); XImage_filter_Start(&xsfilter); Page 33

34 HLS Directives for Video Processing Assign input to be an AXI4 stream named INPUT_STREAM #pragma HLS RESOURCE variable=input core=axis metadata="-bus_bundle INPUT_STREAM" Assign control interface to an AXI4-Lite interface #pragma HLS RESOURCE variable=return core=axi_slave metadata="-bus_bundle CONTROL_BUS" Assign rows to be accessible through the AXI4-Lite interface #pragma HLS RESOURCE variable=rows core=axi_slave metadata="-bus_bundle CONTROL_BUS" Declare that rows will not be changed during the execution of the function #pragma HLS INTERFACE ap_stable port=rows Enable streaming dataflow optimizations #pragma HLS dataflow Page 34

35 A more complex OpenCV example: fast-corners This code is not streaming and must be rewritten Random access and in-place operation on dst void opencv_image_filter(iplimage* img, IplImage* dst ) { IplImage* gray = cvcreateimage(cvsize(img->width,img->height), 8, 1 ); cvcvtcolor( img, gray, CV_BGR2GRAY ); std::vector<cv::keypoint> keypoints; cv::mat gray_mat(gray,0); cv::fast(gray_mat, keypoints, 20,true ); int rect=2; cvcopy(img,dst); for (int i=0; i<keypointssize(); i++) { cvrectangle(dst, cvpoint(keypoints[i]ptx,keypoints[i]pty), cvpoint(keypoints[i]ptx+rect,keypoints[i]pty+rect), cvscalar(255,0,0),1); } cvreleaseimage( &gray ); } opencv_topcpp Page 35

36 A more complex OpenCV example: fast-corners This code is streaming Note that function correspondence is not 1:1! void opencv_image_filter(iplimage* src, IplImage* dst) { IplImage* gray = cvcreateimage( cvgetsize(src), 8, 1 ); IplImage* mask = cvcreateimage( cvgetsize(src), 8, 1 ); IplImage* dmask = cvcreateimage( cvgetsize(src), 8, 1 ); std::vector<cv::keypoint> keypoints; cv::mat gray_mat(gray,0); cvcvtcolor(src, gray, CV_BGR2GRAY ); cv::fast(gray_mat, keypoints, 20, true); GenMask(mask, keypoints); cvdilate(mask,dmask); cvcopy(src,dst); PrintMask(dst,dmask,cvScalar(255,0,0)); hls::fastx hls::paintmask cvreleaseimage( &mask ); cvreleaseimage( &dmask ); cvreleaseimage( &gray ); } opencv_topcpp Page 36

37 A more complex OpenCV example: fast-corners Synthesizable code Note #pragma HLS stream hls::mat<max_height,max_width,hls_8uc3> _src(rows,cols); hls::mat<max_height,max_width,hls_8uc3> _dst(rows,cols); hls::axivideo2mat(input, _src); hls::mat<max_height,max_width,hls_8uc3> src0(rows,cols); hls::mat<max_height,max_width,hls_8uc3> src1(rows,cols); #pragma HLS stream depth=20000 variable=src1data_stream hls::mat<max_height,max_width,hls_8uc1> mask(rows,cols); hls::mat<max_height,max_width,hls_8uc1> dmask(rows,cols); hls::scalar<3,unsigned char> color(255,0,0); hls::duplicate(_src,src0,src1); hls::mat<max_height,max_width,hls_8uc1> gray(rows,cols); hls::cvtcolor<hls_bgr2gray>(src0,gray); hls::fastx(gray,mask,20,true); hls::dilate(mask,dmask); hls::paintmask(src1,dmask,_dst,color); hls::mat2axivideo(_dst, output); topcpp Page 37

38 Streams and Reconvergent paths hls::mat conceptually represents a whole image, but is implemented as a stream of pixels template<int ROWS, int COLS, int T> class Mat { public: HLS_SIZE_T rows, cols; hls::stream<hls_tname(t)> data_stream[hls_mat_cn(t)]; }; hls_video_coreh Fast-corners contains a reconvergent path The stream of pixels for src1 must include enough buffering to match the delay through FASTX and Dilate (approximately 10 video lines * 1920 pixels) CvtColor FASTX Dilate PaintMask src1 #pragma HLS stream depth=20000 variable=src1data_stream Page 38

39 Performance Analysis AXI Performance Monitor collects statistics on memory bandwidth see /mnt/axi_perfmonlog Video + fast corners 1920*1080*60*32 = ~4 Gb/s per stream HP0: Read 401 Gb/s, Write 401 Gb/s, Total 803 Gb/s HP2: Read 401 Gb/s, Write 401 Gb/s, Total 803 Gb/s Page 39

40 Power Analysis Voltage and Current can be read from the digital power regulators on the ZC702 board Custom, realtime HD video processing in 2-3 Watts total system power FASTX is less than 200 mw incremental power Page Active Idle Idle + Video Fast Corners + video DDR PL IO PL core PS IO PS core

41 HLS and Zynq accelerates OpenCV apps OpenCV functions enable fast prototyping of Computer Vision algorithms Computer Vision applications are inherently heterogenous and require a mix HW and SW implementation Vivado HLS video library accelerates mapping of opencv functions to FPGA programmable fabric Zynq offers power-optimized integrated solution with high performance programmable logic and embedded ARM Page 41

42 Additional OpenCV Collateral at Xilinxcom Download XAPP1167 from Xilinxcom QuickTake: Leveraging OpenCV and High-Level Synthesis with Vivado Page 42

Multimedia Retrieval Exercise Course 2 Basic of Image Processing by OpenCV

Multimedia Retrieval Exercise Course 2 Basic of Image Processing by OpenCV Multimedia Retrieval Exercise Course 2 Basic of Image Processing by OpenCV Kimiaki Shirahama, D.E. Research Group for Pattern Recognition Institute for Vision and Graphics University of Siegen, Germany

More information

Design AXI Master IP using Vivado HLS tool

Design AXI Master IP using Vivado HLS tool W H I T E P A P E R Venkatesh W VLSI Design Engineer and Srikanth Reddy Sr.VLSI Design Engineer Design AXI Master IP using Vivado HLS tool Abstract Vivado HLS (High-Level Synthesis) tool converts C, C++

More information

Optimizing HW/SW Partition of a Complex Embedded Systems. Simon George November 2015.

Optimizing HW/SW Partition of a Complex Embedded Systems. Simon George November 2015. Optimizing HW/SW Partition of a Complex Embedded Systems Simon George November 2015 Zynq-7000 All Programmable SoC HP ACP GP Page 2 Zynq UltraScale+ MPSoC Page 3 HW/SW Optimization Challenges application()

More information

81920**slide. 1Developing the Accelerator Using HLS

81920**slide. 1Developing the Accelerator Using HLS 81920**slide - 1Developing the Accelerator Using HLS - 82038**slide Objectives After completing this module, you will be able to: Describe the high-level synthesis flow Describe the capabilities of the

More information

The Design of Sobel Edge Extraction System on FPGA

The Design of Sobel Edge Extraction System on FPGA The Design of Sobel Edge Extraction System on FPGA Yu ZHENG 1, * 1 School of software, Beijing University of technology, Beijing 100124, China; Abstract. Edge is a basic feature of an image, the purpose

More information

Integrated Workflow to Implement Embedded Software and FPGA Designs on the Xilinx Zynq Platform Puneet Kumar Senior Team Lead - SPC

Integrated Workflow to Implement Embedded Software and FPGA Designs on the Xilinx Zynq Platform Puneet Kumar Senior Team Lead - SPC Integrated Workflow to Implement Embedded Software and FPGA Designs on the Xilinx Zynq Platform Puneet Kumar Senior Team Lead - SPC 2012 The MathWorks, Inc. 1 Agenda Integrated Hardware / Software Top

More information

SDSoC: Session 1

SDSoC: Session 1 SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the

More information

借助 SDSoC 快速開發複雜的嵌入式應用

借助 SDSoC 快速開發複雜的嵌入式應用 借助 SDSoC 快速開發複雜的嵌入式應用 May 2017 What Is C/C++ Development System-level Profiling SoC application-like programming Tools and IP for system-level profiling Specify C/C++ Functions for Acceleration Full System

More information

A success story of leveraging SDSoC TM to accelerate customer software algorithm

A success story of leveraging SDSoC TM to accelerate customer software algorithm A success story of leveraging SDSoC TM to accelerate customer software algorithm Agenda: Corporate Profile Story Summary Story Detail MPSoC DEMO Wataru Takahashi Manager 1 Corporate name OKI IDS Co., Ltd.

More information

MATLAB/Simulink 기반의프로그래머블 SoC 설계및검증

MATLAB/Simulink 기반의프로그래머블 SoC 설계및검증 MATLAB/Simulink 기반의프로그래머블 SoC 설계및검증 이웅재부장 Application Engineering Group 2014 The MathWorks, Inc. 1 Agenda Introduction ZYNQ Design Process Model-Based Design Workflow Prototyping and Verification Processor

More information

Hardware-Software Co-Design and Prototyping on SoC FPGAs Puneet Kumar Prateek Sikka Application Engineering Team

Hardware-Software Co-Design and Prototyping on SoC FPGAs Puneet Kumar Prateek Sikka Application Engineering Team Hardware-Software Co-Design and Prototyping on SoC FPGAs Puneet Kumar Prateek Sikka Application Engineering Team 2015 The MathWorks, Inc. 1 Agenda Integrated Hardware / Software Top down Workflow for SoC

More information

Designing Multi-Channel, Real-Time Video Processors with Zynq All Programmable SoC Hyuk Kim Embedded Specialist Jun, 2014

Designing Multi-Channel, Real-Time Video Processors with Zynq All Programmable SoC Hyuk Kim Embedded Specialist Jun, 2014 Designing Multi-Channel, Real-Time Video Processors with Zynq All Programmable SoC Hyuk Kim Embedded Specialist Jun, 2014 Broadcast & Pro A/V Landscape Xilinx Smarter Vision in action across the entire

More information

OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision. Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017

OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision. Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017 OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017 Agenda Why Zynq SoCs for Traditional Computer Vision Automated

More information

IMAGE PROCESSING AND OPENCV. Sakshi Sinha Harshad Sawhney

IMAGE PROCESSING AND OPENCV. Sakshi Sinha Harshad Sawhney l IMAGE PROCESSING AND OPENCV Sakshi Sinha Harshad Sawhney WHAT IS IMAGE PROCESSING? IMAGE PROCESSING = IMAGE + PROCESSING WHAT IS IMAGE? IMAGE = Made up of PIXELS. Each Pixels is like an array of Numbers.

More information

Cadence SystemC Design and Verification. NMI FPGA Network Meeting Jan 21, 2015

Cadence SystemC Design and Verification. NMI FPGA Network Meeting Jan 21, 2015 Cadence SystemC Design and Verification NMI FPGA Network Meeting Jan 21, 2015 The High Level Synthesis Opportunity Raising Abstraction Improves Design & Verification Optimizes Power, Area and Timing for

More information

Real-time System Implementation for Video Processing

Real-time System Implementation for Video Processing Paper ID #15811 Real-time System Implementation for Video Processing Dr. Wagdy H Mahmoud, University of the District of Columbia Wagdy H. Mahmoud is an Associate Professor of electrical engineering at

More information

ZYBO Video Workshop. Paris, FRANCE

ZYBO Video Workshop. Paris, FRANCE ZYBO Video Workshop Paris, FRANCE 23.03.2017 1 Theoretical background Software is everywhere. The flexibility it offers to designers allows it to be used in a multitude of applications. Many consumer,

More information

Designing and Prototyping Digital Systems on SoC FPGA The MathWorks, Inc. 1

Designing and Prototyping Digital Systems on SoC FPGA The MathWorks, Inc. 1 Designing and Prototyping Digital Systems on SoC FPGA Hitu Sharma Application Engineer Vinod Thomas Sr. Training Engineer 2015 The MathWorks, Inc. 1 What is an SoC FPGA? A typical SoC consists of- A microcontroller,

More information

OpenCV. Rishabh Maheshwari Electronics Club IIT Kanpur

OpenCV. Rishabh Maheshwari Electronics Club IIT Kanpur OpenCV Rishabh Maheshwari Electronics Club IIT Kanpur Installing OpenCV Download and Install OpenCV 2.1:- http://sourceforge.net/projects/opencvlibrary/fi les/opencv-win/2.1/ Download and install Dev C++

More information

Vivado HLx Design Entry. June 2016

Vivado HLx Design Entry. June 2016 Vivado HLx Design Entry June 2016 Agenda What is the HLx Design Methodology? New & Early Access features for Connectivity Platforms Creating Differentiated Logic 2 What is the HLx Design Methodology? Page

More information

Midterm Exam. Solutions

Midterm Exam. Solutions Midterm Exam Solutions Problem 1 List at least 3 advantages of implementing selected portions of a complex design in software Software vs. Hardware Trade-offs Improve Performance Improve Energy Efficiency

More information

Throughput Exploration and Optimization of a Consumer Camera Interface for a Reconfigurable Platform

Throughput Exploration and Optimization of a Consumer Camera Interface for a Reconfigurable Platform Throughput Exploration and Optimization of a Consumer Camera Interface for a Reconfigurable Platform By: Floris Driessen (f.c.driessen@student.tue.nl) Introduction 1 Video applications on embedded platforms

More information

Ted N. Booth. DesignLinx Hardware Solutions

Ted N. Booth. DesignLinx Hardware Solutions Ted N. Booth DesignLinx Hardware Solutions September 2015 Using Vivado HLS for Video Algorithm Implementation for Demonstration and Validation Agenda Project Description HLS Lessons Learned Summary Project

More information

A So%ware Developer's Journey into a Deeply Heterogeneous World. Tomas Evensen, CTO Embedded So%ware, Xilinx

A So%ware Developer's Journey into a Deeply Heterogeneous World. Tomas Evensen, CTO Embedded So%ware, Xilinx A So%ware Developer's Journey into a Deeply Heterogeneous World Tomas Evensen, CTO Embedded So%ware, Xilinx Embedded Development: Then Simple single CPU Most code developed internally 10 s of thousands

More information

Extending the Power of FPGAs to Software Developers:

Extending the Power of FPGAs to Software Developers: Extending the Power of FPGAs to Software Developers: The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Group Page 1 Agenda The Evolution of FPGAs and FPGA Programming

More information

FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA

FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA 1 FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA Compiler Tan Nguyen 1, Swathi Gurumani 1, Kyle Rupnow 1, Deming Chen 2 1 Advanced Digital Sciences Center, Singapore {tan.nguyen,

More information

Copyright 2014 Xilinx

Copyright 2014 Xilinx IP Integrator and Embedded System Design Flow Zynq Vivado 2014.2 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able

More information

NEW FPGA DESIGN AND VERIFICATION TECHNIQUES MICHAL HUSEJKO IT-PES-ES

NEW FPGA DESIGN AND VERIFICATION TECHNIQUES MICHAL HUSEJKO IT-PES-ES NEW FPGA DESIGN AND VERIFICATION TECHNIQUES MICHAL HUSEJKO IT-PES-ES Design: Part 1 High Level Synthesis (Xilinx Vivado HLS) Part 2 SDSoC (Xilinx, HLS + ARM) Part 3 OpenCL (Altera OpenCL SDK) Verification:

More information

Designing and Targeting Video Processing Subsystems for Hardware

Designing and Targeting Video Processing Subsystems for Hardware 1 Designing and Targeting Video Processing Subsystems for Hardware 정승혁과장 Senior Application Engineer MathWorks Korea 2017 The MathWorks, Inc. 2 Pixel-stream Frame-based Process : From Algorithm to Hardware

More information

OpenCV. OpenCV Tutorials OpenCV User Guide OpenCV API Reference. docs.opencv.org. F. Xabier Albizuri

OpenCV. OpenCV Tutorials OpenCV User Guide OpenCV API Reference. docs.opencv.org. F. Xabier Albizuri OpenCV OpenCV Tutorials OpenCV User Guide OpenCV API Reference docs.opencv.org F. Xabier Albizuri - 2014 OpenCV Tutorials OpenCV Tutorials: Introduction to OpenCV The Core Functionality (core module) Image

More information

ECE 5775 (Fall 17) High-Level Digital Design Automation. Hardware-Software Co-Design

ECE 5775 (Fall 17) High-Level Digital Design Automation. Hardware-Software Co-Design ECE 5775 (Fall 17) High-Level Digital Design Automation Hardware-Software Co-Design Announcements Midterm graded You can view your exams during TA office hours (Fri/Wed 11am-noon, Rhodes 312) Second paper

More information

1. Introduction to the OpenCV library

1. Introduction to the OpenCV library Image Processing - Laboratory 1: Introduction to the OpenCV library 1 1. Introduction to the OpenCV library 1.1. Introduction The purpose of this laboratory is to acquaint the students with the framework

More information

Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration

Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration Marie Nguyen Carnegie Mellon University Pittsburgh, Pennsylvania James C. Hoe Carnegie Mellon University Pittsburgh,

More information

High Level Synthesis of Canny Edge Detection Algorithm on Zynq Platform

High Level Synthesis of Canny Edge Detection Algorithm on Zynq Platform High Level Synthesis of Canny Edge Detection Algorithm on Zynq Platform Hanaa M. Abdelgawad, Mona Safar, Ayman M. Wahba Abstract Real time image and video processing is a demand in many computer vision

More information

Computing to the Energy and Performance Limits with Heterogeneous CPU-FPGA Devices. Dr Jose Luis Nunez-Yanez University of Bristol

Computing to the Energy and Performance Limits with Heterogeneous CPU-FPGA Devices. Dr Jose Luis Nunez-Yanez University of Bristol Computing to the Energy and Performance Limits with Heterogeneous CPU-FPGA Devices Dr Jose Luis Nunez-Yanez University of Bristol Power and energy savings at run-time Power = α.c.v 2.f+g1.V 3 Energy =

More information

Midterm Exam. Solutions

Midterm Exam. Solutions Midterm Exam Solutions Problem 1 List at least 3 advantages of implementing selected portions of a design in hardware, and at least 3 advantages of implementing the remaining portions of the design in

More information

Extending the Power of FPGAs

Extending the Power of FPGAs Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development Agenda The Evolution of FPGAs and FPGA Programming IP-Centric Design with

More information

ECE 661 HW_4. Bharath Kumar Comandur J R 10/02/2012. In this exercise we develop a Harris Corner Detector to extract interest points (such as

ECE 661 HW_4. Bharath Kumar Comandur J R 10/02/2012. In this exercise we develop a Harris Corner Detector to extract interest points (such as ECE 661 HW_4 Bharath Kumar Comandur J R 10/02/2012 1 Introduction In this exercise we develop a Harris Corner Detector to extract interest points (such as corners) in a given image. We apply the algorithm

More information

OpenCV. Basics. Department of Electrical Engineering and Computer Science

OpenCV. Basics. Department of Electrical Engineering and Computer Science OpenCV Basics 1 OpenCV header file OpenCV namespace OpenCV basic structures Primitive data types Point_ Size_ Vec Scalar_ Mat Basics 2 OpenCV Header File #include .hpp is a convention

More information

Hardware Acceleration of Feature Detection and Description Algorithms on Low Power Embedded Platforms

Hardware Acceleration of Feature Detection and Description Algorithms on Low Power Embedded Platforms Hardware Acceleration of Feature Detection and Description Algorithms on LowPower Embedded Platforms Onur Ulusel, Christopher Picardo, Christopher Harris, Sherief Reda, R. Iris Bahar, School of Engineering,

More information

ESE532: System-on-a-Chip Architecture. Today. Message. Clock Cycle BRAM

ESE532: System-on-a-Chip Architecture. Today. Message. Clock Cycle BRAM ESE532: System-on-a-Chip Architecture Day 20: April 3, 2017 Pipelining, Frequency, Dataflow Today What drives cycle times Pipelining in Vivado HLS C Avoiding bottlenecks feeding data in Vivado HLS C Penn

More information

An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware

An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware Tao Chen, Shreesha Srinath Christopher Batten, G. Edward Suh Computer Systems Laboratory School of Electrical

More information

[Sub Track 1-3] FPGA/ASIC 을타겟으로한알고리즘의효율적인생성방법및신기능소개

[Sub Track 1-3] FPGA/ASIC 을타겟으로한알고리즘의효율적인생성방법및신기능소개 [Sub Track 1-3] FPGA/ASIC 을타겟으로한알고리즘의효율적인생성방법및신기능소개 정승혁과장 Senior Application Engineer MathWorks Korea 2015 The MathWorks, Inc. 1 Outline When FPGA, ASIC, or System-on-Chip (SoC) hardware is needed Hardware

More information

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation

More information

HEAD HardwarE Accelerated Deduplication

HEAD HardwarE Accelerated Deduplication HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee Executive Summary A-Z development of deduplication SW version

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

ESL design with the Agility Compiler for SystemC

ESL design with the Agility Compiler for SystemC ESL design with the Agility Compiler for SystemC SystemC behavioral design & synthesis Steve Chappell & Chris Sullivan Celoxica ESL design portfolio Complete ESL design environment Streaming Video Processing

More information

Digital Blocks Semiconductor IP

Digital Blocks Semiconductor IP Digital Blocks Semiconductor IP General Description The Digital Blocks LCD Controller IP Core interfaces a video image in frame buffer memory via the AMBA 3.0 / 4.0 AXI Protocol Interconnect to a 4K and

More information

Altera SDK for OpenCL

Altera SDK for OpenCL Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group

More information

GigaX API for Zynq SoC

GigaX API for Zynq SoC BUM002 v1.0 USER MANUAL A software API for Zynq PS that Enables High-speed GigaE-PL Data Transfer & Frames Management BERTEN DSP S.L. www.bertendsp.com gigax@bertendsp.com +34 942 18 10 11 Table of Contents

More information

Multimedia SoC System Solutions

Multimedia SoC System Solutions Multimedia SoC System Solutions Presented By Yashu Gosain & Forrest Picket: System Software & SoC Solutions Marketing Girish Malipeddi: IP Subsystems Marketing Agenda Zynq Ultrascale+ MPSoC and Multimedia

More information

Exploring OpenCL Memory Throughput on the Zynq

Exploring OpenCL Memory Throughput on the Zynq Exploring OpenCL Memory Throughput on the Zynq Technical Report no. 2016:04, ISSN 1652-926X Chalmers University of Technology Bo Joel Svensson bo.joel.svensson@gmail.com Abstract The Zynq platform combines

More information

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference FINN: A Framework for Fast, Scalable Binarized Neural Network Inference Yaman Umuroglu (XIR & NTNU), Nick Fraser (XIR & USydney), Giulio Gambardella (XIR), Michaela Blott (XIR), Philip Leong (USydney),

More information

FPGA 加速机器学习应用. 罗霖 2017 年 6 月 20 日

FPGA 加速机器学习应用. 罗霖 2017 年 6 月 20 日 FPGA 加速机器学习应用 罗霖 Andy.luo@Xilinx.com 2017 年 6 月 20 日 Xilinx The All Programmable Company XILINX - Founded 1984 Headquarters Research and Development Sales and Support Manufacturing $2.21B FY16 revenue

More information

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei

More information

TP : System on Chip (SoC) 1

TP : System on Chip (SoC) 1 TP : System on Chip (SoC) 1 Goals : -Discover the VIVADO environment and SDK tool from Xilinx -Programming of the Software part of a SoC -Control of hardware peripheral using software running on the ARM

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

Software Driven Verification at SoC Level. Perspec System Verifier Overview

Software Driven Verification at SoC Level. Perspec System Verifier Overview Software Driven Verification at SoC Level Perspec System Verifier Overview June 2015 IP to SoC hardware/software integration and verification flows Cadence methodology and focus Applications (Basic to

More information

Digital Blocks Semiconductor IP

Digital Blocks Semiconductor IP Digital Blocks Semiconductor IP -UHD General Description The Digital Blocks -UHD LCD Controller IP Core interfaces a video image in frame buffer memory via the AMBA 3.0 / 4.0 AXI Protocol Interconnect

More information

FPGA Entering the Era of the All Programmable SoC

FPGA Entering the Era of the All Programmable SoC FPGA Entering the Era of the All Programmable SoC Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates on Cost Page 3 Design Cost Estimated Chip

More information

Implementation of Hardware Accelerators on Zynq

Implementation of Hardware Accelerators on Zynq Downloaded from orbit.dtu.dk on: Dec 29, 2018 Implementation of Hardware Accelerators on Zynq Toft, Jakob Kenn; Nannarelli, Alberto Publication date: 2016 Document Version Publisher's PDF, also known as

More information

A framework for optimizing OpenVX Applications on Embedded Many Core Accelerators

A framework for optimizing OpenVX Applications on Embedded Many Core Accelerators A framework for optimizing OpenVX Applications on Embedded Many Core Accelerators Giuseppe Tagliavini, DEI University of Bologna Germain Haugou, IIS ETHZ Andrea Marongiu, DEI University of Bologna & IIS

More information

MYC-C7Z010/20 CPU Module

MYC-C7Z010/20 CPU Module MYC-C7Z010/20 CPU Module - 667MHz Xilinx XC7Z010/20 Dual-core ARM Cortex-A9 Processor with Xilinx 7-series FPGA logic - 1GB DDR3 SDRAM (2 x 512MB, 32-bit), 4GB emmc, 32MB QSPI Flash - On-board Gigabit

More information

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,

More information

Transportation Informatics Group, ALPEN-ADRIA University of Klagenfurt. Transportation Informatics Group University of Klagenfurt 12/24/2009 1

Transportation Informatics Group, ALPEN-ADRIA University of Klagenfurt. Transportation Informatics Group University of Klagenfurt 12/24/2009 1 Machine Vision Transportation Informatics Group University of Klagenfurt Alireza Fasih, 2009 12/24/2009 1 Address: L4.2.02, Lakeside Park, Haus B04, Ebene 2, Klagenfurt-Austria 2D Shape Based Matching

More information

Early Models in Silicon with SystemC synthesis

Early Models in Silicon with SystemC synthesis Early Models in Silicon with SystemC synthesis Agility Compiler summary C-based design & synthesis for SystemC Pure, standard compliant SystemC/ C++ Most widely used C-synthesis technology Structural SystemC

More information

Tutorial on Software-Hardware Codesign with CORDIC

Tutorial on Software-Hardware Codesign with CORDIC ECE5775 High-Level Digital Design Automation, Fall 2017 School of Electrical Computer Engineering, Cornell University Tutorial on Software-Hardware Codesign with CORDIC 1 Introduction So far in ECE5775

More information

Use ZCU102 TRD to Accelerate Development of ZYNQ UltraScale+ MPSoC

Use ZCU102 TRD to Accelerate Development of ZYNQ UltraScale+ MPSoC Use ZCU102 TRD to Accelerate Development of ZYNQ UltraScale+ MPSoC Topics Hardware advantages of ZYNQ UltraScale+ MPSoC Software stacks of MPSoC Target reference design introduction Details about one Design

More information

SDSoC Technical Seminars Feb 2016

SDSoC Technical Seminars Feb 2016 SDSoC Technical Seminars 2016 Feb 2016 Agenda Zynq SoC and MPSoC Architecture SDSoC Overview Real-life Success C/C++ to Optimized System Targeting Your Own Platform Next Steps Page 2 Agenda Zynq SoC and

More information

Lab Exercise 4 System on chip Implementation of a system on chip system on the Zynq

Lab Exercise 4 System on chip Implementation of a system on chip system on the Zynq Lab Exercise 4 System on chip Implementation of a system on chip system on the Zynq INF3430/INF4431 Autumn 2016 Version 1.2/06.09.2016 This lab exercise consists of 4 parts, where part 4 is compulsory

More information

Simplify System Complexity

Simplify System Complexity Simplify System Complexity With the new high-performance CompactRIO controller Fanie Coetzer Field Sales Engineer Northern South Africa 2 3 New control system CompactPCI MMI/Sequencing/Logging FieldPoint

More information

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference FINN: A Framework for Fast, Scalable Binarized Neural Network Inference Yaman Umuroglu (NTNU & Xilinx Research Labs Ireland) in collaboration with N Fraser, G Gambardella, M Blott, P Leong, M Jahre and

More information

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Niu Feng Technical Specialist, ARM Tech Symposia 2016 Agenda Introduction Challenges: Optimizing cache coherent subsystem

More information

Interaction Technology

Interaction Technology Faculty of Science Information and Computing Sciences 2017 Introduction Computer Vision Coert van Gemeren 8 maart 2017 Information and Computing Sciences TODAY 1.Computer Vision 2.Programming C/C++ OpenCV

More information

Mapping applications into MPSoC

Mapping applications into MPSoC Mapping applications into MPSoC concurrency & communication Jos van Eijndhoven jos@vectorfabrics.com March 12, 2011 MPSoC mapping: exploiting concurrency 2 March 12, 2012 Computation on general purpose

More information

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.01.a)

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.01.a) DS799 June 22, 2011 LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.01.a) Introduction The AXI Video Direct Memory Access (AXI VDMA) core is a soft Xilinx IP core for use with the Xilinx Embedded

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

Motor Control: Model-Based Design from Concept to Implementation on heterogeneous SoC FPGAs Alexander Schreiber, MathWorks

Motor Control: Model-Based Design from Concept to Implementation on heterogeneous SoC FPGAs Alexander Schreiber, MathWorks Motor Control: Model-Based Design from Concept to Implementation on heterogeneous SoC FPGAs Alexander Schreiber, MathWorks 2014 The MathWorks, Inc. 1 Some components of a production application Production

More information

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Nick Fraser (Xilinx & USydney) Yaman Umuroglu (Xilinx & NTNU) Giulio Gambardella (Xilinx)

More information

Adaptable Intelligence The Next Computing Era

Adaptable Intelligence The Next Computing Era Adaptable Intelligence The Next Computing Era Hot Chips, August 21, 2018 Victor Peng, CEO, Xilinx Pervasive Intelligence from Cloud to Edge to Endpoints >> 1 Exponential Growth and Opportunities Data Explosion

More information

Analyzing the Generation and Optimization of an FPGA Accelerator using High Level Synthesis

Analyzing the Generation and Optimization of an FPGA Accelerator using High Level Synthesis Paper Analyzing the Generation and Optimization of an FPGA Accelerator using High Level Synthesis Sebastian Kaltenstadler Ulm University Ulm, Germany sebastian.kaltenstadler@missinglinkelectronics.com

More information

Performance Verification for ESL Design Methodology from AADL Models

Performance Verification for ESL Design Methodology from AADL Models Performance Verification for ESL Design Methodology from AADL Models Hugues Jérome Institut Supérieur de l'aéronautique et de l'espace (ISAE-SUPAERO) Université de Toulouse 31055 TOULOUSE Cedex 4 Jerome.huges@isae.fr

More information

ECE 661 HW 1. Chad Aeschliman

ECE 661 HW 1. Chad Aeschliman ECE 661 HW 1 Chad Aeschliman 2008-09-09 1 Problem The problem is to determine the homography which maps a point or line from a plane in the world to the image plane h 11 h 12 h 13 x i = h 21 h 22 h 23

More information

SDAccel Development Environment User Guide

SDAccel Development Environment User Guide SDAccel Development Environment User Guide Features and Development Flows Revision History The following table shows the revision history for this document. Date Version Revision 05/13/2016 2016.1 Added

More information

Multimedia Retrieval Exercise Course 2 Basic Knowledge about Images in OpenCV

Multimedia Retrieval Exercise Course 2 Basic Knowledge about Images in OpenCV Multimedia Retrieval Exercise Course 2 Basic Knowledge about Images in OpenCV Kimiaki Shirahama, D.E. Research Group for Pattern Recognition Institute for Vision and Graphics University of Siegen, Germany

More information

LegUp HLS Tutorial for Microsemi PolarFire Sobel Filtering for Image Edge Detection

LegUp HLS Tutorial for Microsemi PolarFire Sobel Filtering for Image Edge Detection LegUp HLS Tutorial for Microsemi PolarFire Sobel Filtering for Image Edge Detection This tutorial will introduce you to high-level synthesis (HLS) concepts using LegUp. You will apply HLS to a real problem:

More information

Yafit Snir Arindam Guha Cadence Design Systems, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces

Yafit Snir Arindam Guha Cadence Design Systems, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces Yafit Snir Arindam Guha, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces Agenda Overview: MIPI Verification approaches and challenges Acceleration methodology overview and

More information

High-Level Synthesis Optimization for Blocked Floating-Point Matrix Multiplication

High-Level Synthesis Optimization for Blocked Floating-Point Matrix Multiplication High-Level Synthesis Optimization for Blocked Floating-Point Matrix Multiplication Erik H. D Hollander Electronics and Information Systems Department Ghent University, Ghent, Belgium Erik.DHollander@ugent.be

More information

08 - Address Generator Unit (AGU)

08 - Address Generator Unit (AGU) October 2, 2014 Todays lecture Memory subsystem Address Generator Unit (AGU) Schedule change A new lecture has been entered into the schedule (to compensate for the lost lecture last week) Memory subsystem

More information

Zynq-7000 All Programmable SoC Product Overview

Zynq-7000 All Programmable SoC Product Overview Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform

More information

Employing Multi-FPGA Debug Techniques

Employing Multi-FPGA Debug Techniques Employing Multi-FPGA Debug Techniques White Paper Traditional FPGA Debugging Methods Debugging in FPGAs has been difficult since day one. Unlike simulation where designers can see any signal at any time,

More information

FPGA design with National Instuments

FPGA design with National Instuments FPGA design with National Instuments Rémi DA SILVA Systems Engineer - Embedded and Data Acquisition Systems - MED Region ni.com The NI Approach to Flexible Hardware Processor Real-time OS Application software

More information

SystemC Synthesis Standard: Which Topics for Next Round? Frederic Doucet Qualcomm Atheros, Inc

SystemC Synthesis Standard: Which Topics for Next Round? Frederic Doucet Qualcomm Atheros, Inc SystemC Synthesis Standard: Which Topics for Next Round? Frederic Doucet Qualcomm Atheros, Inc 2/29/2016 Frederic Doucet, Qualcomm Atheros, Inc 2 What to Standardize Next Benefit of current standard: Provides

More information

Support Triangle rendering with texturing: used for bitmap rotation, transformation or scaling

Support Triangle rendering with texturing: used for bitmap rotation, transformation or scaling logibmp Bitmap 2.5D Graphics Accelerator March 12 th, 2015 Data Sheet Version: v2.2 Xylon d.o.o. Fallerovo setaliste 22 10000 Zagreb, Croatia Phone: +385 1 368 00 26 Fax: +385 1 365 51 67 E-mail: support@logicbricks.com

More information

Simplify System Complexity

Simplify System Complexity 1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller

More information

Atlys (Xilinx Spartan-6 LX45)

Atlys (Xilinx Spartan-6 LX45) Boards & FPGA Systems and and Robotics how to use them 1 Atlys (Xilinx Spartan-6 LX45) Medium capacity Video in/out (both DVI) Audio AC97 codec 220 US$ (academic) Gbit Ethernet 128Mbyte DDR2 memory USB

More information

Copyright 2016 Xilinx

Copyright 2016 Xilinx Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building

More information

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous

More information

Introduction to Embedded System Design using Zynq

Introduction to Embedded System Design using Zynq Introduction to Embedded System Design using Zynq Zynq Vivado 2015.2 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able

More information

FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS

FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS Mike Ashworth, Graham Riley, Andrew Attwood and John Mawer Advanced Processor Technologies Group School

More information