Midterm Exam. Solutions

Similar documents
Midterm Exam. Solutions

Copyright 2016 Xilinx

Copyright 2014 Xilinx

Zynq Architecture, PS (ARM) and PL

SoC Design Lecture 11: SoC Bus Architectures. Shaahin Hessabi Department of Computer Engineering Sharif University of Technology

The CoreConnect Bus Architecture

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Zynq-7000 All Programmable SoC Product Overview

SoC Platforms and CPU Cores

FPGA memory performance

Designing Embedded AXI Based Direct Memory Access System

LogiCORE IP AXI DMA v6.02a

LogiCORE IP AXI DMA v6.01.a

LogiCORE IP AXI DMA (v4.00.a)

LogiCORE IP AXI INTC (v1.04a)

Designing with ALTERA SoC Hardware

AXI4-Lite IPIF v3.0. LogiCORE IP Product Guide. Vivado Design Suite

ECE 5775 (Fall 17) High-Level Digital Design Automation. Hardware-Software Co-Design

LogiCORE IP AXI Video Direct Memory Access v5.00.a

The Nios II Family of Configurable Soft-core Processors

SDSoC: Session 1

ECE 448 Lecture 15. Overview of Embedded SoC Systems

Buses. Maurizio Palesi. Maurizio Palesi 1

GigaX API for Zynq SoC

«Real Time Embedded systems» Multi Masters Systems

AMBA 3 AHB Lite Bus Architecture

The Growing Designer Productivity Gap

CprE 488 Embedded Systems Design. Lecture 2 Embedded Platforms

LogiCORE IP AXI Quad Serial Peripheral Interface (AXI Quad SPI) v2.00a

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

Cover TBD. intel Quartus prime Design software

LogiCORE IP AXI DMA (v3.00a)

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.01.a)

VLSI Design of Multichannel AMBA AHB

Cover TBD. intel Quartus prime Design software

Embedded Busses. Large semiconductor. Core vendors. Interconnect IP vendors. STBUS (STMicroelectronics) Many others!

ΗΥ220 Εργαστήριο Ψηφιακών Κυκλωμάτων

LogiCORE IP AXI Video Direct Memory Access v4.00.a

FPGA based embedded processor

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components

FPGA design with National Instuments

Design Choices for FPGA-based SoCs When Adding a SATA Storage }

ARM Processors for Embedded Applications

Introduction to the Qsys System Integration Tool

Hardware Design. University of Pannonia Dept. Of Electrical Engineering and Information Systems. MicroBlaze v.8.10 / v.8.20

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.00.a)

ECE 551 System on Chip Design

Hardware In The Loop (HIL) Simulation for the Zynq-7000 All Programmable SoC Author: Umang Parekh

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

LogiCORE IP Object Segmentation v1.0

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

LogiCORE IP AXI DataMover v3.00a

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Optimizing HW/SW Partition of a Complex Embedded Systems. Simon George November 2015.

Zynq-7000 Bus Functional Model

CoreHPDMACtrl v2.1. Handbook

Synaptic Labs' AXI-Hyperbus Controller Design Guidelines

ELCT 912: Advanced Embedded Systems

Chapter 2 The AMBA SOC Platform

LogiCORE IP AXI Quad Serial Peripheral Interface (AXI Quad SPI) (v1.00a)

Design AXI Master IP using Vivado HLS tool

NoC Generic Scoreboard VIP by François Cerisier and Mathieu Maisonneuve, Test and Verification Solutions

ARM Cortex-A9 ARM v7-a. A programmer s perspective Part1

MATLAB/Simulink 기반의프로그래머블 SoC 설계및검증

LogiCORE IP AXI4-Lite IPIF (v1.01.a)

Support Triangle rendering with texturing: used for bitmap rotation, transformation or scaling

Microsemi IP Cores Accelerate the Development Cycle and Lower Development Costs

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v2.00.a)

Chapter 6 Storage and Other I/O Topics

Design and Implementation of an AHB SRAM Memory Controller

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved

Common Computer-System and OS Structures

Bus AMBA. Advanced Microcontroller Bus Architecture (AMBA)

Design and Implementation of A Reconfigurable Arbiter

First hour Zynq architecture

Design of Embedded Hardware and Firmware

Qsys and IP Core Integration

INT G bit TCP Offload Engine SOC

SoC Interconnect Bus Structures

Keywords- AMBA, AHB, APB, AHB Master, SOC, Split transaction.

CMP Conference 20 th January Director of Business Development EMEA

Designing with ALTERA SoC

System Cache v1.01.a. Product Guide. PG031 July 25, 2012

DEVELOPMENT AND VERIFICATION OF AHB2APB BRIDGE PROTOCOL USING UVM TECHNIQUE

AN OPEN-SOURCE VHDL IP LIBRARY WITH PLUG&PLAY CONFIGURATION

RA3 - Cortex-A15 implementation

OCB-Based SoC Integration

Simplify System Complexity

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits

OPB General Purpose Input/Output (GPIO) (v3.01b)

08 - Address Generator Unit (AGU)

Simplify System Complexity

Exploring OpenCL Memory Throughput on the Zynq

ASIC Logic. Speaker: Juin-Nan Liu. Adopted from National Chiao-Tung University IP Core Design

iimplementation of AMBA AHB protocol for high capacity memory management using VHDL

Requirement ZYNQ SOC Development Board: Z-Turn by MYiR ZYNQ-7020 (XC7Z020-1CLG400C) Vivado and Xilinx SDK TF Card Reader (Micro SD) Windows 7

Transcription:

Midterm Exam Solutions

Problem 1 List at least 3 advantages of implementing selected portions of a design in hardware, and at least 3 advantages of implementing the remaining portions of the design in software

Software vs. Hardware Trade-offs Improve Performance Improve Energy Efficiency Reduce Power Density Manage Design Complexity Reduce Design Cost Stick to Design Schedule Handle Deep Submicron Implement more in Hardware Implement more in Software Source: A Practical Introduction to Hardware/Software Codesign

Distinct Features of Hardware and Software Design Hardware Software Design Paradigm Decomposition in space Decomposition in time Resource Area (#gates, #Slices) Time (#Cycles) Flexibility Must be designed in Implicit Parallelism Implicit Must be designed in Modeling Model Implementation Model Implementation Reuse Uncommon Common

Problem 2 What are the two primary advantages of Zynq over ASSP?

Comparison with Alternative Solutions ASIC ASSP 2 Chip Solution Zynq Performance n Power Unit Cost n Total Cost of Ownership n Risk Time to Market Flexibility Scalability n positive, negative, n neutral Source: Xilinx Video Tutorials

Choice Among Various Implementation Platforms Source: Xcell Journal, no. 88, Q3 2014

Problem 3 List the products of Altera and Microsemi directly competing with Zynq

Alternative Solutions Xilinx Zynq Zynq-7000 All Programmable SoCs with Cortex-A9 MPCore Altera Arria V & Cyclone V Hard processor system (HPS) with Cortex-A9 MPCore Microsemi Smartfusion2 Cortex M3

Problem 4 List at least 3 industry standards adopted in Vivado

Vivado Design Suite 4 years of development and 1 year of beta testing first version released in Summer 2012 scalable data model, supporting designs with up to 100 million ASIC gate equivalents (GEs) based on industry standards, such as AMBA AXI4 interconnect IP-XACT IP packaging metadata Tool Command Language (Tcl) Synopsys Design Constraints (SDC)

Problem 5 List 3 primary metrics optimized by the Vivado s Analytical Placer

Multidimensional Analytical Placer ISE: One-dimensional, timing-driven place-and-route algorithms Simulated annealing algorithms that determine randomly where the tool should place logic cells Does adequate job for FPGAs below 1 million GEs Vivado: Modern multidimensional analytic placement algorithm Deterministically finds a solution that primarily minimizes: timing, congestion, and wire length Better results, fewer iterations Efficient up to 100 million GEs

Vivado s Multidimensional Optimization Source: Xcell, no. 79, 2012

Problem 6 Explain the meaning of the dashed rectangles in the block diagram of the GPIO core shown below

AXI GPIO Resource Utilization and Maximum Clock Frequency Source: LogiCORE IP AXI GPIO: Product Specification

Problem 7 Explain the effect of unmarking the Enable Interrupt option in the Vivado GUI window shown below on the block diagram of AXI GPIO shown next

Block Diagram of AXI GPIO enabled only when the C_INTERRUPT_PRESENT generic set to 1 IPIC IP Interconnect interface Source: LogiCORE IP AXI GPIO: Product Specification

GPIO Core Parameters Source: LogiCORE IP AXI GPIO: Product Specification

Problem 8 How many different types of interrupts can be generated by the AXI GPIO configured as shown in Question 7?

Interrupt Enable Registers, IP IER Source: LogiCORE IP AXI GPIO: Product Specification

Problem 9 Which of the following PS-PL interfaces is used for communication between the ARM processors and AXI GPIOs in Zynq? a. S_AXI_GP b. M_AXI_GP c. S_AXI_ACP, or d. S_AXI_HP?

AXI Interconnects and Interfaces Source: The Zynq Book

Problem 10 List at least 3 possible uses of the Generate Mode of AXI Timer

Generate Mode Counter when enabled begins to count up or down On transition of carry out, the counter stops, or automatically reloads the initial value from the load register, and continues counting if enabled, GenerateOut is driven to 1 for one clock cycle if enabled, the interrupt signal for the timer is driven to 1 Can be used to Generate repetitive interrupts One-time pulses Periodical signals

Block Diagram of AXI Timer Source: LogiCORE IP AXI Timer: Product Guide

Functions of a Typical Timer (2) 2. Output compare - generating signals with the given timing characteristics single pulse periodical signal pulse width period

Problem 11 List two distinct parts of any Hardware Platform Specification

Hardware Platform Specification (1)

Hardware Platform Specification (2)

Hardware Platform Specification (3)

Problem 12 Which company developed AMBA and AXI?

Solution Adopted in ZYNQ Advanced Microcontroller Bus Architecture (AMBA): an open-standard, on-chip interconnect specification for the connection and management of functional blocks in system-on-a-chip (SoC) designs. First version introduced by ARM in 1996. AMBA Advanced extensible Interface 4 (AXI4): the fourth generation of AMBA interface defined in the AMBA 4 specification, targeted at high performance, high clock frequency systems. Introduced by ARM in 2010. Source: M.S. Sadri, Zynq Training

Problem 13 List at least 3 functions of AXI Interconnect

Addressing of Slaves Source: M.S. Sadri, Zynq Training

AXI Interconnect Address Decoding Source: M.S. Sadri, Zynq Training

Clock Domain and Width Conversion Source: M.S. Sadri, Zynq Training

Hierarchical AXI Interconnects Source: M.S. Sadri, Zynq Training

Problem 14 List at least 4 ports of an AXI-Stream Master (other than clk and reset), and divide them into inputs and outputs

Selected AXI Stream Ports Source: M.S. Sadri, Zynq Training

Problem 15 Name the system-on-chip bus standard recommended for use by opencores.org

Competing System-on-Chip Bus Standards Bus Developed by High- Performance Shared Bus Peripheral Shared Bus AMBA v3 ARM AHB APB Point-to-Point Bus AMBA v4 ARM AXI4 AXI4-Lite AXI4-Stream Coreconnect IBM PLB OPB Wishbone SiliCore Corp. Crossbar Topology Shared Topology Point to Point Topology Avalon Altera Avalon-MM Avalon-MM Avalon-ST AMBA: Advanced Microcontroller Bus Architecture AXI: Advanced extensible Interface AHB: AMBA High-speed Bus APB: AMBA Peripheral Bus PLB: Processor Local Bus OPB: On-chip Peripheral Bus MM: Memory Mapped ST: Streaming Source: A Practical Introduction to Hardware/Software Codesign

Problem 16 List at least 6 ports of an AXI-Full Slave (other than clk and reset), and divide them into inputs and outputs

AXI4 Interface Write Address Channel Write Data Channel Write Response Channel Read Address Channel Read Data Channel Source: The Zynq Book

Write Burst Source: ARM AMBA AXI Protocol v1.0: Specification

Read Burst

Entity Declaration (2) port ( -- Users to add ports here LEDs_out : out std_logic_vector(3 downto 0); -- User ports ends -- Do not modify the ports beyond this line -- Global Clock Signal S_AXI_ACLK : in std_logic; -- Global Reset Signal. This Signal is Active LOW S_AXI_ARESETN: in std_logic; -- Write address (issued by master, acceped by Slave) S_AXI_AWADDR: in std_logic_vector(c_s_axi_addr_width-1 downto 0);........

Entity Declaration (3)........ -- Read address valid. This signal indicates that the channel -- is signaling valid read address and control information. S_AXI_ARVALID : in std_logic; -- Read address ready. This signal indicates that the slave is -- ready to accept an address and associated control signals. S_AXI_ARREADY : out std_logic; -- Read data (issued by slave) S_AXI_RDATA : out std_logic_vector(c_s_axi_data_width-1 downto 0); -- Read response. This signal indicates the status of the -- read transfer. S_AXI_RRESP : out std_logic_vector(1 downto 0); -- Read valid. This signal indicates that the channel is -- signaling the required read data. S_AXI_RVALID : out std_logic; -- Read ready. This signal indicates that the master can -- accept the read data and response information. S_AXI_RREADY : in std_logic ); end led_controller_v1_0_s00_axi;

Problem 17 Explain the need for the volatile keyword in the following definition of Xil_In32(): u32 Xil_In32(u32 Addr) { return *(volatile u32 *) Addr; }

Problem 18 Which of the following operations (if any) can be omitted in case of the DMA-based communication between an ARM core and a hardware accelerator using ACP? Write to Accelerator processor allocates buffer processor writes data into buffer processor flushes cache for buffer processor initiates DMA transfer Read from Accelerator processor allocates buffer processor initiates DMA transfer processor waits for DMA to complete processor invalidates cache for buffer processor reads data from buffer

Coherent AXI DMA-based Accelerator Communication Write to Accelerator processor allocates buffer processor writes data into buffer processor flushes cache for buffer processor initiates DMA transfer Read from Accelerator processor allocates buffer processor initiates DMA transfer processor waits for DMA to complete processor invalidates cache for buffer processor reads data from buffer

Problem 19 What operation starts a Simple DMA Transfer when using AXI DMA?

Simple DMA Transfer Programming Sequence for MM2S channel (1) 1. Start the MM2S channel running by setting the run/stop bit to 1, MM2S_DMACR.RS = 1. 2. If desired, enable interrupts by writing a 1 to MM2S_DMACR.IOC_IrqEn and MM2S_DMACR.Err_IrqEn. 3. Write a valid source address to the MM2S_SA register. 4. Write the number of bytes to transfer in the MM2S_LENGTH register. The MM2S_LENGTH register must be written last. All other MM2S registers can be written in any order.

Problem 20 Explain the primary difference between Simple DMA transfer and Scatter-Gather DMA Transfer

Scatter Gather DMA Mode Source: Symbian OS Internals/13. Peripheral Support

Chain of Buffer Descriptors (BDs)

Problem 21 Which core can be used to simplify the development of an AXI-Full Master?

Ways of Implementing AXI4 Master Units Source: M.S. Sadri, Zynq Training

Problem 22 Explain the primary difference between DMA and Central DMA

Central DMA High-bandwidth Direct Memory Access (DMA) between a memorymapped source address and a memory-mapped destination address Optional Scatter Gather (SG) Initialization, status, and control registers are accessed through an AXI4-Lite slave interface Source: Xilinx Advanced Embedded System Design on Zynq

Problem 23 Explain the primary difference between an Integrated Logic Analyzer (ILA) and Virtual Input Output (VIO)

Integrated Logic Analyzer Source: Integrated Logic Analyzer v5.0, LogiCORE IP Product Guide

Virtual Input Output Source: LogiCORE IP Virtual Input/Output

Problem 24 Estimate the minimum amount of memory required by ILA configured as shown below

Problem 25 List at least 3 different ways of dealing with the most time-critical C functions identified by the profiler

Hardware and Software Partitioning Determine the software "critical path" by profiling Profiling measures where the CPU is spending its cycles on a function-byfunction or task-by-task basis Similar to timing analysis in hardware Informs the system designer which software routine may be a candidate to hardware-accelerate Functions can be rewritten to improve efficiency in a number of ways Implementation in assembly code rather than C Writing faster C code, for example limit pointer use Profiling and Performance 18-66 Copyright 2014 Xilinx