Reconfigurable Memory Controller with Programmable Pattern Support

Size: px
Start display at page:

Download "Reconfigurable Memory Controller with Programmable Pattern Support"

Transcription

1 Reconfigurable Memory Controller with Programmable Pattern Support Tassadaq Hussain, Miquel Pericàs, and Eduard Ayguadé Barcelona Supercomputing Center {thussain, miquel.pericas, Abstract. Heterogeneous architectures are increasingly popular due to their flexibility and high performance per watt capability. A kind of heterogeneous architecture, reconfigurable systems-on-chip, offer high performance per watt through the reconfigurable logic and flexibility via multiprocessor cores. But in order to achieve the performance goals it is necessary to provide enough data to the accelerators. In this paper we describe a programmable, pattern-based memory controller (PMC) that aims at improving the performance of heterogeneous or reconfigurable SoC devices. These include scatter gather and strided 1D, 2D and 3D patterns. PMC can prefetch complete patterns into scratchpads that can then be accessed either by a microprocessor or by an accelerator. As a result, the microprocessors and accelerators can focus on computation and are relieved of having to perform address calculations. PMC has been implemented and tested on an ML505 evaluation board using the MicroBlaze softcore as the platform s microprocessor. While PMC adds some latency, it improves performance by offloading the processor and by making better use of available bandwidths. The PMC provide 1.5x speed-ups with processor and 27x speed-ups achieved by using hardware accelerator in PMC SoC based environment while executing thresholding application. 1 Introduction Multiprocessor system-on-chip (MpSoC) with accelerated architectures are increasingly popular due to short design time, flexibility, computational performance and high performance per watt efficiency. While traditional microprocessor cores are considered easy to program, fixed function units enable performance boost beyond what is possible by a ISA programmed microarchitecture. Texas Instruments OMAP (Open Multimedia Application Platform)[1] is an application processors SoC platform that can integrate different cores such as ARM Cortex-A8 superscalar microprocessor core and cores developed by Texas Instruments. AMD Geode[2] provide a series of x86-compatible SoC microprocessors and I/O companions produced by AMD targeted at the embedded computing market. Besides MPSoC architectures, an architecture that is growing in popularity are the so called Reconfigurable System-on-Chip, which combine processor cores with configurable logic, allowing the user to synthesize reconfigurable accelerators on the chip die. Xilinx Extensible Processing Platform[3] is an example of Reconfigurable System on Chip. It is based on ARM s dual-core Cortex-A9 MPCore processors and Xilinx s 28nm programmable logic that takes a processor-centric approach by defining a complete processor. AMD Fusion[4] is a new approach to design Heterogeneous systems and software development. It delivers powerful CPU and GPU capabilities for high performance applications in a single-die processor called an APU. In order to meet performance and power goals, integration of accelerators and microprocessors is required. However, availability of high-performance accelerators is of

2 no use if the memory hierarchy is unable to provide the necessary bandwidth. Correctly managing memory access across the set of accelerators and microprocessors in such a scenario is thus performance-critical, but it is also very challenging. This paper introduces a memory controller based on high-level data patterns in order to simplify programming of SoC applications while ensuring high performance and high efficiency. Other high-level programmable memory controllers have been researched in the past. This field of research is strongly tied to that of memory prefetchers. Basic patterns that have been exploited by prefetchers include vectors with constant strides or following linked lists[5]. Dynamic prefetching[6][7] is a great approach when the processor is designed to run a wide variety of workloads of which it is agnostic during design. A more tightly mapping between a memory controller and an application can be achieved by a software-managed memory controller attached to a scratchpad memory. The main proposal of this paper is hardware implementation of programmable memory controller (PMC) that takes data access description (Descriptor Blocks) and provide streaming data. In order to reduce data access times and improve performance PMC provide following features. PMC system accesses Structure of Array(SoA) in Array of Structure (AoS) format with the help of strides. Multiple descriptors are used to access complex streams ( AoS or SoA) without generating address syncronization delay for non-contiguous memories. A High-Speed Source Synchronous interface is provided that can be easily interconnected with any generic hardware accelerator. A scratchpad-memory interface can be used for microprocessor without modifying existing system. This paper discuses the architecture of the memory controller and its implementation on a Xilinx Virtex-5 (ML505 Development Board) along with the necessary glue to attach it to the MicroBlaze processors or ROCCC-generated accelerators[8]. The programmable memory controller is based on a scatter-gather memory controller and can be directly programmed from C language programs using a (special purpose) interface based on first programming the controller and then issuing send() and receive() calls. For the purpose of evaluating the architecture we implemented a thresholding algorithm and executed it on both variants of the architecture and also a microblaze-based version that lacks the programmable memory controller 2 Proposal To exemplify functionality of the PMC, in this section we explicate data access pattern and architecture of PMC. A conventional memory controller s minimal descriptor contains source, destination addresses and size of the transfer with unit stride access that is not efficient to access complex memory patterns. This is shown in Figure 1. DMA manages one stream of data. If the access pattern is complex then task of the Fig. 1: Generic DMA Data Access

3 Fig. 2: PMC Data Access DMA becomes significantly more complex. Accessing non-contiguous memory locations generate delay while computing addresses. System having a PMC unit can access non-contiguous memory location with the help of stride and jump functionality. To do so PMC uses multiple descriptor block to access the complex data stream. Streams can be from a non-contiguous memory to a contiguous address space, and vice versa. Figure 2 shows the operation of the data access pattern of PMC. Channel 0 access is continuous (Data [n]) having unit-stride and m-stream size. Variable Jump is used between Channels to access contiguous data from different memory location. Channel 1 accesses a diagonal element (Data[n+stride]) with n-stride between two consecutive data locations and m-stream size. Our initial PMC implementation shown in the Figure 3 accesses DDR2 memory [9]. This involves multiple descriptors. The minimum set of parameters for a single descriptor block are shown in Figure 4. Command specifies whether to read or write a single or a stream of data. The address parameters belong to the starting address of the source and to the destination memory location. The PMC system contains four main units: The Front-End Interface The Pattern Controller The Stream Controller The Memory Controller Fig. 3: Internal Architecture of Controller

4 Command Source address Destination address Transfer Size Strides 2.1 The Front-End Interface Fig. 4: Descriptor Structure of Memory controller The Front-End Interface demonstrates support of the PMC for different systems. High-Speed Source Synchronous Interface Processor Local Bus Interface High-Speed Source Synchronous Interface The Source Synchronous Interface shown in Figure 3 is used to supply high-speed data to hardware accelerators. Synchronous handshaking protocol is applied to request and grant Data. Transfer of Data is performed according to the physical memory clock. Processor Local Bus Interface The Scratch Pad controller shown in Figure 3 provides the interface between the LMB and the Scratch Pad Memory (BRAM block). A scratch pad memory subsystem consists of the Descriptor memory along with the Buffer memory. The Descriptor memory feed descriptors to the pattern controller unit as linklist fashion this reduces descriptor request/grant time and eliminates the additional resynchronization time required to access non-contiguous memories. The Buffer memory is used to temporarily hold data while it is being moved to/from physical memory. 2.2 Pattern Controller The Pattern Controller is the top unit shown in Figure 3, which communicate with external processing units. Pattern controller takes descriptor block from external source and feeds it to channel controller. The channel controller unit manages multiple descriptors. A single Channel takes a one descriptor from the channel controller and generates a stream. Figure 2 shows how different channels are combined to access a non contiguous stream. Multiple descriptors are used when the application needs to grab data in the form of complex patterns. 2.3 Stream Controller The Stream Controller comes after the Pattern Controller shown in Figure 3. This unit is responsible for transferring data between physical memory and the hardware accelerator depending upon the programmed descriptor. Salient features of the stream controller are shown in the Table 1. Stream controller contains two main units. Data Management Unit (DMU) Address Management Unit (AMU) The Data Management Unit (DMU) The DMU is dependent on Data in and Data out units shown in Figure 3. It enables the data stream to be written to the appropriate physical memory by generating the write-enable along with write-data and mask-data signals. It supports data streams of up to 1024 elements each having 64-bit words.

5 Table 1: Stream Controller Features Register Width Range Minimum Range Maximum Processing Units 8 bit Channels 8 bit Stride 8 bit 4 address 256 address Stream (32-bit word) 8 bit The Address Management Unit (AMU) The AMU deals with Start Address, Stream and Stride units shown in Figure 3. It takes two clocks to program the AMU. Strides between two consecutive accesses are handled by the AMU without generating delay or latency. In each stream, the first data transfer uses addresses taken by the descriptor unit and for rest of the transfer the address is equal to the address of the previous transfer plus size of strides. AMU supports a stream size of up to 1024 contiguous memory elements with one descriptor. Supported strides need to be multiples of four. 2.4 Memory Controller A modular DDR2 SDRAM [9] controller is used to access data from physical memory. DDR2 SDRAM controller provides high-speed source-synchronous interface and transfers data on both edges of the clock cycle. It allows designs to be ported easily and also makes it possible to share parts of the design across different types of memory interfaces. 3 Evaluation of Stand-alone PMC For the evaluation of PMC we used the architecture shown in Figure 3 with the source synchronous interface. Detailed examination of controller connection and functionality is done with hand written test cases. To verify maximum bandwidth and speed different access patterns are executed over stand alone PMC controller. The results of these patterns are shown in Tables 2 and Data Access Patteren (AoS, SoA) It has been found that most HPC applications are in favor of operating on SoA format [10]. PMC system is one way to access SoA in AoS format without generating any delay. AoS data access requires unit-stride, where as the SoA requires strided access. Fig. 5: Structure of Array Access Pattern

6 Table 2: Clocks taken by Different Write Stream Transfer Type Number of Words (32 bit) Number of Clocks Total Clock (+ Latency) Single 1-byte - 8-byte Minimum Stream Stream-8 (AoS or SoA) x8 (matrix) Table 3: Clocks taken by Different Read Stream Transfer Type Number of Words (32 bit) Number of Clocks Total Clock (+ Latency) Minimum Stream 1-byte - 8-byte 2 32 Single Stream D Stream (AoS or SoA) The stride is determined by the size of the working data structure. Figure 5 shows three different access patterns (x[n],y[0][n],z[n][n]) of n m matrix. Where [n] belongs to contiguous row with unit stride, y[0][n] belongs to column access with stride equal to n stream width where as z[n][n] is diagonal SoA pattern its stride is addition of x stream width and unit stride. 3.2 Testing and verification To test the functionality of PMC, hand written HDL test patterns are used to program PMC. These test patterns read write data to/from physical memory in different format shown in Tables 2 and 3. PMC is synthesized using Xilinx ISE 11 [11] for Virtex 5 ML505 board with a XC5VLX110T FPGA. On Virtex-5 family PMC can work with clock at 260 MHz and it consumes 2457 flip flops and 1602 LUT. 4 Evaluation of PMC based SoC using Test Application To evaluate and analyze PMC functionality we use a simple thresholding algorithm. Thresholding is a straightforward algorithm of image segmentation. It takes gray scale image having 8 bit pixel depth and converts into binary image. Our thresholding application takes individual 8 bit pixels from size of image shown in Figure 6(a). If value of pixel is greater than threshold value it save binary 1 to new pixel other wise 0 will be saved. New image shown in Figure 6(b) have same dimension with 1 bit pixel depth. Thresholding application is executed over following architectures. MicroBlaze Stand-Alone MicroBlaze with PMC MicroBlaze with PMC and Reconfigurable Hardware Accelerator Fig. 6: (a) 256x256 Gray Scale Image taken from [12] (b) 256x256 binary Image

7 4.1 Microblaze Stand-Alone Fig. 7: Microblaze Based System Generic MicroBlaze SoC system accesses physical memory using Multi-Port Memory Controller (MPMC) and Processor Local Bus (PLB). PLB is multi-master bus that is ideal for connecting external peripherals to the MicroBlaze processor core. PLB peripherals encounter issues like Arbiters P riority level, Congestion of traf f ic over buses and Bus protocol translation. The latency and delay to access physical memory can have negative impact on performance that has removed by adding PMC. 4.2 MicroBlaze with PMC To get benefit of embedded processor inside FPGA an architecture is proposed having PMC and Scratch Pad Controller in MicroBlaze system shown in Figure 7 Approach A. In this system MicroBlaze first program descriptor block via Scratch Pad Controller that will then read write data to BRAM memory from physical memory. MicroBlaze reads Data for computation from the BRAM using the Processor Local Bus (PLB) and after doing computation it will write back data to the BRAM to save it in physical memory. Salient parts of the proposed architectures are : Dual Port Memory Controller Scratch Pad Memory Controller Pattern-Based Memory Controller (PMC) Programable Hardware Accelerator Dual Port Memory Controller The Dual-port memory controller is employed to access Scratch Pad memory. One port is dedicated to the Microblaze processor and a second port is used to serve Scratch Pad controller. The Dual-Port memory architecture permits data access on the system side to occur in parallel with PMC side. Additionally, the preferred data memory may be utilized with a variety of cache coherency techniques or policies.

8 Table 4: Memory Maped PMC Descriptor Register Number Register Name Type (Read Write) Offset Address Description 0 dst add r/w 0x0000 Buffer Memory Start Address 1 src add r/w 0x0004 Physical Memory Start Address 2 cmd r/w 0x0008 single or stream read write 3 stream r/w 0x000c Stream 4 stride r/w 0x0010 Stride between two consecutive address 5 ready r 0x0014 PMC ready 6 intr1 r 0x0018 Soft Interrupt1 7 intr2 r 0x001c Soft Interrupt2 8 to 31 reserved Reserved for Future use Scratch Pad Memory Controller The Purpose of the Scratch Pad Memory Controller is to program descriptor blocks of PMC via Microblaze and provide BRAM access to PMC. The Scratch pad controller is connected with Microblaze through PLB and shares descriptor registers of PMC. When the scratch pad controller is ready the programmer can read/write data to/from Physical memory. After completion of data an interrupt signal is generated which indicates that the PMC is ready for next send/receive. Microblaze uses below functions to configure PMC registers. pmc descriptor(*/dst add/*,*/src add/*,*/cmd/*,*/stream/*,*/stride/*); pmc send(); pmc receive(); The parameter of device drivers are memory mapped and are shown in Table MicroBlaze with PMC and Reprogramable Hardware Accelerator An Hardware Accelerator based architecture is proposed shown in Figure 7 Approach B. In this approach Scratch Pad Controller is programmed by MicroBlaze. Thresholding application is running in Hardware Accelerator. Benefits of this approach are : Processor s computation load has been reduced. Delay while accessing Scratchpad memory via PLB is removed. 4.4 Results and Comparison The PMC is programmed in such a way that it is accessing 2-D matrix (256x256x8). The MicroBlaze programmed 256 descriptors of PMC each one is transfering 256 byte of data. XPS (Xilinx Platform Studio) [13] is used to configure and build hardware for Virtex 5 ML505 board with a XC5VLX110T FPGA. Results of the three approaches Microblaze Stand-Alone, MicroBlaze with PMC and Microblaze with PMC and Reprogramable Hardware Accelerator are shown in Table 5. Column DDR2 Access contains clocks taken by Microblaze Stand-Alone architecture to read and write Byte of image from physical memory. Microblaze take two cycles to compute threshold point for each pixel. DDR2 Access for Microblaze with PMC architecture are clocks to read and write image from physical memory to BRAM. BRAM to Microblaze are number of clocks taken by MicroBlaze to access data from BRAM via PLB bus. PLB is served to access instruction memory and data memory that increase number of clocks to access BRAM. In pipelined architecture MicroBlaze programmed PMC to work in parallel. After writing

9 first stream (image row from DDR2 to BRAM) by PMC the MicroBlaze starts processing over it. While processing, PMC prefetches next stream that hides the DDR2 memory access time for next accesses. In this case processor to BRAM memory access is dominant. In Microblaze with PMC and Reprogramable Hardware Accelerator approach Hardware accelerator access data from physical memory. Computation is pipelined with input data stream. Computed data is saved in BRAM buffer. Hardware accelerator has direct connection with BRAM. Only Physical memory access time is dominant in this technique. 5 Related Work A number of DMA Memory Controllers are available in research and development sectors. The XPS Channelized DMA Controller [14] provides simple Direct Memory Access (DMA) services to peripherals and memory devices on the PLB. The DMA reside on the PLB, peripherals working with DMA are forced to follow the PLB protocol. Lattice Semiconductor Scatter-Gather Direct Memory Access Controller IP [15] and ALTERA Scatter-Gather DMA Controller core[16] provide data transfers from noncontiguous block of memory to another by means of a series of smaller contiguous transfers. Both cores read a series of descriptors that specify the data to be transferred. Transfer of data contains unit-strides that are not suitable to access complex memory patterns. The Impulse memory controller [17] supports application-specific optimizations through configurable physical address remapping. By remapping physical addresses, applications can control the data to be accessed and cached. The Impulse controller works under authority of the Operating System that manages physical address. 6 Conclusion This work attacks the memory-processor data access bottleneck by proposing a programmable pattern-based memory controller (PMC). The PMC can work with any SoC architecture and stand-alone HPC kernel without modifications to the microprocessor system. The Controller can be programmed from C programs using a special purpose interface based on send() and receive() calls. Currently, in order to implement higher level patterns PMC uses scatter gather commands, but as future work we are considering to implement more patterns directly in hardware. One kind of pattern that we will consider is automatic tiling. The PMC system provides support for stride access and scatter/gather that eliminates the overhead of arranging and gathering data by the microprocessor unit. The PMC achieve 27x speed-ups with programmable hardware accelerator in PMC SoC based environment while executing a thresholding application. Table 5: System on Chip Results with Different Approaches Bytes DDR2 Access BRAM to MicroBlaze Computation Total of Image Read/Write clocks Read/Write clocks (Threshold) clocks clocks Stand-alone Processor PMC with Microblaze PipeLined PMC with Hardware Accelerator

10 This shows that the PMC based architecture is useful for High Performance Hardware accelerators. A wrapper module can be used to configure descriptors of PMC to directly connect with high level language programable hardware accelerators such as ROCCC. References 1. Texas Instruments OMAP (Open Multimedia Application Platform). [Online]. Available: &navigationId=11988&contentId=4638#omap4 2. Advanced Micro Devices, Inc. All rights reserved, AMD Geode LX Processors Data Book, February Keith DeHaven, Extensible Processing Platform Ideal Solution for a Wide Range of Embedded Systems, April 27, [Online]. Available: wp369 Extensible Processing Platform Overview.pdf 4. The AMD Fusion Family of APUs. [Online]. Available: 5. Amir Roth, Gurindar S. Sohi, Effective jump-pointer prefetching for linked data structures, ISCA 99 Proceedings of the 26th annual international symposium on Computer architecture, vol. Volume 27, 2, May [Online]. Available: 6. Keith I. Farkas and Norman P. Jouppi and Paul Chow, How Useful Are Non-blocking Loads, Stream Buffers, and Speculative Execution in Multiple Issue Processors? High- Performance Computer Architecture, Proceedings., First IEEE, pp , Norm Jouppi, Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers, ISBN: , pp , May [Online]. Available: all.jsp?arnumber= Riverside Optimizing Compiler for Configurable Computing (ROCCC). [Online]. Available: 9. Xilinx, Memory Interface Solutions, December 2, C. Gou, G. Kuzmanov, and G. N. Gaydadjiev, Sams multi-layout memory: providing multiple views of data to boost simd performance, pp , [Online]. Available: Xilinx Integrated Software Enviroment Design Suite. [Online]. Available: http: // 12. New York Criminal Defense Blawg. [Online]. Available: newyorkcriminaldefenseblawg.com/wp-content/uploads/2010/10/fingerprint.jpg 13. Xilinx Platform Studio. [Online]. Available: documentation/dt edk edk11-1.htm 14. Xilinx, Channelized Direct Memory Access and Scatter Gather, February 25, [Online]. Available: dma sg.pdf 15. Lattice Semiconductor Corporation, Scatter-Gather Direct Memory Access Controller IP Core Users Guide, October [Online]. Available: view document.cfm?document id= A. Corporation, Scatter-Gather DMA Controller Core, Quartus II 9.1, November [Online]. Available: qii55003.pdf 17. John Carter, Wilson Hsieh, Leigh Stoller, Mark Swanson, Lixin Zhang, Erik Brunvand, Al Davis,Chen-Chi Kuo, Ravindra Kuramkote,Michael Parker, Lambert Schaelicke, and Terry Tateyama, Impulse: Building a Smarter Memory Controller, Fifth International Symposium on High Performance Computer Architecture (HPCA-5), pp , January [Online]. Available:

AN OCM BASED SHARED MEMORY CONTROLLER FOR VIRTEX 4. Bas Breijer, Filipa Duarte, and Stephan Wong

AN OCM BASED SHARED MEMORY CONTROLLER FOR VIRTEX 4. Bas Breijer, Filipa Duarte, and Stephan Wong AN OCM BASED SHARED MEMORY CONTROLLER FOR VIRTEX 4 Bas Breijer, Filipa Duarte, and Stephan Wong Computer Engineering, EEMCS Delft University of Technology Mekelweg 4, 2826CD, Delft, The Netherlands email:

More information

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.00.a)

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.00.a) DS799 March 1, 2011 LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.00.a) Introduction The AXI Video Direct Memory Access (AXI VDMA) core is a soft Xilinx IP core for use with the Xilinx Embedded

More information

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.01.a)

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.01.a) DS799 June 22, 2011 LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v3.01.a) Introduction The AXI Video Direct Memory Access (AXI VDMA) core is a soft Xilinx IP core for use with the Xilinx Embedded

More information

Template-based Memory Access Engine for Accelerators in SoCs

Template-based Memory Access Engine for Accelerators in SoCs Template-based Memory Access Engine for Accelerators in SoCs Bin Li, Zhen Fang, and Ravi Iyer Intel Labs, Hillsboro, Oregon, USA {bin.li, zhen.fang, ravishankar.iyer}@intel.com Abstract With the rapid

More information

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v2.00.a)

LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v2.00.a) DS799 December 14, 2010 LogiCORE IP AXI Video Direct Memory Access (axi_vdma) (v2.00.a) Introduction The AXI Video Direct Memory Access (AXI VDMA) core is a soft Xilinx IP core for use with the Xilinx

More information

LogiCORE IP AXI DMA (v4.00.a)

LogiCORE IP AXI DMA (v4.00.a) DS781 June 22, 2011 Introduction The AXI Direct Memory Access (AXI DMA) core is a soft Xilinx IP core for use with the Xilinx Embedded Development Kit (EDK). The AXI DMA engine provides high-bandwidth

More information

LogiCORE IP AXI Video Direct Memory Access v4.00.a

LogiCORE IP AXI Video Direct Memory Access v4.00.a LogiCORE IP AXI Video Direct Memory Access v4.00.a Product Guide Table of Contents Chapter 1: Overview Feature Summary............................................................ 9 Applications................................................................

More information

08 - Address Generator Unit (AGU)

08 - Address Generator Unit (AGU) October 2, 2014 Todays lecture Memory subsystem Address Generator Unit (AGU) Schedule change A new lecture has been entered into the schedule (to compensate for the lost lecture last week) Memory subsystem

More information

LogiCORE IP AXI DMA (v3.00a)

LogiCORE IP AXI DMA (v3.00a) DS781 March 1, 2011 Introduction The AXI Direct Memory Access (AXI DMA) core is a soft Xilinx IP core for use with the Xilinx Embedded Development Kit (EDK). The AXI DMA engine provides high-bandwidth

More information

Supporting the Linux Operating System on the MOLEN Processor Prototype

Supporting the Linux Operating System on the MOLEN Processor Prototype 1 Supporting the Linux Operating System on the MOLEN Processor Prototype Filipa Duarte, Bas Breijer and Stephan Wong Computer Engineering Delft University of Technology F.Duarte@ce.et.tudelft.nl, Bas@zeelandnet.nl,

More information

Yet Another Implementation of CoRAM Memory

Yet Another Implementation of CoRAM Memory Dec 7, 2013 CARL2013@Davis, CA Py Yet Another Implementation of Memory Architecture for Modern FPGA-based Computing Shinya Takamaeda-Yamazaki, Kenji Kise, James C. Hoe * Tokyo Institute of Technology JSPS

More information

Hardware Design. University of Pannonia Dept. Of Electrical Engineering and Information Systems. MicroBlaze v.8.10 / v.8.20

Hardware Design. University of Pannonia Dept. Of Electrical Engineering and Information Systems. MicroBlaze v.8.10 / v.8.20 University of Pannonia Dept. Of Electrical Engineering and Information Systems Hardware Design MicroBlaze v.8.10 / v.8.20 Instructor: Zsolt Vörösházi, PhD. This material exempt per Department of Commerce

More information

High Speed Data Transfer Using FPGA

High Speed Data Transfer Using FPGA High Speed Data Transfer Using FPGA Anjali S S, Rejani Krishna P, Aparna Devi P S M.Tech Student, VLSI & Embedded Systems, Department of Electronics, Govt. Model Engineering College, Thrikkakkara anjaliss.mec@gmail.com

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

FPGA memory performance

FPGA memory performance FPGA memory performance Sensor to Image GmbH Lechtorstrasse 20 D 86956 Schongau Website: www.sensor-to-image.de Email: email@sensor-to-image.de Sensor to Image GmbH Company Founded 1989 and privately owned

More information

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores A Configurable Multi-Ported Register File Architecture for Soft Processor Cores Mazen A. R. Saghir and Rawan Naous Department of Electrical and Computer Engineering American University of Beirut P.O. Box

More information

LogiCORE IP AXI DMA v6.01.a

LogiCORE IP AXI DMA v6.01.a LogiCORE IP AXI DMA v6.01.a Product Guide Table of Contents SECTION I: SUMMARY IP Facts Chapter 1: Overview Typical System Interconnect......................................................... 8 Operating

More information

Embedded Systems: Hardware Components (part I) Todor Stefanov

Embedded Systems: Hardware Components (part I) Todor Stefanov Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System

More information

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved Hardware Design MicroBlaze 7.1 This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: List the MicroBlaze 7.1 Features List

More information

Data Side OCM Bus v1.0 (v2.00b)

Data Side OCM Bus v1.0 (v2.00b) 0 Data Side OCM Bus v1.0 (v2.00b) DS480 January 23, 2007 0 0 Introduction The DSOCM_V10 core is a data-side On-Chip Memory (OCM) bus interconnect core. The core connects the PowerPC 405 data-side OCM interface

More information

LogiCORE IP AXI DMA v6.02a

LogiCORE IP AXI DMA v6.02a LogiCORE IP AXI DMA v6.02a Product Guide Table of Contents SECTION I: SUMMARY IP Facts Chapter 1: Overview Operating System Requirements..................................................... 8 Feature Summary..................................................................

More information

Midterm Exam. Solutions

Midterm Exam. Solutions Midterm Exam Solutions Problem 1 List at least 3 advantages of implementing selected portions of a complex design in software Software vs. Hardware Trade-offs Improve Performance Improve Energy Efficiency

More information

April 7, 2010 Data Sheet Version: v4.00

April 7, 2010 Data Sheet Version: v4.00 logimem SDR/DDR/DDR2 SDRAM Memory Controller April 7, 2010 Data Sheet Version: v4.00 Xylon d.o.o. Fallerovo setaliste 22 10000 Zagreb, Croatia Phone: +385 1 368 00 26 Fax: +385 1 365 51 67 E-mail: support@logicbricks.com

More information

Using Intel Streaming SIMD Extensions for 3D Geometry Processing

Using Intel Streaming SIMD Extensions for 3D Geometry Processing Using Intel Streaming SIMD Extensions for 3D Geometry Processing Wan-Chun Ma, Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University firebird@cmlab.csie.ntu.edu.tw,

More information

EECS150 - Digital Design Lecture 13 - Accelerators. Recap and Outline

EECS150 - Digital Design Lecture 13 - Accelerators. Recap and Outline EECS150 - Digital Design Lecture 13 - Accelerators Oct. 10, 2013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof. John Wawrzynek)

More information

PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor

PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor K.Rani Rudramma 1, B.Murali Krihna 2 1 Assosiate Professor,Dept of E.C.E, Lakireddy Bali Reddy Engineering College, Mylavaram

More information

SoC Platforms and CPU Cores

SoC Platforms and CPU Cores SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

SDSoC: Session 1

SDSoC: Session 1 SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the

More information

Rapid-Prototyping Emulation System using a SystemC Control System Environment and Reconfigurable Multimedia Hardware Development Platform

Rapid-Prototyping Emulation System using a SystemC Control System Environment and Reconfigurable Multimedia Hardware Development Platform Rapid-Prototyping Emulation System using a SystemC System Environment and Reconfigurable Multimedia Development Platform DAVE CARROLL, RICHARD GALLERY School of Informatics and Engineering, Institute of

More information

Five Ways to Build Flexibility into Industrial Applications with FPGAs

Five Ways to Build Flexibility into Industrial Applications with FPGAs GM/M/A\ANNETTE\2015\06\wp-01154- flexible-industrial.docx Five Ways to Build Flexibility into Industrial Applications with FPGAs by Jason Chiang and Stefano Zammattio, Altera Corporation WP-01154-2.0 White

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Microprocessor Soft-Cores: An Evaluation of Design Methods and Concepts on FPGAs

Microprocessor Soft-Cores: An Evaluation of Design Methods and Concepts on FPGAs Microprocessor Soft-Cores: An Evaluation of Design Methods and Concepts on FPGAs Pieter Anemaet (1159100), Thijs van As (1143840) {P.A.M.Anemaet, T.vanAs}@student.tudelft.nl Computer Architecture (Special

More information

The RM9150 and the Fast Device Bus High Speed Interconnect

The RM9150 and the Fast Device Bus High Speed Interconnect The RM9150 and the Fast Device High Speed Interconnect John R. Kinsel Principal Engineer www.pmc -sierra.com 1 August 2004 Agenda CPU-based SOC Design Challenges Fast Device (FDB) Overview Generic Device

More information

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,

More information

Embedded Real-Time Video Processing System on FPGA

Embedded Real-Time Video Processing System on FPGA Embedded Real-Time Video Processing System on FPGA Yahia Said 1, Taoufik Saidani 1, Fethi Smach 2, Mohamed Atri 1, and Hichem Snoussi 3 1 Laboratory of Electronics and Microelectronics (EμE), Faculty of

More information

Core Facts. Documentation. Design File Formats. Simulation Tool Used Designed for interfacing configurable (32 or 64

Core Facts. Documentation. Design File Formats. Simulation Tool Used Designed for interfacing configurable (32 or 64 logilens Camera Lens Distortion Corrector March 5, 2009 Product Specification Xylon d.o.o. Fallerovo setaliste 22 10000 Zagreb, Croatia Phone: +385 1 368 00 26 Fax: +385 1 365 51 67 E-mail: info@logicbricks.com

More information

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling

More information

Multi MicroBlaze System for Parallel Computing

Multi MicroBlaze System for Parallel Computing Multi MicroBlaze System for Parallel Computing P.HUERTA, J.CASTILLO, J.I.MÁRTINEZ, V.LÓPEZ HW/SW Codesign Group Universidad Rey Juan Carlos 28933 Móstoles, Madrid SPAIN Abstract: - Embedded systems need

More information

Designing Embedded AXI Based Direct Memory Access System

Designing Embedded AXI Based Direct Memory Access System Designing Embedded AXI Based Direct Memory Access System Mazin Rejab Khalil 1, Rafal Taha Mahmood 2 1 Assistant Professor, Computer Engineering, Technical College, Mosul, Iraq 2 MA Student Research Stage,

More information

Digital Integrated Circuits

Digital Integrated Circuits Digital Integrated Circuits Lecture 9 Jaeyong Chung Robust Systems Laboratory Incheon National University DIGITAL DESIGN FLOW Chung EPC6055 2 FPGA vs. ASIC FPGA (A programmable Logic Device) Faster time-to-market

More information

15-740/ Computer Architecture, Fall 2011 Midterm Exam II

15-740/ Computer Architecture, Fall 2011 Midterm Exam II 15-740/18-740 Computer Architecture, Fall 2011 Midterm Exam II Instructor: Onur Mutlu Teaching Assistants: Justin Meza, Yoongu Kim Date: December 2, 2011 Name: Instructions: Problem I (69 points) : Problem

More information

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University EE108B Lecture 17 I/O Buses and Interfacing to CPU Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements Remaining deliverables PA2.2. today HW4 on 3/13 Lab4 on 3/19

More information

Design of a Network Camera with an FPGA

Design of a Network Camera with an FPGA Design of a Network Camera with an FPGA Tiago Filipe Abreu Moura Guedes INESC-ID, Instituto Superior Técnico guedes210@netcabo.pt Abstract This paper describes the development and the implementation of

More information

LogiCORE IP Mailbox (v1.00a)

LogiCORE IP Mailbox (v1.00a) DS776 September 21, 2010 Introduction In a multiprocessor environment, the processors need to communicate data with each other. The easiest method is to set up inter-processor communication through a mailbox.

More information

Trends in the Infrastructure of Computing

Trends in the Infrastructure of Computing Trends in the Infrastructure of Computing CSCE 9: Computing in the Modern World Dr. Jason D. Bakos My Questions How do computer processors work? Why do computer processors get faster over time? How much

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory Embedded Systems 8. Hardware Components Lothar Thiele Computer Engineering and Networks Laboratory Do you Remember? 8 2 8 3 High Level Physical View 8 4 High Level Physical View 8 5 Implementation Alternatives

More information

A Memory System Design Framework: Creating Smart Memories

A Memory System Design Framework: Creating Smart Memories A Memory System Design Framework: Creating Smart Memories Amin Firoozshahian, Alex Solomatnikov Hicamp Systems Inc. Ofer Shacham, Zain Asgar, http://www.c2s2.org Stephen Richardson, Christos Kozyrakis,

More information

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2 ISSN 2277-2685 IJESR/November 2014/ Vol-4/Issue-11/799-807 Shruti Hathwalia et al./ International Journal of Engineering & Science Research DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL ABSTRACT

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

5. ReAl Systems on Silicon

5. ReAl Systems on Silicon THE REAL COMPUTER ARCHITECTURE PRELIMINARY DESCRIPTION 69 5. ReAl Systems on Silicon Programmable and application-specific integrated circuits This chapter illustrates how resource arrays can be incorporated

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific

More information

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,

More information

System Verification of Hardware Optimization Based on Edge Detection

System Verification of Hardware Optimization Based on Edge Detection Circuits and Systems, 2013, 4, 293-298 http://dx.doi.org/10.4236/cs.2013.43040 Published Online July 2013 (http://www.scirp.org/journal/cs) System Verification of Hardware Optimization Based on Edge Detection

More information

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC 1 Pawar Ruchira Pradeep M. E, E&TC Signal Processing, Dr. D Y Patil School of engineering, Ambi, Pune Email: 1 ruchira4391@gmail.com

More information

LogiCORE IP AXI Video Direct Memory Access v5.00.a

LogiCORE IP AXI Video Direct Memory Access v5.00.a LogiCORE IP AXI Video Direct Memory Access v5.00.a Product Guide Table of Contents Chapter 1: Overview Feature Summary............................................................ 9 Applications................................................................

More information

A memcpy Hardware Accelerator Solution for Non Cache-line Aligned Copies

A memcpy Hardware Accelerator Solution for Non Cache-line Aligned Copies A memcpy Hardware Accelerator Solution for Non Cache-line Aligned Copies Filipa Duarte and Stephan Wong Computer Engineering Laboratory Delft University of Technology Abstract In this paper, we present

More information

Core Facts. Documentation. Encrypted VHDL Fallerovo setaliste Zagreb, Croatia. Verification. Reference Designs &

Core Facts. Documentation. Encrypted VHDL Fallerovo setaliste Zagreb, Croatia. Verification. Reference Designs & logibmp Bitmap 2.5D Graphics Accelerator March 27, 2009 Product Specification Core Facts Xylon d.o.o. Documentation User Guide Design File Formats Encrypted VHDL Fallerovo setaliste 22 Constraints Files

More information

«Real Time Embedded systems» Multi Masters Systems

«Real Time Embedded systems» Multi Masters Systems «Real Time Embedded systems» Multi Masters Systems rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL Chargé de cours rene.beuchat@hesge.ch LSN/hepia Prof. HES 1 Multi Master on Chip On a System On Chip, Master can

More information

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School

More information

Efficiency and memory footprint of Xilkernel for the Microblaze soft processor

Efficiency and memory footprint of Xilkernel for the Microblaze soft processor Efficiency and memory footprint of Xilkernel for the Microblaze soft processor Dariusz Caban, Institute of Informatics, Gliwice, Poland - June 18, 2014 The use of a real-time multitasking kernel simplifies

More information

Computer Memory. Textbook: Chapter 1

Computer Memory. Textbook: Chapter 1 Computer Memory Textbook: Chapter 1 ARM Cortex-M4 User Guide (Section 2.2 Memory Model) STM32F4xx Technical Reference Manual: Chapter 2 Memory and Bus Architecture Chapter 3 Flash Memory Chapter 36 Flexible

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

The S6000 Family of Processors

The S6000 Family of Processors The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which

More information

FPGA PAINT. Design of Embedded Systems, Advanced Course. Faculty of Engineering (LTH) Lund University, Sweden. November 30 th, 2010

FPGA PAINT. Design of Embedded Systems, Advanced Course. Faculty of Engineering (LTH) Lund University, Sweden. November 30 th, 2010 FPGA PAINT Design of Embedded Systems, Advanced Course Faculty of Engineering (LTH) Lund University, Sweden November 30 th, 2010 Course Responsible: Flavius Gruian Viktor Do (dt06vd2@student.lth.se) Zhang

More information

RUN-TIME PARTIAL RECONFIGURATION SPEED INVESTIGATION AND ARCHITECTURAL DESIGN SPACE EXPLORATION

RUN-TIME PARTIAL RECONFIGURATION SPEED INVESTIGATION AND ARCHITECTURAL DESIGN SPACE EXPLORATION RUN-TIME PARTIAL RECONFIGURATION SPEED INVESTIGATION AND ARCHITECTURAL DESIGN SPACE EXPLORATION Ming Liu, Wolfgang Kuehn, Zhonghai Lu, Axel Jantsch II. Physics Institute Dept. of Electronic, Computer and

More information

A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system

A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system 26th July 2005 Alberto Donato donato@elet.polimi.it Relatore: Prof. Fabrizio Ferrandi Correlatore:

More information

TMS320C6678 Memory Access Performance

TMS320C6678 Memory Access Performance Application Report Lit. Number April 2011 TMS320C6678 Memory Access Performance Brighton Feng Communication Infrastructure ABSTRACT The TMS320C6678 has eight C66x cores, runs at 1GHz, each of them has

More information

Keystone Architecture Inter-core Data Exchange

Keystone Architecture Inter-core Data Exchange Application Report Lit. Number November 2011 Keystone Architecture Inter-core Data Exchange Brighton Feng Vincent Han Communication Infrastructure ABSTRACT This application note introduces various methods

More information

2. System Interconnect Fabric for Memory-Mapped Interfaces

2. System Interconnect Fabric for Memory-Mapped Interfaces 2. System Interconnect Fabric for Memory-Mapped Interfaces QII54003-8.1.0 Introduction The system interconnect fabric for memory-mapped interfaces is a high-bandwidth interconnect structure for connecting

More information

A Hardware Cache memcpy Accelerator

A Hardware Cache memcpy Accelerator A Hardware memcpy Accelerator Stephan Wong, Filipa Duarte, and Stamatis Vassiliadis Computer Engineering, Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands {J.S.S.M.Wong, F.Duarte,

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

of Soft Core Processor Clock Synchronization DDR Controller and SDRAM by Using RISC Architecture

of Soft Core Processor Clock Synchronization DDR Controller and SDRAM by Using RISC Architecture Enhancement of Soft Core Processor Clock Synchronization DDR Controller and SDRAM by Using RISC Architecture Sushmita Bilani Department of Electronics and Communication (Embedded System & VLSI Design),

More information

The CoreConnect Bus Architecture

The CoreConnect Bus Architecture The CoreConnect Bus Architecture Recent advances in silicon densities now allow for the integration of numerous functions onto a single silicon chip. With this increased density, peripherals formerly attached

More information

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner RiceNIC Prototyping Network Interfaces Jeffrey Shafer Scott Rixner RiceNIC Overview Gigabit Ethernet Network Interface Card RiceNIC - Prototyping Network Interfaces 2 RiceNIC Overview Reconfigurable and

More information

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA L2: FPGA HARDWARE 18-545: ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA 18-545: FALL 2014 2 Admin stuff Project Proposals happen on Monday Be prepared to give an in-class presentation Lab 1 is

More information

Parallel graph traversal for FPGA

Parallel graph traversal for FPGA LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,

More information

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Who am I? Education Master of Technology, NTNU, 2007 PhD, NTNU, 2010. Title: «Managing Shared Resources in Chip Multiprocessor Memory

More information

Fast dynamic and partial reconfiguration Data Path

Fast dynamic and partial reconfiguration Data Path Fast dynamic and partial reconfiguration Data Path with low Michael Hübner 1, Diana Göhringer 2, Juanjo Noguera 3, Jürgen Becker 1 1 Karlsruhe Institute t of Technology (KIT), Germany 2 Fraunhofer IOSB,

More information

PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing

PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing Py: Yet Another Implementation of Memory Architecture for Modern FPGA-based Computing Shinya Kenji Kise Takamaeda-Yamazaki Tokyo Institute of Technology Tokyo Institute of Technology Tokyo, Japan 152-8552

More information

Getting Started Guide with AXM-A30

Getting Started Guide with AXM-A30 Series PMC-VFX70 Virtex-5 Based FPGA PMC Module Getting Started Guide with AXM-A30 ACROMAG INCORPORATED Tel: (248) 295-0310 30765 South Wixom Road Fax: (248) 624-9234 P.O. BOX 437 Wixom, MI 48393-7037

More information

LogiCORE IP Video Direct Memory Access v1.1

LogiCORE IP Video Direct Memory Access v1.1 LogiCORE IP Video Direct Memory Access v1.1 DS730 September 21, 2010 Introduction The Xilinx Video Direct Memory Access (Video DMA) LogiCORE IP allows video cores to access external memory via the Video

More information

Simplify System Complexity

Simplify System Complexity 1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller

More information

Compiler Manipulation of Stream Descriptors for Data Access Optimization

Compiler Manipulation of Stream Descriptors for Data Access Optimization Compiler Manipulation of Stream Descriptors for Data Access Optimization Abelardo López-Lagunas ITESM Campus Toluca abelardo.lopez@itesm.mx Abstract Efficient data movement is one of the key attributes

More information

The CompSOC Design Flow for Virtual Execution Platforms

The CompSOC Design Flow for Virtual Execution Platforms NEST COBRA CA104 The CompSOC Design Flow for Virtual Execution Platforms FPGAWorld 10-09-2013 Sven Goossens*, Benny Akesson*, Martijn Koedam*, Ashkan Beyranvand Nejad, Andrew Nelson, Kees Goossens* * Introduction

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Implementation of Ethernet, Aurora and their Integrated module for High Speed Serial Data Transmission using Xilinx EDK on Virtex-5 FPGA

Implementation of Ethernet, Aurora and their Integrated module for High Speed Serial Data Transmission using Xilinx EDK on Virtex-5 FPGA Implementation of Ethernet, Aurora and their Integrated module for High Speed Serial Data Transmission using Xilinx EDK on Virtex-5 FPGA Chaitanya Kumar N.V.N.S 1, Mir Mohammed Ali 2 1, 2 Mahaveer Institute

More information

April 6, 2010 Data Sheet Version: v2.05. Support 16 times Support provided by Xylon

April 6, 2010 Data Sheet Version: v2.05. Support 16 times Support provided by Xylon logiwin Versatile Video Controller April 6, 2010 Data Sheet Version: v2.05 Xylon d.o.o. Core Facts Fallerovo setaliste 22 10000 Zagreb, Croatia Provided with Core Phone: +385 1 368 00 26 Fax: +385 1 365

More information

GPU Fundamentals Jeff Larkin November 14, 2016

GPU Fundamentals Jeff Larkin November 14, 2016 GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate

More information

Sri Vidya College of Engineering and Technology. EC6703 Embedded and Real Time Systems Unit IV Page 1.

Sri Vidya College of Engineering and Technology. EC6703 Embedded and Real Time Systems Unit IV Page 1. Sri Vidya College of Engineering and Technology ERTS Course Material EC6703 Embedded and Real Time Systems Page 1 Sri Vidya College of Engineering and Technology ERTS Course Material EC6703 Embedded and

More information

Zynq-7000 All Programmable SoC Product Overview

Zynq-7000 All Programmable SoC Product Overview Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform

More information

Simplify System Complexity

Simplify System Complexity Simplify System Complexity With the new high-performance CompactRIO controller Fanie Coetzer Field Sales Engineer Northern South Africa 2 3 New control system CompactPCI MMI/Sequencing/Logging FieldPoint

More information

SHA3 Core Specification. Author: Homer Hsing

SHA3 Core Specification. Author: Homer Hsing SHA3 Core Specification Author: Homer Hsing homer.hsing@gmail.com Rev. 0.1 January 29, 2013 This page has been intentionally left blank. www.opencores.org Rev 0.1 ii Rev. Date Author Description 0.1 01/29/2013

More information

Hardware Implementation of TRaX Architecture

Hardware Implementation of TRaX Architecture Hardware Implementation of TRaX Architecture Thesis Project Proposal Tim George I. Project Summery The hardware ray tracing group at the University of Utah has designed an architecture for rendering graphics

More information

3.1 Description of Microprocessor. 3.2 History of Microprocessor

3.1 Description of Microprocessor. 3.2 History of Microprocessor 3.0 MAIN CONTENT 3.1 Description of Microprocessor The brain or engine of the PC is the processor (sometimes called microprocessor), or central processing unit (CPU). The CPU performs the system s calculating

More information

BlazePPS (Blaze Packet Processing System) CSEE W4840 Project Design

BlazePPS (Blaze Packet Processing System) CSEE W4840 Project Design BlazePPS (Blaze Packet Processing System) CSEE W4840 Project Design Valeh Valiollahpour Amiri (vv2252) Christopher Campbell (cc3769) Yuanpei Zhang (yz2727) Sheng Qian ( sq2168) March 26, 2015 I) Hardware

More information

ISSN Vol.05,Issue.09, September-2017, Pages:

ISSN Vol.05,Issue.09, September-2017, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.09, September-2017, Pages:1693-1697 AJJAM PUSHPA 1, C. H. RAMA MOHAN 2 1 PG Scholar, Dept of ECE(DECS), Shirdi Sai Institute of Science and Technology, Anantapuramu,

More information

Portland State University ECE 588/688. Cray-1 and Cray T3E

Portland State University ECE 588/688. Cray-1 and Cray T3E Portland State University ECE 588/688 Cray-1 and Cray T3E Copyright by Alaa Alameldeen 2014 Cray-1 A successful Vector processor from the 1970s Vector instructions are examples of SIMD Contains vector

More information

Memory Systems for Embedded Applications. Chapter 4 (Sections )

Memory Systems for Embedded Applications. Chapter 4 (Sections ) Memory Systems for Embedded Applications Chapter 4 (Sections 4.1-4.4) 1 Platform components CPUs. Interconnect buses. Memory. Input/output devices. Implementations: System-on-Chip (SoC) vs. Multi-Chip

More information