C-Based Hardware Design Platform for Dynamically Reconfigurable Processor
|
|
- Abigail Barber
- 6 years ago
- Views:
Transcription
1 C-Based Hardware Design Platform for Dynamically Reconfigurable Processor September 22 nd, 2005 IPFlex Inc.
2 Agenda Merits of C-Based hardware design Hardware enabling C-Based hardware design DAPDNA-FW II Design Tool Programming Example and Performance < 2 >
3 Short development iteration while achieving high performance Until Now IPFlex s Technology Enables Development Idea Development Idea Algorithm & System Design Algorithm & System Design Long re-design iterations takes 80% of system dev. time Logic Design Physical Design HW Test Chip SW Long Development Cycle and Short PLC leads to Risk SW Design = Product HW Design Direct and Timely Link between Idea and Product Reduced Development Cycle = Reduced Risk Product < 3 >
4 HW and SW enabling short iteration cycle Dynamically Reconfigurable Hardware Hardware functions reconfigured in a single clock Software to Silicon TM Hardware design with C-like language < 4 >
5 Agenda Merits of C-Based hardware design Hardware enabling C-Based hardware design DAPDNA-FW II Design Tool Programming Example and Performance < 5 >
6 RISC core and dynamically reconfigurable data flow machine 166MHz bit processing elements 512KB Internal RAM Direct I/O DDR SDRAM Processor chip, design tool, evaluation boards available < 6 >
7 RISC core and dynamically reconfigurable data flow machine PE Matrix < 7 >
8 DAP RISC and DNA Parallel Data Processing Memory.L8: cmpge t3,8 br/t.l15.l9: mov t8,cos_table1 mov t3,cos_table2 addv t2,t8,t2 mov cos_table3,t9 lsl t7,t3,3 ld t8,[fp+4] addv t2,t8,t2 IDU EU REG lsl t7,t7,1 : : : PE matrix Memory addv t8,t8,t11 addv t7,t7,t10 addv t4,t4,a1.l12: mov t9,1 addv t2,t2,t9 br.l10.l13:.l14: mov t2,1 addv t3,t3,t2 br.l8c Config Mem < 8 >
9 Heterogeneous PE Matrix PE Matrix PE EXE EXM DLE RAM # Functionality 32 bit 2 inputs / 1 output execution elements 16bitx16bit->32bit multipliers 32 bit 2 inputs / 1 output delay element DNA Internal memory element 16KB 32 =512KB C16E C32E LDx STx Total Address generators I/O buffers < 9 >
10 Scalability with DNA Direct I/O 16 DNAs at 16Gbps flow-thru (Total 32 Gbps I/O) DRAM DRAM DRAM #0 Bus Switch #1 Bus Switch #2 Bus Switch Ext. Device Direct IO DNA Direct IO Direct IO DNA Direct IO Direct IO DNA Direct IO Ext. Device Synchronize with the external clock 166MHz, 96 bit Interconnection between DNAs equivalent to DLE between segments < 10 >
11 DAPDNA architecture: solution at any system size System Example FPGA/ ASIC FPGA/ ASIC FPGA/ ASIC FPGA/ ASIC 4 Pipelined Processes Dynamic Reconfiguration Scalability Small scale Mid/Large size Ultra large Function 4 Function 3 Function 2 Function 1 One DNA Plane One DAPDNA Processor Up to 16 DAPDNA Processors < 11 >
12 Agenda Merits of C-Based hardware design Hardware enabling C-Based hardware design DAPDNA-FW II Design Tool Programming Example and Performance < 12 >
13 Data Flow C compiler: A part of Software to Silicon design tool Design Specify : Tools included in DAPDNA-FW II DNA P.A. C for DAP System Specifications Profiler Wrapper for DNA Configuration Algorithm Design C Source Code Partitioning DFC Compiler C for DNA DFC Compiler DNA Configuration Block Diagram DNA Designer MATLAB/Simulink DNA Blockset DAP Compiler DNA Compiler (Dynamic Reconfiguration, Place & Route) Compile DAP DAPDNA Co-Simulator / Emulator DNA DAPDNA Object < 13 >
14 DFC Compiler: HW configurations from C language C based Data Flow Description DNA Configuration DNA Control code for DAP DFC Compiler < 14 >
15 HW configurations written in DFC as functions Application (DAP C code) void top_level() { fir(d,h,r); } Call DNA Control Code void fir (int d[ ], int h[ ], int r[ ]) { dna_cfgram_load3(bank3, cfg_fir); dna_config(0,bank3); dna_run(); Load config data Bank Switch DNA Run DNA Hardware DNA Config mem dna_stop(); } DNA Stop DNA PE Matrix < 15 >
16 HW configurations switched dynamically in single clock (6 nano second) Application (DAP C code)... while (1) {... VLC_Decoder(Vector,Quant_dat,Input); Inv_Quantize(dct_dat, quant_dat); Inv_DCT(mc_dat, dct_dat); Motion_Comp(mcu_dat,mc_dat); Color_Conv(mcu_dat, mcu_dat);... } Color Convert Motion Comp Inv DCT Dynam ic Recon. (Partition in Tim e) DNA Hardware Inv Quantization VLC Decoder < 16 >
17 Extension on ANSI C for data flow systems Delayline: delayed data signals delayline d; a d[0] d = a; z = d[0] + d[1] + d[2] + d[3]; Z -1 d[1] Z -1 d[2] + z SEQ: Static code replication Z -1 d[3] int h[] = {1,2,3}; 1 z[0] int z[]; seq (i=0; i<3; i++) { a 2 z[1] z[i] = a * h[i]; } 3 z[2] < 17 >
18 Advantages of DAPDNA architecture Short development period - Rapid prototyping - Co-Simulation between Processor (DAP RISC) and Data Flow Machine (DNA PE Matrix) - Deterministic placement / guaranteed clock 166Mhz Flexibility to change HW according to the application in demand - Dynamic reconfiguration in single clock Performance close to custom device - Deep pipelining and massive parallelization processing elements < 18 >
19 Agenda Merits of C-Based hardware design Hardware enabling C-Based hardware design DAPDNA-FW II Design Tool Programming Example and Performance < 19 >
20 Using dynamic reconfigurability to change image filters DAPDNA-EB5 Evaluation Board DAPDNA-FW II Debug Box DAPDNA-2 Various image filters written in DFC and compiled Laplacian Binary Average Etc. Filters run on DAPDNA-2 using dynamic reconfigurability of hardware < 20 >
21 DAP calls DNA configurations with simple function calls List of filters written in Data Flow c An Average3x3 filter being called in main.c as a function Main.c for DAP RISC < 21 >
22 DFC-described filters compiled into HW with a single click of build button Average3x3 filter described in DFC uses delayline and seq commends to take advantage of parallelization < 22 >
23 resulting in a logical representation of the filter with 32-bit processing elements Pipeline Depth Parallelization Data In Data Out < 23 >
24 as well as physical mapping on DNA Place & Route result Resource manager displays usage and power consumption < 24 >
25 DFC compiler result for image filters DAPDNA-2 (166 Mhz) Image Filters EXE EXM RAM DLE Parallel Performance M Pixel/sec Fps(800x600) 3x3 Average Usage % Binary Usage % Binary Image Edge Det. Usage % < 25 >
26 Optimized result with DNA Designer (Block Diagram) tool Image Filters Images Performance (M Pixels / s) Pentium 4* (3.06Ghz) DAPDNA-2 (166Mhz) Performance Multiple 3x3 Average x3 Laplacian (Edge Enhancement) x3 Laplacian (Edge Detection) Binary Binary Image Edge Detection * Pentium4@3.06GHz, Red Hat Linux 8.0, gcc 3.2, SIMD instruction not used, 24bit color, 800x600 pixels, 1000 Repetitions, Average performance (pixel/sec) < 26 >
27 FFT: Protein structure analysis Extrapolating molecular structure from X-ray diffraction image Extrapolation Algorithm (HIO) Reciprocal Space Real Space 2D-IFFT Adjust x1000 Adjust 2D-FFT Requires 1024 X bit FFTs Source: RIKEN < 27 >
28 DAPDNA-2 is 10X faster than P4 with 1/200 energy Time elapsed (msec) D-FFT (1024x1024) 360 msec P4 8mWh 27 msec DD2 Energy used (mwh) mWh 1000 iterations (HIO) Time elapsed Energy used 16 (min) (Wh) Wh min 1.5 min P4 DD2.1Wh Time 13 : 1 10 : 1 Energy 267 : : 1 * Assuming 80W for P4 and 4W for DAPDNA-2 Source: IPFlex experimentation < 28 >
29 10Gbps active firewall by NTT 10Gb/s Firewall System using Reconfigurable Processors Reconfigurable system technical committee < 29 >
30 10Gbps active firewall by NTT 10Gb/s Firewall System using Reconfigurable Processors Reconfigurable system technical committee < 30 >
31 IPFlex s technology / product roadmap General MPU like Pentium and DNA+AXION complement well Realizes extreme performance Granularity of operands Fine Coarse General MPU Low DNA AXION High Degree of computational parallelism Performance Low High DAPDNA-2 Pentium series Single-core Super chip Pentium Extreme edition IPFlex-enhanced multi-core Processor series Intel multi-core Pentium series Multi-core 2005 Year < 31 >
32 IPFlex s technology / product roadmap Today HW Concept Proof DAPDNA -HP First Commercial Product DAPDNA -2 High-volume Product DAPDNA -2C High-end Product DAPDNA -3A First Software-defined (Axion-based) Product DAPDNA 100MHz 144PEs 16KB RAM 166MHz 376PEs 512KB RAM Direct I/O DDR SDRAM 200MHz ~1000 Simple PEs 333~800MHz 2~4 DAP core 2048~3072PEs 666~1200MHz 4~8 DAP core 4096~6144PEs SW DAP Compiler DNA Compiler Emulator Simulator DAP Compiler DNA Compiler Emulator Simulator DAP Compiler DNA Compiler Emulator Simulator DAP Compiler DNA Compiler Emulator Simulator DAP Compiler DNA Compiler Emulator Simulator Auto Fitter DataFlow-C Compiler Auto Fitter DataFlow-C Compiler Auto Fitter DataFlow-C Compiler Auto Fitter DataFlow-C Compiler Dry-C Compiler Dry-C Compiler C++ Compiler JAVA Compiler Dry-C Compiler C++ Compiler JAVA Compiler System-C Compiler System-Verilog Compiler < 32 >
33 Dynamically Reconfigurable Processor and C-Based Hardware Design Platform by IPFlex Inc. < 33 >
Versal: AI Engine & Programming Environment
Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY
More informationAscenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005
Ascenium: A Continuously Reconfigurable Architecture Robert Mykland Founder/CTO robert@ascenium.com August, 2005 Ascenium: A Continuously Reconfigurable Processor Continuously reconfigurable approach provides:
More informationOrganic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design
Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design 1 Reconfigurable Computing Platforms 2 The Von Neumann Computer Principle In 1945, the
More informationNVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)
NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVDLA NVIDIA DEEP LEARNING ACCELERATOR IP Core for deep learning part of NVIDIA s Xavier
More informationSoftware Defined Modem A commercial platform for wireless handsets
Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from
More informationVersal: The New Xilinx Adaptive Compute Acceleration Platform (ACAP) in 7nm
Engineering Director, Xilinx Silicon Architecture Group Versal: The New Xilinx Adaptive Compute Acceleration Platform (ACAP) in 7nm Presented By Kees Vissers Fellow February 25, FPGA 2019 Technology scaling
More informationAdvance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts
Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism
More informationThe extreme Adaptive DSP Solution to Sensor Data Processing
The extreme Adaptive DSP Solution to Sensor Data Processing Abstract Martin Vorbach PACT XPP Technologies Leo Mirkin Sky Computers, Inc. The new ISR mobile autonomous sensor platforms present a difficult
More informationCo-synthesis and Accelerator based Embedded System Design
Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer
More informationFlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com
FlexRIO FPGAs Bringing Custom Functionality to Instruments Ravichandran Raghavan Technical Marketing Engineer Electrical Test Today Acquire, Transfer, Post-Process Paradigm Fixed- Functionality Triggers
More informationasoc: : A Scalable On-Chip Communication Architecture
asoc: : A Scalable On-Chip Communication Architecture Russell Tessier, Jian Liang,, Andrew Laffely,, and Wayne Burleson University of Massachusetts, Amherst Reconfigurable Computing Group Supported by
More informationGeneral Purpose Signal Processors
General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationIntroduction to Microprocessor
Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device
More informationA 1-GHz Configurable Processor Core MeP-h1
A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface
More informationExploring the Effects of Hyperthreading on Scientific Applications
Exploring the Effects of Hyperthreading on Scientific Applications by Kent Milfeld milfeld@tacc.utexas.edu edu Kent Milfeld, Chona Guiang, Avijit Purkayastha, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER
More informationResource Efficiency of Scalable Processor Architectures for SDR-based Applications
Resource Efficiency of Scalable Processor Architectures for SDR-based Applications Thorsten Jungeblut 1, Johannes Ax 2, Gregor Sievers 2, Boris Hübener 2, Mario Porrmann 2, Ulrich Rückert 1 1 Cognitive
More informationThe Pentium II/III Processor Compiler on a Chip
The Pentium II/III Processor Compiler on a Chip Ronny Ronen Senior Principal Engineer Director of Architecture Research Intel Labs - Haifa Intel Corporation Tel Aviv University January 20, 2004 1 Agenda
More informationA Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013
A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company
More informationAn Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki
An Ultra High Performance Scalable DSP Family for Multimedia Hot Chips 17 August 2005 Stanford, CA Erik Machnicki Media Processing Challenges Increasing performance requirements Need for flexibility &
More informationPreliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths
Preliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths Y. Kodama, T. Odajima, M. Matsuda, M. Tsuji, J. Lee and M. Sato RIKEN AICS (Advanced Institute for Computational
More informationJim Keller. Digital Equipment Corp. Hudson MA
Jim Keller Digital Equipment Corp. Hudson MA ! Performance - SPECint95 100 50 21264 30 21164 10 1995 1996 1997 1998 1999 2000 2001 CMOS 5 0.5um CMOS 6 0.35um CMOS 7 0.25um "## Continued Performance Leadership
More informationEnergy Consumption Evaluation of an Adaptive Extensible Processor
Energy Consumption Evaluation of an Adaptive Extensible Processor Hamid Noori, Farhad Mehdipour, Maziar Goudarzi, Seiichiro Yamaguchi, Koji Inoue, and Kazuaki Murakami December 2007 Outline Introduction
More informationThe Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006
The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content
More informationSimplifying FPGA Design for SDR with a Network on Chip Architecture
Simplifying FPGA Design for SDR with a Network on Chip Architecture Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 RF NoC 3 Status and Conclusions USRP FPGA Capability Gen
More informationHow to validate your FPGA design using realworld
How to validate your FPGA design using realworld stimuli Daniel Clapham National Instruments ni.com Agenda Typical FPGA Design NIs approach to FPGA Brief intro into platform based approach RIO architecture
More informationDigital Signal Processor Core Technology
The World Leader in High Performance Signal Processing Solutions Digital Signal Processor Core Technology Abhijit Giri Satya Simha November 4th 2009 Outline Introduction to SHARC DSP ADSP21469 ADSP2146x
More informationVenezia: a Scalable Multicore Subsystem for Multimedia Applications
Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and
More informationice40 SPRAM Usage Guide Technical Note
TN1314 Version 1.0 June 2016 Contents 1. Introduction... 3 2. Single Port RAM s... 3 2.1. User SB_SPRAM256KA... 3 2.2. SPRAM Port Definitions and GUI Options... 4 3. Power Save States for SPRAM... 6 3.1.
More informationDesign Once with Design Compiler FPGA
Design Once with Design Compiler FPGA The Best Solution for ASIC Prototyping Synopsys Inc. Agenda Prototyping Challenges Design Compiler FPGA Overview Flexibility in Design Using DC FPGA and Altera Devices
More informationTowards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing
Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Walter Stechele, Stephan Herrmann, Andreas Herkersdorf Technische Universität München 80290 München Germany Walter.Stechele@ei.tum.de
More informationECE 471 Embedded Systems Lecture 2
ECE 471 Embedded Systems Lecture 2 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 3 September 2015 Announcements HW#1 will be posted today, due next Thursday. I will send out
More informationGraduate Institute of Electronics Engineering, NTU Advanced VLSI SOPC design flow
Advanced VLSI SOPC design flow Advisor: Speaker: ACCESS IC LAB What s SOC? IP classification IP reusable & benefit Outline SOPC solution on FPGA SOPC design flow pp. 2 What s SOC? Definition of SOC Advantage
More informationAn MPSoC for Energy-Efficient Database Query Processing
Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G. Fettweis An MPSoC for Energy-Efficient Database Query Processing TensilicaDay 2016 Sebastian Haas Emil Matúš Gerhard Fettweis 09.02.2016
More informationHigh Speed Multi-User ASIC/SoC Prototyping system
High Speed Multi-User ASIC/SoC Prototyping system Technical Resource Document Date: August 23, 2010 About GiDEL GiDEL has become one of the market leaders as a company that continuously provides cuttingedge
More informationSimplify System Complexity
Simplify System Complexity With the new high-performance CompactRIO controller Fanie Coetzer Field Sales Engineer Northern South Africa 2 3 New control system CompactPCI MMI/Sequencing/Logging FieldPoint
More informationUMBC. Rubini and Corbet, Linux Device Drivers, 2nd Edition, O Reilly. Systems Design and Programming
Systems Design and Programming Instructor: Professor Jim Plusquellic Text: Barry B. Brey, The Intel Microprocessors, 8086/8088, 80186/80188, 80286, 80386, 80486, Pentium and Pentium Pro Processor Architecture,
More informationBuilding and Using the ATLAS Transactional Memory System
Building and Using the ATLAS Transactional Memory System Njuguna Njoroge, Sewook Wee, Jared Casper, Justin Burdick, Yuriy Teslyar, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory Stanford
More informationSeveral Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining
Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the
More informationTHE NVIDIA DEEP LEARNING ACCELERATOR
THE NVIDIA DEEP LEARNING ACCELERATOR INTRODUCTION NVDLA NVIDIA Deep Learning Accelerator Developed as part of Xavier NVIDIA s SOC for autonomous driving applications Optimized for Convolutional Neural
More informationModel-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany
Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation
More informationHow to Write Fast Numerical Code
How to Write Fast Numerical Code Lecture: Memory hierarchy, locality, caches Instructor: Markus Püschel TA: Alen Stojanov, Georg Ofenbeck, Gagandeep Singh Organization Temporal and spatial locality Memory
More informationThe S6000 Family of Processors
The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which
More informationINTRODUCTION TO FPGA ARCHITECTURE
3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)
More informationDigital Blocks Semiconductor IP
Digital Blocks Semiconductor IP General Description The Digital Blocks LCD Controller IP Core interfaces a video image in frame buffer memory via the AMBA 3.0 / 4.0 AXI Protocol Interconnect to a 4K and
More informationXPU A Programmable FPGA Accelerator for Diverse Workloads
XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for
More informationLecture 7: Introduction to Co-synthesis Algorithms
Design & Co-design of Embedded Systems Lecture 7: Introduction to Co-synthesis Algorithms Sharif University of Technology Computer Engineering Dept. Winter-Spring 2008 Mehdi Modarressi Topics for today
More informationsystems such as Linux (real time application interface Linux included). The unified 32-
1.0 INTRODUCTION The TC1130 is a highly integrated controller combining a Memory Management Unit (MMU) and a Floating Point Unit (FPU) on one chip. Thanks to the MMU, this member of the 32-bit TriCoreTM
More informationCONTACT: ,
S.N0 Project Title Year of publication of IEEE base paper 1 Design of a high security Sha-3 keccak algorithm 2012 2 Error correcting unordered codes for asynchronous communication 2012 3 Low power multipliers
More informationExploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture
Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture Ramadass Nagarajan Karthikeyan Sankaralingam Haiming Liu Changkyu Kim Jaehyuk Huh Doug Burger Stephen W. Keckler Charles R. Moore Computer
More informationCymric A Framework for Prototyping Near-Memory Architectures
A Framework for Prototyping Near-Memory Architectures Chad D. Kersey 1, Hyesoon Kim 2, Sudhakar Yalamanchili 1 The rest of the team: Nathan Braswell, Jemmy Gazhenko, Prasun Gera, Meghana Gupta, Hyojong
More informationSH-X3 Flexible SuperH Multi-core for High-performance and Low-power Embedded Systems
SH-X3 Flexible SuperH Multi-core for High-performance and Low-power Embedded Systems Shinichi Shibahara 1, Masashi Takada 2, Tatsuya Kamei 1, Kiyoshi Hayase 1, Yutaka Yoshida 1, Osamu Nishii 1, Toshihiro
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended
More informationAn Infrastructural IP for Interactive MPEG-4 SoC Functional Verification
International Journal on Electrical Engineering and Informatics - Volume 1, Number 2, 2009 An Infrastructural IP for Interactive MPEG-4 SoC Functional Verification Trio Adiono 1, Hans G. Kerkhoff 2 & Hiroaki
More informationBasics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS
Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS
More informationHigh Performance Embedded Applications. Raja Pillai Applications Engineering Specialist
High Performance Embedded Applications Raja Pillai Applications Engineering Specialist Agenda What is High Performance Embedded? NI s History in HPE FlexRIO Overview System architecture Adapter modules
More informationInterconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp
Interconnect Challenges in a Many Core Compute Environment Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Agenda Microprocessor general trends Implications Tradeoffs Summary
More informationAn Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware
An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware Tao Chen, Shreesha Srinath Christopher Batten, G. Edward Suh Computer Systems Laboratory School of Electrical
More informationSimXMD Simulation-based HW/SW Co-debugging for field-programmable Systems-on-Chip
SimXMD Simulation-based HW/SW Co-debugging for field-programmable Systems-on-Chip Ruediger Willenberg and Paul Chow High-Performance Reconfigurable Computing Group University of Toronto September 4, 2013
More informationAn Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection
An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,
More informationExperiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor
Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Juan C. Pichel Centro de Investigación en Tecnoloxías da Información (CITIUS) Universidade de Santiago de Compostela, Spain
More informationExtending the Power of FPGAs to Software Developers:
Extending the Power of FPGAs to Software Developers: The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Group Page 1 Agenda The Evolution of FPGAs and FPGA Programming
More informationECE 471 Embedded Systems Lecture 2
ECE 471 Embedded Systems Lecture 2 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 4 September 2014 Announcements HW#1 will be posted tomorrow (Friday), due next Thursday Working
More informationCS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory
CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5
More informationNew Software-Designed Instruments
1 New Software-Designed Instruments Nicholas Haripersad Field Applications Engineer National Instruments South Africa Agenda What Is a Software-Designed Instrument? Why Software-Designed Instrumentation?
More informationThe Bifrost GPU architecture and the ARM Mali-G71 GPU
The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our
More informationEvolution of Computers & Microprocessors. Dr. Cahit Karakuş
Evolution of Computers & Microprocessors Dr. Cahit Karakuş Evolution of Computers First generation (1939-1954) - vacuum tube IBM 650, 1954 Evolution of Computers Second generation (1954-1959) - transistor
More informationDataflow: The Road Less Complex
Dataflow: The Road Less Complex Steven Swanson Ken Michelson Andrew Schwerin Mark Oskin University of Washington Sponsored by NSF and Intel Things to keep you up at night (~2016) Opportunities 8 billion
More informationVivado HLx Design Entry. June 2016
Vivado HLx Design Entry June 2016 Agenda What is the HLx Design Methodology? New & Early Access features for Connectivity Platforms Creating Differentiated Logic 2 What is the HLx Design Methodology? Page
More informationAll About the Cell Processor
All About the Cell H. Peter Hofstee, Ph. D. IBM Systems and Technology Group SCEI/Sony Toshiba IBM Design Center Austin, Texas Acknowledgements Cell is the result of a deep partnership between SCEI/Sony,
More informationDigital Blocks Semiconductor IP
Digital Blocks Semiconductor IP -UHD General Description The Digital Blocks -UHD LCD Controller IP Core interfaces a video image in frame buffer memory via the AMBA 3.0 / 4.0 AXI Protocol Interconnect
More informationUFS Unified Memory Extension
UFS Unified Memory Extension Nobuhiro KONDO Toshiba Corporation JEDEC at CES 2014 Copyright 2013-2014 Toshiba Corporation What & Why. UNIFIED MEMORY (UM) What is Unified Memory? Share huge chunk of host
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationSimXMD: Simulation-based HW/SW Co-Debugging for FPGA Embedded Systems
FPGAworld 2014 SimXMD: Simulation-based HW/SW Co-Debugging for FPGA Embedded Systems Ruediger Willenberg and Paul Chow High-Performance Reconfigurable Computing Group University of Toronto September 9,
More informationMPSoC Design Space Exploration Framework
MPSoC Design Space Exploration Framework Gerd Ascheid RWTH Aachen University, Germany Outline Motivation: MPSoC requirements in wireless and multimedia MPSoC design space exploration framework Summary
More informationSimXMD Co-Debugging Software and Hardware in FPGA Embedded Systems
University of Toronto FPGA Seminar SimXMD Co-Debugging Software and Hardware in FPGA Embedded Systems Ruediger Willenberg and Paul Chow High-Performance Reconfigurable Computing Group University of Toronto
More informationRecap: Machine Organization
ECE232: Hardware Organization and Design Part 14: Hierarchy Chapter 5 (4 th edition), 7 (3 rd edition) http://www.ecs.umass.edu/ece/ece232/ Adapted from Computer Organization and Design, Patterson & Hennessy,
More informationAn Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs
An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs Architecture optimized for Fast Ultra Long FFTs Parallel FFT structure reduces external memory bandwidth requirements Lengths from 32K to
More informationEarly Performance-Cost Estimation of Application-Specific Data Path Pipelining
Early Performance-Cost Estimation of Application-Specific Data Path Pipelining Jelena Trajkovic Computer Science Department École Polytechnique de Montréal, Canada Email: jelena.trajkovic@polymtl.ca Daniel
More informationPark Sung Chul. AE MentorGraphics Korea
PGA Design rom Concept to Silicon Park Sung Chul AE MentorGraphics Korea The Challenge of Complex Chip Design ASIC Complex Chip Design ASIC or FPGA? N FPGA Design FPGA Embedded Core? Y FPSoC Design Considerations
More informationAdaptive Scientific Software Libraries
Adaptive Scientific Software Libraries Lennart Johnsson Advanced Computing Research Laboratory Department of Computer Science University of Houston Challenges Diversity of execution environments Growing
More informationRevolutionizing the Datacenter
Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5
More informationHigh Level, high speed FPGA programming
Opportunity: FPAs High Level, high speed FPA programming Wim Bohm, Bruce Draper, Ross Beveridge, Charlie Ross, Monica Chawathe Colorado State University Reconfigurable Computing technology High speed at
More informationIntel released new technology call P6P
P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new
More informationOptimize DSP Designs and Code using Fixed-Point Designer
Optimize DSP Designs and Code using Fixed-Point Designer MathWorks Korea 이웅재부장 Senior Application Engineer 2013 The MathWorks, Inc. 1 Agenda Fixed-point concepts Introducing Fixed-Point Designer Overview
More informationBasic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices
3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific
More informationAn Infrastructural IP for Interactive MPEG-4 SoC Functional Verification
ITB J. ICT Vol. 3, No. 1, 2009, 51-66 51 An Infrastructural IP for Interactive MPEG-4 SoC Functional Verification 1 Trio Adiono, 2 Hans G. Kerkhoff & 3 Hiroaki Kunieda 1 Institut Teknologi Bandung, Bandung,
More informationOASIS Network-on-Chip Prototyping on FPGA
Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of
More informationReconfigurable Cell Array for DSP Applications
Outline econfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden econfigurable computing Coarse-grained reconfigurable cell
More informationUniversität Dortmund. ARM Architecture
ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture
More informationIntelop. *As new IP blocks become available, please contact the factory for the latest updated info.
A FPGA based development platform as part of an EDK is available to target intelop provided IPs or other standard IPs. The platform with Virtex-4 FX12 Evaluation Kit provides a complete hardware environment
More informationSystem-on-Chip Architecture for Mobile Applications. Sabyasachi Dey
System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution
More informationNISC Application and Advantages
NISC Application and Advantages Daniel D. Gajski Mehrdad Reshadi Center for Embedded Computer Systems University of California, Irvine Irvine, CA 92697-3425, USA {gajski, reshadi}@cecs.uci.edu CECS Technical
More information6.375: Complex Digital Systems. February 3, L01-1. Something new and exciting as well as useful
6.375: Complex Digital Systems Lecturer: TA: Administration: Arvind Ming Liu Sally Lee February 3, 2016 http://csg.csail.mit.edu/6.375 L01-1 Why take 6.375 Something new and exciting as well as useful
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationCOMP Parallel Computing. SMM (1) Memory Hierarchies and Shared Memory
COMP 633 - Parallel Computing Lecture 6 September 6, 2018 SMM (1) Memory Hierarchies and Shared Memory 1 Topics Memory systems organization caches and the memory hierarchy influence of the memory hierarchy
More informationExploration of Cache Coherent CPU- FPGA Heterogeneous System
Exploration of Cache Coherent CPU- FPGA Heterogeneous System Wei Zhang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology 1 Outline ointroduction to FPGA-based
More informationWaferBoard Rapid Prototyping
WaferBoard Rapid Prototyping WaferBoard (cover not shown) 1. Select components that are packaged in ball grid array, QFP, TSOP, etc. 2. Place the packaged components FPGAs, ASICs, processors, memories,
More informationThe MorphoSys Parallel Reconfigurable System
The MorphoSys Parallel Reconfigurable System Guangming Lu 1, Hartej Singh 1,Ming-hauLee 1, Nader Bagherzadeh 1, Fadi Kurdahi 1, and Eliseu M.C. Filho 2 1 Department of Electrical and Computer Engineering
More informationHeterogeneous Processing
Heterogeneous Processing Maya Gokhale maya@lanl.gov Outline This talk: Architectures chip node system Programming Models and Compiler Overview Applications Systems Software Clock rates and transistor counts
More information