High Performance DoD DSP Applications
|
|
- Nora Wiggins
- 5 years ago
- Views:
Transcription
1 High Performance DoD DSP Applications Robert Bond Embedded Digital Systems Group 23 August 2003 Slide-1
2 Outline DoD High-Performance DSP Applications Middleware (with some streaming constructs) Future Directions Summary Slide-2
3 Embedded Signal Processing Requirements TFLOPS REQUIREMENTS INCREASING BY AN ORDER OF MAGNITUDE EVERY 5 YEARS Airborne Radar Shipboard Surveillance UAV Missile Seeker 0.01 SBR YEAR Small Unit Operations SIGINT EMBEDDED PROCESSING REQUIREMENTS WILL EMBEDDED PROCESSING REQUIREMENTS WILL REQUIRE TFOPS IN THE TIME FRAME REQUIRE TFOPS IN THE TIME FRAME hpec_01-1 Slide-1 SC2002 KT Tutorial
4 Embedded Processing Spectrum Applications vs. Digital Technology Programmable Processors Reconfigurable Computing With FPGAs ASIC - Standard Cell (Conventional Packaging) Full Custom VLSI Multi-Chip Modules GOPS / cu. ft. 100,000 10,000 1, Slide-3 Custom Reconfigurable Programmable ,000 10, ,000 MOPS / Watt Space Seeker UAV Airborne Shipboard Small Unit Operations SIGINT High performance Programmable Signal Processors (PSP) address a wide range of of applications Efficient Stream Processing is is the challenge!
5 Radar Stream Signal Processing Algorithms GMTI (STAP) Algorithm GMTI Data Flow Corner Turn Corner Turn Corner Turn SAR Algorithm SAR Data Flow Corner Turn Corner Turn GMTI Functional Pipeline SAR Functional Pipeline Data Input Beamformer Filtering (PC & DF) Detection ( STAP & CFAR ) Data Output Data Input Motion Compensation, Pulse Compression Polar Resampling 2D FFT Data Output Raw Input Data Scheduler RADAR Control + Timing Platform State Information Target Detections Sorted by Range & Doppler Raw Input Data Scheduler RADAR Control + Timing Platform State Information SAR Image(s) Synthetic Aperture Radar (SAR): ~ GOPS ~ s latency Ground Moving Target Indicator (GMTI) ~ GOPS ~ 0.1-1s latency Space-Time Adaptive Processing (STAP) ~ GOPS ~ s latency Slide-4
6 Missile Seeker Stream Video Processing 256x256 Focal Plane Array Integrator 60 FPS Guidance Background Estimation Adaptive Nonuniformity Correction Aimpoint Selection (PH) Dither Processing lmu Multi-Frame Processing Bulk Filter (PH) From other sensors CFAR Detection Latency ~ 1mS Target Selection (Discrimination) Data Fusion Feature Extraction Tracking Subpixel Processing Slide-5... Target update from surface based radar Stream Video Processing Back-end Processing Processing Block Opcount (MFLOPS) NUC Dither Subtraction Dempster-Shafer Plausibility Computation Total
7 Outline DoD High-Performance DSP Applications Middleware (with some streaming constructs) Some Key Features Future Directions Summary Slide-6
8 Mappable Stream Processing Decomposable into s (computations) and Conduits (communications) Beamform X OUT = w *X IN Conduit Filter X OUT = FIR(X IN ) Conduit Mappable to different sets of hardware Detect X OUT = X IN >c Measurable resource usage of each mapping Slide-7 Each compute stage can be mapped to different sets of hardware and timed
9 Parallel Vector Library Layered Architecture Application Input Stream Processing Output Parallel Vector Library stream Grid F(s) Map Conduit Distribution Library API Library ISA Math Primitives Communication Primitives Hardware Intel Cluster Workstation PowerPC Cluster Embedded Board Embedded Multi-computer Slide-8
10 Parallel Vector Library Components Class Description Parallelism Stream Processing & Control Stream Vector/Matrix Function Conduit Streams organized as matrices or vectors that span multiple processors Performs signal/image processing functions on matrices/vectors (e.g. FFT, FIR, QR) Supports algorithm task decomposition (the nodes in a signal flow diagram) Hierarchical Supports stream movement between tasks (interconnection in stream flow graph) Data Data & & Pipeline & Pipeline Mapping Map Grid Slide-9 Specifies how s, Streams, and functions are distributed over processors Organizes processors into a 2D layout (virtual machine model) - Hierarchical Data, & Pipeline
11 Some Key Features The application stream flow is defined by tasks interconnected by conduits. Defines an application macro-architecture Presents opportunities for optimization at construction time s can contain other tasks and conduits Hierarchy helps to handle complex applications Conduits can reorganize data and handle buffering Corner turns, multi-rates, different granularities Streams flow through conduits and are transformed in tasks by functions. Streams are resizable with fundamental epochs defined by vector or matrix dimensions. They can be produced and consumed piecemeal. Function implementations specialize based on stream arguments (and maps.) All objects are mappable and hence implicitly parallel. Functionality is map invariant Scalable and portable (virtual machines) Re-mapping gives support for fault recovery and for automated performance tuning. Slide-10
12 Outline DoD High-Performance DSP Applications Middleware (stream language prototyping) Future Directions Summary Slide-11
13 Future Directions Standardization of parallel signal/image processing middleware: HPEC-Software Initiative (VSIPL++) Generative programming techniques Efficient implementation of multi-expression program blocks Breaking the abstraction barrier without using streams Extensions of middleware to heterogeneous platforms FPGAs, DSPs, Microprocessors Run-time optimal mapping Managing latency, throughput, power, Slide-12
14 Generative Programming Approach: Expression Templates + Comma Operator Problem: Abstraction Boundary Related expressions: Vector/Matrix expressions sharing common variables OUT = IN > THRESH THRESH = 0.9 * THRESH * IN Efficient implementation requires that data used in more than one expression remain in register/cache How to do this and maintain high-level API? Hand-coded for loop works, but is too low level Expression templates (PETE) can build a compilable expression for a single statement, but cannot span multiple statements Solution: PETE combined with Comma Operator Approach: Use comma operator to build expression list PETE Expression list destructor triggers evaluation when semicolon is reached out = in > thresh, thresh &= 0.9 * thresh * in; Slide-13
15 Optimal Mapping of Stream Algorithms Input Low Pass Filter Beamform Matched Filter X IN IN X IN IN FIR1 FIR2 X OUT X IN IN mult X OUT X IN IN FFT IFFT X OUT W 1 W 2 W 3 W 4 1 PE 2 PE 3 PE 4 PE frames/sec (1.6 MHz BW) 66 frames/sec (3.2 MHz BW) Slide-14 Example - Find: Min(latency #CPU) Max(throughput #CPU Example - Use: Genetic Algorithm Decision Directed Learning
16 Predicted and Achieved Latency and Throughput 1.5 Problem Size Small (48x4K) Large (48x128K) Latency (seconds) Throughput (frames/sec) #CPU predicted achieved predicted achieved predicted achieved predicted achieved #CPU Graph Solution selects correct optimal mapping Good agreement between predicted and achieved latencies and throughputs Slide-15
17 Summary High-performance stream processing for DoD applications requires 100 s GOPS in stringent form factors. Programmable solutions are parallel processors Computational efficiency is paramount Middleware approaches emerging today have attributes of stream languages except: Lack rigorous formal definitions (and are incomplete) Do not have good compiler support (efficiency issues) BUT: they provide high productivity compared to traditional approaches AND better lifecycle support is in the business of evaluating new technologies for DoD signal and image processing applications Middleware, languages, FPGAs, VLSI, DSP, PCA, Slide-16
18 PETE Review Step 1: Operators Form expression A= + A=B+C*D B * BinaryNode<OpAdd, float*, Binary Node <OpMul, float*, float*>> C D Vector Operator = it=begin (); for (int i=0; i<size(); i++) { Step 2: = operator evaluates expression Action performed at leaves Action performed at internal nodes OpAdd + OpCombine User specifies what to store at the leaves Dereference Leaf } *it=foreach(expr, DereferenceLeaf(), OpCombine() ); foreach (expr, IncrementLeaf(), NullCombine() ); it++; *it= *bit + *cit * *dit; bit++; cit++; dit++; Increment Leaf PETE ForEach: Recursive descent traversal of expression - User defines action performed at leaves - User defines action performed at internal nodes Translation at compile time - Template specialization - Inlining Slide-17
19 PETE and the Comma Operator Step 1: Operators form expression list, out = in > thresh, thresh = 0.9*thresh + 0.1*in; out = in > thresh thresh = * + * 0.9 thresh 0.1 in Expression list destructor if (isbasenode) { for (int i=0; i<size; i++) { } } Step 2: Expression list destructor triggers expression evaluation Comma orders evaluation from left to right = operator + OpCombine foreach(expr, DereferenceLeaf(), OpCombine() ); foreach(expr, IncrementLeaf(), NullCombine() ); *out = *in1 > *thresh1; *thresh2 = 0.9 * *thresh * *in2; out++; in1++; thresh1++; thresh2++; thresh3++; in2++; PETE ForEach changes: - = operator behavior with OpCombine: assign from RHS to LHS -, operator behavior: first evaluate left expression, then evaluate right expression Slide-18
20 Scalable Approach #include <Vector.h> #include <AddPvl.h> void addvectors(amap, bmap, cmap) { Vector< Complex<Float> > a( a, amap, LENGTH); Vector< Complex<Float> > b( b, bmap, LENGTH); Vector< Complex<Float> > c( c, cmap, LENGTH); b = 1; c = 2; a=b+c; Single Processor Mapping A = B + C Multi Processor Mapping A = B + C } Single processor and multi-processor code are the same Maps can be changed without changing software High level code is is compact Slide-19
21 Corner Turn Operation Filter Beamform Original Data Matrix Channels Processor Channels Cornerturned Data Matrix Time Samples Time Samples All-to-all communication Slide-20 Half the data moves across the bisection of the machine
22 Anatomy of a PVL Map Vector/Matrix Stream Function Conduit All PVL objects contain maps Map PVL Maps contain Grid List of nodes Distribution Overlap {0,2,4,6,8,10} List of Nodes Grid Distribution Overlap Slide-21
Kernel Benchmarks and Metrics for Polymorphous Computer Architectures
PCAKernels-1 Kernel Benchmarks and Metrics for Polymorphous Computer Architectures Hank Hoffmann James Lebak (Presenter) Janice McMahon Seventh Annual High-Performance Embedded Computing Workshop (HPEC)
More informationThe HPEC Challenge Benchmark Suite
The HPEC Challenge Benchmark Suite Ryan Haney, Theresa Meuse, Jeremy Kepner and James Lebak Massachusetts Institute of Technology Lincoln Laboratory HPEC 2005 This work is sponsored by the Defense Advanced
More informationEvaluating the Potential of Graphics Processors for High Performance Embedded Computing
Evaluating the Potential of Graphics Processors for High Performance Embedded Computing Shuai Mu, Chenxi Wang, Ming Liu, Yangdong Deng Department of Micro-/Nano-electronics Tsinghua University Outline
More informationDeployment of SAR and GMTI Signal Processing on a Boeing 707 Aircraft using pmatlab and a Bladed Linux Cluster
Deployment of SAR and GMTI Signal Processing on a Boeing 707 Aircraft using pmatlab and a Bladed Linux Cluster Jeremy Kepner, Tim Currie, Hahn Kim, Bipin Mathew, Andrew McCabe, Michael Moore, Dan Rabinkin,
More informationVirtual Prototyping and Performance Analysis of RapidIO-based System Architectures for Space-Based Radar
Virtual Prototyping and Performance Analysis of RapidIO-based System Architectures for Space-Based Radar David Bueno, Adam Leko, Chris Conger, Ian Troxel, and Alan D. George HCS Research Laboratory College
More informationA Stream Compiler for Communication-Exposed Architectures
A Stream Compiler for Communication-Exposed Architectures Michael Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali Meli, Andrew Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, Saman
More informationLLGrid: On-Demand Grid Computing with gridmatlab and pmatlab
LLGrid: On-Demand Grid Computing with gridmatlab and pmatlab Albert Reuther 29 September 2004 This work is sponsored by the Department of the Air Force under Air Force contract F19628-00-C-0002. Opinions,
More informationHigh Performance Embedded Computing Software Initiative (HPEC-SI)
High Performance Embedded Computing Software Initiative (HPEC-SI) Dr. Jeremy Kepner MIT Laboratory This work is sponsored by the High Performance Computing Modernization Office under Air Force Contract
More informationParallel Matlab: The Next Generation
: The Next Generation Dr. Jeremy Kepner / Ms. Nadya Travinin / This work is sponsored by the Department of Defense under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and
More informationAbstract. TD Convolution. Introduction. Shape : a new class. Lenore R. Mullin Xingmin Luo Lawrence A. Bush. August 7, TD Convolution: MoA Design
Building the Support for Radar Processing across Memory Hierarchies: On the Development of an Array Class with Shapes using Expression Templates in C ++ Lenore R. Mullin Xingmin Luo Lawrence A. Bush Abstract
More informationRapid prototyping of radar algorithms [Applications Corner]
Rapid prototyping of radar algorithms [Applications Corner] The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationDeployment of SAR and GMTI Signal Processing on a Boeing 707 Aircraft using pmatlab and a Bladed Linux Cluster
Deployment of SAR and GMTI Signal Processing on a Boeing 707 Aircraft using pmatlab and a Bladed Linux Cluster Jeremy Kepner, Tim Currie, Hahn Kim, Andrew McCabe, Bipin Mathew, Michael Moore, Dan Rabinkin,
More informationMassively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain
Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,
More informationRFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015
RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015 Outline Motivation Current situation Goal RFNoC Basic concepts Architecture overview Summary No Demo! See our booth,
More informationPerforming Multi-Phased Radar Processing with a Very Deep FPGA Pipeline
Performing Multi-Phased Radar Processing with a Very Deep FPGA Pipeline Jeffrey T. Muehring and John K. Antonio School of Computer Science University of Oklahoma antonio@ou.edu 2000 MAPLD Conference The
More informationAuto Source Code Generation and Run-Time Infrastructure and Environment for High Performance, Distributed Computing Systems
Auto Source Code Generation and Run-Time Infrastructure and Environment for High Performance, Distributed Computing Systems Minesh I. Patel Ph.D. 1, Karl Jordan 1, Mattew Clark Ph.D. 1, and Devesh Bhatt
More informationAdaptive Scientific Software Libraries
Adaptive Scientific Software Libraries Lennart Johnsson Advanced Computing Research Laboratory Department of Computer Science University of Houston Challenges Diversity of execution environments Growing
More informationAn Advanced Graph Processor Prototype
An Advanced Graph Processor Prototype Vitaliy Gleyzer GraphEx 2016 DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited. This material is based upon work supported by the Assistant
More informationGedae cwcembedded.com. The CHAMP-AV6 VPX-REDI. Digital Signal Processing Card. Maximizing Performance with Minimal Porting Effort
Technology White Paper The CHAMP-AV6 VPX-REDI Digital Signal Processing Card Maximizing Performance with Minimal Porting Effort Introduction The Curtiss-Wright Controls Embedded Computing CHAMP-AV6 is
More informationEvaluation of Stream Virtual Machine on Raw Processor
Evaluation of Stream Virtual Machine on Raw Processor Jinwoo Suh, Stephen P. Crago, Janice O. McMahon, Dong-In Kang University of Southern California Information Sciences Institute Richard Lethin Reservoir
More information300x Matlab. Dr. Jeremy Kepner. MIT Lincoln Laboratory. September 25, 2002 HPEC Workshop Lexington, MA
300x Matlab Dr. Jeremy Kepner September 25, 2002 HPEC Workshop Lexington, MA This work is sponsored by the High Performance Computing Modernization Office under Air Force Contract F19628-00-C-0002. Opinions,
More informationComputational Process Networks
Computational Process Networks for Real-Time High-Throughput Signal and Image Processing Systems on Workstations Gregory E. Allen EE 382C - Embedded Software Systems 17 February 2000 http://www.ece.utexas.edu/~allen/
More informationComputational Process Networks a model and framework for high-throughput signal processing
Computational Process Networks a model and framework for high-throughput signal processing Gregory E. Allen Ph.D. Defense 25 April 2011 Committee Members: James C. Browne Craig M. Chase Brian L. Evans
More informationScalable GMTI Tracker
Scalable GMTI Tracker Thomas Kurien Mercury Computer Systems 199 Riverneck Road Chelmsford, Massachusetts 01824 tkurien@mc.com Abstract - This paper describes the design and preliminary implementation
More informationAnalysis and Mapping of Sparse Matrix Computations
Analysis and Mapping of Sparse Matrix Computations Nadya Bliss & Sanjeev Mohindra Varun Aggarwal & Una-May O Reilly MIT Computer Science and AI Laboratory September 19th, 2007 HPEC2007-1 This work is sponsored
More informationThe Xilinx XC6200 chip, the software tools and the board development tools
The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions
More informationProcessor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP
Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor
More informationCOMPUTER ORGANIZATION & ARCHITECTURE
COMPUTER ORGANIZATION & ARCHITECTURE Instructions Sets Architecture Lesson 5b 1 STACKS A stack is an ordered set of elements, only one of which can be accessed at a time. The point of access is called
More informationA Framework for Real-Time High-Throughput Signal and Image Processing Systems on Workstations
A Framework for Real-Time High-Throughput Signal and Image Processing Systems on Workstations Prof. Brian L. Evans in collaboration with Gregory E. Allen and K. Clint Slatton Department of Electrical and
More informationAltera SDK for OpenCL
Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group
More informationAccuracy and Performance Trade-offs of Logarithmic Number Units in Multi-Core Clusters
Accuracy and Performance Trade-offs of Logarithmic Number Units in Multi-Core Clusters ARITH 2016 Silicon Valley July 10-13, 2016 Michael Schaffner 1 Michael Gautschi 1 Frank K. Gürkaynak 1 Prof. Luca
More informationFPGA Provides Speedy Data Compression for Hyperspectral Imagery
FPGA Provides Speedy Data Compression for Hyperspectral Imagery Engineers implement the Fast Lossless compression algorithm on a Virtex-5 FPGA; this implementation provides the ability to keep up with
More informationA KASSPER Real-Time Signal Processor Testbed
A KASSPER Real-Time Signal Processor Testbed Glenn Schrader 244 Wood St. exington MA, 02420 Phone: (781)981-2579 Fax: (781)981-5255 gschrad@ll.mit.edu The Knowledge Aided Sensor Signal Processing and Expert
More informationIQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song. HUAWEI TECHNOLOGIES Co., Ltd.
IQ for DNA Interactive Query for Dynamic Network Analytics Haoyu Song www.huawei.com Motivation Service Provider s pain point Lack of real-time and full visibility of networks, so the network monitoring
More informationFlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com
FlexRIO FPGAs Bringing Custom Functionality to Instruments Ravichandran Raghavan Technical Marketing Engineer Electrical Test Today Acquire, Transfer, Post-Process Paradigm Fixed- Functionality Triggers
More informationFlexible wireless communication architectures
Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar Southern Methodist University April
More informationPortable Heterogeneous High-Performance Computing via Domain-Specific Virtualization. Dmitry I. Lyakh.
Portable Heterogeneous High-Performance Computing via Domain-Specific Virtualization Dmitry I. Lyakh liakhdi@ornl.gov This research used resources of the Oak Ridge Leadership Computing Facility at the
More informationReconfigurable Computing. Introduction
Reconfigurable Computing Tony Givargis and Nikil Dutt Introduction! Reconfigurable computing, a new paradigm for system design Post fabrication software personalization for hardware computation Traditionally
More informationHakam Zaidan Stephen Moore
Hakam Zaidan Stephen Moore Outline Vector Architectures Properties Applications History Westinghouse Solomon ILLIAC IV CDC STAR 100 Cray 1 Other Cray Vector Machines Vector Machines Today Introduction
More informationScaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research
Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Nick Fraser (Xilinx & USydney) Yaman Umuroglu (Xilinx & NTNU) Giulio Gambardella (Xilinx)
More informationParallel FFT Program Optimizations on Heterogeneous Computers
Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationLecture 5: Error Resilience & Scalability
Lecture 5: Error Resilience & Scalability Dr Reji Mathew A/Prof. Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S 010 jzhang@cse.unsw.edu.au Outline Error Resilience Scalability Including slides
More informationOverview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips
Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,
More informationScalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA
Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089
More informationMassively Parallel Processor Breadboarding (MPPB)
Massively Parallel Processor Breadboarding (MPPB) 28 August 2012 Final Presentation TRP study 21986 Gerard Rauwerda CTO, Recore Systems Gerard.Rauwerda@RecoreSystems.com Recore Systems BV P.O. Box 77,
More informationUsing Intel Streaming SIMD Extensions for 3D Geometry Processing
Using Intel Streaming SIMD Extensions for 3D Geometry Processing Wan-Chun Ma, Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University firebird@cmlab.csie.ntu.edu.tw,
More informationIMPLICIT+EXPLICIT Architecture
IMPLICIT+EXPLICIT Architecture Fortran Carte Programming Environment C Implicitly Controlled Device Dense logic device Typically fixed logic µp, DSP, ASIC, etc. Implicit Device Explicit Device Explicitly
More informationComparing Memory Systems for Chip Multiprocessors
Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University
More informationNetwork Bandwidth & Minimum Efficient Problem Size
Network Bandwidth & Minimum Efficient Problem Size Paul R. Woodward Laboratory for Computational Science & Engineering (LCSE), University of Minnesota April 21, 2004 Build 3 virtual computers with Intel
More informationJohn Bloomfield, Mercury Computer Systems, Inc. HPEC September Mercury Computer Systems, Inc.
3DUWLWLRQLQJ &RPSXWDWLRQD7DVNV LWKLQDQ)3*$5,6& +HWHURJHQHRXV 0XWLFRPSXWHU John Bloomfield, Mercury Computer Systems, Inc. HPEC September 2002 Agenda Why worry about partitioning? How we partitioned a real-world
More informationHalfway! Sequoia. A Point of View. Sequoia. First half of the course is over. Now start the second half. CS315B Lecture 9
Halfway! Sequoia CS315B Lecture 9 First half of the course is over Overview/Philosophy of Regent Now start the second half Lectures on other programming models Comparing/contrasting with Regent Start with
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationNew Successes for Parameterized Run-time Reconfiguration
New Successes for Parameterized Run-time Reconfiguration (or: use the FPGA to its true capabilities) Prof. Dirk Stroobandt Ghent University, Belgium Hardware and Embedded Systems group Universiteit Gent
More informationMemory Systems IRAM. Principle of IRAM
Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #12 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Last class Outline
More informationC-Based Hardware Design Platform for Dynamically Reconfigurable Processor
C-Based Hardware Design Platform for Dynamically Reconfigurable Processor September 22 nd, 2005 IPFlex Inc. Agenda Merits of C-Based hardware design Hardware enabling C-Based hardware design DAPDNA-FW
More informationDesign, implementation, and evaluation of parallell pipelined STAP on parallel computers
Syracuse University SURFACE Electrical Engineering and Computer Science College of Engineering and Computer Science 1998 Design, implementation, and evaluation of parallell pipelined STAP on parallel computers
More informationAn Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware
An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware Tao Chen, Shreesha Srinath Christopher Batten, G. Edward Suh Computer Systems Laboratory School of Electrical
More informationDesign and Evaluation of I/O Strategies for Parallel Pipelined STAP Applications
Design and Evaluation of I/O Strategies for Parallel Pipelined STAP Applications Wei-keng Liao Alok Choudhary ECE Department Northwestern University Evanston, IL Donald Weiner Pramod Varshney EECS Department
More informationWelcome. Altera Technology Roadshow 2013
Welcome Altera Technology Roadshow 2013 Altera at a Glance Founded in Silicon Valley, California in 1983 Industry s first reprogrammable logic semiconductors $1.78 billion in 2012 sales Over 2,900 employees
More informationDIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING
1 DSP applications DSP platforms The synthesis problem Models of computation OUTLINE 2 DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: Time-discrete representation
More informationAccelerating the Pulsar Search Pipeline with FPGAs, Programmed in OpenCL
Accelerating the Pulsar Search Pipeline with FPGAs, Programmed in OpenCL Oliver Sinnen, Tyrone Sherwin, and Haomiao Wang & Prabu Thiagaraj (Manchester Uni/Raman Research Institute, Bangalore) Parallel
More informationReconfigurable Versus Fixed Versus Hybrid Architectures
Reconfigurable Versus Fixed Versus Hybrid Architectures John K. Antonio Oklahoma Supercomputing Symposium 2008 Norman, Oklahoma October 6, 2008 Computer Science, University of Oklahoma Overview The (past)
More informationDR. JIVRAJ MEHTA INSTITUTE OF TECHNOLOGY
DR. JIVRAJ MEHTA INSTITUTE OF TECHNOLOGY Subject Name: - DISTRIBUTED SYSTEMS Semester :- 8 th Subject Code: -180701 Branch :- Computer Science & Engineering Department :- Computer Science & Engineering
More informationRecurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks
Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and
More informationAdaptive Computing Systems (ACS) Domain for Implementing DSP Algorithms in Reconfigurable Hardware. Objective/Approach/Process
Adaptive Computing Systems (ACS) Domain for Implementing DSP Algorithms in Reconfigurable Hardware John Zaino, Eric Pauer, Ken Smith, Paul Fiore, Jairam Ramanathan, Cory Myers {john.c.aino, ken.smith,
More informationSupport for Programming Reconfigurable Supercomputers
Support for Programming Reconfigurable Supercomputers Miriam Leeser Nicholas Moore, Albert Conti Dept. of Electrical and Computer Engineering Northeastern University Boston, MA Laurie Smith King Dept.
More informationDeep Learning Accelerators
Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction
More informationEvolution of CAD Tools & Verilog HDL Definition
Evolution of CAD Tools & Verilog HDL Definition K.Sivasankaran Assistant Professor (Senior) VLSI Division School of Electronics Engineering VIT University Outline Evolution of CAD Different CAD Tools for
More informationREAL TIME DIGITAL SIGNAL PROCESSING
REAL TIME DIGITAL SIGNAL PROCESSING UTN - FRBA 2011 www.electron.frba.utn.edu.ar/dplab Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.
More informationFPGA-based Supercomputing: New Opportunities and Challenges
FPGA-based Supercomputing: New Opportunities and Challenges Naoya Maruyama (RIKEN AICS)* 5 th ADAC Workshop Feb 15, 2018 * Current Main affiliation is Lawrence Livermore National Laboratory SIAM PP18:
More informationWhite Paper Assessing FPGA DSP Benchmarks at 40 nm
White Paper Assessing FPGA DSP Benchmarks at 40 nm Introduction Benchmarking the performance of algorithms, devices, and programming methodologies is a well-worn topic among developers and research of
More informationThe Next-Gen ISR Conference brought to you by:
The Next-Gen ISR Conference brought to you by: www.ttcus.com @Techtrain Linkedin/Groups: Technology Training Corporation Solving Tomorrow s ISR Problems: Getting the Right Information to the Right People
More informationNear Memory Computing Spectral and Sparse Accelerators
Near Memory Computing Spectral and Sparse Accelerators Franz Franchetti ECE, Carnegie Mellon University www.ece.cmu.edu/~franzf Co-Founder, SpiralGen www.spiralgen.com The work was sponsored by Defense
More informationEECS 570 Final Exam - SOLUTIONS Winter 2015
EECS 570 Final Exam - SOLUTIONS Winter 2015 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points 1 / 21 2 / 32
More informationXPU A Programmable FPGA Accelerator for Diverse Workloads
XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for
More informationWhat is Computer Architecture?
What is Computer Architecture? Architecture abstraction of the hardware for the programmer instruction set architecture instructions: operations operands, addressing the operands how instructions are encoded
More informationIndian Silicon Technologies 2013
SI.No Topics IEEE YEAR 1. An RFID Based Solution for Real-Time Patient Surveillance and data Processing Bio- Metric System using FPGA 2. Real-time Binary Shape Matching System Based on FPGA 3. An Optimized
More informationSystolic Arrays for Reconfigurable DSP Systems
Systolic Arrays for Reconfigurable DSP Systems Rajashree Talatule Department of Electronics and Telecommunication G.H.Raisoni Institute of Engineering & Technology Nagpur, India Contact no.-7709731725
More informationHigher Level Programming Abstractions for FPGAs using OpenCL
Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center ! Technology scaling favors programmability CPUs."#/0$*12'$-*
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationCoarse Grain Reconfigurable Arrays are Signal Processing Engines!
Coarse Grain Reconfigurable Arrays are Signal Processing Engines! Advanced Topics in Telecommunications, Algorithms and Implementation Platforms for Wireless Communications, TLT-9707 Waqar Hussain Researcher
More informationERS-l SAR processing with CESAR.
ERS-l SAR processing with CESAR. by Einar-Arne Herland Division for Electronics Norwegian Defence Research Establishment P.O.Box 25, N-2007 Kjeller, Norway Abstract A vector processor called CESAR (Computer
More informationB-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes
B-KD rees for Hardware Accelerated Ray racing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek Saarland University, Germany Outline Previous Work B-KD ree as new Spatial Index Structure DynR
More informationA Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms
A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern California Los Angeles, California,
More informationH.264 AVC 4k Decoder V.1.0, 2014
SOC H.264 AVC 4k Video Decoder Datasheet System-On-Chip (SOC) Technologies 1. Key Features 1. Profile: High profile 2. Resolution: 4k (3840x2160) 3. Frame Rate: up to 60fps 4. Chroma Format: 4:2:0 or 4:2:2
More informationCo-synthesis and Accelerator based Embedded System Design
Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer
More informationHeath Yardley University of Adelaide Radar Research Centre
Heath Yardley University of Adelaide Radar Research Centre Radar Parameters Imaging Geometry Imaging Algorithm Gamma Remote Sensing Modular SAR Processor (MSP) Motion Compensation (MoCom) Calibration Polarimetric
More informationMIPS ISA AND PIPELINING OVERVIEW Appendix A and C
1 MIPS ISA AND PIPELINING OVERVIEW Appendix A and C OUTLINE Review of MIPS ISA Review on Pipelining 2 READING ASSIGNMENT ReadAppendixA ReadAppendixC 3 THEMIPS ISA (A.9) First MIPS in 1985 General-purpose
More informationMesh. Mesh Channels. Straight-forward Switching Requirements. Switch Delay. Total Switches. Total Switches. Total Switches?
ESE534: Computer Organization Previously Saw need to exploit locality/structure in interconnect Day 20: April 7, 2010 Interconnect 5: Meshes (and MoT) a mesh might be useful Question: how does w grow?
More informationTop500 Supercomputer list
Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity
More informationThe extreme Adaptive DSP Solution to Sensor Data Processing
The extreme Adaptive DSP Solution to Sensor Data Processing Abstract Martin Vorbach PACT XPP Technologies Leo Mirkin Sky Computers, Inc. The new ISR mobile autonomous sensor platforms present a difficult
More informationEE382V: System-on-a-Chip (SoC) Design
EE382V: System-on-a-Chip (SoC) Design Lecture 10 Task Partitioning Sources: Prof. Margarida Jacome, UT Austin Prof. Lothar Thiele, ETH Zürich Andreas Gerstlauer Electrical and Computer Engineering University
More informationVector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks
Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Christos Kozyrakis Stanford University David Patterson U.C. Berkeley http://csl.stanford.edu/~christos Motivation Ideal processor
More informationEmbedded Computing Platform. Architecture and Instruction Set
Embedded Computing Platform Microprocessor: Architecture and Instruction Set Ingo Sander ingo@kth.se Microprocessor A central part of the embedded platform A platform is the basic hardware and software
More informationAnalysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope
Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope G. Mohana Durga 1, D.V.R. Mohan 2 1 M.Tech Student, 2 Professor, Department of ECE, SRKR Engineering College, Bhimavaram, Andhra
More informationTed N. Booth. DesignLinx Hardware Solutions
Ted N. Booth DesignLinx Hardware Solutions September 2015 Using Vivado HLS for Video Algorithm Implementation for Demonstration and Validation Agenda Project Description HLS Lessons Learned Summary Project
More informationPerformance potential for simulating spin models on GPU
Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational
More informationQsys and IP Core Integration
Qsys and IP Core Integration Stephen A. Edwards (after David Lariviere) Columbia University Spring 2016 IP Cores Altera s IP Core Integration Tools Connecting IP Cores IP Cores Cyclone V SoC: A Mix of
More information