Computed Tomography (CT) Scan Image Reconstruction on the SRC-7 David Pointer SRC Computers, Inc.
|
|
- Sheena Hutchinson
- 6 years ago
- Views:
Transcription
1 Computed Tomography (CT) Scan Image Reconstruction on the SRC-7 David Pointer SRC Computers, Inc.
2 CT Image Reconstruction Herman Head Sinogram Herman Head Reconstruction
3 CT Image Reconstruction for all detectors and projections for all pixels initialize filter data sum filtered data into pixel reconstructed Image FFT data *= coefficient IFFT
4 CT Scanner Equipment Today All scanner parameters fixed by manufacturer Typical reconstructed image parameters 512x512 grayscale image resolution 5 minute image reconstruction Equipment costly Closed, proprietary systems Data type unknown Processor type(s) unknown Equipment design life cycle 7-10 years typical No need for speed? Algorithm patents say yes.
5 CTsim Application CTsim Open source CT simulator: Dr. Kevin Rosenberg, M.D. All scanner parameters programmable Written in C++ Selected application parameters 165 detectors 180 projections (views) 1024x1024 reconstructed grayscale image Single precision floating point (SPFP) calculations CTsim FBP CPU Execution Time (fftw) AMD Opteron, 2200 MHz, 1024 KB cache: seconds Intel Xeon 3000 MHz, 2048 KB cache: seconds
6 CTsim Application filtered backprojection FFT Multiply Inverse FFT Pixel Summation Projection Datasets Filter Coefficients Image Display Microprocessor
7 CTsim Application Partition 8P SRC-7 Series H MAP Processor filtered backprojection FFT Multiply Inverse FFT Pixel Summation Projection Datasets Filter Coefficients Image Display Microprocessor Parallel summation for 8 Pixels (8P)
8 CTsim MAP Implementation 8P (Filter) MAP OBM FPGA RAM Projection Datasets 4 OBM F F T FFT Twiddle Table Filter Coefficients 4 mults Filter Filtered Projection Datasets 4 OBM I F F T Filtered Projection Datasets 4 OBM Filtered Datasets 8 arrays
9 CTsim MAP Implementation cfft_fp32() FFT Macro SRC s Signal Processing library macro Programmable point size 256 to complex SPFP input/output per FPGA clock Programmable forward or reverse FFT
10 CTsim MAP Implementation 8P (Pixel Sum) 2+2 OBM 8 pixels Filtered Datasets 8 arrays 8 datasets 8 sum 8 pixels 2+2 OBM
11 CTsim MAP Implementation 8P (Source Code)
12 CTsim MAP Implementation 8P Results FBP MAP Function Time (ms) % of total Type Initialization 3.7 2% Data movement Filter % Calculation Filtered Dataset % Calculation Pixel Sum % Calculation Image Transfer % Data movement total % CPU CPU (s) MAP (s) Speedup AMD Intel
13 CTsim MAP Implementation 8P Timing Initialization Filter Filtered Dataset Pixel Sum Image Transfer
14 CTsim MAP Implementation 8P Device Utilization Single Altera Stratix II 2S180 FPGA ALUTs: 66,416 / 143,520 ( 46 % ) Registers: 92,818 / 143,520 ( 65 % ) M512 rams: 211 / 930 ( 23 % ) M4K rams: 704 / 768 ( 92 % ) M-RAMs: 0 / 9 ( 0 % ) DSP blocks: 408 / 768 ( 53 % )
15 CTsim MAP Implementation 8P Summary 29x performance 1024x1024 SPFP image reconstruction Interesting to medical equipment manufacturers Not compelling yet, even with higher resolution Some manufacturers express disbelief ~60% single FPGA resource utilization Summing all data projections over all pixels is computationally intensive
16 CTsim MAP Implementations Next Steps Precalculating constants, stream x-ray data Requires 1.7 GB storage for current parameters MAP OBM: 64 MB, 19.2 GB/s bandwidth (16 words/clock) MAP CM, 2 GB, 7.2 GB/s bandwidth (8 words/clock) Unspeakable performance 16P Implementation Predict 54x performance Implementation had ~120% device utilization Back to Fourth Grade: Multiplication gets bigger faster than addition Pixel summation operation has 2 independent steps 4Px4P Implementation Predict 54x performance Implementation had ~105% device utilization
17 CTsim MAP Implementation Potential Steps Two FPGA Series H MAP One FPGA precalculates constants, second calculates Use all 16 OBMs for FPGA-FPGA bridge (19.2 GB/s) Too cost sensitive? Are margins really that small? Find out real equipment parameters 512x512 image in 5 minutes typical? Maybe even work with real equipment? Examine 3D CT image reconstruction Real-time 3D CT scanning?
18 Contact Information David Pointer SRC Computers, Inc.
high performance medical reconstruction using stream programming paradigms
high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming
More informationGPU implementation for rapid iterative image reconstruction algorithm
GPU implementation for rapid iterative image reconstruction algorithm and its applications in nuclear medicine Jakub Pietrzak Krzysztof Kacperski Department of Medical Physics, Maria Skłodowska-Curie Memorial
More informationCUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging
CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging Saoni Mukherjee, Nicholas Moore, James Brock and Miriam Leeser September 12, 2012 1 Outline Introduction to CT Scan, 3D reconstruction
More informationIMPLICIT+EXPLICIT Architecture
IMPLICIT+EXPLICIT Architecture Fortran Carte Programming Environment C Implicitly Controlled Device Dense logic device Typically fixed logic µp, DSP, ASIC, etc. Implicit Device Explicit Device Explicitly
More informationAccelerated C-arm Reconstruction by Out-of-Projection Prediction
Accelerated C-arm Reconstruction by Out-of-Projection Prediction Hannes G. Hofmann, Benjamin Keck, Joachim Hornegger Pattern Recognition Lab, University Erlangen-Nuremberg hannes.hofmann@informatik.uni-erlangen.de
More informationA C-to-FPGA Solution for Accelerating Tomographic Reconstruction
A C-to-FPGA Solution for Accelerating Tomographic Reconstruction Nikhil Subramanian A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering
More informationSRC MAPstation Image Processing: Edge Detection
SRC MAPstation Image Processing: Edge Detection David Caliga, Director Software Applications SRC Computers, Inc. dcaliga@srccomputers.com Motivations The purpose of detecting sharp changes in image brightness
More informationA Multi-Tiered Optimization Framework for Heterogeneous Computing
A Multi-Tiered Optimization Framework for Heterogeneous Computing IEEE HPEC 2014 Alan George Professor of ECE University of Florida Herman Lam Assoc. Professor of ECE University of Florida Andrew Milluzzi
More informationTracking Acceleration with FPGAs. Future Tracking, CMS Week 4/12/17 Sioni Summers
Tracking Acceleration with FPGAs Future Tracking, CMS Week 4/12/17 Sioni Summers Contents Introduction FPGAs & 'DataFlow Engines' for computing Device architecture Maxeler HLT Tracking Acceleration 2 Introduction
More informationFrequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM
More informationField Programmable Gate Array (FPGA) Devices
Field Programmable Gate Array (FPGA) Devices 1 Contents Altera FPGAs and CPLDs CPLDs FPGAs with embedded processors ACEX FPGAs Cyclone I,II FPGAs APEX FPGAs Stratix FPGAs Stratix II,III FPGAs Xilinx FPGAs
More informationHardware Oriented Security
1 / 20 Hardware Oriented Security SRC-7 Programming Basics and Pipelining Miaoqing Huang University of Arkansas Fall 2014 2 / 20 Outline Basics of SRC-7 Programming Pipelining 3 / 20 Framework of Program
More informationComputation of Inverse Radon Transform on Graphics Card
Computation of Inverse Radon Transform on Graphics Card Vítězslav Vít VLČEK WSEAS Corfu, Greece 2005 University of West Bohemia Faculty of Applied Sciences Vítězslav Vít VLČEK Computation of Filtered Back
More informationGPU-Based Acceleration for CT Image Reconstruction
GPU-Based Acceleration for CT Image Reconstruction Xiaodong Yu Advisor: Wu-chun Feng Collaborators: Guohua Cao, Hao Gong Outline Introduction and Motivation Background Knowledge Challenges and Proposed
More informationAffordable and power efficient computing for high energy physics: CPU and FFT benchmarks of ARM processors
Affordable and power efficient computing for high energy physics: CPU and FFT benchmarks of ARM processors Mitchell A Cox, Robert Reed and Bruce Mellado School of Physics, University of the Witwatersrand.
More informationX-ray imaging software tools for HPC clusters and the Cloud
X-ray imaging software tools for HPC clusters and the Cloud Darren Thompson Application Support Specialist 9 October 2012 IM&T ADVANCED SCIENTIFIC COMPUTING NeAT Remote CT & visualisation project Aim:
More informationImage Reconstruction from Projection
Image Reconstruction from Projection Reconstruct an image from a series of projections X-ray computed tomography (CT) Computed tomography is a medical imaging method employing tomography where digital
More informationIntroduction to Microprocessor
Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device
More informationX-TRACT: software for simulation and reconstruction of X-ray phase-contrast CT
X-TRACT: software for simulation and reconstruction of X-ray phase-contrast CT T.E.Gureyev, Ya.I.Nesterets, S.C.Mayo, A.W.Stevenson, D.M.Paganin, G.R.Myers and S.W.Wilkins CSIRO Materials Science and Engineering
More informationFFT MegaCore Function User Guide
FFT MegaCore Function User Guide 101 Innovation Drive San Jose, CA 95134 www.altera.com MegaCore Version: 11.0 Document Date: May 2011 Copyright 2011 Altera Corporation. All rights reserved. Altera, The
More informationMedical Image Reconstruction Term II 2012 Topic 6: Tomography
Medical Image Reconstruction Term II 2012 Topic 6: Tomography Professor Yasser Mostafa Kadah Tomography The Greek word tomos means a section, a slice, or a cut. Tomography is the process of imaging a cross
More informationQsys and IP Core Integration
Qsys and IP Core Integration Stephen A. Edwards (after David Lariviere) Columbia University Spring 2016 IP Cores Altera s IP Core Integration Tools Connecting IP Cores IP Cores Cyclone V SoC: A Mix of
More informationEmbedded Computing Platform. Architecture and Instruction Set
Embedded Computing Platform Microprocessor: Architecture and Instruction Set Ingo Sander ingo@kth.se Microprocessor A central part of the embedded platform A platform is the basic hardware and software
More informationCS 179: GPU Programming. Lecture 12 / Homework 4
CS 179: GPU Programming Lecture 12 / Homework 4 Admin Lab 4 is out Due Wednesday, April 27 @3pm Come to OH this week, this set is more difficult than before. Breadth-First Search Given source vertex S:
More informationATS-GPU Real Time Signal Processing Software
Transfer A/D data to at high speed Up to 4 GB/s transfer rate for PCIe Gen 3 digitizer boards Supports CUDA compute capability 2.0+ Designed to work with AlazarTech PCI Express waveform digitizers Optional
More informationComputer-Tomography I: Principles, History, Technology
Computer-Tomography I: Principles, History, Technology Prof. Dr. U. Oelfke DKFZ Heidelberg Department of Medical Physics (E040) Im Neuenheimer Feld 280 69120 Heidelberg, Germany u.oelfke@dkfz.de History
More informationA FPGA Hardware Solution for Accelerating Tomographic Reconstruction
0 A FPGA Hardware Solution for Accelerating Tomographic Reconstruction Jimmy Xu A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering
More informationEnergy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package
High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction
More informationHardware Sizing Guide OV
Hardware Sizing Guide OV3600 6.3 www.alcatel-lucent.com/enterprise Part Number: 0510620-01 Table of Contents Table of Contents... 2 Overview... 3 Properly Sizing Processing and for your OV3600 Server...
More informationA Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT
A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT Daniel Schlifske ab and Henry Medeiros a a Marquette University, 1250 W Wisconsin Ave, Milwaukee,
More informationAN 464: DFT/IDFT Reference Design
Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents About the DFT/IDFT Reference Design... 3 Functional Description for the DFT/IDFT Reference Design... 4 Parameters for the
More informationReconstruction from Projections
Reconstruction from Projections M.C. Villa Uriol Computational Imaging Lab email: cruz.villa@upf.edu web: http://www.cilab.upf.edu Based on SPECT reconstruction Martin Šámal Charles University Prague,
More information24K FFT for 3GPP LTE RACH Detection
24K FFT for GPP LTE RACH Detection ovember 2008, version 1.0 Application ote 515 Introduction In GPP Long Term Evolution (LTE), the user equipment (UE) transmits a random access channel (RACH) on the uplink
More informationLecture 1: Gentle Introduction to GPUs
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed
More informationFFT MegaCore Function User Guide
FFT MegaCore Function User Guide 101 Innovation Drive San Jose, CA 95134 www.altera.com MegaCore Version: 8.1 Document Date: November 2008 Copyright 2008 Altera Corporation. All rights reserved. Altera,
More informationInternational IEEE Symposium on Field-Programmable Custom Computing Machines
- International IEEE Symposium on ield-programmable Custom Computing Machines Scalable Streaming-Array of Simple Soft-Processors for Stencil Computations with Constant Bandwidth Kentaro Sano Yoshiaki Hatsuda
More informationGE s Revolution CT MATLAB III: CT. Kathleen Chen March 20, 2018
GE s Revolution CT MATLAB III: CT Kathleen Chen chens18@rpi.edu March 20, 2018 https://www.zmescience.com/medicine/inside-human-body-real-time-gifs-demo-power-ct-scan/ Reminders Make sure you have MATLAB
More informationDistributed Vision Processing in Smart Camera Networks
Distributed Vision Processing in Smart Camera Networks CVPR-07 Hamid Aghajan, Stanford University, USA François Berry, Univ. Blaise Pascal, France Horst Bischof, TU Graz, Austria Richard Kleihorst, NXP
More informationExploring the Effects of Hyperthreading on Scientific Applications
Exploring the Effects of Hyperthreading on Scientific Applications by Kent Milfeld milfeld@tacc.utexas.edu edu Kent Milfeld, Chona Guiang, Avijit Purkayastha, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER
More informationLec 25: Parallel Processors. Announcements
Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza
More informationRapid CT reconstruction on GPU-enabled HPC clusters
19th International Congress on Modelling and Simulation, Perth, Australia, 12 16 December 2011 http://mssanz.org.au/modsim2011 Rapid CT reconstruction on GPU-enabled HPC clusters D. Thompson a, Ya. I.
More informationComputer-Tomography II: Image reconstruction and applications
Computer-Tomography II: Image reconstruction and applications Prof. Dr. U. Oelfke DKFZ Heidelberg Department of Medical Physics (E040) Im Neuenheimer Feld 280 69120 Heidelberg, Germany u.oelfke@dkfz.de
More informationAltera SDK for OpenCL
Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group
More informationFlexible Architecture Research Machine (FARM)
Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense
More informationGeneral Purpose Signal Processors
General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:
More informationA Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications Jeremy Fowers, Greg Brown, Patrick Cooke, Greg Stitt University of Florida Department of Electrical and
More informationComparison of High-Speed Ray Casting on GPU
Comparison of High-Speed Ray Casting on GPU using CUDA and OpenGL November 8, 2008 NVIDIA 1,2, Andreas Weinlich 1, Holger Scherl 2, Markus Kowarschik 2 and Joachim Hornegger 1 1 Chair of Pattern Recognition
More informationA Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering
A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering HPRCTA 2010 Stefan Craciun Dr. Alan D. George Dr. Herman Lam Dr. Jose C. Principe November 14, 2010 NSF CHREC Center ECE Department,
More informationIntel HLS Compiler: Fast Design, Coding, and Hardware
white paper Intel HLS Compiler Intel HLS Compiler: Fast Design, Coding, and Hardware The Modern FPGA Workflow Authors Melissa Sussmann HLS Product Manager Intel Corporation Tom Hill OpenCL Product Manager
More informationField Program mable Gate Arrays
Field Program mable Gate Arrays M andakini Patil E H E P g r o u p D H E P T I F R SERC school NISER, Bhubaneshwar Nov 7-27 2017 Outline Digital electronics Short history of programmable logic devices
More informationFlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com
FlexRIO FPGAs Bringing Custom Functionality to Instruments Ravichandran Raghavan Technical Marketing Engineer Electrical Test Today Acquire, Transfer, Post-Process Paradigm Fixed- Functionality Triggers
More informationPET Image Reconstruction Cluster at Turku PET Centre
PET Image Reconstruction Cluster at Turku PET Centre J. Johansson Turku PET Centre University of Turku TPC Scientific Seminar Series, 2005 J. Johansson (Turku PET Centre) TPC 2005-02-21 1 / 15 Outline
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationFAST FIR FILTERS FOR SIMD PROCESSORS WITH LIMITED MEMORY BANDWIDTH
Key words: Digital Signal Processing, FIR filters, SIMD processors, AltiVec. Grzegorz KRASZEWSKI Białystok Technical University Department of Electrical Engineering Wiejska
More informationSPECT reconstruction
Regional Training Workshop Advanced Image Processing of SPECT Studies Tygerberg Hospital, 19-23 April 2004 SPECT reconstruction Martin Šámal Charles University Prague, Czech Republic samal@cesnet.cz Tomography
More informationData Storage and Query Answering. Data Storage and Disk Structure (2)
Data Storage and Query Answering Data Storage and Disk Structure (2) Review: The Memory Hierarchy Swapping, Main-memory DBMS s Tertiary Storage: Tape, Network Backup 3,200 MB/s (DDR-SDRAM @200MHz) 6,400
More informationAscenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005
Ascenium: A Continuously Reconfigurable Architecture Robert Mykland Founder/CTO robert@ascenium.com August, 2005 Ascenium: A Continuously Reconfigurable Processor Continuously reconfigurable approach provides:
More informationReconstruction methods for sparse-data tomography
Reconstruction methods for sparse-data tomography Part B: filtered back-projection Samuli Siltanen Department of Mathematics and Statistics University of Helsinki, Finland samuli.siltanen@helsinki.fi www.siltanen-research.net
More informationFFT MegaCore Function User Guide
FFT MegaCore Function User Guide 101 Innovation Drive San Jose, CA 95134 www.altera.com MegaCore Version: 8.0 Document Date: May 2008 Copyright 2008 Altera Corporation. All rights reserved. Altera, The
More informationHigh-performance tomographic reconstruction using graphics processing units
18 th World IMACS / MODSIM Congress, Cairns, Australia 13-17 July 29 http://mssanz.org.au/modsim9 High-performance tomographic reconstruction using graphics processing units Ya.I. esterets and T.E. Gureyev
More informationIntroduction to HPC. Lecture 21
443 Introduction to HPC Lecture Dept of Computer Science 443 Fast Fourier Transform 443 FFT followed by Inverse FFT DIF DIT Use inverse twiddles for the inverse FFT No bitreversal necessary! 443 FFT followed
More informationAdvanced Computing Research Laboratory. Adaptive Scientific Software Libraries
Adaptive Scientific Software Libraries and Texas Learning and Computation Center and Department of Computer Science University of Houston Challenges Diversity of execution environments Growing complexity
More informationMy 2 hours today: 1. Efficient arithmetic in finite fields minute break 3. Elliptic curves. My 2 hours tomorrow:
My 2 hours today: 1. Efficient arithmetic in finite fields 2. 10-minute break 3. Elliptic curves My 2 hours tomorrow: 4. Efficient arithmetic on elliptic curves 5. 10-minute break 6. Choosing curves Efficient
More informationRadon Transform and Filtered Backprojection
Radon Transform and Filtered Backprojection Jørgen Arendt Jensen October 13, 2016 Center for Fast Ultrasound Imaging, Build 349 Department of Electrical Engineering Center for Fast Ultrasound Imaging Department
More informationDEVELOPMENT OF CONE BEAM TOMOGRAPHIC RECONSTRUCTION SOFTWARE MODULE
Rajesh et al. : Proceedings of the National Seminar & Exhibition on Non-Destructive Evaluation DEVELOPMENT OF CONE BEAM TOMOGRAPHIC RECONSTRUCTION SOFTWARE MODULE Rajesh V Acharya, Umesh Kumar, Gursharan
More informationRobert Jamieson. Robs Techie PP Everything in this presentation is at your own risk!
Robert Jamieson Robs Techie PP Everything in this presentation is at your own risk! PC s Today Basic Setup Hardware pointers PCI Express How will it effect you Basic Machine Setup Set the swap space Min
More informationComputer Architecture. Introduction. Lynn Choi Korea University
Computer Architecture Introduction Lynn Choi Korea University Class Information Lecturer Prof. Lynn Choi, School of Electrical Eng. Phone: 3290-3249, 공학관 411, lchoi@korea.ac.kr, TA: 윤창현 / 신동욱, 3290-3896,
More informationProgrammable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures
Programmable Logic Design Grzegorz Budzyń Lecture 15: Advanced hardware in FPGA structures Plan Introduction PowerPC block RocketIO Introduction Introduction The larger the logical chip, the more additional
More informationCS/EE 260. Digital Computers Organization and Logical Design
CS/EE 260. Digital Computers Organization and Logical Design David M. Zar Computer Science and Engineering Department Washington University dzar@cse.wustl.edu http://www.cse.wustl.edu/~dzar/class/260 Digital
More informationCPSC 330 Computer Organization
CPSC 33 Computer Organization Lecture 7c Memory Adapted from CS52, CS 6C and notes by Kevin Peterson and Morgan Kaufmann Publishers, Copyright 24. Improving cache performance Two ways of improving performance:
More informationH.264 AVC 4k Decoder V.1.0, 2014
SOC H.264 AVC 4k Video Decoder Datasheet System-On-Chip (SOC) Technologies 1. Key Features 1. Profile: High profile 2. Resolution: 4k (3840x2160) 3. Frame Rate: up to 60fps 4. Chroma Format: 4:2:0 or 4:2:2
More informationBiophysical Techniques (BPHS 4090/PHYS 5800)
Biophysical Techniques (BPHS 4090/PHYS 5800) Instructors: Prof. Christopher Bergevin (cberge@yorku.ca) Schedule: MWF 1:30-2:30 (CB 122) Website: http://www.yorku.ca/cberge/4090w2017.html York University
More informationCS 179: GPU Programming. Lecture 11 / Homework 4
CS 179: GPU Programming Lecture 11 / Homework 4 Breadth-First Search Given source vertex S: Find min. #edges to reach every vertex from S (Assume source is vertex 0) 0 1 1 2 2 3 Sequential pseudocode:
More informationThe S6000 Family of Processors
The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which
More information/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 3: Caching (1) Welcome!
/INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2015 - Lecture 3: Caching (1) Welcome! Today s Agenda: The Problem with Memory Cache Architectures Practical Assignment 1 INFOMOV Lecture 3 Caching
More informationHammer Slide: Work- and CPU-efficient Streaming Window Aggregation
Large-Scale Data & Systems Group Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation Georgios Theodorakis, Alexandros Koliousis, Peter Pietzuch, Holger Pirk Large-Scale Data & Systems (LSDS)
More informationCyclone III LS FPGAs Altera Corporation Public
Cyclone III LS FPGAs Introducing Cyclone III LS Devices Low power 200K LE for under 0.25 Watt TSMC 60-nm low-power (LP) process Quartus II software power-aware design flow Broadcast Industrial Military
More informationApplication Performance on Dual Processor Cluster Nodes
Application Performance on Dual Processor Cluster Nodes by Kent Milfeld milfeld@tacc.utexas.edu edu Avijit Purkayastha, Kent Milfeld, Chona Guiang, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER Thanks Newisys
More informationCorso di laurea in Fisica A.A Fisica Medica 4 TC
Corso di laurea in Fisica A.A. 2007-2008 Fisica Medica 4 TC Computed Tomography Principles 1. Projection measurement 2. Scanner systems 3. Scanning modes Basic Tomographic Principle The internal structure
More informationAccelerating the Pulsar Search Pipeline with FPGAs, Programmed in OpenCL
Accelerating the Pulsar Search Pipeline with FPGAs, Programmed in OpenCL Oliver Sinnen, Tyrone Sherwin, and Haomiao Wang & Prabu Thiagaraj (Manchester Uni/Raman Research Institute, Bangalore) Parallel
More informationAn Overview of a Compiler for Mapping MATLAB Programs onto FPGAs
An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs P. Banerjee Department of Electrical and Computer Engineering Northwestern University 2145 Sheridan Road, Evanston, IL-60208 banerjee@ece.northwestern.edu
More informationTomographic Reconstruction
Tomographic Reconstruction 3D Image Processing Torsten Möller Reading Gonzales + Woods, Chapter 5.11 2 Overview Physics History Reconstruction basic idea Radon transform Fourier-Slice theorem (Parallel-beam)
More informationWhite Paper. Floating-Point FFT Processor (IEEE 754 Single Precision) Radix 2 Core. Introduction. Parameters & Ports
White Paper Introduction Floating-Point FFT Processor (IEEE 754 Single Precision) Radix 2 Core The floating-point fast fourier transform (FFT) processor calculates FFTs with IEEE 754 single precision (1
More informationPerformance comparison between a massive SMP machine and clusters
Performance comparison between a massive SMP machine and clusters Martin Scarcia, Stefano Alberto Russo Sissa/eLab joint Democritos/Sissa Laboratory for e-science Via Beirut 2/4 34151 Trieste, Italy Stefano
More informationCOMP375 Practice Final Exam
You are allowed one and only one 8½ by 11 inch page of notes during this exam. You are not allowed to use more than 187 square inches of paper surface to hold your notes. Telephone calls and texting are
More informationHISTORY OF MICROPROCESSORS
HISTORY OF MICROPROCESSORS CONTENTS Introduction 4-Bit Microprocessors 8-Bit Microprocessors 16-Bit Microprocessors 1 32-Bit Microprocessors 64-Bit Microprocessors 2 INTRODUCTION Fairchild Semiconductors
More information/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 3: Caching (1) Welcome!
/INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2017 - Lecture 3: Caching (1) Welcome! Today s Agenda: The Problem with Memory Cache Architectures Practical Assignment 1 INFOMOV Lecture 3 Caching
More informationCOMPILED HARDWARE ACCELERATION OF MOLECULAR DYNAMICS CODE. Jason Villarreal and Walid A. Najjar
COMPILED HARDWARE ACCELERATION OF MOLECULAR DYNAMICS CODE Jason Villarreal and Walid A. Najjar Department of Computer Science and Engineering University of California, Riverside villarre, najjar@cs.ucr.edu
More informationEfficient Data Structures for the Fast 3D Reconstruction of Voxel Volumes with Inhomogeneous Spatial Resolution
Efficient Data Structures for the Fast 3D Reconstruction of Voxel Volumes with Inhomogeneous Spatial Resolution Benjamin Betz 1, Steffen Kieß 1, Michael Krumm 2, Gunnar Knupe 2, Tsegaye Eshete 2, Sven
More informationCo-synthesis and Accelerator based Embedded System Design
Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer
More informationSYSTEM BUS AND MOCROPROCESSORS HISTORY
SYSTEM BUS AND MOCROPROCESSORS HISTORY Dr. M. Hebaishy momara@su.edu.sa http://colleges.su.edu.sa/dawadmi/fos/pages/hebaishy.aspx Digital Logic Design Ch1-1 SYSTEM BUS The CPU sends various data values,
More informationX-Stream II. Processing Method. Operating System. Hardware Performance. Elements of Processing Speed TECHNICAL BRIEF
X-Stream II Peter J. Pupalaikis Principal Technologist September 2, 2010 Summary This paper explains how X- Stream II techonlogy improves the speed and responsiveness of LeCroy oscilloscopes. TECHNICAL
More informationOn the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
On the Efficacy of a Fued CPU+GPU Proceor (or APU) for Parallel Computing Mayank Daga, Ahwin M. Aji, and Wu-chun Feng Dept. of Computer Science Sampling of field that ue GPU Mac OS X Comology Molecular
More informationDesign Once with Design Compiler FPGA
Design Once with Design Compiler FPGA The Best Solution for ASIC Prototyping Synopsys Inc. Agenda Prototyping Challenges Design Compiler FPGA Overview Flexibility in Design Using DC FPGA and Altera Devices
More informationHardware/Software Co-Design
1 / 27 Hardware/Software Co-Design Miaoqing Huang University of Arkansas Fall 2011 2 / 27 Outline 1 2 3 3 / 27 Outline 1 2 3 CSCE 5013-002 Speical Topic in Hardware/Software Co-Design Instructor Miaoqing
More informationHardware Acceleration of Pulsar Search on FPGAs using OpenCL
Hardware Acceleration of Pulsar Search on FPGAs using OpenCL Oliver Sinnen Haomiao Wang & Prabu Thiagaraj (Manchester Uni) Parallel and Reconfigurable Computing Department of Electrical and Computer Engineering
More informationHardware and Software Architecture. Chapter 2
Hardware and Software Architecture Chapter 2 1 Basic Components The x86 processor communicates with main memory and I/O devices via buses Data bus for transferring data Address bus for the address of a
More informationRUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch
RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,
More informationContinuous and Discrete Image Reconstruction
25 th SSIP Summer School on Image Processing 17 July 2017, Novi Sad, Serbia Continuous and Discrete Image Reconstruction Péter Balázs Department of Image Processing and Computer Graphics University of
More informationAlgebraic Iterative Methods for Computed Tomography
Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic
More information