Computed Tomography (CT) Scan Image Reconstruction on the SRC-7 David Pointer SRC Computers, Inc.

Size: px
Start display at page:

Download "Computed Tomography (CT) Scan Image Reconstruction on the SRC-7 David Pointer SRC Computers, Inc."

Transcription

1 Computed Tomography (CT) Scan Image Reconstruction on the SRC-7 David Pointer SRC Computers, Inc.

2 CT Image Reconstruction Herman Head Sinogram Herman Head Reconstruction

3 CT Image Reconstruction for all detectors and projections for all pixels initialize filter data sum filtered data into pixel reconstructed Image FFT data *= coefficient IFFT

4 CT Scanner Equipment Today All scanner parameters fixed by manufacturer Typical reconstructed image parameters 512x512 grayscale image resolution 5 minute image reconstruction Equipment costly Closed, proprietary systems Data type unknown Processor type(s) unknown Equipment design life cycle 7-10 years typical No need for speed? Algorithm patents say yes.

5 CTsim Application CTsim Open source CT simulator: Dr. Kevin Rosenberg, M.D. All scanner parameters programmable Written in C++ Selected application parameters 165 detectors 180 projections (views) 1024x1024 reconstructed grayscale image Single precision floating point (SPFP) calculations CTsim FBP CPU Execution Time (fftw) AMD Opteron, 2200 MHz, 1024 KB cache: seconds Intel Xeon 3000 MHz, 2048 KB cache: seconds

6 CTsim Application filtered backprojection FFT Multiply Inverse FFT Pixel Summation Projection Datasets Filter Coefficients Image Display Microprocessor

7 CTsim Application Partition 8P SRC-7 Series H MAP Processor filtered backprojection FFT Multiply Inverse FFT Pixel Summation Projection Datasets Filter Coefficients Image Display Microprocessor Parallel summation for 8 Pixels (8P)

8 CTsim MAP Implementation 8P (Filter) MAP OBM FPGA RAM Projection Datasets 4 OBM F F T FFT Twiddle Table Filter Coefficients 4 mults Filter Filtered Projection Datasets 4 OBM I F F T Filtered Projection Datasets 4 OBM Filtered Datasets 8 arrays

9 CTsim MAP Implementation cfft_fp32() FFT Macro SRC s Signal Processing library macro Programmable point size 256 to complex SPFP input/output per FPGA clock Programmable forward or reverse FFT

10 CTsim MAP Implementation 8P (Pixel Sum) 2+2 OBM 8 pixels Filtered Datasets 8 arrays 8 datasets 8 sum 8 pixels 2+2 OBM

11 CTsim MAP Implementation 8P (Source Code)

12 CTsim MAP Implementation 8P Results FBP MAP Function Time (ms) % of total Type Initialization 3.7 2% Data movement Filter % Calculation Filtered Dataset % Calculation Pixel Sum % Calculation Image Transfer % Data movement total % CPU CPU (s) MAP (s) Speedup AMD Intel

13 CTsim MAP Implementation 8P Timing Initialization Filter Filtered Dataset Pixel Sum Image Transfer

14 CTsim MAP Implementation 8P Device Utilization Single Altera Stratix II 2S180 FPGA ALUTs: 66,416 / 143,520 ( 46 % ) Registers: 92,818 / 143,520 ( 65 % ) M512 rams: 211 / 930 ( 23 % ) M4K rams: 704 / 768 ( 92 % ) M-RAMs: 0 / 9 ( 0 % ) DSP blocks: 408 / 768 ( 53 % )

15 CTsim MAP Implementation 8P Summary 29x performance 1024x1024 SPFP image reconstruction Interesting to medical equipment manufacturers Not compelling yet, even with higher resolution Some manufacturers express disbelief ~60% single FPGA resource utilization Summing all data projections over all pixels is computationally intensive

16 CTsim MAP Implementations Next Steps Precalculating constants, stream x-ray data Requires 1.7 GB storage for current parameters MAP OBM: 64 MB, 19.2 GB/s bandwidth (16 words/clock) MAP CM, 2 GB, 7.2 GB/s bandwidth (8 words/clock) Unspeakable performance 16P Implementation Predict 54x performance Implementation had ~120% device utilization Back to Fourth Grade: Multiplication gets bigger faster than addition Pixel summation operation has 2 independent steps 4Px4P Implementation Predict 54x performance Implementation had ~105% device utilization

17 CTsim MAP Implementation Potential Steps Two FPGA Series H MAP One FPGA precalculates constants, second calculates Use all 16 OBMs for FPGA-FPGA bridge (19.2 GB/s) Too cost sensitive? Are margins really that small? Find out real equipment parameters 512x512 image in 5 minutes typical? Maybe even work with real equipment? Examine 3D CT image reconstruction Real-time 3D CT scanning?

18 Contact Information David Pointer SRC Computers, Inc.

high performance medical reconstruction using stream programming paradigms

high performance medical reconstruction using stream programming paradigms high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming

More information

GPU implementation for rapid iterative image reconstruction algorithm

GPU implementation for rapid iterative image reconstruction algorithm GPU implementation for rapid iterative image reconstruction algorithm and its applications in nuclear medicine Jakub Pietrzak Krzysztof Kacperski Department of Medical Physics, Maria Skłodowska-Curie Memorial

More information

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging Saoni Mukherjee, Nicholas Moore, James Brock and Miriam Leeser September 12, 2012 1 Outline Introduction to CT Scan, 3D reconstruction

More information

IMPLICIT+EXPLICIT Architecture

IMPLICIT+EXPLICIT Architecture IMPLICIT+EXPLICIT Architecture Fortran Carte Programming Environment C Implicitly Controlled Device Dense logic device Typically fixed logic µp, DSP, ASIC, etc. Implicit Device Explicit Device Explicitly

More information

Accelerated C-arm Reconstruction by Out-of-Projection Prediction

Accelerated C-arm Reconstruction by Out-of-Projection Prediction Accelerated C-arm Reconstruction by Out-of-Projection Prediction Hannes G. Hofmann, Benjamin Keck, Joachim Hornegger Pattern Recognition Lab, University Erlangen-Nuremberg hannes.hofmann@informatik.uni-erlangen.de

More information

A C-to-FPGA Solution for Accelerating Tomographic Reconstruction

A C-to-FPGA Solution for Accelerating Tomographic Reconstruction A C-to-FPGA Solution for Accelerating Tomographic Reconstruction Nikhil Subramanian A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering

More information

SRC MAPstation Image Processing: Edge Detection

SRC MAPstation Image Processing: Edge Detection SRC MAPstation Image Processing: Edge Detection David Caliga, Director Software Applications SRC Computers, Inc. dcaliga@srccomputers.com Motivations The purpose of detecting sharp changes in image brightness

More information

A Multi-Tiered Optimization Framework for Heterogeneous Computing

A Multi-Tiered Optimization Framework for Heterogeneous Computing A Multi-Tiered Optimization Framework for Heterogeneous Computing IEEE HPEC 2014 Alan George Professor of ECE University of Florida Herman Lam Assoc. Professor of ECE University of Florida Andrew Milluzzi

More information

Tracking Acceleration with FPGAs. Future Tracking, CMS Week 4/12/17 Sioni Summers

Tracking Acceleration with FPGAs. Future Tracking, CMS Week 4/12/17 Sioni Summers Tracking Acceleration with FPGAs Future Tracking, CMS Week 4/12/17 Sioni Summers Contents Introduction FPGAs & 'DataFlow Engines' for computing Device architecture Maxeler HLT Tracking Acceleration 2 Introduction

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

Field Programmable Gate Array (FPGA) Devices

Field Programmable Gate Array (FPGA) Devices Field Programmable Gate Array (FPGA) Devices 1 Contents Altera FPGAs and CPLDs CPLDs FPGAs with embedded processors ACEX FPGAs Cyclone I,II FPGAs APEX FPGAs Stratix FPGAs Stratix II,III FPGAs Xilinx FPGAs

More information

Hardware Oriented Security

Hardware Oriented Security 1 / 20 Hardware Oriented Security SRC-7 Programming Basics and Pipelining Miaoqing Huang University of Arkansas Fall 2014 2 / 20 Outline Basics of SRC-7 Programming Pipelining 3 / 20 Framework of Program

More information

Computation of Inverse Radon Transform on Graphics Card

Computation of Inverse Radon Transform on Graphics Card Computation of Inverse Radon Transform on Graphics Card Vítězslav Vít VLČEK WSEAS Corfu, Greece 2005 University of West Bohemia Faculty of Applied Sciences Vítězslav Vít VLČEK Computation of Filtered Back

More information

GPU-Based Acceleration for CT Image Reconstruction

GPU-Based Acceleration for CT Image Reconstruction GPU-Based Acceleration for CT Image Reconstruction Xiaodong Yu Advisor: Wu-chun Feng Collaborators: Guohua Cao, Hao Gong Outline Introduction and Motivation Background Knowledge Challenges and Proposed

More information

Affordable and power efficient computing for high energy physics: CPU and FFT benchmarks of ARM processors

Affordable and power efficient computing for high energy physics: CPU and FFT benchmarks of ARM processors Affordable and power efficient computing for high energy physics: CPU and FFT benchmarks of ARM processors Mitchell A Cox, Robert Reed and Bruce Mellado School of Physics, University of the Witwatersrand.

More information

X-ray imaging software tools for HPC clusters and the Cloud

X-ray imaging software tools for HPC clusters and the Cloud X-ray imaging software tools for HPC clusters and the Cloud Darren Thompson Application Support Specialist 9 October 2012 IM&T ADVANCED SCIENTIFIC COMPUTING NeAT Remote CT & visualisation project Aim:

More information

Image Reconstruction from Projection

Image Reconstruction from Projection Image Reconstruction from Projection Reconstruct an image from a series of projections X-ray computed tomography (CT) Computed tomography is a medical imaging method employing tomography where digital

More information

Introduction to Microprocessor

Introduction to Microprocessor Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device

More information

X-TRACT: software for simulation and reconstruction of X-ray phase-contrast CT

X-TRACT: software for simulation and reconstruction of X-ray phase-contrast CT X-TRACT: software for simulation and reconstruction of X-ray phase-contrast CT T.E.Gureyev, Ya.I.Nesterets, S.C.Mayo, A.W.Stevenson, D.M.Paganin, G.R.Myers and S.W.Wilkins CSIRO Materials Science and Engineering

More information

FFT MegaCore Function User Guide

FFT MegaCore Function User Guide FFT MegaCore Function User Guide 101 Innovation Drive San Jose, CA 95134 www.altera.com MegaCore Version: 11.0 Document Date: May 2011 Copyright 2011 Altera Corporation. All rights reserved. Altera, The

More information

Medical Image Reconstruction Term II 2012 Topic 6: Tomography

Medical Image Reconstruction Term II 2012 Topic 6: Tomography Medical Image Reconstruction Term II 2012 Topic 6: Tomography Professor Yasser Mostafa Kadah Tomography The Greek word tomos means a section, a slice, or a cut. Tomography is the process of imaging a cross

More information

Qsys and IP Core Integration

Qsys and IP Core Integration Qsys and IP Core Integration Stephen A. Edwards (after David Lariviere) Columbia University Spring 2016 IP Cores Altera s IP Core Integration Tools Connecting IP Cores IP Cores Cyclone V SoC: A Mix of

More information

Embedded Computing Platform. Architecture and Instruction Set

Embedded Computing Platform. Architecture and Instruction Set Embedded Computing Platform Microprocessor: Architecture and Instruction Set Ingo Sander ingo@kth.se Microprocessor A central part of the embedded platform A platform is the basic hardware and software

More information

CS 179: GPU Programming. Lecture 12 / Homework 4

CS 179: GPU Programming. Lecture 12 / Homework 4 CS 179: GPU Programming Lecture 12 / Homework 4 Admin Lab 4 is out Due Wednesday, April 27 @3pm Come to OH this week, this set is more difficult than before. Breadth-First Search Given source vertex S:

More information

ATS-GPU Real Time Signal Processing Software

ATS-GPU Real Time Signal Processing Software Transfer A/D data to at high speed Up to 4 GB/s transfer rate for PCIe Gen 3 digitizer boards Supports CUDA compute capability 2.0+ Designed to work with AlazarTech PCI Express waveform digitizers Optional

More information

Computer-Tomography I: Principles, History, Technology

Computer-Tomography I: Principles, History, Technology Computer-Tomography I: Principles, History, Technology Prof. Dr. U. Oelfke DKFZ Heidelberg Department of Medical Physics (E040) Im Neuenheimer Feld 280 69120 Heidelberg, Germany u.oelfke@dkfz.de History

More information

A FPGA Hardware Solution for Accelerating Tomographic Reconstruction

A FPGA Hardware Solution for Accelerating Tomographic Reconstruction 0 A FPGA Hardware Solution for Accelerating Tomographic Reconstruction Jimmy Xu A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering

More information

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction

More information

Hardware Sizing Guide OV

Hardware Sizing Guide OV Hardware Sizing Guide OV3600 6.3 www.alcatel-lucent.com/enterprise Part Number: 0510620-01 Table of Contents Table of Contents... 2 Overview... 3 Properly Sizing Processing and for your OV3600 Server...

More information

A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT

A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT Daniel Schlifske ab and Henry Medeiros a a Marquette University, 1250 W Wisconsin Ave, Milwaukee,

More information

AN 464: DFT/IDFT Reference Design

AN 464: DFT/IDFT Reference Design Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents About the DFT/IDFT Reference Design... 3 Functional Description for the DFT/IDFT Reference Design... 4 Parameters for the

More information

Reconstruction from Projections

Reconstruction from Projections Reconstruction from Projections M.C. Villa Uriol Computational Imaging Lab email: cruz.villa@upf.edu web: http://www.cilab.upf.edu Based on SPECT reconstruction Martin Šámal Charles University Prague,

More information

24K FFT for 3GPP LTE RACH Detection

24K FFT for 3GPP LTE RACH Detection 24K FFT for GPP LTE RACH Detection ovember 2008, version 1.0 Application ote 515 Introduction In GPP Long Term Evolution (LTE), the user equipment (UE) transmits a random access channel (RACH) on the uplink

More information

Lecture 1: Gentle Introduction to GPUs

Lecture 1: Gentle Introduction to GPUs CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed

More information

FFT MegaCore Function User Guide

FFT MegaCore Function User Guide FFT MegaCore Function User Guide 101 Innovation Drive San Jose, CA 95134 www.altera.com MegaCore Version: 8.1 Document Date: November 2008 Copyright 2008 Altera Corporation. All rights reserved. Altera,

More information

International IEEE Symposium on Field-Programmable Custom Computing Machines

International IEEE Symposium on Field-Programmable Custom Computing Machines - International IEEE Symposium on ield-programmable Custom Computing Machines Scalable Streaming-Array of Simple Soft-Processors for Stencil Computations with Constant Bandwidth Kentaro Sano Yoshiaki Hatsuda

More information

GE s Revolution CT MATLAB III: CT. Kathleen Chen March 20, 2018

GE s Revolution CT MATLAB III: CT. Kathleen Chen March 20, 2018 GE s Revolution CT MATLAB III: CT Kathleen Chen chens18@rpi.edu March 20, 2018 https://www.zmescience.com/medicine/inside-human-body-real-time-gifs-demo-power-ct-scan/ Reminders Make sure you have MATLAB

More information

Distributed Vision Processing in Smart Camera Networks

Distributed Vision Processing in Smart Camera Networks Distributed Vision Processing in Smart Camera Networks CVPR-07 Hamid Aghajan, Stanford University, USA François Berry, Univ. Blaise Pascal, France Horst Bischof, TU Graz, Austria Richard Kleihorst, NXP

More information

Exploring the Effects of Hyperthreading on Scientific Applications

Exploring the Effects of Hyperthreading on Scientific Applications Exploring the Effects of Hyperthreading on Scientific Applications by Kent Milfeld milfeld@tacc.utexas.edu edu Kent Milfeld, Chona Guiang, Avijit Purkayastha, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER

More information

Lec 25: Parallel Processors. Announcements

Lec 25: Parallel Processors. Announcements Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza

More information

Rapid CT reconstruction on GPU-enabled HPC clusters

Rapid CT reconstruction on GPU-enabled HPC clusters 19th International Congress on Modelling and Simulation, Perth, Australia, 12 16 December 2011 http://mssanz.org.au/modsim2011 Rapid CT reconstruction on GPU-enabled HPC clusters D. Thompson a, Ya. I.

More information

Computer-Tomography II: Image reconstruction and applications

Computer-Tomography II: Image reconstruction and applications Computer-Tomography II: Image reconstruction and applications Prof. Dr. U. Oelfke DKFZ Heidelberg Department of Medical Physics (E040) Im Neuenheimer Feld 280 69120 Heidelberg, Germany u.oelfke@dkfz.de

More information

Altera SDK for OpenCL

Altera SDK for OpenCL Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group

More information

Flexible Architecture Research Machine (FARM)

Flexible Architecture Research Machine (FARM) Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications

A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications Jeremy Fowers, Greg Brown, Patrick Cooke, Greg Stitt University of Florida Department of Electrical and

More information

Comparison of High-Speed Ray Casting on GPU

Comparison of High-Speed Ray Casting on GPU Comparison of High-Speed Ray Casting on GPU using CUDA and OpenGL November 8, 2008 NVIDIA 1,2, Andreas Weinlich 1, Holger Scherl 2, Markus Kowarschik 2 and Joachim Hornegger 1 1 Chair of Pattern Recognition

More information

A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering

A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering HPRCTA 2010 Stefan Craciun Dr. Alan D. George Dr. Herman Lam Dr. Jose C. Principe November 14, 2010 NSF CHREC Center ECE Department,

More information

Intel HLS Compiler: Fast Design, Coding, and Hardware

Intel HLS Compiler: Fast Design, Coding, and Hardware white paper Intel HLS Compiler Intel HLS Compiler: Fast Design, Coding, and Hardware The Modern FPGA Workflow Authors Melissa Sussmann HLS Product Manager Intel Corporation Tom Hill OpenCL Product Manager

More information

Field Program mable Gate Arrays

Field Program mable Gate Arrays Field Program mable Gate Arrays M andakini Patil E H E P g r o u p D H E P T I F R SERC school NISER, Bhubaneshwar Nov 7-27 2017 Outline Digital electronics Short history of programmable logic devices

More information

FlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com

FlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com FlexRIO FPGAs Bringing Custom Functionality to Instruments Ravichandran Raghavan Technical Marketing Engineer Electrical Test Today Acquire, Transfer, Post-Process Paradigm Fixed- Functionality Triggers

More information

PET Image Reconstruction Cluster at Turku PET Centre

PET Image Reconstruction Cluster at Turku PET Centre PET Image Reconstruction Cluster at Turku PET Centre J. Johansson Turku PET Centre University of Turku TPC Scientific Seminar Series, 2005 J. Johansson (Turku PET Centre) TPC 2005-02-21 1 / 15 Outline

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

FAST FIR FILTERS FOR SIMD PROCESSORS WITH LIMITED MEMORY BANDWIDTH

FAST FIR FILTERS FOR SIMD PROCESSORS WITH LIMITED MEMORY BANDWIDTH Key words: Digital Signal Processing, FIR filters, SIMD processors, AltiVec. Grzegorz KRASZEWSKI Białystok Technical University Department of Electrical Engineering Wiejska

More information

SPECT reconstruction

SPECT reconstruction Regional Training Workshop Advanced Image Processing of SPECT Studies Tygerberg Hospital, 19-23 April 2004 SPECT reconstruction Martin Šámal Charles University Prague, Czech Republic samal@cesnet.cz Tomography

More information

Data Storage and Query Answering. Data Storage and Disk Structure (2)

Data Storage and Query Answering. Data Storage and Disk Structure (2) Data Storage and Query Answering Data Storage and Disk Structure (2) Review: The Memory Hierarchy Swapping, Main-memory DBMS s Tertiary Storage: Tape, Network Backup 3,200 MB/s (DDR-SDRAM @200MHz) 6,400

More information

Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005

Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005 Ascenium: A Continuously Reconfigurable Architecture Robert Mykland Founder/CTO robert@ascenium.com August, 2005 Ascenium: A Continuously Reconfigurable Processor Continuously reconfigurable approach provides:

More information

Reconstruction methods for sparse-data tomography

Reconstruction methods for sparse-data tomography Reconstruction methods for sparse-data tomography Part B: filtered back-projection Samuli Siltanen Department of Mathematics and Statistics University of Helsinki, Finland samuli.siltanen@helsinki.fi www.siltanen-research.net

More information

FFT MegaCore Function User Guide

FFT MegaCore Function User Guide FFT MegaCore Function User Guide 101 Innovation Drive San Jose, CA 95134 www.altera.com MegaCore Version: 8.0 Document Date: May 2008 Copyright 2008 Altera Corporation. All rights reserved. Altera, The

More information

High-performance tomographic reconstruction using graphics processing units

High-performance tomographic reconstruction using graphics processing units 18 th World IMACS / MODSIM Congress, Cairns, Australia 13-17 July 29 http://mssanz.org.au/modsim9 High-performance tomographic reconstruction using graphics processing units Ya.I. esterets and T.E. Gureyev

More information

Introduction to HPC. Lecture 21

Introduction to HPC. Lecture 21 443 Introduction to HPC Lecture Dept of Computer Science 443 Fast Fourier Transform 443 FFT followed by Inverse FFT DIF DIT Use inverse twiddles for the inverse FFT No bitreversal necessary! 443 FFT followed

More information

Advanced Computing Research Laboratory. Adaptive Scientific Software Libraries

Advanced Computing Research Laboratory. Adaptive Scientific Software Libraries Adaptive Scientific Software Libraries and Texas Learning and Computation Center and Department of Computer Science University of Houston Challenges Diversity of execution environments Growing complexity

More information

My 2 hours today: 1. Efficient arithmetic in finite fields minute break 3. Elliptic curves. My 2 hours tomorrow:

My 2 hours today: 1. Efficient arithmetic in finite fields minute break 3. Elliptic curves. My 2 hours tomorrow: My 2 hours today: 1. Efficient arithmetic in finite fields 2. 10-minute break 3. Elliptic curves My 2 hours tomorrow: 4. Efficient arithmetic on elliptic curves 5. 10-minute break 6. Choosing curves Efficient

More information

Radon Transform and Filtered Backprojection

Radon Transform and Filtered Backprojection Radon Transform and Filtered Backprojection Jørgen Arendt Jensen October 13, 2016 Center for Fast Ultrasound Imaging, Build 349 Department of Electrical Engineering Center for Fast Ultrasound Imaging Department

More information

DEVELOPMENT OF CONE BEAM TOMOGRAPHIC RECONSTRUCTION SOFTWARE MODULE

DEVELOPMENT OF CONE BEAM TOMOGRAPHIC RECONSTRUCTION SOFTWARE MODULE Rajesh et al. : Proceedings of the National Seminar & Exhibition on Non-Destructive Evaluation DEVELOPMENT OF CONE BEAM TOMOGRAPHIC RECONSTRUCTION SOFTWARE MODULE Rajesh V Acharya, Umesh Kumar, Gursharan

More information

Robert Jamieson. Robs Techie PP Everything in this presentation is at your own risk!

Robert Jamieson. Robs Techie PP Everything in this presentation is at your own risk! Robert Jamieson Robs Techie PP Everything in this presentation is at your own risk! PC s Today Basic Setup Hardware pointers PCI Express How will it effect you Basic Machine Setup Set the swap space Min

More information

Computer Architecture. Introduction. Lynn Choi Korea University

Computer Architecture. Introduction. Lynn Choi Korea University Computer Architecture Introduction Lynn Choi Korea University Class Information Lecturer Prof. Lynn Choi, School of Electrical Eng. Phone: 3290-3249, 공학관 411, lchoi@korea.ac.kr, TA: 윤창현 / 신동욱, 3290-3896,

More information

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures Programmable Logic Design Grzegorz Budzyń Lecture 15: Advanced hardware in FPGA structures Plan Introduction PowerPC block RocketIO Introduction Introduction The larger the logical chip, the more additional

More information

CS/EE 260. Digital Computers Organization and Logical Design

CS/EE 260. Digital Computers Organization and Logical Design CS/EE 260. Digital Computers Organization and Logical Design David M. Zar Computer Science and Engineering Department Washington University dzar@cse.wustl.edu http://www.cse.wustl.edu/~dzar/class/260 Digital

More information

CPSC 330 Computer Organization

CPSC 330 Computer Organization CPSC 33 Computer Organization Lecture 7c Memory Adapted from CS52, CS 6C and notes by Kevin Peterson and Morgan Kaufmann Publishers, Copyright 24. Improving cache performance Two ways of improving performance:

More information

H.264 AVC 4k Decoder V.1.0, 2014

H.264 AVC 4k Decoder V.1.0, 2014 SOC H.264 AVC 4k Video Decoder Datasheet System-On-Chip (SOC) Technologies 1. Key Features 1. Profile: High profile 2. Resolution: 4k (3840x2160) 3. Frame Rate: up to 60fps 4. Chroma Format: 4:2:0 or 4:2:2

More information

Biophysical Techniques (BPHS 4090/PHYS 5800)

Biophysical Techniques (BPHS 4090/PHYS 5800) Biophysical Techniques (BPHS 4090/PHYS 5800) Instructors: Prof. Christopher Bergevin (cberge@yorku.ca) Schedule: MWF 1:30-2:30 (CB 122) Website: http://www.yorku.ca/cberge/4090w2017.html York University

More information

CS 179: GPU Programming. Lecture 11 / Homework 4

CS 179: GPU Programming. Lecture 11 / Homework 4 CS 179: GPU Programming Lecture 11 / Homework 4 Breadth-First Search Given source vertex S: Find min. #edges to reach every vertex from S (Assume source is vertex 0) 0 1 1 2 2 3 Sequential pseudocode:

More information

The S6000 Family of Processors

The S6000 Family of Processors The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which

More information

/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 3: Caching (1) Welcome!

/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 3: Caching (1) Welcome! /INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2015 - Lecture 3: Caching (1) Welcome! Today s Agenda: The Problem with Memory Cache Architectures Practical Assignment 1 INFOMOV Lecture 3 Caching

More information

Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation

Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation Large-Scale Data & Systems Group Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation Georgios Theodorakis, Alexandros Koliousis, Peter Pietzuch, Holger Pirk Large-Scale Data & Systems (LSDS)

More information

Cyclone III LS FPGAs Altera Corporation Public

Cyclone III LS FPGAs Altera Corporation Public Cyclone III LS FPGAs Introducing Cyclone III LS Devices Low power 200K LE for under 0.25 Watt TSMC 60-nm low-power (LP) process Quartus II software power-aware design flow Broadcast Industrial Military

More information

Application Performance on Dual Processor Cluster Nodes

Application Performance on Dual Processor Cluster Nodes Application Performance on Dual Processor Cluster Nodes by Kent Milfeld milfeld@tacc.utexas.edu edu Avijit Purkayastha, Kent Milfeld, Chona Guiang, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER Thanks Newisys

More information

Corso di laurea in Fisica A.A Fisica Medica 4 TC

Corso di laurea in Fisica A.A Fisica Medica 4 TC Corso di laurea in Fisica A.A. 2007-2008 Fisica Medica 4 TC Computed Tomography Principles 1. Projection measurement 2. Scanner systems 3. Scanning modes Basic Tomographic Principle The internal structure

More information

Accelerating the Pulsar Search Pipeline with FPGAs, Programmed in OpenCL

Accelerating the Pulsar Search Pipeline with FPGAs, Programmed in OpenCL Accelerating the Pulsar Search Pipeline with FPGAs, Programmed in OpenCL Oliver Sinnen, Tyrone Sherwin, and Haomiao Wang & Prabu Thiagaraj (Manchester Uni/Raman Research Institute, Bangalore) Parallel

More information

An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs

An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs P. Banerjee Department of Electrical and Computer Engineering Northwestern University 2145 Sheridan Road, Evanston, IL-60208 banerjee@ece.northwestern.edu

More information

Tomographic Reconstruction

Tomographic Reconstruction Tomographic Reconstruction 3D Image Processing Torsten Möller Reading Gonzales + Woods, Chapter 5.11 2 Overview Physics History Reconstruction basic idea Radon transform Fourier-Slice theorem (Parallel-beam)

More information

White Paper. Floating-Point FFT Processor (IEEE 754 Single Precision) Radix 2 Core. Introduction. Parameters & Ports

White Paper. Floating-Point FFT Processor (IEEE 754 Single Precision) Radix 2 Core. Introduction. Parameters & Ports White Paper Introduction Floating-Point FFT Processor (IEEE 754 Single Precision) Radix 2 Core The floating-point fast fourier transform (FFT) processor calculates FFTs with IEEE 754 single precision (1

More information

Performance comparison between a massive SMP machine and clusters

Performance comparison between a massive SMP machine and clusters Performance comparison between a massive SMP machine and clusters Martin Scarcia, Stefano Alberto Russo Sissa/eLab joint Democritos/Sissa Laboratory for e-science Via Beirut 2/4 34151 Trieste, Italy Stefano

More information

COMP375 Practice Final Exam

COMP375 Practice Final Exam You are allowed one and only one 8½ by 11 inch page of notes during this exam. You are not allowed to use more than 187 square inches of paper surface to hold your notes. Telephone calls and texting are

More information

HISTORY OF MICROPROCESSORS

HISTORY OF MICROPROCESSORS HISTORY OF MICROPROCESSORS CONTENTS Introduction 4-Bit Microprocessors 8-Bit Microprocessors 16-Bit Microprocessors 1 32-Bit Microprocessors 64-Bit Microprocessors 2 INTRODUCTION Fairchild Semiconductors

More information

/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 3: Caching (1) Welcome!

/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 3: Caching (1) Welcome! /INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2017 - Lecture 3: Caching (1) Welcome! Today s Agenda: The Problem with Memory Cache Architectures Practical Assignment 1 INFOMOV Lecture 3 Caching

More information

COMPILED HARDWARE ACCELERATION OF MOLECULAR DYNAMICS CODE. Jason Villarreal and Walid A. Najjar

COMPILED HARDWARE ACCELERATION OF MOLECULAR DYNAMICS CODE. Jason Villarreal and Walid A. Najjar COMPILED HARDWARE ACCELERATION OF MOLECULAR DYNAMICS CODE Jason Villarreal and Walid A. Najjar Department of Computer Science and Engineering University of California, Riverside villarre, najjar@cs.ucr.edu

More information

Efficient Data Structures for the Fast 3D Reconstruction of Voxel Volumes with Inhomogeneous Spatial Resolution

Efficient Data Structures for the Fast 3D Reconstruction of Voxel Volumes with Inhomogeneous Spatial Resolution Efficient Data Structures for the Fast 3D Reconstruction of Voxel Volumes with Inhomogeneous Spatial Resolution Benjamin Betz 1, Steffen Kieß 1, Michael Krumm 2, Gunnar Knupe 2, Tsegaye Eshete 2, Sven

More information

Co-synthesis and Accelerator based Embedded System Design

Co-synthesis and Accelerator based Embedded System Design Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer

More information

SYSTEM BUS AND MOCROPROCESSORS HISTORY

SYSTEM BUS AND MOCROPROCESSORS HISTORY SYSTEM BUS AND MOCROPROCESSORS HISTORY Dr. M. Hebaishy momara@su.edu.sa http://colleges.su.edu.sa/dawadmi/fos/pages/hebaishy.aspx Digital Logic Design Ch1-1 SYSTEM BUS The CPU sends various data values,

More information

X-Stream II. Processing Method. Operating System. Hardware Performance. Elements of Processing Speed TECHNICAL BRIEF

X-Stream II. Processing Method. Operating System. Hardware Performance. Elements of Processing Speed TECHNICAL BRIEF X-Stream II Peter J. Pupalaikis Principal Technologist September 2, 2010 Summary This paper explains how X- Stream II techonlogy improves the speed and responsiveness of LeCroy oscilloscopes. TECHNICAL

More information

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing On the Efficacy of a Fued CPU+GPU Proceor (or APU) for Parallel Computing Mayank Daga, Ahwin M. Aji, and Wu-chun Feng Dept. of Computer Science Sampling of field that ue GPU Mac OS X Comology Molecular

More information

Design Once with Design Compiler FPGA

Design Once with Design Compiler FPGA Design Once with Design Compiler FPGA The Best Solution for ASIC Prototyping Synopsys Inc. Agenda Prototyping Challenges Design Compiler FPGA Overview Flexibility in Design Using DC FPGA and Altera Devices

More information

Hardware/Software Co-Design

Hardware/Software Co-Design 1 / 27 Hardware/Software Co-Design Miaoqing Huang University of Arkansas Fall 2011 2 / 27 Outline 1 2 3 3 / 27 Outline 1 2 3 CSCE 5013-002 Speical Topic in Hardware/Software Co-Design Instructor Miaoqing

More information

Hardware Acceleration of Pulsar Search on FPGAs using OpenCL

Hardware Acceleration of Pulsar Search on FPGAs using OpenCL Hardware Acceleration of Pulsar Search on FPGAs using OpenCL Oliver Sinnen Haomiao Wang & Prabu Thiagaraj (Manchester Uni) Parallel and Reconfigurable Computing Department of Electrical and Computer Engineering

More information

Hardware and Software Architecture. Chapter 2

Hardware and Software Architecture. Chapter 2 Hardware and Software Architecture Chapter 2 1 Basic Components The x86 processor communicates with main memory and I/O devices via buses Data bus for transferring data Address bus for the address of a

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

Continuous and Discrete Image Reconstruction

Continuous and Discrete Image Reconstruction 25 th SSIP Summer School on Image Processing 17 July 2017, Novi Sad, Serbia Continuous and Discrete Image Reconstruction Péter Balázs Department of Image Processing and Computer Graphics University of

More information

Algebraic Iterative Methods for Computed Tomography

Algebraic Iterative Methods for Computed Tomography Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic

More information