Transforming Irregular Algorithms for Heterogeneous Computing - Case Studies in Bioinformatics
|
|
- Anissa Chambers
- 5 years ago
- Views:
Transcription
1 Trasformig Irregular lgorithms for Heterogeeous omputig - ase Studies i ioiformatics Jig Zhag dvisor: Dr. Wu Feg ollaborator: Hao Wag syergy.cs.vt.edu
2 Irregular lgorithms haracterized by Operate o irregular data structures - trees, graphs, liked lists, priority queues Data are processed i variable-iteratio loops oth memory access ad cotrol flow are irregular ad upredictable Emergig data itesive applicatios have icreasig irregularity i memory access ad cotrol flow over massive data set ioiformatics applicatios Huma geome:. billio letters (. Gb) Social etwork aalysis, graph algorithms Twitter (Jue-Dec 9) - millio users, millio tweets (Gb) syergy.cs.vt.edu
3 Heterogeeous omputig Systems Heterogeeous computig systems ca offer massively parallel computatio with high power efficiecy ad throughput GPU, MD PU Itel Xeo Phi FPG, DSP, ut, desiged for regular programs, relyig o data locality ad regular computatio to tolerate access latecies Thousads of smaller (weak), more efficiet (simple) cores Limited cache size, i.e., short cache lie lifetimes syergy.cs.vt.edu
4 Heterogeeous omputig Systems Heterogeeous computig systems ca offer massively parallel computatio with high power efficiecy ad throughput GPU, MD PU Itel Xeo Phi FPG, DSP, ut, desiged for regular programs, relyig o data locality ad regular computatio to tolerate access latecies Thousads It is of very smaller challegig (weak), more to efficiet map irregular (simple) cores, which are lack of brach predictio, out-of-order executio, etc. algorithms o heterogeeous computig systems! Limited cache size, i.e., short cache lie lifetimes syergy.cs.vt.edu
5 Impacts of Irregularity o GPU Performace Irregular cotrol flow rach divergece Occurs whe threads iside warps braches to differet executio paths Waste computatioal resource rach Path Path Wilso Fug, Iva Sham, George Yua, Tor amodt, Dyamic Warp Formatio ad Schedulig for Efficiet GPU otrol Flow, micro syergy.cs.vt.edu
6 Impacts of Irregularity o GPU Perf (cot d) Irregular memory access Ucoalesced memory access Lower memory bus utilizatio Warp Warp 8 8 Oe segmets sigle 8-byte trasactio oalesced Memory ccess ross N segmets N -byte trasactio Ucoalesced Memory ccess syergy.cs.vt.edu
7 Irregular lgorithms Optimizatios o GPU Irregular cotrol flow Kerel fissio Sortig work Warp-cetric executio... Irregular memory access Memory access reorderig Exploitig memory hierarchy Data structure adaptatio... syergy.cs.vt.edu
8 ase Study: Mappig LSTp o GPU* LST - asic Local ligmet Search Tool Fid most similar sequeces from database for a query sequece Use heuristic algorithms, which have sigificat irregularities i both memory access ad cotrol flow P is for protei sequece; N is for ucleotide sequece *: J. Zhag, H. Wag, H. Li, W. Feg, culstp: Fie-Graied Parallelizatio of Protei Sequece Search o a GPU, IPDPS 8 syergy.cs.vt.edu
9 LST lgorithm Subject Sequece (database) Four stages i LST. Hit detectio. Ugapped extesio. Gapped extesio. Fial extesio Query Sequece 9 syergy.cs.vt.edu
10 LST lgorithm Query Sequece Subject Sequece Four stages i LST. Hit detectio. Ugapped extesio. Gapped extesio. Fial extesio syergy.cs.vt.edu
11 LST lgorithm Subject Sequece Four stages i LST. Hit detectio. Ugapped extesio. Gapped extesio. Fial extesio oe oe, Lookup Table Query Sequece syergy.cs.vt.edu
12 LST lgorithm query positio, subject positio Subject Sequece Four stages i LST. Hit detectio. Ugapped extesio. Gapped extesio. Fial extesio oe oe, Lookup Table Query Sequece,,,, syergy.cs.vt.edu
13 LST lgorithm Subject Sequece Four stages i LST. Hit detectio. Ugapped extesio. Gapped extesio. Fial extesio,, Query Sequece, distace < threshold, syergy.cs.vt.edu
14 LST lgorithm Subject Sequece Four stages i LST. Hit detectio. Ugapped extesio. Gapped extesio. Fial extesio, Query Sequece, syergy.cs.vt.edu
15 LST lgorithm Subject Sequece Four stages i LST. Hit detectio. Ugapped extesio. Gapped extesio. Fial extesio, Query Sequece X, syergy.cs.vt.edu
16 LST lgorithm Subject Sequece Four stages i LST. Hit detectio. Ugapped extesio. Gapped extesio. Fial extesio, Query Sequece X, syergy.cs.vt.edu
17 LST lgorithm Subject Sequece Four stages i LST. Hit detectio. Ugapped extesio. Gapped extesio. Fial extesio, Query Sequece X Kowledge: seed-ad-exted the most widely used aligmet paradigm, syergy.cs.vt.edu
18 State-of-the-rt GPU LST pproaches* Use ituitive parallelizatio - coarse-graied parallelizatio, where a thread compares the query sequece to a subject sequece Thread Thread Thread Thread... Subject Seq Subject Seq Subject Seq Subject Seq... Query Sequece *: Desig ad Implemetatio of a UD-compatible GPU-based ore for Gapped LST lgorithm (IS ) GPU- LST: Usig Graphics Processors to ccelerate Protei Sequece ligmet UD-LSTP (iformatics) UD-LSTP: cceleratig LSTP o UD-eabled Graphics Hardware (T) cceleratig Protei Sequece Search i a Heterogeeous omputig System. (IPDPS ) 8 syergy.cs.vt.edu
19 hallege #: otrol Flow Irregularity With coarse-graied parallelism, differet threads may execute across differet stages Warp Subject sequece Hit detectio Thread Thread Thread Thread subject_seq subject_seq subject_seq subject_seq No hit Hit foud Hit foud & distace < No hit Ugapped extesio extesio Divergece 9 syergy.cs.vt.edu
20 Solutio: Kerel Fissio Split hit detectio ad ugapped extesio stage ito separate kerels Use itermediate kerels to bridge the two stages syergy.cs.vt.edu
21 hallege #: otrol Flow Irregularity () oarse-graied extesio, where a thread is resposible for a diagoal, ca result i divergece, sice differet diagoals could be exteded to differet legths Thread Thread Thread Oe thread per diagoal syergy.cs.vt.edu
22 Solutio: Warp-cetric Extesio Map a warp of threads to oe diagoal Threads i a warp checks differet positios cocurretly Warp Warp Warp Warp-based Extesio syergy.cs.vt.edu
23 hallege #: Memory ccess Irregularity Use a global array, called lasthit rray, to record the previous hit i each diagoal Subject Sequece Query Sequece LastHit rray - syergy.cs.vt.edu
24 hallege #: Memory ccess Irregularity Read LastHit rray to get last hit positio i the diagoal Write back curret positio to lasthit rray Query Sequece Thread, - Subject Sequece WR Operatio Positio Read Write syergy.cs.vt.edu
25 hallege #: Memory ccess Irregularity Read LastHit rray to get last hit positio i the diagoal Write back curret positio to lasthit rray Query Sequece Thread,, - Subject Sequece WR Operatio Positio Read Write Read - Write - syergy.cs.vt.edu
26 hallege #: Memory ccess Irregularity Read LastHit rray to get last hit positio i the diagoal Write back curret positio to lasthit rray Query Sequece Thread,,, - Subject Sequece WR Operatio Read Positio Write Read - Write - Read Write syergy.cs.vt.edu
27 hallege #: Memory ccess Irregularity Read LastHit rray to get last hit positio i the diagoal Write back curret positio to lasthit rray Query Sequece Thread,,,, - Subject Sequece WR Operatio Read Positio Write Read - Write - Read Write Read Write syergy.cs.vt.edu
28 hallege #: Memory ccess Irregularity Irregular Read/Write Ops o LastHit array Thread Irregular Memory ccess Subject Sequece Operatio Positio Query Sequece,,,, Read Write Read - Write - Read Write Read Write 8 - syergy.cs.vt.edu
29 Solutio: Memory ccess Reorderig ollect ad group hits by diagoal umbers via biig Threads Subject Sequece Query Sequece,,,, iig is, Out-of-order,, 9 -, syergy.cs.vt.edu
30 Solutio: Memory ccess Reorderig Sort hits by positios i each bi Threads,,,, iig is,,, Sort is,,, -, -, syergy.cs.vt.edu
31 Solutio: Memory ccess Reorderig Filter out o-exteable hits by distace of adjacet hits Threads,,,, iig is,,, Sort X, X, is, Filter, is -, -, syergy.cs.vt.edu -
32 Threads Solutio: Memory ccess Reorderig Trasform a irregular algorithm (lasthit array method) ito three regular phases,, Regular read, irregular write,, iig -,,, is Regular memory access, Sort - X, X,, is, Filter, is syergy.cs.vt.edu -
33 Performace Numbers ompared to origial versio, culstp has... Less brach divergece overhead % 8% % % % % culstp - Hit Detectio culstp - Hit Filterig Origial The lower the better culstp rach Divergece Overhead* The higher the better culstp Occupacy chieved culstp - Hit Sortig culstp - Ugapped Extesio The higher the better culstp Global Load Efficiecy *: rach Divergece Overhead: the ratio of diverget executio to total executio syergy.cs.vt.edu
34 Performace Numbers ompared to origial versio, culstp has... Less brach divergece overhead etter occupacy % 8% % % % % culstp - Hit Detectio culstp - Hit Filterig Origial The lower the better culstp rach Divergece Overhead* The higher the better culstp Occupacy chieved culstp - Hit Sortig culstp - Ugapped Extesio The higher the better culstp Global Load Efficiecy *: rach Divergece Overhead: the ratio of diverget executio to total executio syergy.cs.vt.edu
35 Performace Numbers ompared to origial versio, culstp has... Less brach divergece overhead etter occupacy etter load efficiecy % 8% % % % % culstp - Hit Detectio culstp - Hit Filterig Origial The lower the better culstp rach Divergece Overhead* The higher the better culstp Occupacy chieved culstp - Hit Sortig culstp - Ugapped Extesio The higher the better culstp Global Load Efficiecy *: rach Divergece Overhead: the ratio of diverget executio to total executio syergy.cs.vt.edu
36 Speedup With swissprot database, culstp achieves Up to.-fold speedup of hit detectio ad ugapped extesio over GPU-LSTP, which is the fastest GPU-based LST Up to.-fold overall speedup over GPU-LSTP (culstp has parallelizatio of the rest of stages) query query query Speedup of Hit Detectio & Ugapped Extesio query query query Overall Speedup syergy.cs.vt.edu
37 oclusio ad Future Work oclusio Mappig irregular algorithms to heterogeeous system is o-trivial, eed couples of trasformatio ad adaptio o algorithms ad data structures ut these cocepts of trasformatios are propagable Kerel fissio, memory access reorderig, warp-cetric executio, Future Work Geeralize optimizatios from culstp for differet algorithms ad other heterogeeous systems Desig a library for irregular algorithms with optimized irregular data structure ad operatios syergy.cs.vt.edu
38 Other Research Topics Optimizig W (urrows Wheeler ligmet) o multicore PU ad Itel Xeo Phi dbblast: database idex based LST o multicore PU ad Itel Xeo Phi Dyamic parallelism o MD PU (New) Optimizig irregular applicatios for GPU/Xeo Phi clusters (Future) Other Research Iterests loud computig MapReduce with accelerators 8 syergy.cs.vt.edu
Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution
Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig
More informationMorgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.
Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple
More informationInstruction and Data Streams
Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Data Parallelism 1 (vector & SIMD extesios) (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Istructio ad
More informationComputer Graphics Hardware An Overview
Computer Graphics Hardware A Overview Graphics System Moitor Iput devices CPU/Memory GPU Raster Graphics System Raster: A array of picture elemets Based o raster-sca TV techology The scree (ad a picture)
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad
More informationChapter 3 Classification of FFT Processor Algorithms
Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:
More informationMultiprocessors. HPC Prof. Robert van Engelen
Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies
More informationDesign of Digital Circuits Lecture 21: SIMD Processors II and Graphics Processing Units
Desig of Digital Circuits Lecture 21: SIMD Processors II ad Graphics Processig Uits Dr. Jua Gómez Lua Prof. Our Mutlu ETH Zurich Sprig 2018 17 May 2018 New Course: Bachelor s Semiar i Comp Arch Fall 2018
More informationBenchmarking SpMV on Many-Core Architecture
Bechmarkig SpMV o May-Core Architecture Biwei Xie ad Zhe Jia Istitute of Computig Techology Chiese Academy of Scieces Priceto Uiversity Why Bechmarkig? To Measure Is To Kow -- William Thomso (Lord Kelvi)
More informationA Fast Fractal Model Based ECG Compression Technique Lin Yin 1,a, Fang Yu* 1,b
4th Iteratioal Coferece o Computer, Mechatroics, Cotrol ad Electroic Egieerig (ICCMCEE 2015) A Fast Fractal Model Based ECG Compressio Techique Li Yi 1,a, Fag Yu* 1,b 1 Electroicsad iformatio egieerig
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must
More informationFuzzy Membership Function Optimization for System Identification Using an Extended Kalman Filter
Fuzzy Membership Fuctio Optimizatio for System Idetificatio Usig a Eteded Kalma Filter Srikira Kosaam ad Da Simo Clevelad State Uiversity NAFIPS Coferece Jue 4, 2006 Embedded Cotrol Systems Research Lab
More informationCMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems
More information3D Model Retrieval Method Based on Sample Prediction
20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer
More informationMessage Integrity and Hash Functions. TELE3119: Week4
Message Itegrity ad Hash Fuctios TELE3119: Week4 Outlie Message Itegrity Hash fuctios ad applicatios Hash Structure Popular Hash fuctios 4-2 Message Itegrity Goal: itegrity (ot secrecy) Allows commuicatig
More informationStructuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software
Structurig Redudacy for Fault Tolerace CSE 598D: Fault Tolerat Software What do we wat to achieve? Versios Damage Assessmet Versio 1 Error Detectio Iputs Versio 2 Voter Outputs State Restoratio Cotiued
More informationSPIRAL DSP Transform Compiler:
SPIRAL DSP Trasform Compiler: Applicatio Specific Hardware Sythesis Peter A. Milder (peter.milder@stoybroo.edu) Fraz Frachetti, James C. Hoe, ad Marus Pueschel Departmet of ECE Caregie Mello Uiversity
More informationSCI Reflective Memory
Embedded SCI Solutios SCI Reflective Memory (Experimetal) Atle Vesterkjær Dolphi Itercoect Solutios AS Olaf Helsets vei 6, N-0621 Oslo, Norway Phoe: (47) 23 16 71 42 Fax: (47) 23 16 71 80 Mail: atleve@dolphiics.o
More informationOperating System Concepts. Operating System Concepts
Chapter 4: Mass-Storage Systems Logical Disk Structure Logical Disk Structure Disk Schedulig Disk Maagemet RAID Structure Disk drives are addressed as large -dimesioal arrays of logical blocks, where the
More informationCSE 417: Algorithms and Computational Complexity
Time CSE 47: Algorithms ad Computatioal Readig assigmet Read Chapter of The ALGORITHM Desig Maual Aalysis & Sortig Autum 00 Paul Beame aalysis Problem size Worst-case complexity: max # steps algorithm
More informationarxiv: v2 [cs.ds] 24 Mar 2018
Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves
More informationSpeeding-up dynamic programming in sequence alignment
Departmet of Computer Sciece Aarhus Uiversity Demark Speedig-up dyamic programmig i sequece aligmet Master s Thesis Dug My Hoa - 443 December, Supervisor: Christia Nørgaard Storm Pederse Implemetatio code
More informationUH-MEM: Utility-Based Hybrid Memory Management. Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, Onur Mutlu
UH-MEM: Utility-Based Hybrid Memory Maagemet Yag Li, Saugata Ghose, Jogmoo Choi, Ji Su, Hui Wag, Our Mutlu 1 Executive Summary DRAM faces sigificat techology scalig difficulties Emergig memory techologies
More informationComputer Architecture Lecture 8: SIMD Processors and GPUs. Prof. Onur Mutlu ETH Zürich Fall October 2017
Computer Architecture Lecture 8: SIMD Processors ad GPUs Prof. Our Mutlu ETH Zürich Fall 2017 18 October 2017 Ageda for Today & Next Few Lectures SIMD Processors GPUs Itroductio to GPU Programmig Digitaltechik
More informationApplication of rule rewriting system to software automated tuning
Applicatio of rule rewritig system to software automated tuig Ivaeko P.A., Dorosheko A.Y. Istitute of Software Systems of Natioal Academy of Scieces of Ukraie, Glushkov av. 40, 03187 Kyiv, Ukraie paiv@ukr.et,
More informationAdaptive Resource Allocation for Electric Environmental Pollution through the Control Network
Available olie at www.sciecedirect.com Eergy Procedia 6 (202) 60 64 202 Iteratioal Coferece o Future Eergy, Eviromet, ad Materials Adaptive Resource Allocatio for Electric Evirometal Pollutio through the
More informationEfficient Hardware Design for Implementation of Matrix Multiplication by using PPI-SO
Efficiet Hardware Desig for Implemetatio of Matrix Multiplicatio by usig PPI-SO Shivagi Tiwari, Niti Meea Dept. of EC, IES College of Techology, Bhopal, Idia Assistat Professor, Dept. of EC, IES College
More informationLecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming
Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis
More informationGPUMP: a Multiple-Precision Integer Library for GPUs
GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia Email: {kyzhao, chxw}@comp.hkbu.edu.hk Abstract
More informationCIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19
CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.
More informationImprovement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation
Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity
More informationNeuro Fuzzy Model for Human Face Expression Recognition
IOSR Joural of Computer Egieerig (IOSRJCE) ISSN : 2278-0661 Volume 1, Issue 2 (May-Jue 2012), PP 01-06 Neuro Fuzzy Model for Huma Face Expressio Recogitio Mr. Mayur S. Burage 1, Prof. S. V. Dhopte 2 1
More informationChapter 24. Sorting. Objectives. 1. To study and analyze time efficiency of various sorting algorithms
Chapter 4 Sortig 1 Objectives 1. o study ad aalyze time efficiecy of various sortig algorithms 4. 4.7.. o desig, implemet, ad aalyze bubble sort 4.. 3. o desig, implemet, ad aalyze merge sort 4.3. 4. o
More informationCMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago
CMSC 22200 Computer Architecture Lecture 2: ISA Prof. Yajig Li Departmet of Computer Sciece Uiversity of Chicago Admiistrative Stuff Lab1 out toight Due Thursday (10/18) Lab1 review sessio Tomorrow, 10/05,
More informationQuality of Service. Spring 2018 CS 438 Staff - University of Illinois 1
Quality of Service Sprig 2018 CS 438 Staff - Uiversity of Illiois 1 Quality of Service How good are late data ad lowthroughput chaels? It depeds o the applicatio. Do you care if... Your e-mail takes 1/2
More informationRecognition of convolutional neural network based on CUDA Technology
Recogitio of covolutioal eural etwork based o CUDA Techology Yi-bi HUANG, Kag Li, Ge Wag, Mi Cao, Pi Li, Yu-jia Zhag Istitute of Automatio, Chiese Academy of Scieces Abstract - For the problem whether
More informationCMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,
More informationEE123 Digital Signal Processing
Last Time EE Digital Sigal Processig Lecture 7 Block Covolutio, Overlap ad Add, FFT Discrete Fourier Trasform Properties of the Liear covolutio through circular Today Liear covolutio with Overlap ad add
More informationComputer Systems - HS
What have we leared so far? Computer Systems High Level ENGG1203 2d Semester, 2017-18 Applicatios Sigals Systems & Cotrol Systems Computer & Embedded Systems Digital Logic Combiatioal Logic Sequetial Logic
More informationLecture 3. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram
Lecture 3 RTL Desig Methodology Trasitio from Pseudocode & Iterface to a Correspodig Block Diagram Structure of a Typical Digital Data Iputs Datapath (Executio Uit) Data Outputs System Cotrol Sigals Status
More informationLecture 28: Data Link Layer
Automatic Repeat Request (ARQ) 2. Go ack N ARQ Although the Stop ad Wait ARQ is very simple, you ca easily show that it has very the low efficiecy. The low efficiecy comes from the fact that the trasmittig
More informationCMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago
CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device
More informationMaster Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1
Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts
More informationComputer Architecture ELEC3441
CPU-Memory Bottleeck Computer Architecture ELEC44 CPU Memory Lecture 8 Cache Dr. Hayde Kwok-Hay So Departmet of Electrical ad Electroic Egieerig Performace of high-speed computers is usually limited by
More informationTexture Mapping. Jian Huang. This set of slides references the ones used at Ohio State for instruction.
Texture Mappig Jia Huag This set of slides refereces the oes used at Ohio State for istructio. Ca you do this What Dreams May Come Texture Mappig Of course, oe ca model the exact micro-geometry + material
More informationDynamic Programming and Curve Fitting Based Road Boundary Detection
Dyamic Programmig ad Curve Fittig Based Road Boudary Detectio SHYAM PRASAD ADHIKARI, HYONGSUK KIM, Divisio of Electroics ad Iformatio Egieerig Chobuk Natioal Uiversity 664-4 Ga Deokji-Dog Jeoju-City Jeobuk
More informationChapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings
Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The
More informationAppendix D. Controller Implementation
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);
More informationDesign of Digital Circuits Lecture 22: GPU Programming. Dr. Juan Gómez Luna Prof. Onur Mutlu ETH Zurich Spring May 2018
Desig of Digital Circuits Lecture 22: GPU Programmig Dr. Jua Gómez Lua Prof. Our Mutlu ETH Zurich Sprig 2018 18 May 2018 Ageda for Today GPU as a accelerator Program structure Bulk sychroous programmig
More informationWestminsterResearch
WestmisterResearch http://www.westmister.ac.uk/westmisterresearch Parallelizig the Chambolle Algorithm for Performace-Optimized Mappig o FPGA Devices Beretta, I., Raa, V., Aki, A., Nacci, A. A., Sciuto,
More informationHeaps. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015
Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 201 Heaps 201 Goodrich ad Tamassia xkcd. http://xkcd.com/83/. Tree. Used with permissio uder
More informationAn Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem
A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies
More informationLazy Type Changes in Object-oriented Database. Shan Ming Woo and Barbara Liskov MIT Lab. for Computer Science December 1999
Lazy Type Chages i Object-orieted Database Sha Mig Woo ad Barbara Liskov MIT Lab. for Computer Sciece December 1999 Backgroud wbehavior of OODB apps compose of behavior of persistet obj wbehavior of objects
More informationCS2410 Computer Architecture. Flynn s Taxonomy
CS2410 Computer Architecture Dept. of Computer Sciece Uiversity of Pittsburgh http://www.cs.pitt.edu/~melhem/courses/2410p/idex.html 1 Fly s Taxoomy SISD Sigle istructio stream Sigle data stream (SIMD)
More informationAnnouncements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components
Aoucemets Readig Chapter 4 (4.1-4.2) Project #4 is o the web ote policy about project #3 missig compoets Homework #1 Due 11/6/01 Chapter 6: 4, 12, 24, 37 Midterm #2 11/8/01 i class 1 Project #4 otes IPv6Iit,
More informationUNIT 4C Iteration: Scalability & Big O. Efficiency
UNIT 4C Iteratio: Scalability & Big O 1 Efficiecy A computer program should be totally correct, but it should also execute as quickly as possible (time-efficiecy) use memory wisely (storage-efficiecy)
More informationCMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Advaced Issues Review: Pipelie Hazards Structural hazards Desig pipelie to elimiate structural hazards.
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface ARM Editio Chapter 6 Parallel Processors from Cliet to Cloud Itroductio Goal: coectig multiple computers to get higher performace Multiprocessors
More informationAvid Interplay Bundle
Avid Iterplay Budle Versio 2.5 Cofigurator ReadMe Overview This documet provides a overview of Iterplay Budle v2.5 ad describes how to ru the Iterplay Budle cofiguratio tool. Iterplay Budle v2.5 refers
More informationDesign of Digital Circuits Lecture 20: SIMD Processors. Prof. Onur Mutlu ETH Zurich Spring May 2018
Desig of Digital Circuits Lecture 20: SIMD Processors Prof. Our Mutlu ETH Zurich Sprig 2018 11 May 2018 New Course: Bachelor s Semiar i Comp Arch Fall 2018 2 credit uits Rigorous semiar o fudametal ad
More informationCS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1
CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implemetatios: average cases Search Add Remove Sorted array-based Usorted array-based Balaced Search Trees O(log ) O() O() O() O(1) O()
More informationGraphs. Minimum Spanning Trees. Slides by Rose Hoberman (CMU)
Graphs Miimum Spaig Trees Slides by Rose Hoberma (CMU) Problem: Layig Telephoe Wire Cetral office 2 Wirig: Naïve Approach Cetral office Expesive! 3 Wirig: Better Approach Cetral office Miimize the total
More informationLecture 5. Counting Sort / Radix Sort
Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018
More informationMinimum Spanning Trees
Miimum Spaig Trees Miimum Spaig Trees Spaig subgraph Subgraph of a graph G cotaiig all the vertices of G Spaig tree Spaig subgraph that is itself a (free) tree Miimum spaig tree (MST) Spaig tree of a weighted
More informationMorgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5
Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:
More informationHardware Design and Performance Estimation of The 128-bit Block Cipher CRYPTON
Hardware Desig ad Performace Estimatio of The 128-bit Block Cipher CRYPTON Eujog Hog, Jai-Hoo Chug, ad Chae Hoo Lim Iformatio ad Commuicatios Research Ceter Future Systems, Ic. 372-2 Yagjae-Dog, Seocho-Ku,
More informationCSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University
CSCI 5090/7090- Machie Learig Sprig 018 Mehdi Allahyari Georgia Souther Uiversity Clusterig (slides borrowed from Tom Mitchell, Maria Floria Balca, Ali Borji, Ke Che) 1 Clusterig, Iformal Goals Goal: Automatically
More informationIntroduction. Nature-Inspired Computing. Terminology. Problem Types. Constraint Satisfaction Problems - CSP. Free Optimization Problem - FOP
Nature-Ispired Computig Hadlig Costraits Dr. Şima Uyar September 2006 Itroductio may practical problems are costraied ot all combiatios of variable values represet valid solutios feasible solutios ifeasible
More informationSorting 9/15/2009. Sorting Problem. Insertion Sort: Soundness. Insertion Sort. Insertion Sort: Running Time. Insertion Sort: Soundness
9/5/009 Algorithms Sortig 3- Sortig Sortig Problem The Sortig Problem Istace: A sequece of umbers Objective: A permutatio (reorderig) such that a ' K a' a, K,a a ', K, a' of the iput sequece The umbers
More informationOptimized Three-Dimensional Stencil Computation on Fermi and Kepler GPUs
Optimized Three-Dimesioal Stecil Computatio o Fermi ad Kepler GPUs Aamaria Vizitiu, Lucia Itu, Cosmi Niţă, Costati Suciu Siemes Corporate Techology, SC Siemes SRL Departmet of Automatio ad Iformatio Techology,
More informationHash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.
Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative
More informationAlpha Individual Solutions MAΘ National Convention 2013
Alpha Idividual Solutios MAΘ Natioal Covetio 0 Aswers:. D. A. C 4. D 5. C 6. B 7. A 8. C 9. D 0. B. B. A. D 4. C 5. A 6. C 7. B 8. A 9. A 0. C. E. B. D 4. C 5. A 6. D 7. B 8. C 9. D 0. B TB. 570 TB. 5
More informationImage Segmentation EEE 508
Image Segmetatio Objective: to determie (etract) object boudaries. It is a process of partitioig a image ito distict regios by groupig together eighborig piels based o some predefied similarity criterio.
More informationOptimization for framework design of new product introduction management system Ma Ying, Wu Hongcui
2d Iteratioal Coferece o Electrical, Computer Egieerig ad Electroics (ICECEE 2015) Optimizatio for framework desig of ew product itroductio maagemet system Ma Yig, Wu Hogcui Tiaji Electroic Iformatio Vocatioal
More informationSimple and Fast Parallel Algorithms for the Voronoi Maps and the Euclidean Distance Map, with GPU implementations
Simple ad Fast Parallel Algorithms for the Vorooi Maps ad the Euclidea Distace Map, with GPU implemetatios Takumi Hoda, Shiosuke Yamamoto, Hiroaki Hoda, Koji Nakao, Yasuaki Ito Departmet of Iformatio Egieerig
More informationMinimum Spanning Trees
Presetatio for use with the textbook, lgorithm esig ad pplicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 0 Miimum Spaig Trees 0 Goodrich ad Tamassia Miimum Spaig Trees pplicatio: oectig a Network Suppose
More informationPage 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory!
Why Care About the Memory Hierarchy? Memory Virtual Memory -DRAM Memory Gap (latecy) Reasos: Multi process systems (abstractio & memory protectio) Solutio: Tables (holdig per process traslatios) Fast traslatio
More informationn Explore virtualization concepts n Become familiar with cloud concepts
Chapter Objectives Explore virtualizatio cocepts Become familiar with cloud cocepts Chapter #15: Architecture ad Desig 2 Hypervisor Virtualizatio ad cloud services are becomig commo eterprise tools to
More informationService Oriented Enterprise Architecture and Service Oriented Enterprise
Approved for Public Release Distributio Ulimited Case Number: 09-2786 The 23 rd Ope Group Eterprise Practitioers Coferece Service Orieted Eterprise ad Service Orieted Eterprise Ya Zhao, PhD Pricipal, MITRE
More informationProgramming with Shared Memory PART II. HPC Spring 2017 Prof. Robert van Engelen
Programmig with Shared Memory PART II HPC Sprig 2017 Prof. Robert va Egele Overview Sequetial cosistecy Parallel programmig costructs Depedece aalysis OpeMP Autoparallelizatio Further readig HPC Sprig
More informationCache-Optimal Methods for Bit-Reversals
Proceedigs of the ACM/IEEE Supercomputig Coferece, November 1999, Portlad, Orego, U.S.A. Cache-Optimal Methods for Bit-Reversals Zhao Zhag ad Xiaodog Zhag Departmet of Computer Sciece College of William
More informationAnalysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve
Advaces i Computer, Sigals ad Systems (2018) 2: 19-25 Clausius Scietific Press, Caada Aalysis of Server Resource Cosumptio of Meteorological Satellite Applicatio System Based o Cotour Curve Xiagag Zhao
More informationA Parallel Reconfigurable Architecture for Real-Time Stereo Vision
2009 Iteratioal Cofereces o Embedded Software ad Systems A Parallel Recofigurable Architecture for Real-Time Stereo Visio Lei Che Yude Jia Beijig Laboratory of Itelliget Iformatio Techology, School of
More informationMinimum Spanning Trees. Application: Connecting a Network
Miimum Spaig Tree // : Presetatio for use with the textbook, lgorithm esig ad pplicatios, by M. T. oodrich ad R. Tamassia, Wiley, Miimum Spaig Trees oodrich ad Tamassia Miimum Spaig Trees pplicatio: oectig
More informationPar4All. From Convex Array Regions to Heterogeneous Computing
Par4All From Covex Array Regios to Heterogeeous Computig Mehdi Amii, Béatrice Creusillet, Stéphaie Eve, Roa Keryell, Oig Goubier, Serge Guelto, Jaice Oaia McMaho, Fraçois Xavier Pasquier, Grégoire Péa,
More information1 Enterprise Modeler
1 Eterprise Modeler Itroductio I BaaERP, a Busiess Cotrol Model ad a Eterprise Structure Model for multi-site cofiguratios are itroduced. Eterprise Structure Model Busiess Cotrol Models Busiess Fuctio
More information. Written in factored form it is easy to see that the roots are 2, 2, i,
CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or
More informationOutline. Research Definition. Motivation. Foundation of Reverse Engineering. Dynamic Analysis and Design Pattern Detection in Java Programs
Dyamic Aalysis ad Desig Patter Detectio i Java Programs Outlie Lei Hu Kamra Sartipi {hul4, sartipi}@mcmasterca Departmet of Computig ad Software McMaster Uiversity Caada Motivatio Research Problem Defiitio
More informationCIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13
CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis
More informationUniprocessors. HPC Prof. Robert van Engelen
Uiprocessors HPC Prof. Robert va Egele Overview PART I: Uiprocessors PART II: Multiprocessors ad ad Compiler Optimizatios Parallel Programmig Models Uiprocessors Multiprocessors Processor architectures
More informationLower Bounds for Sorting
Liear Sortig Topics Covered: Lower Bouds for Sortig Coutig Sort Radix Sort Bucket Sort Lower Bouds for Sortig Compariso vs. o-compariso sortig Decisio tree model Worst case lower boud Compariso Sortig
More informationA Parallel DFA Minimization Algorithm
A Parallel DFA Miimizatio Algorithm Ambuj Tewari, Utkarsh Srivastava, ad P. Gupta Departmet of Computer Sciece & Egieerig Idia Istitute of Techology Kapur Kapur 208 016,INDIA pg@iitk.ac.i Abstract. I this
More informationCSC 220: Computer Organization Unit 11 Basic Computer Organization and Design
College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:
More informationALU Augmentation for MPEG-4 Repetitive Padding
ALU Augmetatio for MPEG-4 Repetitive Paddig Georgi Kuzmaov Stamatis Vassiliadis Computer Egieerig Lab, Electrical Egieerig Departmet, Faculty of formatio Techology ad Systems, Delft Uiversity of Techology,
More informationFast and Scalable List Ranking on the GPU
Fast ad Scalable List Rakig o the GPU M. Suhail Rehma, Kishore Kothapalli, P. J. Narayaa Iteratioal Istitute of Iformatio Techology Hyderabad, Idia {rehma@research., kkishore@, pj@}iiit.ac.i ABSTRACT Geeral
More informationDATA STRUCTURES. amortized analysis binomial heaps Fibonacci heaps union-find. Data structures. Appetizer. Appetizer
Data structures DATA STRUCTURES Static problems. Give a iput, produce a output. Ex. Sortig, FFT, edit distace, shortest paths, MST, max-flow,... amortized aalysis biomial heaps Fiboacci heaps uio-fid Dyamic
More informationIntroduction to Computing Systems: From Bits and Gates to C and Beyond 2 nd Edition
Lecture Goals Itroductio to Computig Systems: From Bits ad Gates to C ad Beyod 2 d Editio Yale N. Patt Sajay J. Patel Origial slides from Gregory Byrd, North Carolia State Uiversity Modified slides by
More information