In-Place Associative Computing

Size: px
Start display at page:

Download "In-Place Associative Computing"

Transcription

1 In-Place Associative Computing All Images are Public in the Web Avidan Akerib Ph.D. Vice President Associative Computing BU

2 Agenda Introduction to associative computing Use case examples Similarity search Large Scale Attention Computing Few-shot learning Software model Future Approaches 2

3 The Challenge In AI Computing (Matrix Multiplication is not enough!!) AI Requirement High Precision Floating Point Multi precision Linearly Scalable Sort-search Heavy computation Bandwidth/power tradeoff Use Case Example Neural network learning Real time inference, saving memory Big Data Top-K, recommendation, speech, classify image/video Non linearity, Softmax, exponent, normalization High speed at low power 3

4 4 Von Neumann Architecture Memory CPU High Density (Repeated cells) Slower Leveraging Moore s Law Lower Density (Lots of Logic ) Faster 4

5 5 Von Neumann Architecture Memory CPU High Density (Reputed cells) Slower Lower Density (Lots of Logic) Faster CPU frequency outpacing memory - need to add cache. Continue to leverage Moore s Law 5

6 6 Since 26 Clock speed start flattening Sharply Source: Intel 6

7 7 Thinking Parallel : 2 cores and more Memory CPU However, memory utilization becomes an issue 7

8 More and more memory to solve utilization problem Memory CPU Local and Global Memory 8

9 9 Memory still growing rapidly Memory CPU Memory becomes a larger part of each chip 9

10 Same Concept even with GPGPUs Memories GPGPU Very High Power, large die, Expensive What s Next??

11 Most of Power goes to BW Source: Song Han Stanford University

12 2 Changing the Rules of the Game!!! Standard Memory cells are smarter than we thought!! 2

13 APU Associative Processing Unit Simple CPU Question Simple & Narrow Bus Answer Millions Processors APU Associative Processing Computes in-place directly in the memory array removes the I/O bottleneck Significantly increases performance Reduces power 3

14 4 How Computers Work Today RE/WE Address Decoder ALU Sense Amp /IO Drivers 4

15 5 Accessing Multiple Rows Simultaneously RE RE RE WE NOR? WE RE Bus Contention is not an error!!! It s a simple NOR/NAND satisfying De-Morgan s law 5

16 Truth Table Example A B C D C AB!A!C + BC =!!(!A!C + BC ) =! (!(!A!C)!(BC)) Every Minterm takes one Clock All bit lines executes Karnaugh tables in parallel = NAND( NAND(!A,!C),NAND(B,C)) CLOCK Read (!A,!C) ; WRITE T Read (B,C) ; WRITE T2 CLOCK Read (T,T2) ; WRITE D 6

17 7 Vector Add Example A[] + B[] = C[] vector A(8,32M) vector B(8,32M) Vector C(9,32M) C = A + B No. Of Clocks = 4 * 8 = 32 Clocks/byte= 32/32M=/M OPS = Ghz X M = PetaOPS 7

18 CAM/ Associative Search Records in the combines key goes to the read enable Values Duplicate Vales with inverse data =match RE RE RE RE KEY: Search Duplicate the Key with Inverse. Move The original Key next to the inverse data 8

19 TCAM Search By Standard Memory Cells Don t Care Don t Care Don t Care 9

20 TCAM Search By Standard Memory Cells in the combines key goes to the read enable Insert Zero instead of don t-care Duplicate data. Inverse only to those which are not don t care = match =match RE RE RE RE KEY: Search Duplicate the Key with Inverse. Move The original Key next to the inverse data 2

21 Computing in the Bit Lines Vector A a a a2 a3 a4 a5 a6 a7 Vector B b b b2 b3 b4 b5 b6 b7 C=f(A,B) Each bit line becomes a processor and storage Millions of bit lines = millions of processors 2

22 Neighborhood Computing Shift vector C=f(A,SL(B,)) Parallel shift of bit cycle sections Enables neighborhood operations such as convolutions 22

23 Search & Count Search Count = 3 Search (binary or ternary) all bit lines in cycle 28 M bit lines => 28 Peta search/sec Key Applications for search and count for predictive analytics: Recommender systems K-Nearest Neighbors (using cosine similarity search) Random forest Image histogram Regular expression 23

24 CPU vs GPU vs FPGA vs APU 24

25 CPU/GPGPU vs APU CPU/GPGPU (Current Solution) In-Place Computing (APU) Send an address to memory Fetch the data from memory and send it to the processor Compute serially per core (thousands of cores at most) Write the data back to memory, further wasting IO resources Send data to each location that needs it Search by content Mark in place Compute in place on millions of processors (the memory itself becomes millions of processors No need to write data back the result is already in the memory If needed, distribute or broadcast at once 25

26 ARCHITECTURE 26

27 Communication between Sections Shift between sections enable neighborhood operations (filters, CNN etc.) Store, Compute, Search and Transport data anywhere. 27

28 Memory Section Computing to Improve Performance MLB section 24 rows control Connecting Mux MLB section Connecting mux... Instr. Buffer 28

29 APU Chip Layout 2M bit processors or 28K vector processors runs at G Hz with up to 2 Peta OPS peak performance 29

30 APU Layout vs GPU Layout Multi-Functional, Programmable Blocks Acceleration of FP operation Blocks 3

31 EXAMPLE APPLICATIONS 3

32 K-Nearest Neighbors (k-nn) Simple example: N = 36, 3 Groups, 2 dimensions (D = 2 ) for X and Y K = 4 Group Green selected as the majority For actual applications: N = Billions, D = Tens, K = Tens of thousands 32

33 k-nn Use Case in an APU Item N Item 3 Item Item 2 Features of item Features of item 2 Features of item N Item features and label storage Q C p = Dp Q = n σ i= D p Q σn p 2 i= D i Di p Qi σi= n Q i Majority Calculation Compute cosine distances for all N in parallel ( s, assuming D=5 features) Distribute data 2 ns (to all) Computing Area K Mins at O() complexity ( 3 s) In-Place ranking With the data base in an APU, computation for all N items done in.5 ms, independent of K (X Improvement over current solutions) 33

34 K-MINS: O() Algorithm KMINS(int K, vector C){ M :=, V := ; FOR b = msb to b = lsb: D := not(c[b]); N := M & D; cnt = COUNT(N V) IF cnt > K: M := N; ELIF cnt < K: V := N V; ELSE: // cnt == K V := N V; EXIT; ENDIF ENDFOR } MSB C C C 2 LSB N 34

35 K-MINS: The Algorithm V N V N M D C[] KMINS(int K, vector C){ M :=, V := ; FOR b = msb to b = lsb: D := not(c[b]); N := M & D; cnt = COUNT(N V) IF cnt > K: M := N; ELIF cnt < K: V := N V; ELSE: // cnt == K V := N V; EXIT; ENDIF ENDFOR } cnt= 35

36 K-MINS: The Algorithm V N V N M D C[] KMINS(int K, vector C){ M :=, V := ; FOR b = msb to b = lsb: D := not(c[b]); N := M & D; cnt = COUNT(N V) IF cnt > K: M := N; ELIF cnt < K: V := N V; ELSE: // cnt == K V := N V; EXIT; ENDIF ENDFOR } cnt=8 36

37 K-MINS: The Algorithm KMINS(int K, vector C){ M :=, V := ; FOR b = msb to b = lsb: D := not(c[b]); N := M & D; cnt = COUNT(N V) IF cnt > K: M := N; ELIF cnt < K: V := N V; ELSE: // cnt == K V := N V; EXIT; ENDIF ENDFOR } final output V C O() Complexity 37

38 Similarity Search and Top-K for Recognition Image Convolution Layer Feature Extractor (Neural network) Data Base Every image/sentence/doc has a label Word/Sentence/doc Embedding Text 38

39 Dense (XN) Vector by Sparse NxM Matrix Input Vector Sparse Matrix Output Vector APU Representations and Computing Column Row Search all columns for row = 2 :Distribute -2 : 2 Cy Search all columns for row = 3 :Distribute 3 : 2 Cy Search all columns for row = 4 :Distribute - : 2 Cy Multiply in Parallel : Cy Shift and Add all belonging to same column Complexity including IO : O (N +logβ) where β is the number of nonzero elements in the sparse matrix N << M in general for recommender systems 39

40 Two NxN Sparse Matrix Multiplication Sparse In Matrix Sparse Inb-2 Matrix Output Matrix X = COL ROW In-DB In-DB2 Out-DB Choose Next free Entry from In-DB Read its Row value Search and Mark Similar Rows For all Marked Row Search where Col(In-DB) = Row(In-DB2) Broadcast selected value to Output Table bit lines Multiply in Parallel Shift and Add all belonging to same Column Update Out-DB Go Back to Step if there are more free entries Exit Complexity Including IO : O(β+logβ) Compared to O(β.7 N.2 +N 2 ) in CPU ( > X Improvement) 4

41 Softmax Used in many neural networks applications, especially for attention networks The Softmax function takes an N dimensional vector of scores and generates probabilities between to, as defined by the function Si = ezi σ N j= e Zj Where Z is the dot product between a query vector and feature vector ( for example, word emending of English vocabulary ) 4

42 The Difficulties in Softmax Computing. Dot Product for millions vectors 2. Non Linearity function (Exp) 3. Dependency : every score depends on all others in data base 4. Dynamic range: fast overflow, requires high precision calculations 5. Speed and Latency 42

43 Taylor Series e x = + x + x2 2 + x3 3! + Very Expensive, Requires more than 2 coefficients and double precision for good accuracy. 43

44 M SoftMax Performance Proprietary algorithm leverages APU s lookup capability Provide M High accuracy exact Softmax values = < 5 µsec vs - msec in GPU > 3 orders of magnitude improvement 44

45 Associative Memory for Natural Language Processing (NLP) Q&A, dialog, language translation, speech recognition etc. Requires learning past events Needs large array with attention capabilities 45

46 Examples Q&A: Dan put the book in his car,.. Long story here. Mike took Dan s car Long story here. He drove to SF Q : Where is the book now? A: Car, SF Attention Computing Language Translation: The cow ate the hay because it was delicious. The cow ate the hay because it was hungry. Source: Łukasz Kaiser 46

47 Example of Associative Attention Computing Input Data (i.e. Sentence in English for translation or for Q&A) Encoder (NN) Feature Vector Embedding Sentences Features Representation (Key) 47

48 Example of Associative Attention Computing V V2 V3 V4 V5 V6 Compute TOP K Value)... X Attention SoftMax Result Dot Product Result Next Stage (Encoder or Decoder) Query Encoder (NN) Feature Vector Dot Product Key Dot Product :O() SoftMax O() Top-K O() 48

49 Q&A : End to End Network (Weston) Source: Weston et al 49

50 GSI Associative Solution for End to End Source: Weston et al Constant time of 3 µsec per one iteration, any memory size > few orders of magnitude improvement 5

51 Associative Computing for Low-Shot Learning Gradient-Based Optimization has achieved impressive results on supervised tasks such as image classification These models need a lot of data People can learn efficiently from few examples Associative Computing Like people, can measure similarity to features stored in memory Can also create a new label for similar features in the future 5

52 Zero-Shot Learning with k-nn Input Images with labels Pixels Similar Image without label Feature Extractor by Convolution Layer Features Embedding Input features Similar Image Label Cosine Similarity Search + Top-K Extract features using any pre- trained CNN, for example VGG/Inception on ImageNet New data set is embedded using a pre-trained model and stored in memory with its label Query (test images) are input without label and their features are cosine similarity searched to predict the label 52

53 Dimension Reduction Output of convolution layer is large ( 2, features in VGG, very sparse) Simple matrix or multi-layer non-linear transformation Learned simply Loss function: Cosine distance found between any two records 2, Difference between the distance of input and output Learns to preserve the cosine distance through transform 2 Associative 53

54 Low-Shot: Train the network on distance k-nn Data Base Start with untrained network Output of network is already reduced-dimension keys for k-nn DB Train the network only to keep similar-valued keys close 54

55 Cut Short k-nn Data Base (Associative) Stop training when system starts to converge (Cut Short) Use similarity search instead of Fully Connected Requires less complete training 55

56 PROGRAMING MODEL 56

57 Programming Model Write application In Standard Host Using TensorFlow /Tesor2Tensor Frame Work Generates TensorFlow Graph for Execution in Device Memory APU Chip/Card Execute the Graph using fused Capabilities 57

58 PCIe Development Boards 4 APU Chips 8 Millions bit lines rows (processors) 8 Peta Boolean OPS 6.4 TFLOPS 2 Petabit/sec Internal IO 6-64 GB(Device memory) TensorFlow Frame-work (basic functions) GNL (GSI Numeric Lib) 58

59 FUTURE APPROACH NON VOLATILE CONCEPT 59

60 Computing in Non-Volatile Cells Select Multiple Lines for read (as NOR/NAND input) Ref = V-read The Sense Unit is Sensing Bit Line for Logic or Select or Multiple Lines for write (NOR/NAND results) Ref = V-write Write Control Generates logic or for bit line Sense Unit & Write Control Non Volatile bit cell Select REF 6

61 Solutions for Future Data Centers CPU Register File L/L2/L3 DRAM ASSOCIATIVE High endurance Full computing (floating points etc.) requires read & write : Low endurance Data search engines (read most of the time) Standard SRAM Based STT-RAM RAM Based PC-RAM Based ReRam Based Flash HDD Volatile Non Volatile Mid endurance Machine learning, malware detection detection etc., : Much more read and much less write 6

62 Summary APU enables state of the art, next-generation machine learning : In-Place from basic Boolean Algebra to complex algorithms O() Dot Produces computation O() Min/Max O() Top K O() Softmax Ultra high Internal BW.5 Peta bit/sec Up to 2 PetaOps of Boolean Algebra in a single chip Fully Scalable Fully Programmable Efficient TensorFlow based capabilities 62

63 Summary Extending Moore s Law and Leveraging Advanced Memory Technology Growth For M.Sc./Ph.D. students that would like to collaborate on research please contact me: aakerib@gsitechnology.com 63

64 Thank You! Any Questions? APU Page 64

IN-MEMORY ASSOCIATIVE COMPUTING

IN-MEMORY ASSOCIATIVE COMPUTING IN-MEMORY ASSOCIATIVE COMPUTING AVIDAN AKERIB, GSI TECHNOLOGY AAKERIB@GSITECHNOLOGY.COM AGENDA The AI computational challenge Introduction to associative computing Examples An NLP use case What s next?

More information

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu Mohsen Imani University of California San Diego Winter 2016 Technology Trend for IoT http://www.flashmemorysummit.com/english/collaterals/proceedi ngs/2014/20140807_304c_hill.pdf 2 Motivation IoT significantly

More information

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru

More information

Revolutionizing the Datacenter

Revolutionizing the Datacenter Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5

More information

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015 CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 3, 2015 Reading Quiz Today Hardware basics Machine memory models Digital signals Logic gates Circuits: Borrow some paper if

More information

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016 CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 2, 2016 Reading Quiz Today Hardware basics Machine memory models Digital signals Logic gates Circuits: Borrow some paper if

More information

CENG 4480 L09 Memory 3

CENG 4480 L09 Memory 3 CENG 4480 L09 Memory 3 Bei Yu Chapter 11 Memories Reference: CMOS VLSI Design A Circuits and Systems Perspective by H.E.Weste and D.M.Harris 1 Memory Arrays Memory Arrays Random Access Memory Serial Access

More information

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter IT 3123 Hardware and Software Concepts Notice: This session is being recorded. CPU and Memory June 11 Copyright 2005 by Bob Brown Latches Can store one bit of data Can be ganged together to store more

More information

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

M1 Computers and Data

M1 Computers and Data M1 Computers and Data Module Outline Architecture vs. Organization. Computer system and its submodules. Concept of frequency. Processor performance equation. Representation of information characters, signed

More information

Memory. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Memory. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Memory Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Building a Processor memory inst register file alu PC +4 +4 new pc offset target imm control extend =? cmp

More information

MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16

MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16 MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16 THE DATA CHALLENGE Performance Improvement (RelaLve) 4.4 ZB Total data created, replicated, and consumed in a single year

More information

Computers: Inside and Out

Computers: Inside and Out Computers: Inside and Out Computer Components To store binary information the most basic components of a computer must exist in two states State # 1 = 1 State # 2 = 0 1 Transistors Computers use transistors

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Semantic Image Search. Alex Egg

Semantic Image Search. Alex Egg Semantic Image Search Alex Egg Inspiration Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing

More information

10th August Part One: Introduction to Parallel Computing

10th August Part One: Introduction to Parallel Computing Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer

More information

Toward a Memory-centric Architecture

Toward a Memory-centric Architecture Toward a Memory-centric Architecture Martin Fink EVP & Chief Technology Officer Western Digital Corporation August 8, 2017 1 SAFE HARBOR DISCLAIMERS Forward-Looking Statements This presentation contains

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant

More information

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM Memories Overview Memory Classification Read-Only Memory (ROM) Types of ROM PROM, EPROM, E 2 PROM Flash ROMs (Compact Flash, Secure Digital, Memory Stick) Random Access Memory (RAM) Types of RAM Static

More information

Brainchip OCTOBER

Brainchip OCTOBER Brainchip OCTOBER 2017 1 Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2 Neuromorphic Computing Background Brainchip OCTOBER 2017 3 A Brief History

More information

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Charles Eckert Xiaowei Wang Jingcheng Wang Arun Subramaniyan Ravi Iyer Dennis Sylvester David Blaauw Reetuparna Das M-Bits Research

More information

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Elaina Chai December 12, 2014 1. Abstract In this paper I will discuss the feasibility of an implementation of an algorithm

More information

Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture

Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture The 51st Annual IEEE/ACM International Symposium on Microarchitecture Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture Byungchul Hong Yeonju Ro John Kim FuriosaAI Samsung

More information

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science Algorithms and Architecture William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp Algorithms What is an algorithm? A set of instructions to perform a task How do we evaluate an algorithm?

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Exploring the Structure of Data at Scale. Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019

Exploring the Structure of Data at Scale. Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019 Exploring the Structure of Data at Scale Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019 Outline Why exploration of large datasets matters Challenges in working with large data

More information

Transistor: Digital Building Blocks

Transistor: Digital Building Blocks Final Exam Review Transistor: Digital Building Blocks Logically, each transistor acts as a switch Combined to implement logic functions (gates) AND, OR, NOT Combined to build higher-level structures Multiplexer,

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 6 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Contents FPGA Technology Programmable logic Cell (PLC) Mux-based cells Look up table PLA

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer

S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer 2 100 倍以上速く 本当に可能ですか? 2 DOUGLAS ADAMS BABEL FISH Neural Machine Translation Unit 3 4 OVER 100X FASTER, IS IT REALLY POSSIBLE?

More information

Memory and Programmable Logic

Memory and Programmable Logic Memory and Programmable Logic Memory units allow us to store and/or retrieve information Essentially look-up tables Good for storing data, not for function implementation Programmable logic device (PLD),

More information

ECSE-2610 Computer Components & Operations (COCO)

ECSE-2610 Computer Components & Operations (COCO) ECSE-2610 Computer Components & Operations (COCO) Part 18: Random Access Memory 1 Read-Only Memories 2 Why ROM? Program storage Boot ROM for personal computers Complete application storage for embedded

More information

NeuroMem. A Neuromorphic Memory patented architecture. NeuroMem 1

NeuroMem. A Neuromorphic Memory patented architecture. NeuroMem 1 NeuroMem A Neuromorphic Memory patented architecture NeuroMem 1 Unique simple architecture NM bus A chain of identical neurons, no supervisor 1 neuron = memory + logic gates Context Category ted during

More information

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Computer Organization and Levels of Abstraction

Computer Organization and Levels of Abstraction Computer Organization and Levels of Abstraction Announcements Today: PS 7 Lab 8: Sound Lab tonight bring machines and headphones! PA 7 Tomorrow: Lab 9 Friday: PS8 Today (Short) Floating point review Boolean

More information

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types CSCI 4717/5717 Computer Architecture Topic: Internal Memory Details Reading: Stallings, Sections 5.1 & 5.3 Basic Organization Memory Cell Operation Represent two stable/semi-stable states representing

More information

Adrian Proctor Vice President, Marketing Viking Technology

Adrian Proctor Vice President, Marketing Viking Technology Storage PRESENTATION in the TITLE DIMM GOES HERE Socket Adrian Proctor Vice President, Marketing Viking Technology SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

Computer Organization and Assembly Language (CS-506)

Computer Organization and Assembly Language (CS-506) Computer Organization and Assembly Language (CS-506) Muhammad Zeeshan Haider Ali Lecturer ISP. Multan ali.zeeshan04@gmail.com https://zeeshanaliatisp.wordpress.com/ Lecture 2 Memory Organization and Structure

More information

Deep Learning Accelerators

Deep Learning Accelerators Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction

More information

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating

More information

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview Human Body Recognition and Tracking: How the Kinect Works Kinect RGB-D Camera Microsoft Kinect (Nov. 2010) Color video camera + laser-projected IR dot pattern + IR camera $120 (April 2012) Kinect 1.5 due

More information

Effect of memory latency

Effect of memory latency CACHE AWARENESS Effect of memory latency Consider a processor operating at 1 GHz (1 ns clock) connected to a DRAM with a latency of 100 ns. Assume that the processor has two ALU units and it is capable

More information

Persistent RNNs. (stashing recurrent weights on-chip) Gregory Diamos. April 7, Baidu SVAIL

Persistent RNNs. (stashing recurrent weights on-chip) Gregory Diamos. April 7, Baidu SVAIL (stashing recurrent weights on-chip) Baidu SVAIL April 7, 2016 SVAIL Think hard AI. Goal Develop hard AI technologies that impact 100 million users. Deep Learning at SVAIL 100 GFLOP/s 1 laptop 6 TFLOP/s

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

GPU Acceleration of Matrix Algebra. Dr. Ronald C. Young Multipath Corporation. fmslib.com

GPU Acceleration of Matrix Algebra. Dr. Ronald C. Young Multipath Corporation. fmslib.com GPU Acceleration of Matrix Algebra Dr. Ronald C. Young Multipath Corporation FMS Performance History Machine Year Flops DEC VAX 1978 97,000 FPS 164 1982 11,000,000 FPS 164-MAX 1985 341,000,000 DEC VAX

More information

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics ,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics The objectives of this module are to discuss about the need for a hierarchical memory system and also

More information

Intel s s Memory Strategy for the Wireless Phone

Intel s s Memory Strategy for the Wireless Phone Intel s s Memory Strategy for the Wireless Phone Stefan Lai VP and Co-Director, CTM Intel Corporation Nikkei Microdevices Memory Symposium January 26 th, 2005 Agenda Evolution of Memory Requirements Evolution

More information

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Unified Deep Learning with CPU, GPU, and FPGA Technologies Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition William Stallings Computer Organization and Architecture 6th Edition Chapter 5 Internal Memory 5.1 Semiconductor Main Memory 5.2 Error Correction 5.3 Advanced DRAM Organization 5.1 Semiconductor Main Memory

More information

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions

More information

The Impact of Persistent Memory and Intelligent Data Encoding

The Impact of Persistent Memory and Intelligent Data Encoding The Impact of Persistent Memory and Intelligent Data Encoding Or, How to Succeed with NVDIMMs Without Really Trying Rob Peglar SVP/CTO, Symbolic IO rpeglar@symbolicio.com @peglarr Wisdom Lower R/W Latency

More information

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1> Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building

More information

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy CSCI-UA.0201 Computer Systems Organization Memory Hierarchy Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Programmer s Wish List Memory Private Infinitely large Infinitely fast Non-volatile

More information

Computer Organization and Levels of Abstraction

Computer Organization and Levels of Abstraction Computer Organization and Levels of Abstraction Announcements PS8 Due today PS9 Due July 22 Sound Lab tonight bring machines and headphones! Binary Search Today Review of binary floating point notation

More information

STRING KERNEL TESTING ACCELERATION USING MICRON S AUTOMATA PROCESSOR

STRING KERNEL TESTING ACCELERATION USING MICRON S AUTOMATA PROCESSOR STRING KERNEL TESTING ACCELERATION USING MICRON S AUTOMATA PROCESSOR Chunkun Bo 1,2, Ke Wang 1,2, Yanjun (Jane) Qi 1, Kevin Skadron 1,2 1 Department of Computer Science 2 Center for Automata Processing

More information

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Nick Fraser (Xilinx & USydney) Yaman Umuroglu (Xilinx & NTNU) Giulio Gambardella (Xilinx)

More information

Lecture-14 (Memory Hierarchy) CS422-Spring

Lecture-14 (Memory Hierarchy) CS422-Spring Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, 1 Yong Wang, 1 Bo Yu, 1 Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A

More information

WYSE Academic Challenge Computer Fundamentals Test (State Finals)

WYSE Academic Challenge Computer Fundamentals Test (State Finals) WYSE Academic Challenge Computer Fundamentals Test (State Finals) - 1998 1. What is the decimal value for the result of the addition of the binary values: 1111 + 0101? (Assume a 4 bit, 2's complement representation.)

More information

END-TERM EXAMINATION

END-TERM EXAMINATION (Please Write your Exam Roll No. immediately) END-TERM EXAMINATION DECEMBER 2006 Exam. Roll No... Exam Series code: 100919DEC06200963 Paper Code: MCA-103 Subject: Digital Electronics Time: 3 Hours Maximum

More information

MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius

MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius What is Mixed Precision Training? Reduced precision tensor math with FP32 accumulation, FP16 storage Successfully used to train a variety

More information

7/28/ Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc.

7/28/ Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc. Technology in Action Technology in Action Chapter 9 Behind the Scenes: A Closer Look a System Hardware Chapter Topics Computer switches Binary number system Inside the CPU Cache memory Types of RAM Computer

More information

Show how to connect three Full Adders to implement a 3-bit ripple-carry adder

Show how to connect three Full Adders to implement a 3-bit ripple-carry adder Show how to connect three Full Adders to implement a 3-bit ripple-carry adder 1 Reg. A Reg. B Reg. Sum 2 Chapter 5 Computing Components Yet another layer of abstraction! Components Circuits Gates Transistors

More information

Using Machine Learning for Classification of Cancer Cells

Using Machine Learning for Classification of Cancer Cells Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.

More information

Trends in the Infrastructure of Computing

Trends in the Infrastructure of Computing Trends in the Infrastructure of Computing CSCE 9: Computing in the Modern World Dr. Jason D. Bakos My Questions How do computer processors work? Why do computer processors get faster over time? How much

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 15

CO Computer Architecture and Programming Languages CAPL. Lecture 15 CO20-320241 Computer Architecture and Programming Languages CAPL Lecture 15 Dr. Kinga Lipskoch Fall 2017 How to Compute a Binary Float Decimal fraction: 8.703125 Integral part: 8 1000 Fraction part: 0.703125

More information

Deep Learning Performance and Cost Evaluation

Deep Learning Performance and Cost Evaluation Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Don Wang, Rene Meyer, Ph.D. info@ AMAX Corporation Publish date: October 25,

More information

LECTURE 10: Improving Memory Access: Direct and Spatial caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017 3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural

More information

CS 101, Mock Computer Architecture

CS 101, Mock Computer Architecture CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically

More information

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01

More information

Enabling Technology for the Cloud and AI One Size Fits All?

Enabling Technology for the Cloud and AI One Size Fits All? Enabling Technology for the Cloud and AI One Size Fits All? Tim Horel Collaborate. Differentiate. Win. DIRECTOR, FIELD APPLICATIONS The Growing Cloud Global IP Traffic Growth 40B+ devices with intelligence

More information

Checking for duplicates Maximum density Battling computers and algorithms Barometer Instructions Big O expressions. John Edgar 2

Checking for duplicates Maximum density Battling computers and algorithms Barometer Instructions Big O expressions. John Edgar 2 CMPT 125 Checking for duplicates Maximum density Battling computers and algorithms Barometer Instructions Big O expressions John Edgar 2 Write a function to determine if an array contains duplicates int

More information

ESE: Efficient Speech Recognition Engine for Sparse LSTM on FPGA

ESE: Efficient Speech Recognition Engine for Sparse LSTM on FPGA ESE: Efficient Speech Recognition Engine for Sparse LSTM on FPGA Song Han 1,2, Junlong Kang 2, Huizi Mao 1, Yiming Hu 3, Xin Li 2, Yubin Li 2, Dongliang Xie 2, Hong Luo 2, Song Yao 2, Yu Wang 2,3, Huazhong

More information

CS/COE 0447 Example Problems for Exam 2 Spring 2011

CS/COE 0447 Example Problems for Exam 2 Spring 2011 CS/COE 0447 Example Problems for Exam 2 Spring 2011 1) Show the steps to multiply the 4-bit numbers 3 and 5 with the fast shift-add multipler. Use the table below. List the multiplicand (M) and product

More information

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal Computer Architectures for Deep Learning Ethan Dell and Daniyal Iqbal Agenda Introduction to Deep Learning Challenges Architectural Solutions Hardware Architectures CPUs GPUs Accelerators FPGAs SOCs ASICs

More information

Semiconductor Memories: RAMs and ROMs

Semiconductor Memories: RAMs and ROMs Semiconductor Memories: RAMs and ROMs Lesson Objectives: In this lesson you will be introduced to: Different memory devices like, RAM, ROM, PROM, EPROM, EEPROM, etc. Different terms like: read, write,

More information

Caches. Hiding Memory Access Times

Caches. Hiding Memory Access Times Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY

More information

Knowledge Organiser. Computing. Year 10 Term 1 Hardware

Knowledge Organiser. Computing. Year 10 Term 1 Hardware Organiser Computing Year 10 Term 1 Hardware Enquiry Question How does a computer do everything it does? Big questions that will help you answer this enquiry question: 1. What is the purpose of the CPU?

More information

Machine Learning on VMware vsphere with NVIDIA GPUs

Machine Learning on VMware vsphere with NVIDIA GPUs Machine Learning on VMware vsphere with NVIDIA GPUs Uday Kurkure, Hari Sivaraman, Lan Vu GPU Technology Conference 2017 2016 VMware Inc. All rights reserved. Gartner Hype Cycle for Emerging Technology

More information

Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit

Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM Join the Conversation #OpenPOWERSummit Moral of the Story OpenPOWER is the best platform to

More information

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance

More information

A Deep Relevance Matching Model for Ad-hoc Retrieval

A Deep Relevance Matching Model for Ad-hoc Retrieval A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese

More information

Deep Learning Performance and Cost Evaluation

Deep Learning Performance and Cost Evaluation Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Rene Meyer, Ph.D. AMAX Corporation Publish date: October 25, 2018 Abstract Introduction

More information

Grundlagen Microcontroller Memory. Günther Gridling Bettina Weiss

Grundlagen Microcontroller Memory. Günther Gridling Bettina Weiss Grundlagen Microcontroller Memory Günther Gridling Bettina Weiss 1 Lecture Overview Memory Memory Types Address Space Allocation 2 Memory Requirements What do we want to store? program constants (e.g.

More information

Rapid growth of massive datasets

Rapid growth of massive datasets Overview Rapid growth of massive datasets E.g., Online activity, Science, Sensor networks Data Distributed Clusters are Pervasive Data Distributed Computing Mature Methods for Common Problems e.g., classification,

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

HARDWARE. There are a number of factors that effect the speed of the processor. Explain how these factors affect the speed of the computer s CPU.

HARDWARE. There are a number of factors that effect the speed of the processor. Explain how these factors affect the speed of the computer s CPU. HARDWARE hardware ˈhɑːdwɛː noun [ mass noun ] the machines, wiring, and other physical components of a computer or other electronic system. select a software package that suits your requirements and buy

More information

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

Class 6 Large-Scale Image Classification

Class 6 Large-Scale Image Classification Class 6 Large-Scale Image Classification Liangliang Cao, March 7, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual

More information

Parallelism and Concurrency. COS 326 David Walker Princeton University

Parallelism and Concurrency. COS 326 David Walker Princeton University Parallelism and Concurrency COS 326 David Walker Princeton University Parallelism What is it? Today's technology trends. How can we take advantage of it? Why is it so much harder to program? Some preliminary

More information

Memory technology and optimizations ( 2.3) Main Memory

Memory technology and optimizations ( 2.3) Main Memory Memory technology and optimizations ( 2.3) 47 Main Memory Performance of Main Memory: Latency: affects Cache Miss Penalty» Access Time: time between request and word arrival» Cycle Time: minimum time between

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information