Architectural Support for Large-Scale Visual Search. Carlo C. del Mundo Vincent Lee Armin Alaghi Luis Ceze Mark Oskin

Size: px
Start display at page:

Download "Architectural Support for Large-Scale Visual Search. Carlo C. del Mundo Vincent Lee Armin Alaghi Luis Ceze Mark Oskin"

Transcription

1 Architectural Support for Large-Scale Visual Search Carlo C. del Mundo Vincent Lee Armin Alaghi Luis Ceze Mark Oskin

2 Motivation: Visual Data & Their Applications Rebooting the IT Revolution, SIA, September Visual Applications 3D Reconstruction Augmented Reality Activity Recognition Content-based Search Algorithm Across These Apps: k-nearest neighbors (knn) Overall Challenges: High movement High compute requirements Idea: Co-design the memory subsytem with search algorithms.

3 State of the art: Today s Software Pipeline Feature Extraction 1 Extract shape, appearance, motion Feature Feature Feature Feature >1000 Index Features Scale of Index Trillions x Images Billions x Videos Billions x Audio Multimedia Database Query Extract features Feature Query Generation K-nearest neighbors 2?? Distance Calculation? Recover Result Result Result Result VEctor VEctor? Calculate pairwise metric distance Response Reverse Lookup Query Candidate Nearest Neighbor s 1 Wang et al., Action Recognition by Dense Trajectories, in CVPR Muja and Lowe, Scalable Nearest Neighbor Algorithms for High Dimensional Data, in PAMI 14.

4 State of the Art: Nearest Neighbor Search on Commodity Systems DRAM DRAM DRAM DRAM Intra-node Challenge: k-nearest neighbor search is heavily memory bound. Inter-node Challenge Each query vector generates k local responses. Query Video To get the global k nearest neighbors we need an all-to-one reduction; this makes the network the bottleneck.

5 Our Vision: Systems and Architecture Support for Nearest Neighbor Search Reduce Reduce Reduce Reduce Search Our Vision: A nearest neighbor content addressable memory () to alleviate memory boundedness by moving compute closer to. Query Video Reduce Reduce Reduction Our Vision: Systems support for innetwork joins and reductions to reduce global network bandwidth and scalability.

6 Intra-Node: Rebalancing Bandwidth and Compute Rebalance Bandwidth Use binary features 2. > 1000 Servicing One Query Algorithmic Growing Pains GB of communication 150 GFLOP of computation FLOP/byte FP Operators FMUL, FSUB FADD Bit Operators XOR POPCOUNT Rebalance Compute Move compute closer to memory. Storage SSDs or DRAM Our Vision. Nearest Neighbor CAM DRAM 1 Assume 512 KB feature vectors. N = 1 billion. Assume a single query needs to evaluate 10% of N with kd-trees. Distance metric is Euclidean. Note: we assume these numbers are from FLANN. 2 Gong and Lazebnik, Iterative Quantization: A Procustean Approach to Learning Binary Codes, in CVPR 11.

7 High-Level Overview: Nearest Neighbor Content Addressable Memory Idea: Achieve high internal bandwidth by interposing logic in memory. Our Vision: A class of memories akin to CAMs for high-throughput distance calculations. Input+ System+ Binary++Data k+=+2 Parameters distance+=+h id Output {id,+distance}

8 Architectural Overview: Inside the ODT ZQ RESET# CKE A12 CK, CK# CS# RAS# CAS# WE# Row Address A[15:0] Bank Address BA[2:0] Command Decode Control Mode Registers Address Register ZQCL, ZQCS Refresh Counter (RC = 8K) BC4 OTF ZQ CAL 3 3 Row Address MUX Bank Control To ODT/output drivers Bank 8 rowaddress latch and decoder Bank 2 rowaddress latch and decoder Bank 1 rowaddress latch and decoder Bank 0 rowaddress latch and decoder write Bank 0 Memory Array (65536 x 128 x 128) Sense Amplifiers knn Engine Accelerator Bypass Bank 1 Memory Array (65536 x 128 x 128) Sense Amplifiers write knn Engine read Accelerator Bypass write Bank 2 Memory Array (65536 x 128 x 128) Sense Amplifiers knn Engine read Accelerator Bypass write Bank 8 Memory Array (65536 x 128 x 128) Sense Amplifiers knn Engine read Accelerator Bypass r [31:0] Output Buffer ODT Control rvalid rready Engine Address EA[ADDRESS_BITS-1 :0] Engine Data w [31:0] Engine Operation cmd [2:0] Engine Control read Engine Read MUX I/O Gating DM Mask I/O Gating DM Mask I/O Gating DM Mask I/O Gating DM Mask

9 Programming Model: Integrating the in DRAMs DRAM Command Interface Programming API/Library Driver Layer Command Interface 1. additions are orthogonal to existing DRAM infrastructure. 2. We introduce a memory mapped configuration and control. 3. We introduce a command set which allows for configuration. DRAM Control Control Command Explanation DRAM Array and Controllers READ WRITE Reads the result out from the priority queues Writes initial query vector to accelerator memory DRAM Read/Write Interface Acceleration ENABLE DISABLE Signals to process read from DRAM Signals to ignore any from the DRAM DRAM Output Interface Systems Stack Output Interface RESET Reinitializes priority queues in preparation for next execution Command Set

10 Related Works: What came before the Software-based solutions Libraries [Arya, Mount, JACM 1998] ANN: Approximate Nearest Neighbors Library [Muja, Lowe, PAMI 2014] FLANN: Fast Library for Approximate Nearest Neighbors Algorithms [Datar et al., SCG 04] Locality-sensitive Hashing [Y. Weiss et al., NIPS 08] Spectral Hashing [Torralba et al., CVPR 08] Small codes for Recognition [Gong, Lazebnik, CVPR 11] Iterative Quantization Hardware-based solutions Nearest-Neighbors [Roberts, MS Thesis 90] PCAM: Proximity Content-Addressable Memory [Garcia et al., CVPRW 08] Nearest Neighbors on GPU In-memory computing [Patterson et al., IEEE MICRO 97] A Case for Intelligent RAM [D.G. Elliott, D&T 99] Computational RAM [Guo et al., MICRO 11] TCAMs for Data-Intensive Computing

11 Current Work: What We ve Done & Plan To Do Application Design & Characterization. Design & Implementation of VIRAL: VIdeo and Image Retrieval At Large-Scale.

12 Current Work: What We ve Done & Plan To Do Hardware Design. ASIC, Simulation, Design & Test. knn Engine Priority Shift Register Write Interface Sample Data Query Buffer Euclidean Distance Unit Index Buffer Accumulator knn Query Lane Priority Queue Shift Register Read Interface raddr > Memory Controller Write Interface Address Demux Data Arbiter knn Query Lane knn Query Lane knn Query Lane... knn Query Lane Read Interface Input Data > > > r knn Engine

13 Summary: Enabling visual applications via co-design Our Vision: Architectural support to enable high dimensional & high cardinality nearest neighbor search. Input+ System+ Binary++Data k+=+2 Parameters distance+=+l id Output {id,+distance} Our work enables visual applications such as: 3D Reconstruction Content-based Search Activity Recognition

Large-scale visual recognition Efficient matching

Large-scale visual recognition Efficient matching Large-scale visual recognition Efficient matching Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!!

More information

CLSH: Cluster-based Locality-Sensitive Hashing

CLSH: Cluster-based Locality-Sensitive Hashing CLSH: Cluster-based Locality-Sensitive Hashing Xiangyang Xu Tongwei Ren Gangshan Wu Multimedia Computing Group, State Key Laboratory for Novel Software Technology, Nanjing University xiangyang.xu@smail.nju.edu.cn

More information

Datasheet. Zetta 4Gbit DDR3L SDRAM. Features VDD=VDDQ=1.35V / V. Fully differential clock inputs (CK, CK ) operation

Datasheet. Zetta 4Gbit DDR3L SDRAM. Features VDD=VDDQ=1.35V / V. Fully differential clock inputs (CK, CK ) operation Zetta Datasheet Features VDD=VDDQ=1.35V + 0.100 / - 0.067V Fully differential clock inputs (CK, CK ) operation Differential Data Strobe (DQS, DQS ) On chip DLL align DQ, DQS and DQS transition with CK

More information

Searching in one billion vectors: re-rank with source coding

Searching in one billion vectors: re-rank with source coding Searching in one billion vectors: re-rank with source coding Hervé Jégou INRIA / IRISA Romain Tavenard Univ. Rennes / IRISA Laurent Amsaleg CNRS / IRISA Matthijs Douze INRIA / LJK ICASSP May 2011 LARGE

More information

SDRAM DDR3 512MX8 ½ Density Device Technical Note

SDRAM DDR3 512MX8 ½ Density Device Technical Note SDRAM DDR3 512MX8 ½ Density Device Technical Note Introduction This technical note provides an overview of how the PRN256M8V90BG8RGF-125 DDR3 SDRAM device operates and is configured as a 2Gb device. Addressing

More information

Approximate Nearest Neighbor Search. Deng Cai Zhejiang University

Approximate Nearest Neighbor Search. Deng Cai Zhejiang University Approximate Nearest Neighbor Search Deng Cai Zhejiang University The Era of Big Data How to Find Things Quickly? Web 1.0 Text Search Sparse feature Inverted Index How to Find Things Quickly? Web 2.0, 3.0

More information

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu Mohsen Imani University of California San Diego Winter 2016 Technology Trend for IoT http://www.flashmemorysummit.com/english/collaterals/proceedi ngs/2014/20140807_304c_hill.pdf 2 Motivation IoT significantly

More information

SDRAM DDR3 512MX8 ½ Density Device Technical Note

SDRAM DDR3 512MX8 ½ Density Device Technical Note SDRAM DDR3 512MX8 ½ Density Device Technical Note Introduction This technical note provides an overview of how the XAA512M8V90BG8RGF-SSWO and SSW1 DDR3 SDRAM device is configured and tested as a 2Gb device.

More information

AC-DIMM: Associative Computing with STT-MRAM

AC-DIMM: Associative Computing with STT-MRAM AC-DIMM: Associative Computing with STT-MRAM Qing Guo, Xiaochen Guo, Ravi Patel Engin Ipek, Eby G. Friedman University of Rochester Published In: ISCA-2013 Motivation Prevalent Trends in Modern Computing:

More information

DRAM Main Memory. Dual Inline Memory Module (DIMM)

DRAM Main Memory. Dual Inline Memory Module (DIMM) DRAM Main Memory Dual Inline Memory Module (DIMM) Memory Technology Main memory serves as input and output to I/O interfaces and the processor. DRAMs for main memory, SRAM for caches Metrics: Latency,

More information

A Systems View of Large- Scale 3D Reconstruction

A Systems View of Large- Scale 3D Reconstruction Lecture 23: A Systems View of Large- Scale 3D Reconstruction Visual Computing Systems Goals and motivation Construct a detailed 3D model of the world from unstructured photographs (e.g., Flickr, Facebook)

More information

Large Scale 3D Reconstruction by Structure from Motion

Large Scale 3D Reconstruction by Structure from Motion Large Scale 3D Reconstruction by Structure from Motion Devin Guillory Ziang Xie CS 331B 7 October 2013 Overview Rome wasn t built in a day Overview of SfM Building Rome in a Day Building Rome on a Cloudless

More information

The University of Adelaide, School of Computer Science 13 September 2018

The University of Adelaide, School of Computer Science 13 September 2018 Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Features. DDR3 Registered DIMM Spec Sheet

Features. DDR3 Registered DIMM Spec Sheet Features DDR3 functionality and operations supported as defined in the component data sheet 240-pin, Registered Dual In-line Memory Module (RDIMM) Fast data transfer rates: PC3-8500, PC3-10600, PC3-12800

More information

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013 Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest

More information

EECS150 - Digital Design Lecture 17 Memory 2

EECS150 - Digital Design Lecture 17 Memory 2 EECS150 - Digital Design Lecture 17 Memory 2 October 22, 2002 John Wawrzynek Fall 2002 EECS150 Lec17-mem2 Page 1 SDRAM Recap General Characteristics Optimized for high density and therefore low cost/bit

More information

Application Codesign of Near-Data Processing for Similarity Search

Application Codesign of Near-Data Processing for Similarity Search Application Codesign of Near-Data for Similarity Search Vincent T. Lee, Amrita Mazumdar, Carlo C. del Mundo, Armin Alaghi, Luis Ceze, Mark Oskin University of Washington {vlee2, amrita, cdel, armin, luisceze,

More information

SDRAM DDR3 256MX8 ½ Density Device Technical Note

SDRAM DDR3 256MX8 ½ Density Device Technical Note SDRAM DDR3 256MX8 ½ Density Device Technical Note Introduction This technical note provides an overview of how the SGG128M8V79DG8GQF-15E DDR3 SDRAM device is configured and tested as a 1Gb device. This

More information

arxiv: v2 [cs.dc] 10 Jul 2017

arxiv: v2 [cs.dc] 10 Jul 2017 Application-Driven Near-Data for Similarity Search Vincent T. Lee, Amrita Mazumdar, Carlo C. del Mundo, Armin Alaghi, Luis Ceze, Mark Oskin University of Washington {vlee2, amrita, cdel, armin, luisceze,

More information

Implementing Ultra Low Latency Data Center Services with Programmable Logic

Implementing Ultra Low Latency Data Center Services with Programmable Logic Implementing Ultra Low Latency Data Center Services with Programmable Logic John W. Lockwood, CEO: Algo-Logic Systems, Inc. http://algo-logic.com Solutions@Algo-Logic.com (408) 707-3740 2255-D Martin Ave.,

More information

IN-MEMORY ASSOCIATIVE COMPUTING

IN-MEMORY ASSOCIATIVE COMPUTING IN-MEMORY ASSOCIATIVE COMPUTING AVIDAN AKERIB, GSI TECHNOLOGY AAKERIB@GSITECHNOLOGY.COM AGENDA The AI computational challenge Introduction to associative computing Examples An NLP use case What s next?

More information

Large scale object/scene recognition

Large scale object/scene recognition Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database

More information

Design and Implementation of High Performance Application Specific Memory

Design and Implementation of High Performance Application Specific Memory Design and Implementation of High Performance Application Specific Memory - 고성능 Application Specific Memory 의설계와구현 - M.S. Thesis Sungdae Choi Dec. 20th, 2002 Outline Introduction Memory for Mobile 3D Graphics

More information

Chapter 6 (Lect 3) Counters Continued. Unused States Ring counter. Implementing with Registers Implementing with Counter and Decoder

Chapter 6 (Lect 3) Counters Continued. Unused States Ring counter. Implementing with Registers Implementing with Counter and Decoder Chapter 6 (Lect 3) Counters Continued Unused States Ring counter Implementing with Registers Implementing with Counter and Decoder Sequential Logic and Unused States Not all states need to be used Can

More information

ECE 485/585 Microprocessor System Design

ECE 485/585 Microprocessor System Design Microprocessor System Design Lecture 4: Memory Hierarchy Memory Taxonomy SRAM Basics Memory Organization DRAM Basics Zeshan Chishti Electrical and Computer Engineering Dept Maseeh College of Engineering

More information

A comparative study of k-nearest neighbour techniques in crowd simulation

A comparative study of k-nearest neighbour techniques in crowd simulation A comparative study of k-nearest neighbour techniques in crowd simulation Jordi Vermeulen Arne Hillebrand Roland Geraerts Department of Information and Computing Sciences Utrecht University, The Netherlands

More information

Isometric Mapping Hashing

Isometric Mapping Hashing Isometric Mapping Hashing Yanzhen Liu, Xiao Bai, Haichuan Yang, Zhou Jun, and Zhihong Zhang Springer-Verlag, Computer Science Editorial, Tiergartenstr. 7, 692 Heidelberg, Germany {alfred.hofmann,ursula.barth,ingrid.haas,frank.holzwarth,

More information

Lecture 13: k-means and mean-shift clustering

Lecture 13: k-means and mean-shift clustering Lecture 13: k-means and mean-shift clustering Juan Carlos Niebles Stanford AI Lab Professor Stanford Vision Lab Lecture 13-1 Recap: Image Segmentation Goal: identify groups of pixels that go together Lecture

More information

DeepIndex for Accurate and Efficient Image Retrieval

DeepIndex for Accurate and Efficient Image Retrieval DeepIndex for Accurate and Efficient Image Retrieval Yu Liu, Yanming Guo, Song Wu, Michael S. Lew Media Lab, Leiden Institute of Advance Computer Science Outline Motivation Proposed Approach Results Conclusions

More information

ECE 152 Introduction to Computer Architecture

ECE 152 Introduction to Computer Architecture Introduction to Computer Architecture Main Memory and Virtual Memory Copyright 2009 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) Spring 2009 1 Where We Are in This Course

More information

Single Pass Connected Components Analysis

Single Pass Connected Components Analysis D. G. Bailey, C. T. Johnston, Single Pass Connected Components Analysis, Proceedings of Image and Vision Computing New Zealand 007, pp. 8 87, Hamilton, New Zealand, December 007. Single Pass Connected

More information

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES Greg Hankins APRICOT 2012 2012 Brocade Communications Systems, Inc. 2012/02/28 Lookup Capacity and Forwarding

More information

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

ECEN 449 Microprocessor System Design. Memories. Texas A&M University ECEN 449 Microprocessor System Design Memories 1 Objectives of this Lecture Unit Learn about different types of memories SRAM/DRAM/CAM Flash 2 SRAM Static Random Access Memory 3 SRAM Static Random Access

More information

Lecture 11: Packet forwarding

Lecture 11: Packet forwarding Lecture 11: Packet forwarding Anirudh Sivaraman 2017/10/23 This week we ll talk about the data plane. Recall that the routing layer broadly consists of two parts: (1) the control plane that computes routes

More information

Embedded Systems: Hardware Components (part I) Todor Stefanov

Embedded Systems: Hardware Components (part I) Todor Stefanov Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System

More information

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness EE 352 Unit 10 Memory System Overview SRAM vs. DRAM DMA & Endian-ness The Memory Wall Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology

More information

Infineon HYB39S128160CT M SDRAM Circuit Analysis

Infineon HYB39S128160CT M SDRAM Circuit Analysis September 8, 2004 Infineon HYB39S128160CT-7.5 128M SDRAM Circuit Analysis Table of Contents Introduction... Page 1 List of Figures... Page 2 Device Summary Sheet... Page 13 Chip Description... Page 16

More information

Fast Nearest Neighbor Search in the Hamming Space

Fast Nearest Neighbor Search in the Hamming Space Fast Nearest Neighbor Search in the Hamming Space Zhansheng Jiang 1(B), Lingxi Xie 2, Xiaotie Deng 1,WeiweiXu 3, and Jingdong Wang 4 1 Shanghai Jiao Tong University, Shanghai, People s Republic of China

More information

Scalable Nearest Neighbor Algorithms for High Dimensional Data Marius Muja (UBC), David G. Lowe (Google) IEEE 2014

Scalable Nearest Neighbor Algorithms for High Dimensional Data Marius Muja (UBC), David G. Lowe (Google) IEEE 2014 Scalable Nearest Neighbor Algorithms for High Dimensional Data Marius Muja (UBC), David G. Lowe (Google) IEEE 2014 Presenter: Derrick Blakely Department of Computer Science, University of Virginia https://qdata.github.io/deep2read/

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

ENEE 759H, Spring 2005 Memory Systems: Architecture and

ENEE 759H, Spring 2005 Memory Systems: Architecture and SLIDE, Memory Systems: DRAM Device Circuits and Architecture Credit where credit is due: Slides contain original artwork ( Jacob, Wang 005) Overview Processor Processor System Controller Memory Controller

More information

Features. DDR3 UDIMM w/o ECC Product Specification. Rev. 14 Dec. 2011

Features. DDR3 UDIMM w/o ECC Product Specification. Rev. 14 Dec. 2011 Features DDR3 functionality and operations supported as defined in the component data sheet 240pin, unbuffered dual in-line memory module (UDIMM) Fast data transfer rates: PC3-8500, PC3-10600, PC3-12800

More information

Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search

Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search Dror Aiger, Efi Kokiopoulou, Ehud Rivlin Google Inc. aigerd@google.com, kokiopou@google.com, ehud@google.com Abstract

More information

The Memory Hierarchy Part I

The Memory Hierarchy Part I Chapter 6 The Memory Hierarchy Part I The slides of Part I are taken in large part from V. Heuring & H. Jordan, Computer Systems esign and Architecture 1997. 1 Outline: Memory components: RAM memory cells

More information

A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality Video

A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality Video A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality Video Amrita Mazumdar Armin Alaghi Jonathan T. Barron David Gallup Luis Ceze Mark Oskin Steven M. Seitz University of Washington Google

More information

Features. DDR2 UDIMM with ECC Product Specification. Rev. 1.2 Aug. 2011

Features. DDR2 UDIMM with ECC Product Specification. Rev. 1.2 Aug. 2011 Features 240pin, unbuffered dual in-line memory module (UDIMM) Error Check Correction (ECC) Support Fast data transfer rates: PC2-4200, PC3-5300, PC3-6400 Single or Dual rank 512MB (64Meg x 72), 1GB(128

More information

Technical Note Designing for High-Density DDR2 Memory

Technical Note Designing for High-Density DDR2 Memory Technical Note Designing for High-Density DDR2 Memory TN-47-16: Designing for High-Density DDR2 Memory Introduction Introduction DDR2 memory supports an extensive assortment of options for the system-level

More information

Chapter 8 Memory Basics

Chapter 8 Memory Basics Logic and Computer Design Fundamentals Chapter 8 Memory Basics Charles Kime & Thomas Kaminski 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Overview Memory definitions Random Access

More information

Lecture-14 (Memory Hierarchy) CS422-Spring

Lecture-14 (Memory Hierarchy) CS422-Spring Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect

More information

Features. DDR2 UDIMM w/o ECC Product Specification. Rev. 1.1 Aug. 2011

Features. DDR2 UDIMM w/o ECC Product Specification. Rev. 1.1 Aug. 2011 Features 240pin, unbuffered dual in-line memory module (UDIMM) Fast data transfer rates: PC2-4200, PC3-5300, PC3-6400 Single or Dual rank 512MB (64Meg x 64), 1GB(128 Meg x 64), 2GB (256 Meg x 64) JEDEC

More information

ECE 250 / CS250 Introduction to Computer Architecture

ECE 250 / CS250 Introduction to Computer Architecture ECE 250 / CS250 Introduction to Computer Architecture Main Memory Benjamin C. Lee Duke University Slides from Daniel Sorin (Duke) and are derived from work by Amir Roth (Penn) and Alvy Lebeck (Duke) 1

More information

Mobile LPDDR (only) 152-Ball Package-on-Package (PoP) TI-OMAP MT46HxxxMxxLxCG MT46HxxxMxxLxKZ

Mobile LPDDR (only) 152-Ball Package-on-Package (PoP) TI-OMAP MT46HxxxMxxLxCG MT46HxxxMxxLxKZ Mobile LPDDR (only) 152-Ball Package-on-Package (PoP) TI-OMAP MT6HxxxMxxLxCG MT6HxxxMxxLxKZ 152-Ball x Mobile LPDDR (only) PoP (TI-OMAP) Features Features / = 1.70 1.95V Bidirectional data strobe per byte

More information

Application Note AN2247/D Rev. 0, 1/2002 Interfacing the MCF5307 SDRAMC to an External Master nc... Freescale Semiconductor, I Melissa Hunter TECD App

Application Note AN2247/D Rev. 0, 1/2002 Interfacing the MCF5307 SDRAMC to an External Master nc... Freescale Semiconductor, I Melissa Hunter TECD App Application Note AN2247/D Rev. 0, 1/2002 Interfacing the MCF5307 SDRAMC to an External Master Melissa Hunter TECD Applications This application note discusses the issues involved in designing external

More information

Novel Hardware Architecture for Fast Address Lookups

Novel Hardware Architecture for Fast Address Lookups Novel Hardware Architecture for Fast Address Lookups Pronita Mehrotra Paul D. Franzon Department of Electrical and Computer Engineering North Carolina State University {pmehrot,paulf}@eos.ncsu.edu This

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS

More information

Large-Scale Face Manifold Learning

Large-Scale Face Manifold Learning Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random

More information

1.35V DDR3L SDRAM SODIMM

1.35V DDR3L SDRAM SODIMM 1.35V DDR3L SDRAM SODIMM MT4KTF12864HZ 1GB MT4KTF25664HZ 2GB 1GB, 2GB (x64, SR) 204-Pin 1.35V DDR3L SODIMM Features Features DDR3L functionality and operations supported as defined in the component data

More information

Introduction to Video Compression

Introduction to Video Compression Insight, Analysis, and Advice on Signal Processing Technology Introduction to Video Compression Jeff Bier Berkeley Design Technology, Inc. info@bdti.com http://www.bdti.com Outline Motivation and scope

More information

PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 INTRODUCTION In centralized database: Data is located in one place (one server) All DBMS functionalities are done by that server

More information

Organization Row Address Column Address Bank Address Auto Precharge 256Mx4 (1GB) based module A0-A13 A0-A9 BA0-BA2 A10

Organization Row Address Column Address Bank Address Auto Precharge 256Mx4 (1GB) based module A0-A13 A0-A9 BA0-BA2 A10 GENERAL DESCRIPTION The Gigaram GR2DR4BD-E4GBXXXVLP is a 512M bit x 72 DDDR2 SDRAM high density ECC REGISTERED DIMM. The GR2DR4BD-E4GBXXXVLP consists of eighteen CMOS 512M x 4 STACKED DDR2 SDRAMs for 4GB

More information

LE4ASS21PEH 16GB Unbuffered 2048Mx64 DDR4 SO-DIMM 1.2V Up to PC CL

LE4ASS21PEH 16GB Unbuffered 2048Mx64 DDR4 SO-DIMM 1.2V Up to PC CL LE4ASS21PEH 16GB Unbuffered 2048Mx64 DDR4 SO-DIMM 1.2V Up to PC4-2133 CL 15-15-15 General Description This Legacy device is a JEDEC standard unbuffered SO-DIMM module, based on CMOS DDR4 SDRAM technology,

More information

PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory

PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory Scalable and Energy-Efficient Architecture Lab (SEAL) PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in -based Main Memory Ping Chi *, Shuangchen Li *, Tao Zhang, Cong

More information

Automotive DDR3L-RS SDRAM

Automotive DDR3L-RS SDRAM Automotive DDR3L-RS SDRAM MT4KG4 28 Meg x 4 x 8 banks MT4K52M8 64 Meg x 8 x 8 banks MT4K256M6 32 Meg x 6 x 8 banks 4Gb: x8, x6 Automotive DDR3L-RS SDRAM Description Description The.35R3L-RS SDRAM device

More information

UMBC D 7 -D. Even bytes 0. 8 bits FFFFFC FFFFFE. location in addition to any 8-bit location. 1 (Mar. 6, 2002) SX 16-bit Memory Interface

UMBC D 7 -D. Even bytes 0. 8 bits FFFFFC FFFFFE. location in addition to any 8-bit location. 1 (Mar. 6, 2002) SX 16-bit Memory Interface 8086-80386SX 16-bit Memory Interface These machines differ from the 8088/80188 in several ways: The data bus is 16-bits wide. The IO/M pin is replaced with M/IO (8086/80186) and MRDC and MWTC for 80286

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM

Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM Low-Cost Inter-Linked ubarrays (LIA) Enabling Fast Inter-ubarray Data Movement in DRAM Kevin Chang rashant Nair, Donghyuk Lee, augata Ghose, Moinuddin Qureshi, and Onur Mutlu roblem: Inefficient Bulk Data

More information

1.35V DDR3L SDRAM UDIMM

1.35V DDR3L SDRAM UDIMM 1.35V DDR3L SDRAM UDIMM MT8KTF12864AZ 1GB MT8KTF25664AZ 2GB MT8KTF51264AZ 4GB 1GB, 2GB, 4GB (x64, SR) 240-Pin 1.35V DDR3L UDIMM Features Features DDR3L functionality and operations supported as defined

More information

Memory Challenges. Issues & challenges in memory design: Cost Performance Power Scalability

Memory Challenges. Issues & challenges in memory design: Cost Performance Power Scalability Memory Devices 1 Memory Challenges Issues & challenges in memory design: Cost Performance Power Scalability 2 Memory - Overview Definitions: RAM random access memory DRAM dynamic RAM SRAM static RAM Volatile

More information

EE414 Embedded Systems Ch 5. Memory Part 2/2

EE414 Embedded Systems Ch 5. Memory Part 2/2 EE414 Embedded Systems Ch 5. Memory Part 2/2 Byung Kook Kim School of Electrical Engineering Korea Advanced Institute of Science and Technology Overview 6.1 introduction 6.2 Memory Write Ability and Storage

More information

6.819 / 6.869: Advances in Computer Vision

6.819 / 6.869: Advances in Computer Vision 6.819 / 6.869: Advances in Computer Vision Image Retrieval: Retrieval: Information, images, objects, large-scale Website: http://6.869.csail.mit.edu/fa15/ Instructor: Yusuf Aytar Lecture TR 9:30AM 11:00AM

More information

Addressing the Memory Wall

Addressing the Memory Wall Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the

More information

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Main Memory. Readings

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Main Memory. Readings Introduction to Computer Architecture Main Memory and Virtual Memory Copyright 2012 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) Spring 2012 Where We Are in This Course

More information

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels Kai Guo, Prakash Ishwar, and Janusz Konrad Department of Electrical & Computer Engineering Motivation

More information

Lecture: k-means & mean-shift clustering

Lecture: k-means & mean-shift clustering Lecture: k-means & mean-shift clustering Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap: Image Segmentation Goal: identify groups of pixels that go together 2 Recap: Gestalt

More information

Lecture: k-means & mean-shift clustering

Lecture: k-means & mean-shift clustering Lecture: k-means & mean-shift clustering Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap: Image Segmentation Goal: identify groups of pixels that go together

More information

DDR2 SDRAM UDIMM MT8HTF12864AZ 1GB

DDR2 SDRAM UDIMM MT8HTF12864AZ 1GB Features DDR2 SDRAM UDIMM MT8HTF12864AZ 1GB For component data sheets, refer to Micron's Web site: www.micron.com Figure 1: 240-Pin UDIMM (MO-237 R/C D) Features 240-pin, unbuffered dual in-line memory

More information

DDR2 SDRAM UDIMM MT16HTF25664AZ 2GB MT16HTF51264AZ 4GB For component data sheets, refer to Micron s Web site:

DDR2 SDRAM UDIMM MT16HTF25664AZ 2GB MT16HTF51264AZ 4GB For component data sheets, refer to Micron s Web site: DDR2 SDRAM UDIMM MT16HTF25664AZ 2GB MT16HTF51264AZ 4GB For component data sheets, refer to Micron s Web site: www.micron.com 2GB, 4GB (x64, DR): 240-Pin DDR2 SDRAM UDIMM Features Features 240-pin, unbuffered

More information

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, Christos Kozyrakis Stanford University Platform Lab Review Feb 2017 Deep Neural

More information

DDR3 SDRAM UDIMM MT8JTF12864AZ 1GB MT8JTF25664AZ 2GB MT8JTF51264AZ 4GB. Features. 1GB, 2GB, 4GB (x64, SR) 240-Pin DDR3 UDIMM.

DDR3 SDRAM UDIMM MT8JTF12864AZ 1GB MT8JTF25664AZ 2GB MT8JTF51264AZ 4GB. Features. 1GB, 2GB, 4GB (x64, SR) 240-Pin DDR3 UDIMM. DDR3 SDRAM UDIMM MT8JTF12864AZ 1GB MT8JTF25664AZ 2GB MT8JTF51264AZ 4GB 1GB, 2GB, 4GB (x64, SR) 240-Pin DDR3 UDIMM Features Features DDR3 functionality and operations supported as defined in the component

More information

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5

More information

1.35V DDR3L SDRAM SODIMM

1.35V DDR3L SDRAM SODIMM 1.35V DDR3L SDRAM SODIMM MT8KTF12864HZ 1GB MT8KTF25664HZ 2GB MT8KTF51264HZ 4GB 1GB, 2GB, 4GB (x64, SR) 204-Pin 1.35V DDR3L SODIMM Features Features DDR3L functionality and operations supported as defined

More information

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3) Lecture 15: DRAM Main Memory Systems Today: DRAM basics and innovations (Section 2.3) 1 Memory Architecture Processor Memory Controller Address/Cmd Bank Row Buffer DIMM Data DIMM: a PCB with DRAM chips

More information

Supervised Hashing for Image Retrieval via Image Representation Learning

Supervised Hashing for Image Retrieval via Image Representation Learning Supervised Hashing for Image Retrieval via Image Representation Learning Rongkai Xia, Yan Pan, Cong Liu (Sun Yat-Sen University) Hanjiang Lai, Shuicheng Yan (National University of Singapore) Finding Similar

More information

Lecture 13: Memory and Programmable Logic

Lecture 13: Memory and Programmable Logic Lecture 13: Memory and Programmable Logic Syed M. Mahmud, Ph.D ECE Department Wayne State University Aby K George, ECE Department, Wayne State University Contents Introduction Random Access Memory Memory

More information

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017 CS 2750: Machine Learning Clustering Prof. Adriana Kovashka University of Pittsburgh January 17, 2017 What is clustering? Grouping items that belong together (i.e. have similar features) Unsupervised:

More information

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (5 th Week)

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (5 th Week) + (Advanced) Computer Organization & Architechture Prof. Dr. Hasan Hüseyin BALIK (5 th Week) + Outline 2. The computer system 2.1 A Top-Level View of Computer Function and Interconnection 2.2 Cache Memory

More information

Random Access Memory (RAM)

Random Access Memory (RAM) Random Access Memory (RAM) EED2003 Digital Design Dr. Ahmet ÖZKURT Dr. Hakkı YALAZAN 1 Overview Memory is a collection of storage cells with associated input and output circuitry Possible to read and write

More information

APPENDIX: DETAILS ABOUT THE DISTANCE TRANSFORM

APPENDIX: DETAILS ABOUT THE DISTANCE TRANSFORM APPENDIX: DETAILS ABOUT THE DISTANCE TRANSFORM To speed up the closest-point distance computation, 3D Euclidean Distance Transform (DT) can be used in the proposed method. A DT is a uniform discretization

More information

DDR3 SDRAM UDIMM MT8JTF12864AZ 1GB MT8JTF25664AZ 2GB. Features. 1GB, 2GB (x64, SR) 240-Pin DDR3 SDRAM UDIMM. Features

DDR3 SDRAM UDIMM MT8JTF12864AZ 1GB MT8JTF25664AZ 2GB. Features. 1GB, 2GB (x64, SR) 240-Pin DDR3 SDRAM UDIMM. Features DDR3 SDRAM UDIMM MT8JTF12864AZ 1GB MT8JTF25664AZ 2GB 1GB, 2GB (x64, SR) 240-Pin DDR3 SDRAM UDIMM Features Features DDR3 functionality and operations supported as defined in the component data sheet 240-pin,

More information

An introduction to SDRAM and memory controllers. 5kk73

An introduction to SDRAM and memory controllers. 5kk73 An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part

More information

Michael Adler 2017/05

Michael Adler 2017/05 Michael Adler 2017/05 FPGA Design Philosophy Standard platform code should: Provide base services and semantics needed by all applications Consume as little FPGA area as possible FPGAs overcome a huge

More information

Memory and Programmable Logic

Memory and Programmable Logic Memory and Programmable Logic Memory units allow us to store and/or retrieve information Essentially look-up tables Good for storing data, not for function implementation Programmable logic device (PLD),

More information

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling

More information

M2 Outline. Memory Hierarchy Cache Blocking Cache Aware Programming SRAM, DRAM Virtual Memory Virtual Machines Non-volatile Memory, Persistent NVM

M2 Outline. Memory Hierarchy Cache Blocking Cache Aware Programming SRAM, DRAM Virtual Memory Virtual Machines Non-volatile Memory, Persistent NVM M2 Memory Systems M2 Outline Memory Hierarchy Cache Blocking Cache Aware Programming SRAM, DRAM Virtual Memory Virtual Machines Non-volatile Memory, Persistent NVM Memory Technology Memory MemoryArrays

More information

SC64G1A08. DDR3-1600F(CL7) 240-Pin XMP(ver 2.0) U-DIMM 1GB (128M x 64-bits)

SC64G1A08. DDR3-1600F(CL7) 240-Pin XMP(ver 2.0) U-DIMM 1GB (128M x 64-bits) SC64G1A08 DDR3-1600F(CL7) 240-Pin XMP(ver 2.0) U-DIMM 1GB (128M x 64-bits) General Description The ADATA s SC64G1A08 is a 128Mx64 bits 1GB(1024MB) DDR3-1600(CL7) SDRAM XMP (ver 2.0) memory module, The

More information

Memory Supplement for Section 3.6 of the textbook

Memory Supplement for Section 3.6 of the textbook The most basic -bit memory is the SR-latch with consists of two cross-coupled NOR gates. R Recall the NOR gate truth table: A S B (A + B) The S stands for Set to remember, and the R for Reset to remember.

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Cache 11232011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Memory Components/Boards Two-Level Memory Hierarchy

More information

Nearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017

Nearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017 Nearest neighbors Focus on tree-based methods Clément Jamin, GUDHI project, Inria March 2017 Introduction Exact and approximate nearest neighbor search Essential tool for many applications Huge bibliography

More information

Staged Memory Scheduling

Staged Memory Scheduling Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:

More information