Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

Similar documents
Memory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall

Spiral 2-9. Tri-State Gates Memories DMA

Learning Outcomes. Spiral 2-9. Typical Logic Gate TRI-STATE GATES

EEM 486: Computer Architecture. Lecture 9. Memory

ECE 485/585 Microprocessor System Design

EE414 Embedded Systems Ch 5. Memory Part 2/2

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

Topic 21: Memory Technology

Topic 21: Memory Technology

The DRAM Cell. EEC 581 Computer Architecture. Memory Hierarchy Design (III) 1T1C DRAM cell

COMPUTER ARCHITECTURES

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Memories: Memory Technology

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

Concept of Memory. The memory of computer is broadly categories into two categories:

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

Chapter 8 Memory Basics

The University of Adelaide, School of Computer Science 13 September 2018

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

Design with Microprocessors

Computer Systems Laboratory Sungkyunkwan University

Introduction to memory system :from device to system

DRAM Main Memory. Dual Inline Memory Module (DIMM)

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

CENG4480 Lecture 09: Memory 1

Mainstream Computer System Components

Computer Memory. Textbook: Chapter 1

Memory Hierarchy and Caches

UMBC. Select. Read. Write. Output/Input-output connection. 1 (Feb. 25, 2002) Four commonly used memories: Address connection ... Dynamic RAM (DRAM)

CENG3420 Lecture 08: Memory Organization

The Memory Hierarchy 1

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

CS311 Lecture 21: SRAM/DRAM/FLASH

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

Memory Pearson Education, Inc., Hoboken, NJ. All rights reserved.

The Memory Hierarchy Part I

ECE 485/585 Microprocessor System Design

Large and Fast: Exploiting Memory Hierarchy

Address connections Data connections Selection connections

ENEE 759H, Spring 2005 Memory Systems: Architecture and

+1 (479)

Computer Organization. 8th Edition. Chapter 5 Internal Memory

ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems

CPE300: Digital System Architecture and Design

A+3 A+2 A+1 A. The data bus 16-bit mode is shown in the figure below: msb. Figure bit wide data on 16-bit mode data bus

chapter 8 The Memory System Chapter Objectives

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by:

Chapter 5 Internal Memory

CS152 Computer Architecture and Engineering Lecture 16: Memory System

ECE 341. Lecture # 16

Emerging DRAM Technologies

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

UMBC D 7 -D. Even bytes 0. 8 bits FFFFFC FFFFFE. location in addition to any 8-bit location. 1 (Mar. 6, 2002) SX 16-bit Memory Interface

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.

CS 261 Fall Mike Lam, Professor. Memory

Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.

! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

Lecture 5: Computing Platforms. Asbjørn Djupdal ARM Norway, IDI NTNU 2013 TDT

Computer System Components

EN1640: Design of Computing Systems Topic 06: Memory System

ECSE-2610 Computer Components & Operations (COCO)

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

Memory technology and optimizations ( 2.3) Main Memory

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

ECE 2300 Digital Logic & Computer Organization

COSC 6385 Computer Architecture - Memory Hierarchies (III)

CS429: Computer Organization and Architecture

CpE 442. Memory System

Mainstream Computer System Components

CS429: Computer Organization and Architecture

UNIT V (PROGRAMMABLE LOGIC DEVICES)

Summary of Computer Architecture

Memory. Lecture 22 CS301

Computer Structure. The Uncore. Computer Structure 2013 Uncore

COSC 6385 Computer Architecture - Memory Hierarchies (II)

ECE 152 Introduction to Computer Architecture

a) Memory management unit b) CPU c) PCI d) None of the mentioned

Application Note AN2247/D Rev. 0, 1/2002 Interfacing the MCF5307 SDRAMC to an External Master nc... Freescale Semiconductor, I Melissa Hunter TECD App

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

Introduction read-only memory random access memory

Memory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM

An introduction to SDRAM and memory controllers. 5kk73

Lecture-14 (Memory Hierarchy) CS422-Spring

Storage Technologies and the Memory Hierarchy

The Memory Component

Semiconductor Memory Classification

Memory hierarchy Outline

Memory Challenges. Issues & challenges in memory design: Cost Performance Power Scalability

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Where Have We Been? Ch. 6 Memory Technology

Transcription:

EE 352 Unit 10 Memory System Overview SRAM vs. DRAM DMA & Endian-ness

The Memory Wall Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology targets density rather than speed) Large memories yield large address decode and access times Main memory is physically located on separate chips and inter-chip signals have much higher propagation delay than intra-chip signals 55%/year Processor-Memory Performance Gap 7%/year Hennessy and Patterson, Computer Architecture A Quantitative Approach (2003) Elsevier Science

Improving Memory Performance Possibilities for improvement Technology Can we improve our transistor-level design to create faster RAMs Can we integrate memories on the same chip as our processing logic Architectural Can we organize memory in a more efficient manner (this is our focus)

DRAM & SRAM Mark Redekopp, All rights reserved

Memory Technology Static RAM (SRAM) Will retain values indefinitely (as long as power is on) Stored bit is actively remembered (driven and regenerated by circuitry) Dynamic RAM (DRAM) Will lose values if not refreshed periodically Stored bit is passively remembered and needs to be regenerated by external circuitry

Memory Array Logical View = 1D array of rows (words) Already this is 2D because each word is 32-bits (i.e. 32 columns) Physical View = 2D array of rows and columns Each row may contain 1000 s of columns (bits) though we have to access at least 8- (and often 16-, 32-, or 64-) bits at a time C8004DB2 89AB97CD AB4982FE 123489AB 1D Logical View (Each row is a single word = 32-bits) 0x000c 0x0008 0x0004 0x0000............ 0x000800 0x000400 0x000000 2D Physical View (e.g. a row is 1KB = 8Kb)

1K Word Lines 1Mx8 Memory Array Layout Start with array of cells that can each store 1-bit 1 MB = 8 Mbit = 2 23 total cells This can be broken into a 2D array of cells (2 10 x 2 13 ) Each row connects to a WL = word line which selects that row Each column connects to a BL = bit line for read/write data 8K Bit Lines BL[0] BL[8191] Cell Cell WL[0] WL[1023] Cell Cell

Row and Column For 1MB we need 20 address bits 10 upper address bits can select one of the 1024 rows Suppose we always want to read/write an 8-bits (byte), then we will group each set of 8-bits into 1024 byte columns (1024*8=8K) 10 lower address bits will select the column Mark Redekopp, All rights reserved Row Addr. A[19:10] 10-bits Addr. Decoder WL[0] 2 10 word lines WL[1023] Col. Addr. A[9:0] 10-bits 8K Bit Lines BL[0] Cell Cell BL[8191] Cell Cell Amplifiers & Column Mux in/out [7:0]

Periphery Logic decoders selects one row based on the input address number Bit lines are bidirectional lines for reading (output) or writing (input) Column multiplexers use the address bits to select the right set of 8-bits from the 8K bit lines Mark Redekopp, All rights reserved Addr. A[19:10] 10-bits Addr. Decoder WL[0] 2 10 word lines WL[1023] Addr. A[9:0] 10-bits 8K Bit Lines BL[0] Cell Cell BL[8191] Cell Cell Amplifiers & Column Mux in/out [7:0]

Memory Technologies Memory technologies share the same layout but differ in their cell implementation Static RAM (SRAM) Will retain values indefinitely (as long as power is on) When read, the stored bit is actively driven onto the bit lines Dynamic RAM (DRAM) Will lose values if not refreshed periodically When read, the stored bit passively pulls the bit line voltage up or down slightly Mark Redekopp, All rights reserved and needs to be amplified Addr. Addr. Decoder 8K Bit Lines BL[0] BL[8191] Cell Cell WL[0] WL[1023] Cell Cell Addr. Amplifiers & Column Mux in/out

SRAM Cell Each memory cell requires 6 transistors Each cell consists of a D-Latch is made from cross connected inverters which have active connections to PWR and GND. Thus, the signal is remembered and regenerated as long as power is supplied BL[0] D/Q SRAM Core Enable BL PWR BL WL WL WL DQ GND PWR DQ GND SRAM core implementation Mark Redekopp, All rights reserved

DRAM Cell Bit is stored on a capacitor and requires only 1 transistor and a capacitor WL BL Transistor acting as a switch Capacitor

DRAM Cell Bit is stored on a capacitor and requires only 1 transistor Write WL=1 connects the capacitor to the BL which is driven to 1 or 0 and charges/discharges capacitor based on the BL value WL 1 = 5V BL 1 Transistor acting as a switch Capacitor charges to 5V

DRAM Cell Bit is stored on a capacitor and requires only 1 transistor Write WL=1 connects the capacitor to the BL which is driven to 1 or 0 and charges/discharges capacitor based on the BL value With WL=0 transistor is closed and value stored on the capacitor WL BL 0 Transistor is now off trapping charge on capacitor Capacitor maintains 5V charge

DRAM Cell Bit is stored on a capacitor and requires only 1 transistor Write WL=1 connects the capacitor to the BL which is driven to 1 or 0 and charges/discharges capacitor based on the BL value Read BL is precharged to 2.5 V WL=1 connects the capacitor to the BL allowing charge on capacitor to change the voltage on the BL WL Sense Amp. BL 2.7 V 2.5 V 2.3 V Transistor acting as a switch Capacitor charge (or lack of charge) changes the BL voltage slightly Bitline is precharged to 2.5V and then capacitor charge slightly raises or lowers BL voltage Sense amp drives voltage the rest of the way to 5V or 0V

DRAM Issues Destructive Read Charge is lost when capacitor value is read Need to write whatever is read back to the cap. Leakage Current Charge slowly leaks off the cap. Value must be refreshed periodically WL BL Transistor acting as a switch Capacitor

SRAM vs. DRAM Summary SRAM Mark Redekopp, All rights reserved Faster because bit lines are actively driven by the D-Latch Faster, simpler interface due to lack of refresh Larger Area for each cell which means less memory per chip Used for cache memory where speed/latency is key DRAM Slower because passive value (charge on cap.) drives BL Slower due to refresh cycles Small area means much greater density of cells and thus large memories Used for main memory where density is key

Architectural Block Diagram Memory controller interprets system bus transactions into appropriate chip address and control signals System address and data bus sizes do not have to match chip address and data sizes Processor System System 32 64 System Bus System System Memory Controller Control Chip Chip 12 64 Chip Memory Memory Chips Chips Chips Chip Control Control Control

Memory Block Diagrams SRAM Bidirectional, shared data bus (only 1 chip can drive bus at a time) CS = Chip select (enables the chip in the likely case that multiple chips connected to bus) WE / OE = Write / Output Enable (write the data or read data to/from the given address) DRAM Bidirectional, shared data bus (only 1 chip can drive bus at a time) RAS / CAS = Row / Column address strobe. broken into two groups of bits for the row and column in the 2D memory array (This allows address bus to be approx. half as wide as an SRAM s) CS/WE/OE = Chip select & Write / Output Enable A[n-1:0] SRAM chip D[m-1:0] CS WE OE A[n-1:0] DRAM chip D[m-1:0] RAS CAS CS WE OE

Implications of Memory Technology Memory latency of a single access using current DRAM technology will be slow We must improve bandwidth Idea 1: Access more than just a single word at a time Technology: EDO, SDRAM, etc. Idea 2: Increase number of accesses serviced in parallel (in-flight accesses) Technology: Banking

Row Decoder Legacy DRAM Timing Can have only a single access in-flight at once Memory controller must send row and column address portions for each access RAS active holds the row open for reading/writing CAS active selects the column to read/write /RAS /CAS Timing Generator Memory Array /RAS /CAS Row t RC MC Bus Row Column Row Column Bus t RAC t RC = Cycle Time (110ns) = Time before next access can start t RAC =Access Time (60ns) = Time until data is valid Mark Redekopp, All rights reserved In / Out In / Out Column Legacy DRAM (Must present new Row/Column address for each access) Column Muxes in / out

Reg. Row Decoder Fast Page Mode DRAM Timing Can provide multiple column addresses with only one row address /RAS /RAS /CAS Timing Generator Memory Array /CAS Row MC Bus Row Column Column Bus In / Out In / Out Column Column Muxes Mark Redekopp, All rights reserved Fast Page Mode (Future address that fall in same row can pull data from the latched row) in / out

Reg. Row Decoder EDO DRAM Timing Similar to FPM but overlaps data i with column address i+1 /RAS /RAS /CAS Timing Generator Memory Array /CAS Row MC Bus Row Column Column Bus In / Out In / Out Column Column Muxes Mark Redekopp, All rights reserved EDO (Extended Out) Column address i+1 is sent while data i is being transferred on the bus in / out

Reg/Cntr Reg. Row Decoder Synchronous DRAM Timing Registers the column address and automatically increments it, accessing n sequential data words in n successive clocks called bursts n=4 or 8 usually) /RAS /CAS Timing Generator CLK Memory Array CLK /RAS Row /CAS MC Bus Row Column Bus i i+1 i+2 i+3 Column Column Latch/Register Column Muxes Mark Redekopp, All rights reserved SDRAM (Synchronous DRAM) Addition of clock signal. Will get up to n consecutive words in the next n clocks after column address is sent in / out

Reg/Cntr Reg. Row Decoder DDR SDRAM Timing Double data rate access data every half clock cycle /RAS /CAS Timing Generator CLK Memory Array Row Column Column Latch/Register Column Muxes Mark Redekopp, All rights reserved DDR SDRAM (Double- Rate SDRAM) Addition of clock signal. Will get up to 2n consecutive words in the next n clocks after column address is sent in / out

Banking Divide memory into banks duplicating row/column decoder and other peripheral logic to create independent memory arrays that can access data in parallel uses a portion of the address to determine which bank to access Bank 0 Bank 1 Row / Column Bank Bank 0 Bank Bank 0 Bank 2 Bank 3 Mark Redekopp, All rights reserved

Bank Access Timing Consecutive accesses to different banks can be overlapped and hide the time to access the row and select the column Consecutive accesses within a bank (to different rows) exposes the access latency CLK MC Bus Row 1 Col 1 Row 2a Col 2a Row 2b Col 2b Delay due to bank conflict Bus Mark Redekopp, All rights reserved 1 2a Access 1 maps to bank 1 while access 2a maps to bank 2 allowing parallel access. However, access 2b immediately follows and maps to bank 2 causing a delay. 2b

Memory Modules Integrate several chips on a module SIMM (Single In-line Memory Module) = single sided DIMM (Dual In-line Memory Module) = double sided 1 side is termed a rank SIMM = 1 rank DIMM = 2 ranks Example (8) 1Mx8 chips each output 8 bits to form a 64-bit data bus Bank Bank 0 Bank 0 Bank 00 Bank Bank 0 Bank 0 Bank 00 8 8 8 64 Bank Bank 0 Bank 0 Bank 00

Breakdown Assume a 1GB memory system made of 2 ranks of 64Mx8 chips (4 x 16K x 1K x 8) connected to a 64-bit (8-byte) wide data bus 0x004f1ad8 Unused Rank Row Bank Col. Unused A31,A30 A29 A28 A15 A14,A13 A12 A3 A2..A0 00 0 00 0000 1001 1110 00 1101011011 000 Chip 0 Chip 1 Chip 7 Rank 0 Bank Bank 0 Bank 0 Bank 00 Bank Bank 0 Bank 0 Bank 00 Bank Bank 0 Bank 0 Bank 00 64 8 8 8 Rank 1 Bank Bank 0 Bank 0 Bank 00 Bank Bank 0 Bank 0 Bank 00 Bank Bank 0 Bank 0 Bank 00

Breakdown Assume a 1GB memory system made of 2 ranks of 64Mx8 chips (4 x 16K x 1K x 8) connected to a 64-bit (8-byte) wide data bus 0xff05aac0 Unused Rank Row Bank Col. Unused A31,A30 A29 A28 A15 A14,A13 A12 A3 A2..A0 11 1 11 1110 0000 1011 01 0101011000 000 Chip 0 Chip 1 Chip 7 Rank 0 Bank Bank 0 Bank 0 Bank 00 Bank Bank 0 Bank 0 Bank 00 Bank Bank 0 Bank 0 Bank 00 64 8 8 8 Rank 1 Bank 1 Bank 1 Bank 1 Bank 1 Bank 1 Bank 1 Bank 1 Bank 1 Bank 1 Bank 1 Bank 1 Bank 1

Breakdown Assume 2 GB of memory on a single SIMM module with (8) 256MB each (broken into 8 banks and 8K rows) Unused Rank Row Bank Col. Unused A31 - A30-A18 A17-A15 A14-A3 A2-A0

Memory Summary Opportunities for speedup / parallelism Sequential accesses lowers latency (due to bursts of data via FPM, EDO, SDRAM techniques) Ensure accesses map to different banks and ranks to hide latency (parallel decoding and operation)

Programming Considerations For memory configuration given earlier, accesses to the same bank but different row occur on an 32KB boundary Now consider a matrix multiply of 8K x 8K integer matrices (i.e. 32KB x 32KB) In code below m2[0][0] @ 0x10010000 while m2[1][0] @ 0x10018000 Unused Rank Row Bank Col. Unused A31,A30 A29 A28 A15 A14,A13 A12 A3 A2..A0 0x10010000 0x10018000 00 0 1 0000 0000 0001 0 00 0000000000 000 00 0 1 0000 0000 0001 1 00 0000000000 000 m1 x m2 int m1[8192][8192], m2[8192][8192], result[8192][8192]; int i,j,k;... for(i=0; i < 8192; i++){ for(j=0; j < 8192; j++){ result[i][j]=0; for(k=0; k < 8192; k++){ result[i][j] += matrix1[i][k] * matrix2[k][j]; } } }

DMA & ENDIAN-NESS Mark Redekopp, All rights reserved

Direct Memory Access (DMA) Large buffers of data often need to be copied between: Memory and I/O (video data, network traffic, etc.) Memory and Memory (OS space to user app. space) DMA devices are small hardware devices that copy data from a source to destination freeing the processor to do real work CPU I/O Device (USB) DMA System Bus I/O Bridge I/O Bus Memory I/O Device (Network)

Transfer w/o DMA Without DMA, processor would have to move data using a loop Move 16Kwords pointed to by ($s1) to ($s2) li $t0,16384 AGAIN: lw $t1,0($s1) sw $t1,0($s2) addi $s1,$s1,4 addi $s2,$s2,4 subi $t0,$t0,1 bne $t0,$zero,again Processor wastes valuable execution time moving data CPU I/O Device (USB) System Bus I/O Bridge I/O Bus Memory I/O Device (Network)

Transfer w/ DMA Processor sets values in DMA control registers Source Start Dest. Start Byte Count Control & Status (Start, Stop, Interrupt on Completion, etc.) DMA becomes bus-master (controls system bus to generate reads and writes) while processor is free to execute other code Small problem: Bus will be busy Hopefully, data & code needed by the CPU will reside in the processor s cache CPU I/O Device (USB) DMA System Bus I/O Bridge I/O Bus DMA Control Registers Memory I/O Device (Network)

DMA Channel 0 DMA Channel 1 DMA Channel 2 DMA Channel 3 DMA Engines Systems usually have multiple DMA engines/channels Each can be configured to be started/controlled by the processor or by certain I/O peripherals Network or other peripherals can initiate DMA s on their behalf Bus arbiter assigns control of the bus Usually winning requestor has control of the bus until it relinquishes it (turns off its request signal) Bus Masters Requests / Grants Processor Core Bus Arbiter Internal System Bus Slave devices Memory Peripheral Peripheral

Endian-ness Endian-ness refers to the two alternate methods of ordering the bytes in a larger unit (word, long, etc.) Big-Endian PPC, Sparc MS byte is put at the starting address Little-Endian used by Intel processors / PCI bus LS byte is put at the starting address The longword value: 0 x 1 2 3 4 5 6 7 8 can be stored differently 0x00 12 0x00 78 0x01 34 0x01 56 0x02 56 0x02 34 0x03 78 0x03 12 Big-Endian Little-Endian

es increasing upward Big-endian vs. Little-endian Big-endian makes sense if you view your memory as starting at the top-left and addresses increasing as you go down Little-endian makes sense if you view your memory as starting at the bottom-right and addresses increasing as you go up 0 1 2 3 es increasing downward 000000 000004 000008 00000C 000010 000014 12345678 12345678 000014 000010 00000C 000008 000004 000000 Mark Redekopp, All rights reserved 1 2 3 4 5 6 7 8 Byte 0 Byte 1 Byte 2 Byte 3 3 2 1 0 1 2 3 4 5 6 7 8 Byte 3 Byte 2 Byte 1 Byte 0

es increasing upward Big-endian vs. Little-endian es increasing downward Issues arise when transferring data between different systems Byte-wise copy of data from big-endian system to little-endian system Major issue in networks (little-endian computer => big-endian computer) and even within a single computer (System memory => Peripheral (PCI) device) 000000 000004 000008 00000C 000010 000014 Big-Endian 12345678 0 1 2 3 MARS is LITTLE-ENDIAN Copy byte 0 to byte 0, byte 1 to byte 1, etc. Long @ 0 in big-endian system is now different that long @ 0 in little-endian system Little-Endian 78563412 000014 000010 00000C 000008 000004 000000 Mark Redekopp, All rights reserved 1 2 3 4 5 6 7 8 Byte 0 Byte 1 Byte 2 Byte 3 Long @ addr. 0 3 2 1 0 7 8 5 6 3 4 1 2 Byte 3 Byte 2 Byte 1 Byte 0