Microprocessor System Design Lecture 17: Serial Buses USB Disks and other I/O Zeshan Chishti Electrical and Computer Engineering Dept. Maseeh College of Engineering and Computer Science Source: Lecture based on materials provided by Mark F.
Bus Trends Parallel to Serial Lower Voltage Can t accomplish large voltage swings at high speed Power consumption Differential Signaling Clock Forwarding and Clock Embedding Encoding Line codes: 8b/10b
Why Serial? Parallel Serial + - Device A Device B Device A + - Device B 10 bidirectional wires at 250Mbps pair unidirectional wires at 2500Mbps (2.5Gbps)
Traditional Parallel Bus Device A Device B Used in low to medium 100 MHz range Issues Board trace length mismatch effects skew at a device Clock skew across devices Faster data rate squeezes eye t SU CLK t H EYE
Source Synchronous Bus Device A Device B Used in 200MHz to 1.6GHz range Clock signal is forwarded with data Design impact: Board layout track length mismatch still adds to skew Eliminates skew error term caused by clock domain skew Allows faster cycle times than parallel
Source Synchronous - Clock Forward Clock Data 1 Data 2 Data 3 Data 4 Clock is transmitted continuously from Tx to Rx Removes clock distribution skew Examples HyperTransport Parallel RapidIO
Embedded Clock Data Clock Clock signal is embedded with data Edge density guaranteed by encoding scheme Examples Edge density?...well, you have to have enough transitions to recover the clock PCI Express USB Serial RapidIO Infiniband Clock signal embedded with data
SerDes (Serial/Deserializer) Serialize Tx 10 x 250Mbps Low Speed Parallel Data 1 x 2.5Gbps High Speed Serial Data Deserialize Rx
SerDes - Parallel and Serial Conversion 10-bit Parallel Interface Serializer TxP TxN SysClk XmitClk Rx Parallel Interface 10-bit Clock Recovery and Data Deserializer RxP RxN Recovered Clock
Differential Signaling Differential, point to point Complementary signals transmitted Receiver detects voltage difference between lines Low amplitudes (200mV - 400mV typical), high speeds Good noise immunity Pair routed together noise cancels out EX: LVDS Low Voltage Differential Signaling 3.125 Gbps, +/- 350mV Gbps at mws -- High speed & low power consumption FibreChannel, Gigabit Ethernet, HDMI, DVI
Clock Recovery for Embedded Clock Data Clock Problem: When clock signal is embedded with data, need enough transitions to recover the clock from the data (high edge density) Solution: Line codes Use codes which ensure that there are always enough transitions (0->1 and 1->0) in short amount of time, so that the clock signal can be easily recovered from the embedded clock/data signal Other advantage: avoid DC bias Clock signal embedded with data
Ensuring Edge Density: m-of-n codes Some 8-bit code words have too few 1s (or 0s) to ensure edge density sufficient to recover clock 8b10 encoding (developed by IBM in 1983) Use a subset of 10-bit code words having a balanced number of 0s and 1s Benefits Ensure edge density Avoid DC bias at receiver from imbalance 8 bit byte 10 bit code Table lookup Device A Device B Parallel Data TX FIFO 8B/10B Encoder Serializer + _ Deserializer 8B/10B Decoder RX FIFO Parallel Data Parallel Data RX FIFO 8B/10B Decoder Deserializer + _ Serializer 8B/10B Encoder TX FIFO Parallel Data
8b/10b Transmission Code http://en.wikipedia.org/wiki/8b/10b_encoding Name 8-bit binary 10-bit Encode D0.0 00000 000 0100 100111 D0.1 00000 001 0100 011101 D0.2 00000 010 0100 101101 D0.3 00000 011 1011 110001 Reasons for using 8B/10B encoding/decoding Guarantees transition density to ensure correct PLL operation Error correction to detect signaling errors Ensures signal is DC balanced no DC offset develops over time Support of special characters that can be used as delimiters for control, such as sync or framing, or other generalized commands
USB (Universal Serial Bus)
USB Universal Serial Bus Need for new peripheral bus Inexpensive Capable of supporting wide range of devices Simple (no IRQ contention, etc) Hot-pluggable (plug and play) USB introduced 1995 Compaq, Digital, IBM, Intel, Microsoft, NEC, Northern Telecom, later 25 companies USB 2.0 Finalized April 2000 Fully backward compatible implications! USB 1.X USB 2.0 LS Low Speed 1.5 Mbps FS Full Speed 12 Mbps HS High Speed 480 Mbps Low speed devices: keyboard, mice Higher speed devices: scanners disks, CD/RW, flash storage
USB Features No arbitration! Everything controlled by host (master) Point-to-point with host as one point No device-to-device communication USB devices don t use I/O or memory address space Memory used for buffers, descriptors Devices have addresses but not from I/O or memory space Error detection and correction CRC (16-bit for data packets, 5-bit for others) ACK Resend bad packets
USB Flexible Peripheral Interface Tree Topology Point-to-point links Hubs (7 peripherals/hub) 127 devices maximum 5m maximum cable length Serial Interface Low power devices can be powered through cable 100mA 500mA if permitted by host Isochronous Device can request guaranteed bandwidth Mission Critical apps. Audio, streaming video
Tree Topology Host Controller Generates transactions Hubs Controls power to its ports Enables/disables ports Recognizes devices attached ports Reports status events associated with ports Devices when requested by host
Layers and Logical Communication
Endpoints Drivers communicate with device endpoints All devices must support endpoint 0 (control) Except for endpoint 0 all endpoints unidirectional Devices can implement multiple endpoints Ex: CD/ROM Endpoint for control Endpoint for writing data Endpoint for reading data Endpoint for reading audio Isochronous
USB Device Descriptors Each device attached to the USB describes itself using a hierarchy of descriptors device descriptor (1) specifies number of supported configurations, default communication pipe, general info configuration descriptor (1/configuration) specifies info on configuration and number of interfaces interface descriptor (1/interface) device class, number of endpoints, etc endpoint descriptor (1/endpoint) each descriptor defines a point of communication incl. max transfer rate, transfer type, etc. string descriptor human readable information about device, configuration, interface (Unicode characters) class-specific product specific info or requirements not covered by USB standard
USB Enumeration USB devices are enumerated at power up and when first attached Enumeration is carried out by the host software and the root hub and consists of (roughly) the following steps: Host configures root hub and requests that power be applied to all ports Root hub enables power to each port and detects presence of device and downstream hubs Root resets all of the devices it has detected forces every USB device to respond as address 0 so root hub can read each device s descriptor Root hub assigns unique device address to each USB device Host software probes each device to determine if the endpoints can be accommodated w/ the remaining bandwidth and power. If insufficient power or bandwidth is available that device configuration is not allowed on the USB there may be other configurations offered by the USB device if no configurations work the device is not allowed on the USB If acceptable configuration is found the device is fully enabled in that configuration and the client software is notified that device is installed
USB Transactions Typical USB transaction comprises 3 phases: Token packet phase identifies type of transaction, device address and whether there are additional phases Data packet phase carries payload (<= 1023 bytes) Handshake packet phase status of transaction (ex: errors?) All transfers except isochronous guarantee data delivery Packet format: Synchronization data (seven 0 s followed by single 1) Packet ID purpose and content of packet (Token, Data, Handshake or Special + inverted ID for check) CRC and end of packet indicator
Packets Token packets Sent at beginning of transaction to specify target endpoint address Define type of transaction Data packets SOF (Start of Frame) IN (transfer from target USB device to system) OUT (transfer from system to target USB device) SETUP (start of control transfer) For data payload (follow token packets) Handshake packets Sent by device to acknowledge receipt of data ACK/NAK/NYET/STALL Special packets Pre-amble packets to enable low-speed ports prior to sending lowspeed packets
USB Devices Share Frames
Transactions Transaction consists of multiple packets Token/Data/ACK Must complete in (micro) frame USB Client Driver Initiates transfers Using IRP (I/O Request Packet) USB Driver Breaks IRPs into transactions Achievable in single frame Transaction descriptors created Host Controller Driver Schedules transaction descriptors Using knowledge of devices Allocates to frames USB Host Controller Traverses transaction descriptors Serializes data Creates USB packets
Frames and mframes 1ms 125ms Low Speed Mode 1ms frames 1,500 bits /frame 187.5 KB/s Full Speed Mode 1ms frames 12,000 bits/frame 1.5 MB/s High Speed Mode (USB 2.0) 125ms frames 60,000 bits/mframe 60 MB/s
Disk Drives and other I/O
Computer Memory Hierarchy: Disks Processor Datapath Control Registers On-Chip Cache Second Level Cache (SRAM) Third Level Cache (SRAM) Main Memory (DRAM) Secondary Storage (Disk) Tertiary Storage (Tape) Intermediate results Cached DRAM Instructions File System Data Paging [Cached Files] Archive Backup
Typical Disk Drive A device called a head reads and writes data in a hard drive by manipulating the magnetic medium that composes the surface of an associated disk platter. Naturally, a platter has 2 surfaces; therefore there are 2 heads per platter Cylinder refers to stack of tracks on all surfaces at same distance from spindle Sectors are fixed-length (typically standardized on 512 bytes)* Need addressing scheme to uniquely identify sectors * This is moving (has moved) to 4KB sectors
Disk Geometry Cylinder/Head/Sector Addressing (CHS) Limitations due to PC BIOS bit field sizes Operating System 1024 cylinders, 256 heads, 64 sectors Cylinders, sectors numbered from 1 Heads numbered from 0 2 10 x 2 8 x 2 6 x 2 9 = 2 33 = 8 GB Logical Block Addressing (LBA) Sectors just numbered consecutively from 0 Drive maps from LBA to physical location on disk Optimization Bad sector mgmt. Caching Disk Drive
Issues Fragmentation Internal Fragmentation Files unlikely to be exact multiples of sector size, so there will be unused space in last sector of every file (natural consequence of fixed size sectors) External Fragmentation Over time, with additions and deletions, there will be gaps in allocated sectors. These must be managed by the file system and unused sectors allocated to new files (often allocated multiple sectors at a time (blocks)). Bad block handling Increasing density, lower signal-to-noise ratio Increase in SER (soft error rate) SER ~ 1 in 105 bits or about 1 in 24 sectors - ECC handles most of these For those which can t be corrected, drive maintains internal tables which mark them as bad and never uses them Drive still presents a contiguous LBA address space to the user Two techniques Sector Slipping - If sector is bad, find next non-defective sequential sector to be that LBA Sector Sparing - Reserve some spare sectors at one or more locations on the disk
Key Disk Performance Parameters Seek time Time required to move head to the desired track Depends on where the head is currently (less for adjacent or nearby tracks) 5ms to 15ms typical Rotational Latency Time required for sector to rotate under head 1/rotational speed of disk Average is ½ rotational latency 5,400 to 12,000 RPM typical Transfer Time Time to transfer a sector 20 to 160 MB/s typical Controller Time (Controller Overhead)
Disk Performance Average access time Seek time + rotational latency + transfer time + controller time Fair to assume that rotation latency = 1/(½ * rotational speed) Controller time is often negligible (compared to seek time and rotation speed) Buffers DRAM Write buffer allows actual disk write to occur later Read buffer necessary because transfer rate varies Pre-fetching Anticipate reads beyond current sector Read and cache sectors during rotational latency Exploit spatial locality No additional latency for sectors in same track Caching (on disk and in OS) Command Queuing and Scheduling
RAID Redundant Array of Inexpensive (Independent) Drives/Disks Idea: Use many smaller, cheaper disks to improve I/O operations/sec and data transfer rates Weakness: Lower reliability (more disks more failures) Solution: Add redundancy Important: RAID is concerned about entire disk failure(s) not with errors within a sector (ECC on disk) Ken Ouichi, IBM (1978) applied parity scheme to disks UC Berkeley RAID paper (1988) proposed taxonomy and described RAID 1 through RAID 5 Widely adopted (and extended) by industry
RAID striping
RAID Striping (block level) No redundancy Mirroring Requires 2x disks Striping (bit or byte level) Parity Inexpensive (1 redundant disk) Read all disks to recover Issue: All reads/writes access all disks
RAID Small reads (block or smaller) require only one disk read Small reads can occur in parallel Small writes more complicated But optimization possible instead of reading all disks and two writes read two disks and two writes Problem: contention for disk with parity
RAID Stripe parity blocks across disks Writes proceed in parallel if parity on different disks If two drives fail previous schemes fail Two (striped) parity checks Row and diagonal