EECS150 - Digital Design Lecture 12 - Video Interfacing. MIPS150 Video Subsystem

Similar documents
EECS150 - Digital Design Lecture 15 - Video

EECS150 - Digital Design Lecture 15 - Project Description, Part 5

EECS150 - Digital Design Lecture 18 Line Drawing Engine. Announcements

Laboratory Exercise 8

EECS150 - Digital Design Lecture 08 - Project Introduction Part 1

MAZE A SIMPLE GPU WITH EMBEDDED VIDEO GAME. Designed by:

UCB EECS150 Spring UCB EECS150 Spring 2010, Lab Lecture #5

EECS150 - Digital Design Lecture 16 - Memory

CS475m - Computer Graphics. Lecture 1 : Rasterization Basics

Processor design - MIPS

From Vertices to Fragments: Rasterization. Reading Assignment: Chapter 7. Special memory where pixel colors are stored.

Line Rasterization and Antialiasing

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14

ECE 30 Introduction to Computer Engineering

Laboratory Exercise 5

EECS150 - Digital Design Lecture 13 - Project Description, Part 2: Memory Blocks. Project Overview

EECS150 - Digital Design Lecture 17 Memory 2

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

Line Drawing Week 6, Lecture 9

EECS150 - Digital Design Lecture 16 Memory 1

Digital Differential Analyzer Bresenhams Line Drawing Algorithm

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements

Scan Conversion. CMP 477 Computer Graphics S. A. Arekete

EECS150 - Digital Design Lecture 09 - Parallelism

Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O. Announcements

Computer Graphics : Bresenham Line Drawing Algorithm, Circle Drawing & Polygon Filling

EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:

CS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure

White Paper The Need for a High-Bandwidth Memory Architecture in Programmable Logic Devices

From Ver(ces to Fragments: Rasteriza(on

The Nios II Family of Configurable Soft-core Processors

Output Primitives Lecture: 3. Lecture 3. Output Primitives. Assuming we have a raster display, a picture is completely specified by:

Announcements. Midterms graded back at the end of class Help session on Assignment 3 for last ~20 minutes of class. Computer Graphics

Output Primitives. Dr. S.M. Malaek. Assistant: M. Younesi

COSC 6385 Computer Architecture - Memory Hierarchies (III)

MIPS Assembly Programming

COMP30019 Graphics and Interaction Scan Converting Polygons and Lines

CPSC / Scan Conversion

EECS150 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3

CS 4731: Computer Graphics Lecture 21: Raster Graphics: Drawing Lines. Emmanuel Agu

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B

Rasterization, Depth Sorting and Culling

Chapter - 2: Geometry and Line Generations

EECS 151/251A Spring 2019 Digital Design and Integrated Circuits. Instructor: John Wawrzynek. Lecture 18 EE141

ECE 2035 Programming HW/SW Systems Fall problems, 7 pages Exam Two 23 October 2013

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

COSC 6385 Computer Architecture - Memory Hierarchies (II)

Computer Graphics: Graphics Output Primitives Line Drawing Algorithms

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

Lec 25: Parallel Processors. Announcements

Memory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2)

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

OpenGL Graphics System. 2D Graphics Primitives. Drawing 2D Graphics Primitives. 2D Graphics Primitives. Mathematical 2D Primitives.

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

2000 N + N <100N. When is: Find m to minimize: (N) m. N log 2 C 1. m + C 3 + C 2. ESE534: Computer Organization. Previously. Today.

08 - Address Generator Unit (AGU)

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

ECE 437 Computer Architecture and Organization Lab 6: Programming RAM and ROM Due: Thursday, November 3

Lecture 25: Busses. A Typical Computer Organization

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

MIPS Programming. A basic rule is: try to be mechanical (that is, don't be "tricky") when you translate high-level code into assembler code.

MaanavaN.Com CS1202 COMPUTER ARCHITECHTURE

Computer Graphics: Line Drawing Algorithms

Advanced Computer Architecture

LECTURE 10. Pipelining: Advanced ILP

ECE2049-E18 Lecture 6 Notes 1. ECE2049: Embedded Computing in Engineering Design E Term Lecture #6: Exam Review

Rasterization: Geometric Primitives

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

Lecture 7: Examples, MARS, Arithmetic

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

200 points total. Start early! Update March 27: Problem 2 updated, Problem 8 is now a study problem.

CS 101, Mock Computer Architecture

LECTURE 10: Improving Memory Access: Direct and Spatial caches

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University

PowerPC 740 and 750

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

SPART. SPART Design. A Special Purpose Asynchronous Receiver/Transmitter. The objectives of this miniproject are to:

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University

Memory latency: Affects cache miss penalty. Measured by:

1 Introduction to Graphics


Reduced Instruction Set Computer (RISC)

Rasterization, or What is glbegin(gl_lines) really doing?

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality

Scan Converting Lines

UC Berkeley CS61C : Machine Structures

Question?! Processor comparison!

Chapter 5. Internal Memory. Yonsei University

LECTURE 5: MEMORY HIERARCHY DESIGN

EECS150 - Digital Design Lecture 20 - Finite State Machines Revisited

Transcription:

EECS15 - Digital Design Lecture 12 - Video Interfacing Feb 28, 213 John Wawrzynek Spring 213 EECS15 - Lec12-video Page 1 MIPS15 Video Subsystem Gives software ability to display information on screen. Equivalent to standard graphics cards: Processor can directly write the display bit map 2D Graphics acceleration Spring 213 EECS15 - Lec12-video Page 2

HW/SW Interface A range of memory addresses correspond to the display. CPU writes (using sw instruction) pixel values to change display. No synchronization required. Independent process reads pixels from memory and sends them to the display interface at the required rate. CPU address map xffffffff Ex: 124 pixels/line X 768 lines (,) x83ffffc x8 (123, 767) Display Origin: Increasing X values to the right. Increasing Y values down. Spring 213 EECS15 - Lec12-video Page 3 Implementation like a simple dual-ported memory. Two independent processes access frame: CPU writes pixel locations. Could be in random order, e.g. drawing an object, or sequentially, e.g. clearing the screen. Video Interface continuously reads pixel locations in scan-line order and sends to physical display. How big is this memory and how do we implement it? For us: 124 x 768 pixels/frame x 24 bits/pixel Spring 213 EECS15 - Lec12-video Page 4

Memory Mapped MIPS address map xffffffff x823fffd x8 124 pixels/line X 768 lines (,) (123, 767) Display Origin: Increasing X values to the right. Increasing Y values down. 124 * 768 = 786,432 pixels We choose 24 bits/pixel { Red[7:] ; Green[7:] ; Blue[7:] } 786,432 * 3 = 2,359,296 Bytes Total memory bandwidth needed to support frame? Spring 213 EECS15 - Lec12-video Page 5 Buffer Implementation Which XUP memory resource to use? Memory Capacity Summary: LUT RAM Block RAM External SRAM External DRAM DRAM bandwidth: Spring 213 EECS15 - Lec12-video Page 6

Details 768 lines, 124 pixels/line MIPS address map xffffffff x824 x8 124 x 768 locations = 786,432 pixel locations. 4K 4K 4K 4K XUP DRAM memory capacity: 256 MBytes (in external DRAM). With Byte addressed memory, best to use 4 Bytes/ pixel Starting each line on a multiple of 4K leads to independent X and Y address: {Y[21:12] ; X[11:2]} Y == row number, X == pixel in row Spring 213 EECS15 - Lec12-video Page 7 Buffer Physical Interface Processor Side: provides a memory mapped programming interface to video display. CPU DRAM Hub : arbitrates among multiple DRAM users. DRAM Controller / Hub FPGA Video Interface Video Interface Block: accepts pixel values from FB, streams pixels values and control signals to More generally, how does physical software device. interface to I/O devices? Spring 213 EECS15 - Lec12-video Page 8

DVI connector: accommodates analog and digital formats Physical Video Interface DVI Transmitter Chip, Chrontel 731C. Implements standard signaling voltage levels for video monitors. Digital to analog conversion for analog display formats. Spring 213 EECS15 - Lec12-video Page 9 Details 29 One pixel value per memory location. MIPS address map xffffffff x83ffffc x8 768 lines, pixels/line. = 786,432 memory locations Virtex-5 LX11T memory capacity: 5,328 Kbits (in block RAMs). (5,328 X 124 bits) / 786432 = 6.9 bits/pixel max! We choose 4 bits/pixel Note, that with only 4 bits/pixel, we could assign more than one pixel per memory location. Ruled out by us, as it complicated software. Spring 213 EECS15 - Lec12-video Page 1

Color Map 4 bits per pixel, allows software to assign each screen location, one of 16 different colors. However, physical display interface uses 8 bits / pixel-color. Therefore entire pallet is 2 24 colors. Color Map converts 4 bit pixel values to 24 bit colors. 24 bits pixel value from frame 16 entries R G B. R G B R G B R G B pixel color to video interface Color map is memory mapped to CPU address space, so software can set the color table. Addresses: x84_ x84_3c, one 24-bit entry per memory address. Spring 213 EECS15 - Lec12-video Page 11 Memory Mapped 21 A range of memory addresses correspond to the display. CPU writes (using sw instruction) pixel values to change display. No handshaking required. Independent process reads pixels from memory and sends them to the display interface at the required rate. MIPS address map xffffffff x81d4bfc x8 8 pixels/line X 6 lines (,) (8, 6) Display Origin: Increasing X values to the right. Increasing Y values down. 8Mbits / 48 = 17.5 bits/pixel max! We choose 16 bits/pixel { Red[4:] ; Green[5:] ; Blue[4:] } Spring 213 EECS15 - Lec12-video Page 12

xffffffff Details 21 MIPS address map x83ffffc x8 124 x 768 locations 6 lines, 8 pixels/line Note, that we assign only one 16 bit pixel per memory location. XUP SRAM memory capacity: ~8 Mbits (in external SRAMs). Two pixel address map to one address in the SRAM (it is 32bits wide). Only part of the mapped memory range occupied with physical memory. Spring 213 EECS15 - Lec12-video Page 13. = 48, memory locations Starting each line on a multiple of leads to independent X and Y address: {Y[9:] ; X[9:]} Y == row number, X == pixel in row XUP Board External SRAM ZBT synchronous SRAM, 9 Mb on 32-bit data bus, with four parity bits 256K x 36 bits (located under the removable LCD) *ZBT (ZBT stands for zero bus turnaround) the turnaround is the number of clock cycles it takes to change access to the SRAM from write to read and vice versa. The turnaround for ZBT SRAMs or the latency More generally, how does software between read and write cycle is Spring 213 EECS15 - Lec12-video zero. Page 14 interface to I/O devices?

MIPS15 Video Subsystem Gives software ability to display information on screen. Equivalent to standard graphics cards: Processor can directly write the display bit map 2D Graphics acceleration Spring 213 EECS15 - Lec12-video Page 15 Graphics Software Clearing the screen - fill the entire screen with same color Remember base address: x8_ Size: 124 x 768 clear: # a holds 4-bit pixel color # t holds the pixel pointer ori $t, $, x8 # top half of frame address sll $t, $t, 16 # form frame beginning address # t2 holds the frame max address ori $t2, $, 768 # 768 rows sll $t2, $t2, 12 # * pixels/row * 4 Bytes/address addu $t2, $t2, $t # form ending address addiu $t2, $t2, -4 # minus one word address! # # the write loop L: sw $a, ($t) # write the pixel bneq $t, $t2, L # loop until done addiu $t, $t, 4 # bump pointer! jr $ra How long does this take? What do we need to know to answer? How does this compare to the frame rate? Spring 213 EECS15 - Lec12-video Page 16

Optimized Clear Routine clear:. Amortizing the loop overhead... # the write loop L: sw $a, ($t) # write some pixels sw $a, 4($t) sw $a, 8($t) sw $a, 12($t) sw $a, 16($t) sw $a, 2($t) sw $a, 24($t) sw $a, 28($t) sw $a, 32($t) sw $a, 36($t) sw $a, 4($t) sw $a, 44($t) sw $a, 48($t) sw $a, 52($t) sw $a, 56($t) sw $a, 6($t) bneq $t, $t2, L # loop until done addiu $t, $t, 64 # bump pointer jr $ra What s the performance of this one? Spring 213 EECS15 - Lec12-video Page 17 Line Drawing 1 2 3 4 5 6 7 8 9 1 11 12 1 2 3 4 5 6 7 For each x value, could compute y, with: then round to the nearest integer y value. From (x,y) to (x1,y1) Line equation defines all the points: Slope can be precomputed, but still requires floating point * and + in the loop: slow or expensive! Spring 213 EECS15 - Lec12-video Page 18

Bresenham Line Drawing Algorithm Developed by Jack E. Bresenham in 1962 at IBM. "I was working in the computation lab at IBM's San Jose development lab. A Calcomp plotter had been attached to an IBM 141 via the 147 typewriter console.... Computers of the day, slow at complex arithmetic operations, such as multiply, especially on floating point numbers. Bresenham s algorithm works with integers and without multiply or divide. Simplicity makes it appropriate for inexpensive hardware implementation. With extension, can be used for drawing circles. Spring 213 EECS15 - Lec12-video Page 19 Line Drawing Algorithm This version assumes: x < x1, y < y1, slope =< 45 degrees function line(x, x1, y, y1) int deltax := x1 - x 1 2 3 4 5 6 7 8 9 1 11 12 int deltay := y1 - y 1 int error := deltax / 2 2 int y := y 3 for x from x to x1 4 plot(x,y) 5 error := error - deltay 6 if error < then 7 y := y + 1 error := error + deltax Note: error starts at deltax/2 and gets decremented by deltay for each x, y gets incremented when error goes negative, therefore y gets incremented at a rate proportional to deltax/deltay. Spring 213 EECS15 - Lec12-video Page 2

Line Drawing, Examples 1 2 3 4 5 6 7 8 9 1 11 12 1 2 3 4 5 6 7 deltay = 1 (very low slope). y only gets incremented once (halfway between x and x1) 1 2 3 4 5 6 7 8 9 1 11 12 1 2 3 4 5 6 7 deltay = deltax (45 degrees, max slope). y gets incremented for every x Spring 213 EECS15 - Lec12-video Page 21 Line Drawing Example 1 2 3 4 5 6 7 8 9 1 11 12 1 2 3 4 5 6 7 function line(x, x1, y, y1) int deltax := x1 - x int deltay := y1 - y int error := deltax / 2 int y := y for x from x to x1 plot(x,y) error := error - deltay if error < then y := y + 1 error := error + deltax (1,1) -> (11,5) deltax = 1, deltay = 4, error = 1/2 = 5, y = 1 x = 1: plot(1,1) error = 5-4 = 1 x = 2: plot(2,1) error = 1-4 = -3 y = 1 + 1 = 2 error = -3 + 1 = 7 x = 3: plot(3,2) error = 7-4 = 3 x = 4: plot(4,2) error = 3-4 = -1 y = 2 + 1 = 3 error = -1 + 1 = 9 x = 5: plot(5,3) error = 9-4 = 5 x = 6: plot(6,3) error = 5-4 = 1 x = 7: plot(7,3) error = 1-4 = -3 y = 3 + 1 = 4 error = -3 + 1 -= 7 Spring 213 EECS15 - Lec12-video Page 22

#define SWAP(x, y) (x ^= y ^= x ^= y) #define ABS(x) (((x)<)? -(x) : (x)) C Version void line(int x, int y, int x1, int y1) { char steep = (ABS(y1 - y) > ABS(x1 - x))? 1 : ; if (steep) { SWAP(x, y); SWAP(x1, y1); Modified to work in any quadrant and for any slope. } if (x > x1) { SWAP(x, x1); SWAP(y, y1); } int deltax = x1 - x; Estimate software int deltay = ABS(y1 - y); int error = deltax / 2; performance (MIPS version) int ystep; int y = y int x; ystep = (y < y1)? 1 : -1; What s needed to do it in for (x = x; x <= x1; x++) { if (steep) hardware? plot(y,x); else plot(x,y); Goal is one pixel per cycle. error = error - deltay; if (error < ) { y += ystep; Pipelining might be necessary. error += deltax; } } } Spring 213 EECS15 - Lec12-video Page 23 Hardware Implementation Notes Read-only control register Write-only trigger registers Write-only non-trigger registers x84_64: x84_6: x84_5c: x84_58: x84_54: x84_5: x84_4c: x84_48: x84_44: x84_4: 32 ready color y1 x1 y x y1 x1 y x 1 CPU initializes line engine by sending pair of points and color value to use. Writes to trigger registers initiate line engine. has one write port - Shared by CPU and line engine. Priority Spring 213 to CPU - Line engine EECS15 stalls - Lec12-video when CPU writes. Page 24