EECS150 - Digital Design Lecture 15 - Video

Similar documents
EECS150 - Digital Design Lecture 12 - Video Interfacing. MIPS150 Video Subsystem

EECS150 - Digital Design Lecture 15 - Project Description, Part 5

EECS150 - Digital Design Lecture 18 Line Drawing Engine. Announcements

Laboratory Exercise 8

EECS150 - Digital Design Lecture 08 - Project Introduction Part 1

MAZE A SIMPLE GPU WITH EMBEDDED VIDEO GAME. Designed by:

CS475m - Computer Graphics. Lecture 1 : Rasterization Basics

UCB EECS150 Spring UCB EECS150 Spring 2010, Lab Lecture #5

EECS150 - Digital Design Lecture 16 - Memory

From Vertices to Fragments: Rasterization. Reading Assignment: Chapter 7. Special memory where pixel colors are stored.

Processor design - MIPS

Line Rasterization and Antialiasing

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14

ECE 30 Introduction to Computer Engineering

Laboratory Exercise 5

EECS150 - Digital Design Lecture 13 - Project Description, Part 2: Memory Blocks. Project Overview

EECS150 - Digital Design Lecture 17 Memory 2

EECS150 - Digital Design Lecture 16 Memory 1

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

Line Drawing Week 6, Lecture 9

Digital Differential Analyzer Bresenhams Line Drawing Algorithm

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements

Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996

Scan Conversion. CMP 477 Computer Graphics S. A. Arekete

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O. Announcements

Computer Graphics : Bresenham Line Drawing Algorithm, Circle Drawing & Polygon Filling

EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:

CS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure

EECS150 - Digital Design Lecture 09 - Parallelism

Computer Graphics: Graphics Output Primitives Line Drawing Algorithms

White Paper The Need for a High-Bandwidth Memory Architecture in Programmable Logic Devices

From Ver(ces to Fragments: Rasteriza(on

The Nios II Family of Configurable Soft-core Processors

08 - Address Generator Unit (AGU)

Output Primitives Lecture: 3. Lecture 3. Output Primitives. Assuming we have a raster display, a picture is completely specified by:

Announcements. Midterms graded back at the end of class Help session on Assignment 3 for last ~20 minutes of class. Computer Graphics

Output Primitives. Dr. S.M. Malaek. Assistant: M. Younesi

COSC 6385 Computer Architecture - Memory Hierarchies (III)

COMP30019 Graphics and Interaction Scan Converting Polygons and Lines

MIPS Assembly Programming

CPSC / Scan Conversion

EECS150 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3

CS 4731: Computer Graphics Lecture 21: Raster Graphics: Drawing Lines. Emmanuel Agu

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B

Rasterization, Depth Sorting and Culling

Chapter - 2: Geometry and Line Generations

200 points total. Start early! Update March 27: Problem 2 updated, Problem 8 is now a study problem.

EECS 151/251A Spring 2019 Digital Design and Integrated Circuits. Instructor: John Wawrzynek. Lecture 18 EE141

ECE 2035 Programming HW/SW Systems Fall problems, 7 pages Exam Two 23 October 2013

COSC 6385 Computer Architecture - Memory Hierarchies (II)

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

Lec 25: Parallel Processors. Announcements

Memory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall

ECE 571 Advanced Microprocessor-Based Design Lecture 18

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

OpenGL Graphics System. 2D Graphics Primitives. Drawing 2D Graphics Primitives. 2D Graphics Primitives. Mathematical 2D Primitives.

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2)

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

ECE 571 Advanced Microprocessor-Based Design Lecture 20

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

2000 N + N <100N. When is: Find m to minimize: (N) m. N log 2 C 1. m + C 3 + C 2. ESE534: Computer Organization. Previously. Today.

ECE 437 Computer Architecture and Organization Lab 6: Programming RAM and ROM Due: Thursday, November 3

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

Lecture 25: Busses. A Typical Computer Organization

MaanavaN.Com CS1202 COMPUTER ARCHITECHTURE

MIPS Programming. A basic rule is: try to be mechanical (that is, don't be "tricky") when you translate high-level code into assembler code.

Computer Graphics: Line Drawing Algorithms

Advanced Computer Architecture

LECTURE 10. Pipelining: Advanced ILP

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

ECE2049-E18 Lecture 6 Notes 1. ECE2049: Embedded Computing in Engineering Design E Term Lecture #6: Exam Review

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

Rasterization: Geometric Primitives

Lecture 7: Examples, MARS, Arithmetic

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 10: Improving Memory Access: Direct and Spatial caches

CS 101, Mock Computer Architecture

CS3350B Computer Architecture Quiz 3 March 15, 2018

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University

PowerPC 740 and 750

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

SPART. SPART Design. A Special Purpose Asynchronous Receiver/Transmitter. The objectives of this miniproject are to:

1 Introduction to Graphics


Memory latency: Affects cache miss penalty. Measured by:

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality

Reduced Instruction Set Computer (RISC)

Rasterization, or What is glbegin(gl_lines) really doing?

Question?! Processor comparison!

Chapter 5. Internal Memory. Yonsei University

Scan Converting Lines

Transcription:

EECS150 - Digital Design Lecture 15 - Video March 6, 2011 John Wawrzynek Spring 2012 EECS150 - Lec15-video Page 1

MIPS150 Video Subsystem Gives software ability to display information on screen. Equivalent to standard graphics cards: Processor can directly write the display bit map 2D Graphics acceleration Spring 2012 EECS150 - Lec15-video Page 2

Framebuffer HW/SW Interface A range of memory addresses correspond to the display. CPU writes (using sw instruction) pixel values to change display. No synchronization required. Independent process reads pixels from memory and sends them to the display interface at the required rate. CPU address map 0xFFFFFFFF Ex: 1024 pixels/line X 768 lines (0,0) 0x803FFFFC 0x80000000 Frame buffer (1023, 767) Display Origin: Increasing X values to the right. Increasing Y values down. 0 Spring 2012 EECS150 - Lec15-video Page 3

Framebuffer Implementation Framebuffer like a simple dual-ported memory. Two independent processes access framebuffer: CPU writes pixel locations. Could be in random order, e.g. drawing an object, or sequentially, e.g. clearing the screen. Frame buffer Video Interface continuously reads pixel locations in scan-line order and sends to physical display. How big is this memory and how do we implement it? For us: 1024 x 768 pixels/frame x 24 bits/pixel Spring 2012 EECS150 - Lec15-video Page 4

Memory Mapped Framebuffer MIPS address map 0xFFFFFFFF 0x8023FFFD 0x80000000 Frame buffer 1024 pixels/line X 768 lines (0,0) (1023, 767) Display Origin: Increasing X values to the right. Increasing Y values down. 1024 * 768 = 786,432 pixels 0 We choose 24 bits/pixel { Red[7:0] ; Green[7:0] ; Blue[7:0] } 786,432 * 3 = 2,359,296 Bytes Total memory bandwidth needed to support frame buffer? Spring 2012 EECS150 - Lec15-video Page 5

Frame Buffer Implementation Which XUP memory resource to use? Memory Capacity Summary: LUT RAM Block RAM External SRAM External DRAM DRAM bandwidth: Spring 2012 EECS150 - Lec15-video Page 6

Framebuffer Details 768 lines, 1024 pixels/line MIPS address map 0xFFFFFFFF 0x80240000 0x80000000 1024 x 768 locations 0 Frame buffer = 786,432 pixel locations. 4K 4K 4K 4K XUP DRAM memory capacity: 256 MBytes (in external DRAM). With Byte addressed memory, best to use 4 Bytes/ pixel Starting each line on a multiple of 4K leads to independent X and Y address: {Y[9:0] ; X[11:2]} Y == row number, X == pixel in row Spring 2012 EECS150 - Lec15-video Page 7

Frame Buffer Physical Interface Processor Side: provides a memory mapped programming interface to video display. You do! CPU DRAM Hub : arbitrates among multiple DRAM users. You do! DRAM Controller / Hub FPGA Video Interface More generally, how does software interface to I/O devices? Video Interface Block: accepts pixel values from FB, streams pixels values and control signals to physical device. We do! Spring 2012 EECS150 - Lec15-video Page 8

DVI connector: accommodates analog and digital formats Physical Video Interface DVI Transmitter Chip, Chrontel 7301C. Implements standard signaling voltage levels for video monitors. Digital to analog conversion for analog display formats. Spring 2012 EECS150 - Lec15-video Page 9

Framebuffer Details 2009 One pixel value per memory location. MIPS address map 0xFFFFFFFF 0x803FFFFC 0x80000000 0 Frame buffer 768 lines, 1K pixels/line. 1K 1K 1K 1K = 786,432 memory locations Virtex-5 LX110T memory capacity: 5,328 Kbits (in block RAMs). (5,328 X 1024 bits) / 786432 = 6.9 bits/pixel max! We choose 4 bits/pixel Note, that with only 4 bits/pixel, we could assign more than one pixel per memory location. Ruled out by us, as it complicated software. Spring 2012 EECS150 - Lec15-video Page 10

Color Map 4 bits per pixel, allows software to assign each screen location, one of 16 different colors. However, physical display interface uses 8 bits / pixel-color. Therefore entire pallet is 2 24 colors. Color Map converts 4 bit pixel values to 24 bit colors. pixel value from framebuffer 16 entries 24 bits R G B. R G B R G B R G B pixel color to video interface Color map is memory mapped to CPU address space, so software can set the color table. Addresses: 0x8040_0000 0x8040_003C, one 24-bit entry per memory address. Spring 2012 EECS150 - Lec15-video Page 11

Memory Mapped Framebuffer 2010 A range of memory addresses correspond to the display. CPU writes (using sw instruction) pixel values to change display. No handshaking required. Independent process reads pixels from memory and sends them to the display interface at the required rate. MIPS address map 0xFFFFFFFF 0x801D4BFC 0x80000000 Frame buffer 800 pixels/line X 600 lines (0,0) (800, 600) Display Origin: Increasing X values to the right. Increasing Y values down. 0 8Mbits / 480000 = 17.5 bits/pixel max! We choose 16 bits/pixel { Red[4:0] ; Green[5:0] ; Blue[4:0] } Spring 2012 EECS150 - Lec15-video Page 12

Framebuffer Details 2010 MIPS address map 0xFFFFFFFF 0x803FFFFC 0x80000000 1024 x 768 locations 0 Frame buffer 600 lines, 800 pixels/line. 1K 1K 1K 1K = 480,000 memory locations Note, that we assign only one 16 bit pixel per memory location. XUP SRAM memory capacity: ~8 Mbits (in external SRAMs). Starting each line on a multiple of 1K leads to independent X and Y address: {Y[9:0] ; X[9:0]} Y == row number, X == pixel in row Two pixel address map to one address in the SRAM (it is 32bits wide). Only part of the mapped memory range occupied with physical memory. Spring 2012 EECS150 - Lec15-video Page 13

XUP Board External SRAM ZBT synchronous SRAM, 9 Mb on 32-bit data bus, with four parity bits 256K x 36 bits (located under the removable LCD) *ZBT (ZBT stands for zero bus turnaround) the turnaround is the number of clock cycles it takes to change access to the SRAM from write to read and vice versa. The turnaround for ZBT SRAMs or the latency More generally, how does software between read and write cycle is Spring 2012 EECS150 - Lec15-video zero. Page 14 interface to I/O devices?

MIPS150 Video Subsystem Gives software ability to display information on screen. Equivalent to standard graphics cards: Processor can directly write the display bit map 2D Graphics acceleration Spring 2012 EECS150 - Lec15-video Page 15

Graphics Software Clearing the screen - fill the entire screen with same color Remember Framebuffer base address: 0x8000_0000 Size: 1024 x 768 clear: # a0 holds 4-bit pixel color # t0 holds the pixel pointer ori $t0, $0, 0x8000 # top half of frame address sll $t0, $t0, 16 # form framebuffer beginning address # t2 holds the framebuffer max address ori $t2, $0, 768 # 768 rows sll $t2, $t2, 12 # * 1K pixels/row * 4 Bytes/address addu $t2, $t2, $t0 # form ending address addiu $t2, $t2, -4 # minus one word address! # # the write loop L0: sw $a0, 0($t0) # write the pixel bneq $t0, $t2, L0 # loop until done addiu $t0, $t0, 4 # bump pointer! jr $ra How long does this take? What do we need to know to answer? How does this compare to the frame rate? Spring 2012 EECS150 - Lec15-video Page 16

Optimized Clear Routine clear:. Amortizing the loop overhead... # the write loop L0: sw $a0, 0($t0) # write some pixels sw $a0, 4($t0) sw $a0, 8($t0) sw $a0, 12($t0) sw $a0, 16($t0) sw $a0, 20($t0) sw $a0, 24($t0) sw $a0, 28($t0) sw $a0, 32($t0) sw $a0, 36($t0) sw $a0, 40($t0) sw $a0, 44($t0) sw $a0, 48($t0) sw $a0, 52($t0) sw $a0, 56($t0) sw $a0, 60($t0) bneq $t0, $t2, L0 # loop until done addiu $t0, $t0, 64 # bump pointer jr $ra What s the performance of this one? Spring 2012 EECS150 - Lec15-video Page 17

Line Drawing 0 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 From (x0,y0) to (x1,y1) Line equation defines all the points: For each x value, could compute y, with: then round to the nearest integer y value. Slope can be precomputed, but still requires floating point * and + in the loop: slow or expensive! Spring 2012 EECS150 - Lec15-video Page 18 0

Bresenham Line Drawing Algorithm Developed by Jack E. Bresenham in 1962 at IBM. "I was working in the computation lab at IBM's San Jose development lab. A Calcomp plotter had been attached to an IBM 1401 via the 1407 typewriter console.... Computers of the day, slow at complex arithmetic operations, such as multiply, especially on floating point numbers. Bresenham s algorithm works with integers and without multiply or divide. Simplicity makes it appropriate for inexpensive hardware implementation. With extension, can be used for drawing circles. Spring 2012 EECS150 - Lec15-video Page 19

Line Drawing Algorithm This version assumes: x0 < x1, y0 < y1, slope =< 45 degrees function line(x0, x1, y0, y1) int deltax := x1 - x0 0 1 2 3 4 5 6 7 8 9 10 11 12 int deltay := y1 - y0 1 int error := deltax / 2 2 int y := y0 3 for x from x0 to x1 4 plot(x,y) 5 error := error - deltay 6 if error < 0 then 7 y := y + 1 error := error + deltax Note: error starts at deltax/2 and gets decremented by deltay for each x, y gets incremented when error goes negative, therefore y gets incremented at a rate proportional to deltax/deltay. Spring 2012 EECS150 - Lec15-video Page 20

Line Drawing, Examples 0 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 deltay = 1 (very low slope). y only gets incremented once (halfway between x0 and x1) 0 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 deltay = deltax (45 degrees, max slope). y gets incremented for every x Spring 2012 EECS150 - Lec15-video Page 21

Line Drawing Example 0 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 function line(x0, x1, y0, y1) int deltax := x1 - x0 int deltay := y1 - y0 int error := deltax / 2 int y := y0 for x from x0 to x1 plot(x,y) error := error - deltay if error < 0 then y := y + 1 error := error + deltax (1,1) -> (11,5) deltax = 10, deltay = 4, error = 10/2 = 5, y = 1 x = 1: plot(1,1) error = 5-4 = 1 x = 2: plot(2,1) error = 1-4 = -3 y = 1 + 1 = 2 error = -3 + 10 = 7 x = 3: plot(3,2) error = 7-4 = 3 x = 4: plot(4,2) error = 3-4 = -1 y = 2 + 1 = 3 error = -1 + 10 = 9 x = 5: plot(5,3) error = 9-4 = 5 x = 6: plot(6,3) error = 5-4 = 1 x = 7: plot(7,3) error = 1-4 = -3 y = 3 + 1 = 4 error = -3 + 10 -= 7 Spring 2012 EECS150 - Lec15-video Page 22

#define SWAP(x, y) (x ^= y ^= x ^= y) #define ABS(x) (((x)<0)? -(x) : (x)) C Version void line(int x0, int y0, int x1, int y1) { char steep = (ABS(y1 - y0) > ABS(x1 - x0))? 1 : 0; if (steep) { SWAP(x0, y0); SWAP(x1, y1); } Modified to work in any quadrant and for any slope. } if (x0 > x1) { SWAP(x0, x1); SWAP(y0, y1); } int deltax = x1 - x0; Estimate software int deltay = ABS(y1 - y0); int error = deltax / 2; performance (MIPS version) int ystep; int y = y0 int x; ystep = (y0 < y1)? 1 : -1; What s needed to do it in for (x = x0; x <= x1; x++) { if (steep) hardware? plot(y,x); else plot(x,y); Goal is one pixel per cycle. error = error - deltay; if (error < 0) { y += ystep; Pipelining might be necessary. error += deltax; } } Spring 2012 EECS150 - Lec15-video Page 23

Hardware Implementation Notes Read-only control register Write-only trigger registers Write-only non-trigger registers 0x8040_0064: 0x8040_0060: 0x8040_005c: 0x8040_0058: 0x8040_0054: 0x8040_0050: 0x8040_004c: 0x8040_0048: 0x8040_0044: 0x8040_0040: 32 ready color y1 x1 y0 x0 y1 x1 y0 x0 10 0 CPU initializes line engine by sending pair of points and color value to use. Writes to 0x8040_005* trigger engine. Framebuffer has one write port - Shared by CPU and line engine. Priority Spring 2012 to CPU - Line engine EECS150 stalls - Lec15-video when CPU writes. Page 24