TKT-2431 SoC design Introduction to exercises
Assistants: Exercises and the project work Juha Arvio juha.arvio@tut.fi, Otto Esko otto.esko@tut.fi In the project work, a simplified H.263 video encoder is implemented on Altera DE2 FPGA Development and Education board The projects work consists of a set of exercises After successfully finishing each exercise, one should have a working H.263 video encoder Exercises: Mon 14-16, Tue 14-16, Thu 8-10 (TC417) Assistance not available in any other time All needed software is installed on the PCs of the class and can be used whenever the class is not reserved for other courses
Exercises and the project work Attending the exercise hours is voluntary The following assignment is introduced Tools and algorithms are introduced Hints are given Questions are answered Completing each of the exercises is mandatory The returns have to be in time The returns have to be accepted Project work is carried out in groups of 1-2 students Groups of 2 persons are preferred
Exercises and the project work The project work consists of several phases and sub-tasks Receiving and understanding the system requirements Writing a system specification Software implementation of the encoder Functional verification on PC workstation Migrating the SW implementation onto FPGA Verification and performance profiling for pure SW implementation HW/SW partitioning and hardware acceleration Verification and performance profiling for accelerated implementation Documentation
Exercises and the project work Completed project work is valid for three successive exams Bonus points The maximum amount of bonus points is 6 Given according to the quality of returned exercises Bonus point criteria will be explained during the first exercises More detailed description about the project work will be given during the first exercises http://www.tkt.cs.tut.fi/kurssit/2431
Exercise 1 / Part 1 Introduction to topic
Topic of the work A simplified H.263 video encoder on DE2 FPGA Education and Development board The system design flow Introducing the requirements for video encoder Functional specification is written Software implementation written in ANSI C language of the video encoder algorithm is made and verified on PC workstation Initial hardware architecture containing a single Nios II softcore CPU and necessary peripherals is synthesized for FPGA Software version is migrated to Nios II processor on FPGA Design is partitioned into software and hardware according to the profiling result of software implementation DCT algorithm is accelerated with dedicated logic Accelerated system is implemented and verified on FPGA Performance analysis is carried out for the accelerated system as well and compared with the pure software implementation
H.263 The basics of H.263 video encoding are explained during following exercises Students are encouraged to get familiar with video encoding algorithms in general before they start the project H.263 has a lot in common with algorithms like JPEG and MPEG-2 A very simplified version of H.263 video encoder (resembling motion JPEG) is used. Only INTRA coding (i.e. prediction of subsequent frames is not applied) Algorithms used are DCT (Discrete Cosine Transform), Quantization, RLE (Run-Length Encoding), and VLC coding
Software Altera Quartus II v7.2 System development front-end Schematic editing FPGA synthesis SOPC builder for building Avalon/Nios based systems Integrated Iogic analyzer Nios II IDE Software development environment for Nios II processor Part of Nios II development kit Mentor Graphics ModelSim Simulating own VHDL blocks/designs ffplay video player tmndec H.263 decoder nios2-terminal Terminal software for reading from jtag uart
Hardware Altera DE2 Development and Education Board Cyclone II 2C35 FPGA 33,216 logic elements 483,840 bits of embedded RAM 35 Embedded multipliers 4 PLLs 475 User I/O pins (at maximum) External memory devices 4 MB Flash 512 KB SRAM 8 MB SDRAM RS-232 serial port Used for communication between PC and Nios II processor USB blaster port Used for programming the FPGA (memory contents and HW configuration) In addition, the board contains following peripherals (not so relevant for the project) Ethernet MAC/PHY device 4x user push-buttons, 18x toggle switches 18x red user leds, 9x green user leds 8x dual 7-segment display 2x expansion headers (40 user I/O pins / header) SD flash connector header 50 MHz and 27 MHz Oscillators
Exercise returns Exercises are returned as follows: Return for an exercise has to be made before the next week s sunday at 23:59 by E-mail The return has to be made to the corresponding assistant (Juha Arvio or Otto Esko) All the required documents have to be in either pdf or pure text-file format The subject for the email has the following form: SOCD_Ex<exercise_number>_G<group_number> where <exercise_number> is the number of the exercise in question and <group_number> is the number of your group.
Bonus points Three main exercise returns are rated Excellent: 1 bonus point for the exam The returned document is very good and/or the returned source codes work correctly and are well done Accepted: no bonus The returned document or code is acceptable Rejected: no bonus, the return has to be corrected Use common sense: Do not return rubbish! All the exercises have to be accepted At maximum six bonus points for the exam can be obtained 1 point can be obtained from each of the exercises 2, 5, 12 1 point for the first functional (HW accelerated) encoder implementation 2 points for the fastest encoder implementation 1 point for the second fastest encoder
Exercise 1, Part 2 Introduction to algorithms
Requirements for Video Transmission Communication delay More important in video conferencing applications than in file-based streaming applications Should be as low as possible (< 250 ms, even 150 ms) Should be kept as constant as possible Avoiding burst of frames followed by a still image Buffering Frame rate Affects to perceived smoothness of motion Under 10 fps video stream is perceived as fast slide show Image resolution Directly proportional to data size of a raw image Depends on the application
Introduction to H.263 Standard May 1996, ITU-T recommendation v1 Block-based ( Macroblock size is 16 pixels by 16 lines ) Motion estimation for temporal redundancy reduction Same objects are likely to be present in adjacent frames Half pixel accurate motion vectors DCT for spatial redundancy reduction 8 x 8 blocks Adjacent pixel values have only a little difference Quantization (lossy) Control of compression ratio RLE and Huffman as entropy coding algorithms
Block Diagram of H.263 Encoder + pre-processing + DCT Q Entropy coding 1/2 pixel accurate (interpolation) - Mot. Comp v(u,v) Mot. Est. Prediction error computation In Intra mode, MBs are coded directly Q -1 IDCT motion vector v(u,v) Previous reconstructed pictures (same image as the decoder observes) 7 0 4 0 0 0 0 1 1 9 3 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 0 bits out (Huffman, VLC) No need to send zeros in 8x8 block to the decoder
Motion Estimation Previous reconstructed frame Current frame Motion vector (u,v) -p -p p 16 16 p claire.qcif original Detected motion - + + Macroblock's prediction error ( to be encoded ) claire.qcif previous reconstructed
Discrete Cosine Transform (DCT) Assumption: Adjacent pixels differ only a little from each other Thus, data in the frequency domain is easier to compress Spatial domain compression Pixels are grouped into blocks and the blocks are then transformed into frequency domain Essential information is then in more compact form Important DCT-coefficients in upper-left corner, that is, in low frequencies Compression is achieved by discarding the less important information of the transformed block Quantization of coefficients DCT itself is a lossless transform Limited accuracy with coefficients, however, leads to some loss of information
Entropy Encoding After quantization, the quantized coefficients are compressed in a lossless manner using entropy encoding Run-length coding Lower amplitude coefficient likely to be zero Arrange successive quantized non-zero coefficients into combinations of (LAST, RUN, LEVEL) Last = Whether this is the final non-zero coefficient in the block RUN =Number of preceding zeros LEVEL = sign and magnitude of the non-zero coefficient Coefficients are processed in zig-zag order Due to the fact that running zeros are most likely located at higher frequencies Huffman coding (variable length coding) After RLE coefficients are encoded based on the statistical characteristics Shorter codewords for symbols which occur with high probability
H.263 Project work A simplified version of H.263 video encoder (resembling motion JPEG) is used. only INTRA coding (i.e. prediction of subsequent frames is not applied) used algorithms are DCT (Discrete Cosine Transform), quantization, RLE (Run-Length Encoding), and VLC coding. Image resolution used is QCIF (176 x 144) Encoder: pre-processing DCT Q Entropy coding 011001011 Decoder: 011001011 Entropy decoding Q -1 IDCT Reconstructed pictures
Documentation Verification Design flow Requirements Specification Performance analysis SW Implementation Performance analysis HW/SW partitioning Performance analysis Final Implementation
Specification In this week the specification of the encoder is started Required C source codes for the encoder are pre-given Can be downloaded from course web-pages You have to write a simple specification for the video encoder system you are going to implement Specification does not have to be long It is the quality of the contents that matters 4-7 pages in total (including the chapters introduced on next week) The specification should be written before the implementation An implementation document will be written later A diagram of the video encoding flow is required Control and data flow diagram describing how the pre-given H.263 functions are used
Specification (2) 1. Introduction What is being specified 2. Flow of encoding Present different phases of the encoding Explain the encoding flow briefly A flow diagram of encoding is required! 3. Encoder interface Inputs and outputs of encoder What kind of data is read in? What is the output data like? 4.Description of algorithms Function prototypes Description of function parameters and return values Description of function behavior and purpose in this design At least DCT, quantization, RLE, and VLC have to be covered here The subsequent sections will be written in exercise 2.
Links on H.263 related material http://www.itu.int/rec/t-rec-h.263/ ITU-T specification of H.263 http://www.jaxstream.com/products/jaxspeed/wp_m4venc.pdf Basics of MPEG-4 video encoding http://www.ece.purdue.edu/~ace/jpeg-tut/jpegtut1.html JPEG tutorial