AS THE MOBILE electronics market matures, third-generation

Size: px
Start display at page:

Download "AS THE MOBILE electronics market matures, third-generation"

Transcription

1 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 7, JULY A Low-Power 3-D Rendering Engine With Two Texture Units and 29-Mb Embedded DRAM for 3G Multimedia Terminals Ramchan Woo, Student Member, IEEE, Sungdae Choi, Student Member, IEEE, Ju-Ho Sohn, Student Member, IEEE, Seong-Jun Song, Student Member, IEEE, Young-Don Bae, Student Member, IEEE, and Hoi-Jun Yoo, Member, IEEE Abstract A low-power three-dimensional (3-D) rendering engine with two texture units and 29-Mb embedded DRAM is designed and integrated into an LSI for mobile third generation (3G) multimedia terminals. Bilinear MIPMAP texture-mapped 3-D graphics can be realized with the help of low-power pipeline structure, optimization of datapath, extensive clock gating, texture address alignment, and the distributed activation of embedded DRAM. The scalable performance reaches up to 100 Mpixels/s and 400 Mtexels/s at 50 MHz. The chip is implemented with m pure DRAM process to reduce the fabrication cost of the embedded-dram chip. The logic with DRAM takes 46 mm 2 and consumes 140 mw at 33-MHz operation, respectively. The 3-D graphics images are successfully demonstrated by using the fabricated chip on the prototype PDA board. Index Terms Embedded DRAM, low power, mobile application, PDA, portable, texture mapping, 3-D graphics rendering. TABLE I PIPELINE DESCRIPTION I. INTRODUCTION AS THE MOBILE electronics market matures, third-generation (3G) multimedia terminals such as PDAs or smart cellphones are gaining popularity. Their applications are already migrating to real-time multimedia, even to the three-dimensional (3-D) gaming applications [1]. Therefore, much research about hardware accelerators [2] [4] and software-only solutions [1], [5] has tried to put 3-D graphics rendering into the handheld devices. However, they are still below the market requirements showing only limited shading operations, without the texture mapping which is a mandatory requirement for 3-D gaming applications. In order to draw texture-mapped 3-D graphics on the mobile terminals, huge memory bandwidth and capacity must be provided to store the frame, depth, and texture images. Therefore, the embedded memory logic (EML) process is one of the most promising solutions since it integrates both DRAM and logic on a single die. However, this EML technology costs too much because the logic must be designed with the different transistors from the DRAM [11]. Therefore, it has been seldom used on the low-cost mobile platforms. Manuscript received October 28, 2003; revised January 15, The authors are with the Semiconductor System Laboratory, Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology (KAIST), Daejeon , Korea ( ural@eeinfo.kaist.ac.kr; hjyoo@ee.kaist.ac.kr). Digital Object Identifier /JSSC In this work, we designed and implemented a 3-D rendering engine using the pure DRAM technology to reduce the fabrication cost while maintaining the huge memory bandwidth. Using the DRAM process enables us to further reduce the power consumption because off-chip loading to the rendering memory is completely eliminated. We optimize the circuits and architectures so that the rendering engine with two texture units and 29-Mb embedded DRAM is realized while satisfying the requirements of the long-lasting battery lifetime and the physical dimensions of mobile terminals. Also, we designed the rendering engine as a scalable IP core to satisfy the performance requirements on various mobile platforms within allowed power budget, since the target applications range from simple avatars, user interfaces, and commercials on the QCIF ( ) display to the real-time 3-D games on the QVGA ( ). This paper is organized as follows. The system architecture will be discussed in Section II, and the design of low-power rendering pipeline will be covered in Section III. Energy /04$ IEEE

2 1102 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 7, JULY 2004 Fig. 1. Example of target 3G system. Fig. 2. Rendering engine architecture. efficient texture unit and embedded DRAM architecture will follow in Section IV and V, respectively. After discussing the implementation results in Section VI, we will summarize the work in Section VII. II. SYSTEM ARCHITECTURE Fig. 1 shows the target 3G system which contains a baseband modem for communication, an application processor dedicated for multimedia processing, and memories. Once 3-D objects and texture data are downloaded from the air channel, they are stored inside the system memory and graphics DRAM, respectively. Then, the rendering engine starts drawing of 3-D image pixels onto the LCD screen. The system architecture of the proposed rendering engine [6] is shown in Fig. 2. It consists of a main pixel pipeline, a post-processing unit, and 12 rendering DRAMs. The main pixel pipeline performs shading and texturing with two pixel processors, each of which contains a high-performance texture unit. After the pixel is being processed in the main pipeline, the postprocessing unit recalculates the pixel data for real-time special rendering effects such as antialiasing, motion blur, and fog [7]. The 29-Mb rendering DRAMs contain frame buffers, depth buffers, and texture memories. Twelve independently controlled DRAMs reduce the power consumption since only the necessary memories can be activated selectively. III. LOW-POWER RENDERING PIPELINE Fig. 3 shows the main rendering pipeline attached with graphics memories and Table I describes its operation. It is composed of 14 multipipelined stages to maximally save the power consumption by activating only the necessary stages. The graphics memories are accessed through distributed pipeline stages depth buffer at PI stage, texture memory at TP2 stage, and frame buffer at PB stage. Since each pipeline stage is designed as a module with its own controller, additional rendering features can be easily inserted in the next revision without modifying the entire pipeline. After fetching the instructions, the rendering engine shapes the triangle and varies the operation cycles in the next stages according to the size (HOLD#1) and the shape (HOLD#2) of the triangles by pausing the previous pipeline stages. Shaping the triangle is accelerated in the TS stage, performing the horizontal-order rasterization (scanline-based rasterization) as in Fig. 4. Although this rasterization can simplify the memory addressing and pipeline control, the rendering performance can be degraded when the triangle falls across the DRAM pages in the conventional DRAM architecture [8], [13]. Therefore, we redefined the timing of graphics DRAM and assigned the frame and depth buffers as a vertical stripe pattern, instead of prefetching data from standard SDRAM. Since the row of proposed DRAM can be changed without any latency at 50-MHz random row cycle

3 WOO et al.: RENDERING ENGINE FOR 3G MULTIMEDIA TERMINALS 1103 Fig. 3. Main rendering pipeline. Fig. 4. Rasterization order and frame/depth buffer assignment. ns and each memory (A or B in Fig. 4) has its own read/write ports, the graphics DRAM can continuously provide the bandwidth required to access two pixels together. This rasterization order also reduces the power consumption since the memories corresponding to only the necessary pixels can be activated. To render triangles with modified Bresenham s incremental line drawing algorithm [15], the position of input vertices must be identified, and the increments of colors and coordinates must be calculated in the earlier rendering pipeline TS stage. The total calculation time from the register to the final multiplexer (MUX) in the TS is less than 20 ns and it decides the maximum operation frequency of the rendering engine 50 MHz. In order to develop applications quickly in the mobile 3-D graphics, the model data may be shrunk from the PC platform, where triangles are optimized for large-sized screen resolution ( , , or more), to mobile platforms which has even smaller sized screen resolution ( or ). Therefore, the average number of pixels inside the triangle can be smaller in mobile 3-D, which means setup time may become bottleneck of pixel throughput. The setup engine is designed to ensure the triangle-setup cycle is always smaller than pixels-filling cycle even for a single-pixel triangle one cycle triangle setup without latency. Here, optimizing the datapath width is important to implement the TS with small number of transistor gates, while preserving the necessary precision. In this implementation, we use 11-bit floating-point bit mantissa bit exponent SIMD dividers for the datapath. Although the shifters at the last stage in the floating-point look-up table (LUT) division increases the gate counts by 14%, the total area of SIMD dividers is smaller than that of 16-bit fixed-point LUT divider by 40% since the area of the multiplier is much reduced. In order to execute the rendering programs and to control the datapath, bit encoded instructions are defined. Since the transferring the vertices takes most of the rendering cycle, the instructions are optimized for this operation. As shown in Fig. 5, the length of instruction is selected to be 128-bit fixed-format to transfer whole vertex information at every single rendering cycle. Therefore, colors (,,, ), screen coordinates, screen depth, and homogeneous texture coordinates are transferred together with the command information. This 128-bit instructions require

4 1104 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 7, JULY 2004 Fig. 5. Instruction set format. Fig. 6. Extensive clock gating. additional glue-logic to adapt to the 32-bit geometry engine in the graphics LSI [9]. However, it means this rendering engine is attachable to any other geometry engine by changing the design of glue-logic, without touching the rendering core. The number of instruction is decided to support the subset of OpenGL rendering operations, discarding high-level functions and buffers which can be rarely used in the mobile gaming applications. Additional instructions to support real-time special rendering effects, to control the embedded DRAMs, and to manage the standby power are also defined. Since the rendering engine contains two pixel processors (PPs) and each PP has its own texture unit fetching 4 texels/cycle, the pixel fill rate and the texel rate are up to 100 Mpixels/s and 400 Mtexels/s at 50 MHz, respectively. The two pixel processors are also simply assigned to render horizontally adjacent pixels. So, it is easy to gather texture address, and this can be used to propose the energy-efficient texture unit covered in the next section. In order to eliminate the power consumption of the unused blocks as much as possible, we applied extensive clock gating to the pipeline latches as shown in Fig. 6. The rendering engine suspends the following pipeline by gating off the clocks in each pixel processor according to the results of the depth comparison Fig. 7. Bilinear texture filtering. in the PI stage. Therefore, we place the depth-compare unit in the earlier pixel stage, unlike the case in the high-performance PC graphics chipsets. Although this violates the OpenGL semantics, which do not allow updating the depth buffer until after texture mapping as textured pixels may be completely transparent, this violation can be solved by removing those triangles in the software prior to the rendering operation. Also, the pipeline latches of the shading and texturing unit can be independently enabled or disabled to maximally avoid the unnecessary datapath transition.

5 WOO et al.: RENDERING ENGINE FOR 3G MULTIMEDIA TERMINALS 1105 Fig. 8. Address alignment logic. IV. ENERGY-EFFICIENT TEXTURE UNIT The texture images are mapped from the texel space to the screen space as shown in Fig. 7. During this operation, bilinear MIPMAP filtering is performed to improve the pixel quality further [12]. However, this filtering generates as many as eight texture memory requests to process two pixels together since four texels are necessary for the calculation of one pixel. However, fetching 8 texels directly from eight texture memories may result in huge power consumption. Therefore, we propose address alignment logic (AAL) to combine the texel requests and reduce them in real-time. Fig. 8 shows the block diagram of AAL. After texture addresses ( and ) are calculated at TA1 stage, four bilinear addresses are generated from each pixel processor. Then, the spatial aligner (TA2_SPATIAL_ALIGN) compares the texture addresses of PP0 (PP0UV0 PP0UV3) with those of PP1 (PP1UV0 PP1UV3), setting the overlapped position flag (OPF) on SA0 SA3. Then, the temporal aligner (TA2_TEM- PORAL_ALIGN) compares the current texture requests (PP0UV0 PP0UV3, PP1UV0 PP1UV3) with the previous ones which are stored inside the registers, setting the OPF on TA0 TA7. The mask generation block (TA2_MASK_GEN) finally merges the OPF from the spatial and temporal aligners

6 1106 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 7, JULY 2004 Fig. 9. AAL analysis results. (a) Test vectors. (b) Remaining requests after AAL. (c) Number of texture requests. (d) Number of cycles. and generates the bit-masks (SPmask, TMmask), which indicate the texel positions to be newly fetched from the texture memories. The simulation results show the average numbers of mask bits are 5 for SPmask and 2.3 for TMmask. At the same time, the texture addresses are translated into the physical address, which covers 24-Mb memory space at the TA2_ADDR_TRANSLATION block. Although the average number of texture memories activated per cycle is reduced to 2.3 through the operation of spatial and temporal aligner, the maximum number is still eight. Since the rendering engine is attached to four texture memories in this implementation, the bank access is scheduled by TP1_BANK_AGGREGATION in a round-robin manner. We choose the number of texture memories attached to texturing unit as four, since the cumulative probability of remaining requests after AAL are about 90% within this number. The use of AAL also can make the number of texture memories be even smaller for the cheaper platforms if we can sacrifice the performance. When the same texture bank is accessed, this block sets TP1_MULTI to 1, extending the operation cycles. Then TP2 and TP3 stage redistribute the texel data from four texture memories to eight corresponding positions, feeding 4 texels per PP for bilinear texture filtering. Although the number of texture prefetch stages (TP1, TP2, and TP3) is optimized to 3 for this implementation, in which the latency of texture DRAM is 1, it can be easily scaled up for multilatency DRAM such as off-chip texture memory by simply inserting more pipeline latches at TP2. Fig. 9 shows the AAL analysis results. We simulated the performance of AAL while running test vectors as in Fig. 9(a). Fig. 9(b) shows the probability of remaining number of texture requests after AAL. Fig. 9(c) shows the number of texture requests to draw two pixels together. The spatial aligner and temporal aligner reduce the requests to 2.3 on average. The number of cycles to draw two pixels is illustrated in Fig. 9(d). Although two pixels are processed together, the number of cycles is increased by only 10%. Therefore, this rendering engine can draw two pixels while requiring less number of activation

7 WOO et al.: RENDERING ENGINE FOR 3G MULTIMEDIA TERMINALS 1107 Fig. 10. Power/energy reduction in embedded DRAM. (a) Power consumption. (b) Energy consumption. of texture memories with little cycle overhead compared to a single-pp architecture, which means the rendering engine needs less energy to finish drawing a scene. V. EMBEDDED DRAM ARCHITECTURE To save the power consumption of the embedded DRAMs as well as to optimally utilize their bandwidth, we designed three different DRAM types: frame buffer, depth buffer, and texture memory. In order to satisfy the cycle and latency requirements of rendering logic, we completely redesigned the DRAMs, without using any SRAM caches, which consume extra power. To cover the screen resolution which matches the screen resolution of most of current cell phones, four frame buffers and four depth buffers with zero latency are used in the chip. Also, four texture memories amount to 24 Mb and store MIPMAP texture image for the 3-D gaming applications. These embedded DRAMs can operate at scalable clock frequency ranging from 5 to 50 MHz to match the speed of the rendering logic, providing up to 2.4 GB/s bandwidth with 416-bit-wide bus. Twelve distributed DRAMs also save run-time power consumption since only the necessary memories can be selectively activated out of twelve. In this architecture, the overall power of rendering memories per two-pixel can be written as follows: PP1 utilization depth-gated ratio texture-access ratio. Here, depends on the size and the shape of triangle, and it tends to decrease when the triangle gets smaller. depends on the depth complexity, and it can be reduced by the extensive clock gating according to the depth-comparison results. is reduced by the AAL. Based on the actual amount of power consumption of each DRAM ( mw, mw, and mw, measured at 33 MHz), the can be illustrated as in Fig. 10(a). More power can be saved as the triangles get smaller and scenes get more complex, which can happen for gaming applications on small-sized LCD screen of mobile devices. When,, and, the power can be reduced by 65%, compared with the unified memory architecture where all memories are activated together. Fig. 10(b) shows the normalized energy consumption until finishing the drawing job. Let total number of pixels to be drawn. Then, the time required to finish the drawing is Therefore, the energy consumption to finish the drawing is where,, and are the power consumption of the frame buffer, depth buffer, and texture memory, respectively, and The distributed memory system saves more energy as 3-D applications get more complex 63% reduction for and. Also, the memories can be selectively refreshed for data retention in standby modes by power-control instructions as shown in Fig. 11: PLHD (Hold), PIDL (Idle), PSLP (Sleep), and POFF (Off). PHLD can be used to hold datapath and memory temporally for normal rendering operations, waiting for geometry operation. All memories are refreshed in this mode. PIDL turns off the rendering clock but refreshes all graphics memories. In PSLP mode, only texture memory is refreshed to hold the texture images since they are possibly

8 1108 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 7, JULY 2004 TABLE II RENDERING ENGINE FEATURES Fig. 11. Standby power models. Fig. 13. Prototype PDA board. Fig. 12. Die photograph. downloaded from the wireless network. Finally, POFF turns off all operations. VI. IMPLEMENTATION The 3-D rendering engine with embedded DRAM is integrated into the Graphics LSI which contains a 32-bit RISC processor and power management unit as well [9], [10]. The chip is fabricated using m 256-Mb-compatible DRAM process to implement both the logic and memory into a single chip with low fabrication cost. Fig. 12 shows the die photograph and Table II summarizes its features. It can draw 24-bit texture-mapped pixels with maximum drawing speed of 100 Mpixels/s and 400 Mtexels/s at 50 MHz. The use of AAL with four TMs reduces the sustained texturing performance by only about 10%. This is about 50 times faster than the minimum performance requirement (2 Mpixels/s for avatar animation at 15 f/s) of PDAs and cellphones with QVGA resolution LCD screens. Therefore, the clock speed of this rendering engine can be decided to scale down the performance also with the power consumption, depending on the target applications and platforms. The first silicon is successfully working and real-time 3-D graphics images are demonstrated on the prototype PDA board as shown in Fig. 13. VII. CONCLUSION A low-power 3-D rendering engine for 3G multimedia terminals is designed and implemented. Integrating the embedded DRAM and applying various low-power techniques such as extensive clock gating, address alignment, and distributed memories reduce its power consumption to less than 140 mw at the continuous drawing of texture-mapped 3-D scenes. This scalable core with 29-Mb DRAM can operate at various frequencies up to 50 MHz to satisfy the performance and power requirements of different application processors. The rendering engine is integrated into the Graphics LSI, fabricated with m DRAM process, and 3-D animations are successfully demonstrated on the prototype system. REFERENCES [1] Khronos Group, Bringing 3-D gaming to cell phones, presented at the Game Developers Conf [2] R. Woo et al., A 120-mW 3-D rendering engine with 6-Mb embedded DRAM and 3.2-Gbyte/s runtime reconfigurable bus for PDA chip, IEEE J. Solid-State Circuits, vol. 37, pp , Oct

9 WOO et al.: RENDERING ENGINE FOR 3G MULTIMEDIA TERMINALS 1109 [3] C.-W. Yoon et al., A 80/20-MHz 160-mW multimedia processor integrated with embedded DRAM, MPEG-4 and 3-D rendering engine for mobile applications, IEEE J. Solid-State Circuits, vol. 36, pp , Nov [4] Y.-H. Park et al., A 7.1-GB/s low-power rendering engine in 2-D arrayembedded memory logic CMOS for portable multimedia system, IEEE J. Solid-State Circuits, vol. 36, pp , June [5] G. K. Kolli, 3-D graphics optimizations for ARM architecture, presented at the Game Developers Conf [6] R. Woo et al., A low power 3-D rendering engine with two texture units and 29 Mb embedded DRAM for 3G multimedia terminals, in Proc. Eur. Solid-State Circuits Conf. (ESSCIRC), 2003, pp [7] T. Akenine-Moller et al., Real-Time Rendering, 2nd ed. Wellesley, MA: A. K. Peters, [8] M. F. Deering et al., FBRAM: a new form of memory optimized for 3-D graphics, in Proc. ACM SIGGRAPH, 1994, pp [9] R. Woo et al., A 210 mw graphics LSI implementing full 3-D pipeline with 264 Mtexels/s texturing for mobile multimedia applications, in IEEE ISSCC Dig. Tech. Papers, Feb. 2003, pp [10] R. Woo et al., A low-power and high-performance 2D/3D graphics accelerator for mobile multimedia applications, presented at the Hot Chips Conf [11] D. D. Buss, Technology in the Internet age, in IEEE ISSCC Dig. Tech. Papers, Feb. 2002, pp [12] L. Williams, Pyramidal parametrics, in Proc. ACM SIGGRAPH, 1983, pp [13] J. Montrun and H. Moreton, nvidia GeForce4, presented at the Hot Chips Conf [14] OpenGL (2003) [Online]. Available: [15] O. Lathrop and D. Kirk et al., Accurate rendering by subpixel addressing, IEEE Comput. Graphics Applicat., pp , Sept Ramchan Woo (S 00) received the B.S. (summa cum laude) and M.S. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST) in 1999 and 2001, respectively. He is currently working toward the Ph.D. degree in electrical engineering at KAIST and expected to graduate in Aug As a Chief Researcher at the Semiconductor System Laboratory in KAIST, he developed the full 3-D graphics LSI for handheld devices. His research interests include low-power design of mobile multimedia system with specific interest in mobile 3-D computer graphics architecture and its implementation with merged-dram technology. Also, he is now working for the mobile graphics libraries. Sungdae Choi (S 01) was born on March 17, 1978, in Korea. He received the B.S and M.S. degrees in electrical engineering and computer science from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, in 2001 and 2003, respectively, where he is currently working toward the Ph.D. degree. In 2001, he joined the Semiconductor System Laboratory (SSL) at KAIST as a Research Assistant. His research activities are related to application-specific embedded memory architecture and content-addressable memories. Ju-Ho Sohn (S 01) was born on July 7, 1979, in Korea. He received the B.S. (summa cum laude) and M.S. degrees in electrical engineering from the Korea Advanced Institude of Science and Technology (KAIST), Daejeon, in 2001 and 2003, respectively. He is currently working toward the Ph.D. degree in electrical engineeing in the same department. His research activities are related to real-time 3-D graphics for portable systems and its implementation, especially high-performance portable multimedia processor design for 3-D vertex operations. Seong-Jun Song (S 01) was born in Seoul, Korea, in He received the B.S. degree in electrical engineering and computer science in 2001 from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, where he is currently working toward the M.S. degree. Since 2001, he has been a Research Assistant at KAIST. His research interests include high-speed optical interface integrated circuits using submicron CMOS technology, phase-locked loops, and clock and data recovery circuits for high-speed data communications, and radio-frequency CMOS integrated circuits for wireless communication applications. Young-Don Bae (S 01) received the B.S. and M.S. degrees in electronics engineering from Chungnam National University, Daejeon, Korea, in 1997 and 1999, respectively. Currently, he is working toward the Ph.D. degree in the Department of Electrical Engineering and Computer Science at the Korea Advanced Institute of Science and Technology (KAIST), Daejeon. His research interests include system-on-a-chip design methodology and high-performance and low-power microprocessor design. Hoi-Jun Yoo (M 95) graduated from the Electronic Department of Seoul National University, Seoul, Korea, in 1983 and received the M.S. and Ph.D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, in 1985 and 1988, respectively. His Ph.D. work concerned the fabrication process for GaAs vertical optoelectronic integrated circuits. From 1988 to 1990, he was with Bell Communications Research, Red Bank, NJ, where he invented the two-dimensional phase-locked VCSEL array, the front-surface-emitting laser, and the high-speed lateral HBT. In 1991, he became Manager of a DRAM design group at Hyundai Electronics and designed a family of fast-1 M DRAMs and synchronous DRAMs, including 256 M SDRAM. From 1995 to 1997, he was a faculty member with Kangwon National University. In 1998, he joined the faculty of the Department of Electrical Engineering at KAIST, and currently leads a project team on RAM Processors (RAMP). In 2001, he founded a national research center, System Integration and IP Authoring Research Center (SIPAC), funded by Korean government to promote wordwide IP authoring and its SOC application. Currently he is the Project Manager for SoC in Korea Ministry of Information and Communication. His current interests are SOC design, IP authoring, high-speed and low-power memory circuits and architectures, design of embedded memory logic, optoelectronic integrated circuits, and novel devices and circuits. He is the author of the books DRAM Design (Seoul, Korea: Hongleung, 1996; in Korean) and High Performance DRAM (Seoul, Korea: Sigma, 1999; in Korean). Dr. Yoo received the Electronic Industrial Association of Korea Award for his contribution to DRAM technology in 1994 and the Korea Semiconductor Industry Association Award in 2002.

2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don

2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don RAMP-IV: A Low-Power and High-Performance 2D/3D Graphics Accelerator for Mobile Multimedia Applications Woo, Sungdae Choi, Ju-Ho Sohn, Seong-Jun Song, Young-Don Bae,, and Hoi-Jun Yoo oratory Dept. of EECS,

More information

ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2

ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2 ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2 9.2 A 80/20MHz 160mW Multimedia Processor integrated with Embedded DRAM MPEG-4 Accelerator and 3D Rendering Engine for Mobile Applications

More information

A 120mW Embedded 3D Graphics Rendering Engine with 6Mb Logically Local Frame-Buffer and 3.2GByte/s Run-time Reconfigurable Bus for PDA-Chip

A 120mW Embedded 3D Graphics Rendering Engine with 6Mb Logically Local Frame-Buffer and 3.2GByte/s Run-time Reconfigurable Bus for PDA-Chip A 120mW Embedded 3D Graphics Rendering Engine with 6Mb Logically Local Frame-Buffer and 3.2GByte/s Run-time Reconfigurable Bus for PDA-Chip Ramchan Woo*, Chi-Weon Yoon, Jeonghoon Kook, Se-Joong Lee, Kangmin

More information

Development of a 3-D Graphics Rendering Engine with Lighting Acceleration for Handheld Multimedia Systems

Development of a 3-D Graphics Rendering Engine with Lighting Acceleration for Handheld Multimedia Systems 1020 IEEE Transactions on Consumer Electronics, Vol. 51, No. 3, AUGUST 2005 Development of a 3-D Graphics Rendering Engine with Lighting Acceleration for Handheld Multimedia Systems Byeong-Gyu Nam, Min-wuk

More information

Cost-Effective Low-Power Graphics Processing Unit for Handheld Devices

Cost-Effective Low-Power Graphics Processing Unit for Handheld Devices INTEGRATED CIRCUITS FOR COMMUNICATIONS Cost-Effective Low-Power Graphics Processing Unit for Handheld Devices Byeong-Gyu Nam, Jeabin Lee, Kwanho Kim, Seungjin Lee, and Hoi-Jun Yoo, Korea Advanced Institute

More information

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System

More information

Vertex Shader Design I

Vertex Shader Design I The following content is extracted from the paper shown in next page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only

More information

A Low Power Multimedia SoC with Fully Programmable 3D Graphics and MPEG4/H.264/JPEG for Mobile Devices

A Low Power Multimedia SoC with Fully Programmable 3D Graphics and MPEG4/H.264/JPEG for Mobile Devices A Low Power Multimedia SoC with Fully Programmable 3D Graphics and MPEG4/H.264/JPEG for Mobile Devices Jeong-Ho Woo, Ju-Ho Sohn, Hyejung Kim, Jongcheol Jeong 1, Euljoo Jeong 1, Suk Joong Lee 1 and Hoi-Jun

More information

MODERN graphics processing units (GPUs) for 3-D

MODERN graphics processing units (GPUs) for 3-D IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 8, AUGUST 2007 1767 A Low-Power Unified Arithmetic Unit for Programmable Handheld 3-D Graphics Systems Byeong-Gyu Nam, Student Member, IEEE, Hyejung Kim,

More information

High Performance AXI Protocol Based Improved DDR3 Memory Controller With Improved Memory Bandwidth

High Performance AXI Protocol Based Improved DDR3 Memory Controller With Improved Memory Bandwidth High Performance AXI Protocol Based Improved DDR3 Memory Controller With Improved Memory Bandwidth Manoj Gupta a, Dr. Ashok Kumar Nagawat b a Research Scholar, Faculty of Science, University of Rajasthan,

More information

Design and Optimization of Geometry Acceleration for Portable 3D Graphics

Design and Optimization of Geometry Acceleration for Portable 3D Graphics M.S. Thesis Design and Optimization of Geometry Acceleration for Portable 3D Graphics Ju-ho Sohn 2002.12.20 oratory Department of Electrical Engineering and Computer Science Korea Advanced Institute of

More information

Design and Implementation of High Performance Application Specific Memory

Design and Implementation of High Performance Application Specific Memory Design and Implementation of High Performance Application Specific Memory - 고성능 Application Specific Memory 의설계와구현 - M.S. Thesis Sungdae Choi Dec. 20th, 2002 Outline Introduction Memory for Mobile 3D Graphics

More information

Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor

Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998 707 Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor James A. Farrell and Timothy C. Fischer Abstract The logic and circuits

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

AS THE semiconductor technology scales down and the

AS THE semiconductor technology scales down and the IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 7, JULY 2010 1399 A 118.4 GB/s Multi-Casting Network-on-Chip With Hierarchical Star-Ring Combined Topology for Real-Time Object Recognition Joo-Young

More information

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee

More information

A fixed-point 3D graphics library with energy-efficient efficient cache architecture for mobile multimedia system

A fixed-point 3D graphics library with energy-efficient efficient cache architecture for mobile multimedia system MS Thesis A fixed-point 3D graphics library with energy-efficient efficient cache architecture for mobile multimedia system Min-wuk Lee 2004.12.14 Semiconductor System Laboratory Department Electrical

More information

Scanline-based rendering of 2D vector graphics

Scanline-based rendering of 2D vector graphics Scanline-based rendering of 2D vector graphics Sang-Woo Seo 1, Yong-Luo Shen 1,2, Kwan-Young Kim 3, and Hyeong-Cheol Oh 4a) 1 Dept. of Elec. & Info. Eng., Graduate School, Korea Univ., Seoul 136 701, Korea

More information

A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision

A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision Seok-Hoon Kim KAIST, Daejeon, Republic of Korea I. INTRODUCTION Recently, there has been tremendous progress in 3D graphics

More information

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Architectures Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Overview of today s lecture The idea is to cover some of the existing graphics

More information

A Low Cost Tile-based 3D Graphics Full Pipeline with Real-time Performance Monitoring Support for OpenGL ES in Consumer Electronics

A Low Cost Tile-based 3D Graphics Full Pipeline with Real-time Performance Monitoring Support for OpenGL ES in Consumer Electronics A Low Cost Tile-based 3 Graphics Full Pipeline with Real-time Performance Monitoring Support for OpenGL ES in Consumer Electronics Ruei-Ting Gu, Tse-Chen Yeh, Wei-Sheng Hunag, Ting-Yun Huang, Chung-Hua

More information

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.1, FEBRUARY, 2015 http://dx.doi.org/10.5573/jsts.2015.15.1.077 Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network

More information

ISSN: [Bilani* et al.,7(2): February, 2018] Impact Factor: 5.164

ISSN: [Bilani* et al.,7(2): February, 2018] Impact Factor: 5.164 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A REVIEWARTICLE OF SDRAM DESIGN WITH NECESSARY CRITERIA OF DDR CONTROLLER Sushmita Bilani *1 & Mr. Sujeet Mishra 2 *1 M.Tech Student

More information

Performance Evolution of DDR3 SDRAM Controller for Communication Networks

Performance Evolution of DDR3 SDRAM Controller for Communication Networks Performance Evolution of DDR3 SDRAM Controller for Communication Networks U.Venkata Rao 1, G.Siva Suresh Kumar 2, G.Phani Kumar 3 1,2,3 Department of ECE, Sai Ganapathi Engineering College, Visakhaapatnam,

More information

Architecture of An AHB Compliant SDRAM Memory Controller

Architecture of An AHB Compliant SDRAM Memory Controller Architecture of An AHB Compliant SDRAM Memory Controller S. Lakshma Reddy Metch student, Department of Electronics and Communication Engineering CVSR College of Engineering, Hyderabad, Andhra Pradesh,

More information

Structure. Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung,

Structure. Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung, A High Performance 3D Graphics Rasterizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung, Il-San Kim,

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

Byeong-Gyu Nam, Jeabin Lee, Kwanho Kim, Seung Jin Lee, and Hoi-Jun Yoo

Byeong-Gyu Nam, Jeabin Lee, Kwanho Kim, Seung Jin Lee, and Hoi-Jun Yoo A Low-Power Handheld GPU using Logarithmic Arithmetic and Triple DVFS Power Domains Byeong-Gyu Nam, Jeabin Lee, Kwanho Kim, Seung Jin Lee, and Hoi-Jun Yoo Outline Backgrounds Proposed Handheld GPU Low-Power

More information

DIRECT Rambus DRAM has a high-speed interface of

DIRECT Rambus DRAM has a high-speed interface of 1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, NOVEMBER 1999 A 1.6-GByte/s DRAM with Flexible Mapping Redundancy Technique and Additional Refresh Scheme Satoru Takase and Natsuki Kushiyama

More information

A Network Storage LSI Suitable for Home Network

A Network Storage LSI Suitable for Home Network 258 HAN-KYU LIM et al : A NETWORK STORAGE LSI SUITABLE FOR HOME NETWORK A Network Storage LSI Suitable for Home Network Han-Kyu Lim*, Ji-Ho Han**, and Deog-Kyoon Jeong*** Abstract Storage over (SoE) is

More information

3-D Accelerator on Chip

3-D Accelerator on Chip 3-D Accelerator on Chip Third Prize 3-D Accelerator on Chip Institution: Participants: Instructor: Donga & Pusan University Young-Hee Won, Jin-Sung Park, Woo-Sung Moon Sam-Hak Jin Design Introduction Recently,

More information

Unleashing the Power of Embedded DRAM

Unleashing the Power of Embedded DRAM Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

High Throughput Radix-4 SISO Decoding Architecture with Reduced Memory Requirement

High Throughput Radix-4 SISO Decoding Architecture with Reduced Memory Requirement JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.4, AUGUST, 2014 http://dx.doi.org/10.5573/jsts.2014.14.4.407 High Throughput Radix-4 SISO Decoding Architecture with Reduced Memory Requirement

More information

AXI Compliant DDR3 Controller

AXI Compliant DDR3 Controller 2010 Second International Conference on Computer Modeling and Simulation AXI Compliant ler Vikky Lakhmani, M.Tech(Sequential) Student Department of Electrical & Electronics Engineering, Uttar Pradesh Technical

More information

A Fast Synchronous Pipelined DRAM Architecture with SRAM Buffers

A Fast Synchronous Pipelined DRAM Architecture with SRAM Buffers A Fast Synchronous Pipelined DRAM Architecture with SRAM Buffers Chi-Weon Yoon, Yon-Kyun Im, Seon-Ho Han, Hoi-Jun Yoo and Tae-Sung Jung* Dept. of Electrical Engineering, KAIST *Samsung Electronics Co.,

More information

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research Optimizing Games for ATI s IMAGEON 2300 Aaftab Munshi 3D Architect ATI Research A A 3D hardware solution enables publishers to extend brands to mobile devices while remaining close to original vision of

More information

Chapter 2 Embedded Memory Architecture for Low-Power Application Processor

Chapter 2 Embedded Memory Architecture for Low-Power Application Processor Chapter 2 Embedded Memory Architecture for Low-Power Application Processor Hoi Jun Yoo and Donghyun Kim 2.1 Memory Hierarchy 2.1.1 Introduction Currently, the state-of-the-art high-end processors operate

More information

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools Mobile Performance Tools and GPU Performance Tuning Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools NVIDIA GoForce5500 Overview World-class 3D HW Geometry pipeline 16/32bpp

More information

RECENTLY, researches on gigabit wireless personal area

RECENTLY, researches on gigabit wireless personal area 146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 55, NO. 2, FEBRUARY 2008 An Indexed-Scaling Pipelined FFT Processor for OFDM-Based WPAN Applications Yuan Chen, Student Member, IEEE,

More information

A Low-Power ECC Check Bit Generator Implementation in DRAMs

A Low-Power ECC Check Bit Generator Implementation in DRAMs 252 SANG-UHN CHA et al : A LOW-POWER ECC CHECK BIT GENERATOR IMPLEMENTATION IN DRAMS A Low-Power ECC Check Bit Generator Implementation in DRAMs Sang-Uhn Cha *, Yun-Sang Lee **, and Hongil Yoon * Abstract

More information

Database Management Systems, 2nd edition, Raghu Ramakrishnan, Johannes Gehrke, McGraw-Hill

Database Management Systems, 2nd edition, Raghu Ramakrishnan, Johannes Gehrke, McGraw-Hill Lecture Handout Database Management System Lecture No. 34 Reading Material Database Management Systems, 2nd edition, Raghu Ramakrishnan, Johannes Gehrke, McGraw-Hill Modern Database Management, Fred McFadden,

More information

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME)

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME) Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME) Pavel Petroshenko, Sun Microsystems, Inc. Ashmi Bhanushali, NVIDIA Corporation Jerry Evans, Sun Microsystems, Inc. Nandini

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

GeForce4. John Montrym Henry Moreton

GeForce4. John Montrym Henry Moreton GeForce4 John Montrym Henry Moreton 1 Architectural Drivers Programmability Parallelism Memory bandwidth 2 Recent History: GeForce 1&2 First integrated geometry engine & 4 pixels/clk Fixed-function transform,

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola 1. Microprocessor Architectures 1.1 Intel 1.2 Motorola 1.1 Intel The Early Intel Microprocessors The first microprocessor to appear in the market was the Intel 4004, a 4-bit data bus device. This device

More information

THE synchronous DRAM (SDRAM) has been widely

THE synchronous DRAM (SDRAM) has been widely IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 10, OCTOBER 1997 1597 A Study of Pipeline Architectures for High-Speed Synchronous DRAM s Hoi-Jun Yoo Abstract The performances of SDRAM s with different

More information

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver E.Kanniga 1, N. Imocha Singh 2,K.Selva Rama Rathnam 3 Professor Department of Electronics and Telecommunication, Bharath

More information

Hardware-driven visibility culling

Hardware-driven visibility culling Hardware-driven visibility culling I. Introduction 20073114 김정현 The goal of the 3D graphics is to generate a realistic and accurate 3D image. To achieve this, it needs to process not only large amount

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

EE414 Embedded Systems Ch 5. Memory Part 2/2

EE414 Embedded Systems Ch 5. Memory Part 2/2 EE414 Embedded Systems Ch 5. Memory Part 2/2 Byung Kook Kim School of Electrical Engineering Korea Advanced Institute of Science and Technology Overview 6.1 introduction 6.2 Memory Write Ability and Storage

More information

ISSCC 2003 / SESSION 8 / COMMUNICATIONS SIGNAL PROCESSING / PAPER 8.7

ISSCC 2003 / SESSION 8 / COMMUNICATIONS SIGNAL PROCESSING / PAPER 8.7 ISSCC 2003 / SESSION 8 / COMMUNICATIONS SIGNAL PROCESSING / PAPER 8.7 8.7 A Programmable Turbo Decoder for Multiple 3G Wireless Standards Myoung-Cheol Shin, In-Cheol Park KAIST, Daejeon, Republic of Korea

More information

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts Hardware/Software Introduction Chapter 5 Memory Outline Memory Write Ability and Storage Permanence Common Memory Types Composing Memory Memory Hierarchy and Cache Advanced RAM 1 2 Introduction Memory:

More information

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction Hardware/Software Introduction Chapter 5 Memory 1 Outline Memory Write Ability and Storage Permanence Common Memory Types Composing Memory Memory Hierarchy and Cache Advanced RAM 2 Introduction Embedded

More information

TECHNOLOGY BRIEF. Double Data Rate SDRAM: Fast Performance at an Economical Price EXECUTIVE SUMMARY C ONTENTS

TECHNOLOGY BRIEF. Double Data Rate SDRAM: Fast Performance at an Economical Price EXECUTIVE SUMMARY C ONTENTS TECHNOLOGY BRIEF June 2002 Compaq Computer Corporation Prepared by ISS Technology Communications C ONTENTS Executive Summary 1 Notice 2 Introduction 3 SDRAM Operation 3 How CAS Latency Affects System Performance

More information

Texture Caching. Héctor Antonio Villa Martínez Universidad de Sonora

Texture Caching. Héctor Antonio Villa Martínez Universidad de Sonora April, 2006 Caching Héctor Antonio Villa Martínez Universidad de Sonora (hvilla@mat.uson.mx) 1. Introduction This report presents a review of caching architectures used for texture mapping in Computer

More information

CENG4480 Lecture 09: Memory 1

CENG4480 Lecture 09: Memory 1 CENG4480 Lecture 09: Memory 1 Bei Yu byu@cse.cuhk.edu.hk (Latest update: November 8, 2017) Fall 2017 1 / 37 Overview Introduction Memory Principle Random Access Memory (RAM) Non-Volatile Memory Conclusion

More information

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary Cornell University CS 569: Interactive Computer Graphics Introduction Lecture 1 [John C. Stone, UIUC] 2008 Steve Marschner 1 2008 Steve Marschner 2 NASA University of Calgary 2008 Steve Marschner 3 2008

More information

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

GoForce 3D: Coming to a Pixel Near You

GoForce 3D: Coming to a Pixel Near You GoForce 3D: Coming to a Pixel Near You CEDEC 2004 NVIDIA Actively Developing Handheld Solutions Exciting and Growing Market Fully Committed to developing World Class graphics products for the mobile Already

More information

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Building scalable 3D applications. Ville Miettinen Hybrid Graphics Building scalable 3D applications Ville Miettinen Hybrid Graphics What s going to happen... (1/2) Mass market: 3D apps will become a huge success on low-end and mid-tier cell phones Retro-gaming New game

More information

FAST FOURIER TRANSFORM (FFT) and inverse fast

FAST FOURIER TRANSFORM (FFT) and inverse fast IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 11, NOVEMBER 2004 2005 A Dynamic Scaling FFT Processor for DVB-T Applications Yu-Wei Lin, Hsuan-Yu Liu, and Chen-Yi Lee Abstract This paper presents an

More information

Programming Graphics Hardware

Programming Graphics Hardware Tutorial 5 Programming Graphics Hardware Randy Fernando, Mark Harris, Matthias Wloka, Cyril Zeller Overview of the Tutorial: Morning 8:30 9:30 10:15 10:45 Introduction to the Hardware Graphics Pipeline

More information

MIPS R4300I Microprocessor. Technical Backgrounder-Preliminary

MIPS R4300I Microprocessor. Technical Backgrounder-Preliminary MIPS R4300I Microprocessor Technical Backgrounder-Preliminary Table of Contents Chapter 1. R4300I Technical Summary... 3 Chapter 2. Overview... 4 Introduction... 4 The R4300I Microprocessor... 5 The R4300I

More information

EMBEDDED VERTEX SHADER IN FPGA

EMBEDDED VERTEX SHADER IN FPGA EMBEDDED VERTEX SHADER IN FPGA Lars Middendorf, Felix Mühlbauer 1, Georg Umlauf 2, Christophe Bobda 1 1 Self-Organizing Embedded Systems Group, Department of Computer Science University of Kaiserslautern

More information

Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology

Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology Vol. 3, Issue. 3, May.-June. 2013 pp-1475-1481 ISSN: 2249-6645 Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology Bikash Khandal,

More information

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

Reminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline

Reminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline Reminder Course project team forming deadline Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline Course project ideas If you have difficulty in finding team mates, send your

More information

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015 Enhancing Traditional Rasterization Graphics with Ray Tracing March 2015 Introductions James Rumble Developer Technology Engineer Ray Tracing Support Justin DeCell Software Design Engineer Ray Tracing

More information

The Memory Component

The Memory Component The Computer Memory Chapter 6 forms the first of a two chapter sequence on computer memory. Topics for this chapter include. 1. A functional description of primary computer memory, sometimes called by

More information

Emerging DRAM Technologies

Emerging DRAM Technologies 1 Emerging DRAM Technologies Michael Thiems amt051@email.mot.com DigitalDNA Systems Architecture Laboratory Motorola Labs 2 Motivation DRAM and the memory subsystem significantly impacts the performance

More information

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 ISSCC 26 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 22.1 A 125µW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ting-An Lin 2, Sheng-Zen Wang 2, Wen-Ping Lee

More information

Design and Implementation of High Performance DDR3 SDRAM controller

Design and Implementation of High Performance DDR3 SDRAM controller Design and Implementation of High Performance DDR3 SDRAM controller Mrs. Komala M 1 Suvarna D 2 Dr K. R. Nataraj 3 Research Scholar PG Student(M.Tech) HOD, Dept. of ECE Jain University, Bangalore SJBIT,Bangalore

More information

Low-power Architecture. By: Jonathan Herbst Scott Duntley

Low-power Architecture. By: Jonathan Herbst Scott Duntley Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction In a packet-switched network, packets are buffered when they cannot be processed or transmitted at the rate they arrive. There are three main reasons that a router, with generic

More information

Coming to a Pixel Near You: Mobile 3D Graphics on the GoForce WMP. Chris Wynn NVIDIA Corporation

Coming to a Pixel Near You: Mobile 3D Graphics on the GoForce WMP. Chris Wynn NVIDIA Corporation Coming to a Pixel Near You: Mobile 3D Graphics on the GoForce WMP Chris Wynn NVIDIA Corporation What is GoForce 3D? Licensable 3D Core for Mobile Devices Discrete Solutions: GoForce 3D 4500/4800 OpenGL

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

ARM Multimedia IP: working together to drive down system power and bandwidth

ARM Multimedia IP: working together to drive down system power and bandwidth ARM Multimedia IP: working together to drive down system power and bandwidth Speaker: Robert Kong ARM China FAE Author: Sean Ellis ARM Architect 1 Agenda System power overview Bandwidth, bandwidth, bandwidth!

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

10/24/2016. Let s Name Some Groups of Bits. ECE 120: Introduction to Computing. We Just Need a Few More. You Want to Use What as Names?!

10/24/2016. Let s Name Some Groups of Bits. ECE 120: Introduction to Computing. We Just Need a Few More. You Want to Use What as Names?! University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 120: Introduction to Computing Memory Let s Name Some Groups of Bits I need your help. The computer we re going

More information

COMPUTER ARCHITECTURES

COMPUTER ARCHITECTURES COMPUTER ARCHITECTURES Random Access Memory Technologies Gábor Horváth BUTE Department of Networked Systems and Services ghorvath@hit.bme.hu Budapest, 2019. 02. 24. Department of Networked Systems and

More information

LETTER Solid-State Disk with Double Data Rate DRAM Interface for High-Performance PCs

LETTER Solid-State Disk with Double Data Rate DRAM Interface for High-Performance PCs IEICE TRANS. INF. & SYST., VOL.E92 D, NO.4 APRIL 2009 727 LETTER Solid-State Disk with Double Data Rate DRAM Interface for High-Performance PCs Dong KIM, Kwanhu BANG, Seung-Hwan HA, Chanik PARK, Sung Woo

More information

CENG3420 Lecture 08: Memory Organization

CENG3420 Lecture 08: Memory Organization CENG3420 Lecture 08: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest update: February 22, 2018) Spring 2018 1 / 48 Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory

More information

UMBC D 7 -D. Even bytes 0. 8 bits FFFFFC FFFFFE. location in addition to any 8-bit location. 1 (Mar. 6, 2002) SX 16-bit Memory Interface

UMBC D 7 -D. Even bytes 0. 8 bits FFFFFC FFFFFE. location in addition to any 8-bit location. 1 (Mar. 6, 2002) SX 16-bit Memory Interface 8086-80386SX 16-bit Memory Interface These machines differ from the 8088/80188 in several ways: The data bus is 16-bits wide. The IO/M pin is replaced with M/IO (8086/80186) and MRDC and MWTC for 80286

More information

SH-Mobile3: Application Processor for 3G Cellular Phones on a Low-Power SoC Design Platform

SH-Mobile3: Application Processor for 3G Cellular Phones on a Low-Power SoC Design Platform SH-Mobile3: Application Processor for 3G Cellular Phones on a Low-Power SoC Design Platform H. Mizuno, N. Irie, K. Uchiyama, Y. Yanagisawa 1, S. Yoshioka 1, I. Kawasaki 1, and T. Hattori 2 Hitachi Ltd.,

More information

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM Memories Overview Memory Classification Read-Only Memory (ROM) Types of ROM PROM, EPROM, E 2 PROM Flash ROMs (Compact Flash, Secure Digital, Memory Stick) Random Access Memory (RAM) Types of RAM Static

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most

More information

High Performance Visibility Testing with Screen Segmentation

High Performance Visibility Testing with Screen Segmentation High Performance Visibility Testing with Screen Segmentation Péter Sántó, Béla Fehér Budapest University of Technology and Economics Department of Measurement and Information Systems santo@mit.bme.hu,

More information

Hot Chips Bringing Workstation Graphics Performance to a Desktop Near You. S3 Incorporated August 18-20, 1996

Hot Chips Bringing Workstation Graphics Performance to a Desktop Near You. S3 Incorporated August 18-20, 1996 Hot Chips 1996 Bringing Workstation Graphics Performance to a Desktop Near You S3 Incorporated August 18-20, 1996 Agenda ViRGE/VX Marketing Slide! Overview of ViRGE/VX accelerator features 3D rendering

More information

CS 130 Final. Fall 2015

CS 130 Final. Fall 2015 CS 130 Final Fall 2015 Name Student ID Signature You may not ask any questions during the test. If you believe that there is something wrong with a question, write down what you think the question is trying

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology

More information

Programming Characteristics on Three-Dimensional NAND Flash Structure Using Edge Fringing Field Effect

Programming Characteristics on Three-Dimensional NAND Flash Structure Using Edge Fringing Field Effect JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.5, OCTOBER, 2014 http://dx.doi.org/10.5573/jsts.2014.14.5.537 Programming Characteristics on Three-Dimensional NAND Flash Structure Using Edge

More information