A Low Power 720p Motion Estimation Processor with 3D Stacked Memory

Size: px
Start display at page:

Download "A Low Power 720p Motion Estimation Processor with 3D Stacked Memory"

Transcription

1 A Low Power 720p Motion Estimation Processor with 3D Stacked Memory Shuping Zhang, Jinjia Zhou, Dajiang Zhou and Satoshi Goto Graduate School of Information, Production and Systems, Waseda University 2-7 Hibikino, Kitakyushu , Japan Abstract In this paper, a motion estimation processor (MEP) with 3D stacked memory architecture is proposed to 1) reduce the memory and core power consumption; 2) provide higher bandwidth. Firstly, a memory die is designed and staked with MEP die. By adding face-to-face (F2F) pad and through silicon vias (TSV) definitions, 2D electronic design automation (EDA) tools are extended to support the proposed 3D stacking architecture. Moreover, a novel memory controller is applied to control the data transmission and the timing between memory die and MEP die. Finally, 3D physical design is completed for the whole system including TSV/F2F placement, floor plan optimization, power network generation, etc. Comparing with 2D technology, the number of IO pins is reduced by 77%. After optimizing the floor plan of the MEP die and memory die, the routing wire length is reduced by 13.4% and 50% respectively. The simulation results show that the max bandwidth is more than 14GB/s and whole design can support real-time 720p@60fps encoding at 8MHz with less than 65mW, which is only one sixth of the stateof-the-art MEP. Keywords 3DIC design; motion estimation processor; low power design; memory stacking I. INTRODUCTION With the development of semiconductor technology, the portable devices become more and more powerful. Meanwhile, the camera integrated in the portable devices has higher and higher performance. The most obvious feature is that the capture resolution has been improved from 0.3 Megapixel to 10 Megapixel. With such a powerful camera, 720p, 1080p video recording and playback have been a common function in the new portable devices. Benefit from the portability of smart phones and the great expressivity of the video, more and more users tend to record their lives by video. But in the same time, power consumption has been the bottleneck of portable devices, many devices need to be charged once or even twice per day, which makes a bad experience to users. Users prefer a long battery life time to their portable devices. In a word, the popularity of video capture and playback by portable devices is increasing, so that a low power video codec is required. Many researches focused on the reduction of the power consumption on the video codec itself and got quite good results. V. Sze et al. [1] implemented a full real-time 720p H.264 decoder in 65nm, by using variety of techniques such as multiple voltage, frequency domains, frame level dynamic voltage and frequency scaling, the video decoder core power is reduced to 1.8mW. Y. Lin et al. [2] implemented a 1080p@30fps This research was supported by the regional innovation strategy support program of MEXT and Waseda University Graduate Program for Embodiment Informatics (FY2013-FY2019). H.264/AVC (advanced video coding) encoder in 130nm, by applying several techniques including complexity reduction, cross-stage hardware sharing, etc., the encoder core power is optimized to 242mW. However, although the video encoder/decoder core power is reduced significantly, working with an external dynamic random access memory (DRAM), the total power consumption is still high. Many works have been focused on reducing the DRAM power by decreasing the DRAM bandwidth requirement [3][4]. But conventional 2D integrated circuit process technology has encountered the bottleneck. Now many researchers are trying to solve the DRAM problem by 3D large scale integration (LSI). A good example is, for the regularity of the architecture of DRAM, an industrial high performance 8Gb 3D DDR3 memory has been developed in [5]. What s more, Samsung has applied 3D-TSV technology to its 30nm-class DRAM products to keep pace with Moore s Law and industry projections. Also many researches focus on wide IO memory [6] and hybrid memory cube (HMC) [7] to improve the performance. Not limited to the memory area, many researches were focusing on 3D LSI design. A 64 core processor with stacked memory was designed in [8], whose max throughput is up to 63.8GB/s. T. Zhao et al. [9] introduced a 5-tier stacked H.264 application with on-chip DRAM stacking. Even though the memory power is not given in [9], we can figure it out by [10] according to the characteristics given in [9]. The memory power is 492.5mW, which is still too high. In this paper, a motion estimation processor (MEP) with 3D stacked memory architecture is proposed to reduce the memory power and provide higher memory bandwidth. MEP is a key encoding component of almost all modern video coding standards. As profiled in [11], MEP takes more than 50% of the total computation time in an H.264/AVC encoder when configured to use single-direction full search and a search range (SR) of 32. The MEP used in this design is with a SR of 211 and 2 reference frames, which will consume more. This work focuses on reducing the MEP power by 3D integration technology. Firstly, a memory die is designed and stacked with our previous MEP die. By adding face-to-face (F2F) pads and through silicon vias (TSV) definitions, 2D electronic design automation (EDA) tools are extended to support the proposed 3D stacking architecture. Moreover, a novel memory controller is designed to control the data transmission and the timing between memory die and MEP die. Finally, 3D physical design is completed for the whole system including TSV/F2F placement, floor plan optimization of two dies, power network generations, and so on. As a result, comparing with 2D technology based /14/$ IEEE

2 wire TSV Motion Estimation Processor dummy silicon substrate Memory package substrate F2F bonding pads Fig. 1. Side view of the stacked dies MEP, the number of input/output (IO) pins is reduced by 77%. After optimizing the floor plan of the processor die and memory die, the routing wire length is reduced by 13.4% and 50% respectively. The simulation results show that the power consumption of the whole design is 64.85mW and the max bandwidth is 14.06GB/s, which is much better compared to the state-of-the-art works. II. ARCHITECTURE DESIGN A. 2-Die Stacked 3D Architecture Fig. 1 shows the side view of the stacked dies. Two dies (MEP and memory) with same size are stacked face to face. MEP die is put on the top because of the following considerations. Firstly, all the IO cells are in MEP die, and are connected with the landing pads on the backside of the MEP die (the upper surface of the 3D chip) by TSV technology. The landing pads are connected with the lead by wire bonding. Secondly, the MEP consumes more power than the memory so that the MEP die generated more heat than the memory die. Thus, a better cooling can be provided for MEP on top die. Based on this design,io pins are not needed for memory die since all the data transmission and power delivery in memory die are through F2F bonding pads. Therefore, without the limitation of IO pins, 128/256 bit width IO memory can be applied in this design. access memory (SRAM) interfaces and schedule MB-level tasks. Two independent memory interfaces are employed to provide connectivity to the memory die. SRAM (A) is a 256 bit interface for buffering reference frame while SRAM (B) is a 128 bit interface for buffering source frame and motion vectors (MVs). Data is stored to SRAM (A) after frame compression [4]. Two caches implemented with on-chip SRAMs are employed for serving reference frame data to IMEC and IMER [3]. The 24KB IMEC cache consists of 16 data memory banks with independent read addresses. Each bank is implemented with a 1R1W SRAM of 256x48bits. The 512KB IMER cache is also composed of 16 data memory banks. Each bank is implemented with a 1RW SRAM with 2048x128bits. The MEP can provide a max throughput of 1.59Gpixels/s. Therefore, real-time 720p@60fps video encoding can be supported at 8MHz. C. Memory Architecture The MEP requires two memories for data access, so 2 memories are designed in this work. The memory is composed of many normal SRAM blocks. With the limitation of the chip size and the big size of SRAM blocks, the capacity is limited. Memory (A) is designed to be 14.25Mb because one 720p frame requires 7.032Mb memories and two reference frames are stored in memory (A). Memory (B) is designed to be 8Mb because one source frame and some MVs are stored in Memory (B). Fig. 3(a) shows the block diagram of memory (A) buffering reference frame. It consists of three 4Mb banks and one 2.25Mb bank. Eight 32bit 512Kb sub-banks which are in a 4Mb bank are combined together to generate a 256 bit width memory bank. The small 2.5Mb bank including four 64b B. MEP Architecture The top-level block diagram of MEP is shown in Fig. 2. Based on our previous work [3], the MEP contains a hierarchical integer motion estimation (IME) component and a fractional motion estimation (FME) component. The IME component is separated into an IME coarse search engine (IMEC) and an IME refinement search engine (IMER). IMEC, IMER and FME work in parallel in an MB-level pipeline and a memory scheduler component issues data requests to static random (a) Block diagram of SRAM memory (A) Sub bank 0 Sub bank 1 Sub bank 2 Sub bank 3 (b) Block diagram of SRAM memory (B) Fig. 2. Top-level block diagram of MEP Fig. 3. Architecture of memory

3 Controller A Controller B Processor netlist SDC file Processor def file Floor plan (manually) verification Memory scheduler Memory netlist SDC file Power/ground network generation TSV & F2F pads placement (manually) Auto placement & routing CTS, global/detail routing Fix violations Insert decap/fill cells Memory A: reference frame buffer Memory B: source frame buffer Memory def file Fig. 5. Backend design flow ME processor Fig. 4. Architecture of memory controller 512Kb sub-banks and two 128b 128Kb sub-banks is also a 256 bit width memory bank. The 4 banks share the same data bus while all sub-banks share the same address bus. Fig. 3(b) shows the block diagram of memory (B) buffering source frame and MVs. It consists of four 2Mb banks. Each 2Mb bank contains four 32b 512Kb sub-banks making up a 128 bit-width memory bank. These 4 banks share another data bus and all sub-banks in memory (B) share another address bus. To reduce the power consumption of the memory die, all memory banks will switch to standby mode automatically until there is data access on the corresponding bank. Benefited from this approach, 13.89% power is saved. Table I summarizes the specification of these 2 memories. The address width of both two memories is 16bit. The synthesis results show that the memory die can run at 300MHz. So the max bandwidth can be up to 14GB/s. TABLE I. Specification of memories Memory (A) Memory (B) Memory type SRAM SRAM Capacity 14.25Mb 8Mb Bit-width Address Capacity of sub-bank 512Kb(I),512Kb(II),128Kb 512Kb Technology file Timing library Physical library D. Architecture of Memory Controller Double-data-rate (DDR) memory controller is integrated in the original MEP [3]. In this work, a novel SRAM memory controller is designed to take the place of the DDR memory controller. There are two main functions of the memory controller. Firstly, it can Control the data transmission between MEP and the two memories. Moreover, it is capable of controlling the timing of MEP and the two memories. The novel memory controller includes 2 independent controllers (controller A and B) as shown in Fig. 4. It responds the data access requests from memory scheduler in MEP. Controller A undertakes the data transmission between memory A and MEP while memory B and MEP are connected by controller B. Both two memories are compatible to the burst mode whose burst length is 8. There are some other benefits from this design. Firstly, all the interfaces between MEP die and memory dies are included in this module. As long as we need to change the interfaces, we only need to modify this module. Secondly, the MEP can be integrated into this design easily. III. 3D PHYSICAL DESIGN In this section, we will introduce the proposed 3D physical design. Design flow is also introduced in [9], but cell definition is not included. We firstly define F2F pad, TSV by modifying library exchange format (LEF) file and timing library. And then, floor plan of the memory die and the MEP die, F2F pads and TSVs placement, and power network are presented in this section. A. Backend Design Flow Fig. 5 shows the backend design flow. The net list file and timing constraint file which are generated in front end design are imported to the EDA tool. Cadence encounter is used in backend design. The MEP die and the memory die run the backend design respectively. Thus, we need to separate the generated net list into 2 net list files before the beginning of backend design. Also we need to prepare many files including 65nm technology file, timing library, physical library, etc. before beginning. In backend design flow, we optimize the floor plan so that the EDA tools can get good result in subsequent steps such as placement, clock tree synthesis (CTS) and routing. After floor plan, power network needs to be designed. Even though this is a low frequency low power 3DLSI design, a strong power network is built to ensure the power delivery. Thirdly, TSV and F2F pads are placed manually. TSV is used in IO area for connecting IO cell and landing pad. Consequently, TSV placement is only run in MEP die. The rest of steps can be done automatically by EDA tools. In step of auto routing, some special settings are done to reserve the top metal layer for F2F bonding. B. F2F Pad and TSV Definitions The side view of the defined TSV is shown in Fig. 6(a). The diameter of TSV is 2um while the diameter of the landing (a) (b) Fig. 6. (a) Side view of TSV, (b) Shape of F2F pad

4 Sub bank 0 in SRAM (B) share low 32 bit data bus Bank 0 0 Bank 1 0 Bank 2 0 Bank 0 1 Bank 1 1 Bank 2 2 Bank 2 1 Bank 3 1 Bank 0 2 Bank 1 2 Bank 0 3 Blocks belonging to SRAM (B) Bank 1 3 Triangle that indicates the orientation of the block and the locations of the pins in the block Bank 2 3 Bank 3 3 Bank 3 0 Bank 3 2 Bank 3 3 The blocks belonging to SRAM (A) are placed together Bank 0 1 Bank 1 1 Bank 2 1 Bank 3 1 Bank 0 3 Bank 1 3 Bank 2 3 Bank 0 5 Bank 1 5 Bank 2 5 Bank 0 7 Bank 1 7 Bank 2 7 Fig. 9. (a) Locations of the F2F pads, (b) F2F signal pads, (c) F2F P/G pads Fig. 7. Floor plan of the memory die pad on the first metal layer is 5um. A large landing pad allows a large misalignment of TSV, so that it can improve the yield. Fig. 6(b) illustrates the shape of the customized F2F pad. The diameter of the F2F pad is 3.4um and the F2F pad has 2 pins. The one on the top metal layer is used for connecting the other die while another one on the low metal layer is used for connecting the signal, power or ground (P/G). C. Floor Plan of Memory Die Both memory A and memory B are integrated in the memory die. Memory B includes 16 memory blocks while Memory A includes 30 memory blocks, i.e. there are 46 available memory blocks in total. In addition, a backup block is set in memory die. These 47 blocks are placed in a 4384um by 4640um core area as shown in Fig. 7. To minimize the chip size, the blocks are placed as close as possible. Floor plan is a key step in backend design flow. It will affect the auto placement and routing results greatly and directly. Several ideas are proposed to optimize the floor plan of memory die. Firstly, to minimize the wire length, the blocks, which share the same address bus and data bus, are lumped together. As described in section II.C, since all the banks from same SRAM (SRAM A or B) share the address bus, placing these banks together can greatly reduce the wire length. As shown in Fig. 7, the blocks covered by the white box are placed together since all the blocks are belonged to SRAM A and share address bus. Secondly, the triangle in the corner of the block indicates the orientation of the block and the locations of the pins of the block. The location and the order of the pins in a block are fixed. But the orientation of the block can be set by user. By making the pins of each block as close as possible and in the same order, the routing congestion can be reduced. Finally, within the SRAM (SRAM A or B), the blocks which share the same part of data bus are put together to reduce the length of the routing wire. As described in section II.C, E.g. the 4 blocks covered by the red box in Fig. 7 are the sub-bank 0 of each bank in SRAM B, and they share the same low 32bit data bus, so they neighbor with each other. Consequently, after optimizing the floor plan, the total routing wire length is reduced by more than 50%. D. F2F Pads and TSVs Placement In this design, TSV is used in IO area of MEP die. Fig. 8 shows the TSVs inserted in the location of the IO pads. To improve the yield, redundant TSVs (32 per IO pad) are placed to connect one IO pad. The pitch of the redundant TSV is 10um. There are 10 input cells, 14 output cells and 32 P/G cells in processor die, i.e., 56 IO pads. So the total number of TSV is Table II lists the TSV parameters. TABLE II. TSV parameters Diameter 2um Pitch 10um Depth 6um # per IO pad 32 Total number 1792 The locations of the F2F pads are decided by the floor plan of the memory die, as shown in Fig. 9(a). The F2F signal pads are placed in the white boxes where the pins of the memory blocks are gathered nearly in. The order of F2F signal pads is the same as the pins in the blocks, so that routing congestion can be reduced. Fig. 9(b) shows the F2F signal pads and the pitch is 5um. There are 803 F2F signal pads in total. The F2F P/G pads are located on the core power ring and Fig. 8. TSVs inserted in an IO pad Fig. 10. Floor plan of MEP die

5 enclosed with the red boxes. The enlarged view of F2F P/G pads is shown in Fig. 9(c). The pitch of the F2F P/G pad is 10um and the number of F2F P/G pads is 500/500. Table III lists the F2F pad parameters in one die. TABLE III. Diameter Pitch Total number F2F pad parameters 3.4um Signal P/G 5um 10um Signal 803 Power 500 Ground 500 E. Floor Plan of Processor Fig. 10 shows the floor plan of MEP die. Since the locations of the F2F pads are decided by memory die, the floor plan of MEP is optimized based on the fixed F2F pads. As described in section II.B, the IMEC cache is composed of 16 memory banks and the IMER cache also consists of 16 memory banks. Each bank is composed of 2 RAM blocks. In order to reduce the routing wire, these 2 blocks are put together. 2.2M logic gates and 72 cache blocks are placed in a 3340um by 3400um core area. Blocks of IMER cache are placed in the bottom while the blocks of IMEC cache are placed in the top. The blocks are placed from periphery to interior with proper orientation. Compared to the floor plan automatically generated by encounter, the routing wire length is reduced by 13.4%. The IO cells are mainly placed in the left and right side of the IO area since the top and the bottom sides are occupied by F2F P/G pads. There are only 24 control signal cells and 32 P/G cells in the processor die because most of pins connecting to memory die are replaced by the F2F signal pads. Compared to [3], the number of IO pin is reduced by 77%. F. Power Network A strong power delivery network ensures reliable operation of circuits on a chip, especially in a 3D IC. Fig. 11 shows the power network of the processor die. The core area is surrounded by a wide P/G core ring connecting inside and outside. Inside the core area, power rails which connect to the power ring horizontally supply power to the standard cells. Furthermore, the power stripes, which connect to the power ring and power rails vertically, are set to reduce the IR-drop. To enhance the power supply to the RAM blocks, block rings are also added, which are not shown in the figure. The F2F P/G pads are used for power delivering from MEP die to the memory die, and they are connected to the power ring directly. The power network of the memory die is similar to the processor die, except that there is no P/G cell and the locations of the F2F P/G pads are on the power ring. So the power network of the memory die is not introduced again. IV. SIMULATION RESULTS AND COMPARISON The proposed architecture is synthesized with synopsys design-compiler by using 65 G standard cell library. And then, cadence encounter is used in backend design including floor plan, power network generation, TSV and F2F placement, CTS, auto placement and routing, etc. Fig. 12 shows the layout of the MEP die, the memory die, and the physical characteristic of the whole design is summarized in table IV. The simulations are based on these layouts. Since this is a low frequency and low power design, much verification such as signal integrity (SI) analysis and thermal simulation are not necessary. The result of the 3D power analysis and 3D IR-drop simulation are shown in the following parts. TABLE IV. Physical characteristic of this design Process technology 65nm Chip size 5000um x 5000um Core size Processor die Memory die Frequency@voltage # of TSV um x 3400um 4384um x 4640um 8MHz@1.2V # of F2F signal/p/g pad 803/500/500 #of signal/pg IO cell 24/32 A. 3D power analysis 3D power analysis is performed by cadence encounter Power system (EPS). Not only the physical library, timing library, technology file, but also the net list file, design exchange format (DEF) file, standard delay format file, etc. are imported into EPS. Then, the analysis method is set to static and the corner mode is set to normal (1.2V at 25 ). The frequency is read from the timing constraint file and the toggle rate is set. Finally, EPS reports the result that the total power is 64.85mW. The power of the processor die and memory die is 37.67mW and 27.18mW respectively. B. 3D IR-Drop simulation IR-drop simulation is also done by EPS. Firstly, a cell list is done to create power grid library. Then the analysis method is IO cell F2F pad Power ring Ground ring Fig. 11. Power network of the processor die Fig. 12. Layout of processor die (left) and memory die (right)

6 set to static mode, the temperature is set to 25 and the locations of the source power are set. Here the source power of the processor die is the power cell and the source power of the memory die is the F2F power pad. Thirdly, the result files of power analysis are imported. After finishing these steps, EPS analyzes the IR-drop automatically. The worst IR-drop of the processor die is 5.478mV while the maximum IR-drop of the memory die is only 1.555mV. Note that the voltage of the power net is 1.05V, so there are plenty of margins for the 3D stacked case. C. Comparison Here we do not compare this design with [1] and [2] for their different functions. Ref. [1] is a video decoder which is without MEP. Ref. [2] is a whole encoder including not only MEP but also other parts. An MEP with system-in-silicon architecture is introduced in [12], whose core power and memory power is 2383mW and 190mW respectively. The total power is 2573mW. After normalization, the power and energy efficiency are 432.5mW and 6.952nJ/pixel respectively. TABLE V. Memory type Design specification and power comparison [12] This design System-in-Silicon DRAM / on chip 3D stacked memory / on chip Technology 180nm/110nm 65nm Frequency 200MHz/25MHz@1.8V 8MHz@1.2V Throughput 1080p@30fps 720p@60fps Core power Norm. core power a Memory power Norm. memory power 2383mW 382.5mW 190mW 50mW 37.67mW 27.18mW Norm. total power 432.5mW 64.85mW Energy efficiency b 6.952nJ/pixel 1.173nJ/pixel a. Power normalized to 65nm (P65 = P180 / 6.23 = P110 / 3.8) b. Energy efficiency = Norm. total power / throughput TABLE VI. Bandwidth comparison [9] [6] This design Footprint 12.3mm x 21.8mm - 5mm x 5mm # of tier/die Max frequency 133MHz - 300MHz Working frequency 60MHz 200MHz 8MHz Data width x Max bandwidth 8.5GB/s 12.8GB/s 14GB/s Table V shows the specification of this design and the power comparison to [12]. Benefit from the algorithm optimization, the MEP can process 9 pixels per cycle in average, so that it can encode 720p@60fps video sequences with 2 reference frames at 8MHz. Under this frequency, the power consumption of the MEP die and the memory die is as shown. Benefit from the frequency reduction, and 3D integration, the energy efficiency of this design is 1.173nJ/pixel, which is only one sixth of [12]. Bandwidth comparison is given in table VI. A 3D implementation of H.264 encoder is introduced in [9], whose max bandwidth is 8.5GB/s. The max bandwidth of this design is 14GB/s, which are about 1.64 times of [9]. Also the max bandwidth of this design is a little higher than the wide IO single date rate memory [6], since the bandwidth of [6] is 12.8GB/s. V. CONCLUSION In this paper, a MEP with 3D stacked memory architecture is proposed to reduce the memory power and provide higher bandwidth. By adding F2F pad and TSV definitions, 2D EDA tools are extended to support the proposed 3D stacking architecture. Furthermore, a novel memory controller is designed to control the data transmission and the timing between memory die and MEP die. Finally, 3D physical design is completed for the whole system including floor plan optimization of two dies, TSV/F2F placement, power network generations, etc. As a result, comparing with 2D technology based MEP, the number of IO pins is reduced by 77%. After optimizing the floor plan of the processor die and memory die, the routing wire length is reduced by 13.4% and 50% respectively. The simulation result show that the power consumption and the max bandwidth of the whole design is 64.85mW and 14GB/s respectively, which is much better compared to the state-of-the-art. REFERENCES [1] Sze, Vivienne, et al. "A 0.7-v 1.8-Mw H. 264/AVC 720p Video Decoder". Solid-State Circuits, IEEE Journal of (2009): [2] Yu-Kun Lin, et al. "A 242mW 10mm2 1080p H.264/AVC High-Profile Encoder Chip". Solid-State Circuits Conference, ISSCC Digest of Technical Papers. IEEE International [3] Zhou, Jinjia, et al. "A 1.59 Gpixel/s Motion Estimation Processor with to-211 Search Range for UHDTV Video Encoder". VLSI Circuits (VLSIC), 2013 Symposium on. IEEE, C286-C287. [4] Zhou, Dajiang, et al. "A 530 Mpixels/s 4096x2160@ 60fps H. 264/AVC High Profile Video Decoder Chip". Solid-State Circuits, IEEE Journal of 46.4 (2011): [5] Uksong Kang, et al. "8 Gb 3-D DDR3 DRAM using through-silicon-via Technology". Solid-State Circuits, IEEE Journal of 45.1 (2010): [6] Kim, Jung-Sik, et al. "A 1.2 V 12.8 GB/s 2 Gb Mobile Wide-I/O DRAM with I/Os using TSV Based Stacking". Solid-State Circuits, IEEE Journal of 47.1 (2012): [7] Jeddeloh, Joe, and Brent Keeth. "Hybrid Memory Cube New DRAM Architecture Increases Density and Performance". VLSI Technology (VLSIT), 2012 Symposium on. IEEE, [8] Healy, M. B., et al. "Design and Analysis of 3D-MAPS: A Many-Core 3D Processor with Stacked Memory". Custom Integrated Circuits Conference (CICC), 2010 IEEE [9] Tao Zhang, et al. "A 3D SoC Design for H.264 Application with on- Chip DRAM Stacking". 3D Systems Integration Conference (3DIC), 2010 IEEE International [10] ator/ddr3_power_calc.xlsm [11] Woong IL Choi, Byeungwoo Jeon, and Jechang Jeong. "Fast Motion Estimation with Modified Diamond Search for Variable Motion Block Sizes". Image Processing, ICIP Proceedings International Conference on II vol.3. [12] Kumagai, K., et al. "System-in-Silicon Architecture and its Application to H.264/AVC Motion Estimation for 1080HDTV". Solid-State Circuits Conference, ISSCC Digest of Technical Papers. IEEE International

A Study of IR-drop Noise Issues in 3D ICs with Through-Silicon-Vias

A Study of IR-drop Noise Issues in 3D ICs with Through-Silicon-Vias A Study of IR-drop Noise Issues in 3D ICs with Through-Silicon-Vias Moongon Jung and Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology, Atlanta, Georgia, USA Email:

More information

On GPU Bus Power Reduction with 3D IC Technologies

On GPU Bus Power Reduction with 3D IC Technologies On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The

More information

A Design Tradeoff Study with Monolithic 3D Integration

A Design Tradeoff Study with Monolithic 3D Integration A Design Tradeoff Study with Monolithic 3D Integration Chang Liu and Sung Kyu Lim Georgia Institute of Techonology Atlanta, Georgia, 3332 Phone: (44) 894-315, Fax: (44) 385-1746 Abstract This paper studies

More information

Physical Design Implementation for 3D IC Methodology and Tools. Dave Noice Vassilios Gerousis

Physical Design Implementation for 3D IC Methodology and Tools. Dave Noice Vassilios Gerousis I NVENTIVE Physical Design Implementation for 3D IC Methodology and Tools Dave Noice Vassilios Gerousis Outline 3D IC Physical components Modeling 3D IC Stack Configuration Physical Design With TSV Summary

More information

Design and Analysis of Ultra Low Power Processors Using Sub/Near-Threshold 3D Stacked ICs

Design and Analysis of Ultra Low Power Processors Using Sub/Near-Threshold 3D Stacked ICs Design and Analysis of Ultra Low Power Processors Using Sub/Near-Threshold 3D Stacked ICs Sandeep Kumar Samal, Yarui Peng, Yang Zhang, and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta,

More information

3D systems-on-chip. A clever partitioning of circuits to improve area, cost, power and performance. The 3D technology landscape

3D systems-on-chip. A clever partitioning of circuits to improve area, cost, power and performance. The 3D technology landscape Edition April 2017 Semiconductor technology & processing 3D systems-on-chip A clever partitioning of circuits to improve area, cost, power and performance. In recent years, the technology of 3D integration

More information

2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don

2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don RAMP-IV: A Low-Power and High-Performance 2D/3D Graphics Accelerator for Mobile Multimedia Applications Woo, Sungdae Choi, Ju-Ho Sohn, Seong-Jun Song, Young-Don Bae,, and Hoi-Jun Yoo oratory Dept. of EECS,

More information

High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC

High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC Journal of Computational Information Systems 7: 8 (2011) 2843-2850 Available at http://www.jofcis.com High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC Meihua GU 1,2, Ningmei

More information

Design and Analysis of 3D IC-Based Low Power Stereo Matching Processors

Design and Analysis of 3D IC-Based Low Power Stereo Matching Processors Design and Analysis of 3D IC-Based Low Power Stereo Matching Processors Seung-Ho Ok 1, Kyeong-ryeol Bae 1, Sung Kyu Lim 2, and Byungin Moon 1 1 School of Electronics Engineering, Kyungpook National University,

More information

Thermal-Aware Memory Management Unit of 3D- Stacked DRAM for 3D High Definition (HD) Video

Thermal-Aware Memory Management Unit of 3D- Stacked DRAM for 3D High Definition (HD) Video Thermal-Aware Memory Management Unit of 3D- Stacked DRAM for 3D High Definition (HD) Video Chih-Yuan Chang, Po-Tsang Huang, Yi-Chun Chen, Tian-Sheuan Chang and Wei Hwang Department of Electronics Engineering

More information

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 ISSCC 26 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 22.1 A 125µW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ting-An Lin 2, Sheng-Zen Wang 2, Wen-Ping Lee

More information

FABRICATION TECHNOLOGIES

FABRICATION TECHNOLOGIES FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general

More information

Power-Supply-Network Design in 3D Integrated Systems

Power-Supply-Network Design in 3D Integrated Systems Power-Supply-Network Design in 3D Integrated Systems Michael B. Healy and Sung Kyu Lim School of Electrical and Computer Engineering, Georgia Institute of Technology 777 Atlantic Dr. NW, Atlanta, GA 3332

More information

An Overview of Standard Cell Based Digital VLSI Design

An Overview of Standard Cell Based Digital VLSI Design An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,

More information

Five Emerging DRAM Interfaces You Should Know for Your Next Design

Five Emerging DRAM Interfaces You Should Know for Your Next Design Five Emerging DRAM Interfaces You Should Know for Your Next Design By Gopal Raghavan, Cadence Design Systems Producing DRAM chips in commodity volumes and prices to meet the demands of the mobile market

More information

Xuena Bao, Dajiang Zhou, Peilin Liu, and Satoshi Goto, Fellow, IEEE

Xuena Bao, Dajiang Zhou, Peilin Liu, and Satoshi Goto, Fellow, IEEE An Advanced Hierarchical Motion Estimation Scheme with Lossless Frame Recompression and Early Level Termination for Beyond High Definition Video Coding Xuena Bao, Dajiang Zhou, Peilin Liu, and Satoshi

More information

Low-Power Technology for Image-Processing LSIs

Low-Power Technology for Image-Processing LSIs Low- Technology for Image-Processing LSIs Yoshimi Asada The conventional LSI design assumed power would be supplied uniformly to all parts of an LSI. For a design with multiple supply voltages and a power

More information

3D SYSTEM INTEGRATION TECHNOLOGY CHOICES AND CHALLENGE ERIC BEYNE, ANTONIO LA MANNA

3D SYSTEM INTEGRATION TECHNOLOGY CHOICES AND CHALLENGE ERIC BEYNE, ANTONIO LA MANNA 3D SYSTEM INTEGRATION TECHNOLOGY CHOICES AND CHALLENGE ERIC BEYNE, ANTONIO LA MANNA OUTLINE 3D Application Drivers and Roadmap 3D Stacked-IC Technology 3D System-on-Chip: Fine grain partitioning Conclusion

More information

An overview of standard cell based digital VLSI design

An overview of standard cell based digital VLSI design An overview of standard cell based digital VLSI design Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cellbased

More information

Physical Design of a 3D-Stacked Heterogeneous Multi-Core Processor

Physical Design of a 3D-Stacked Heterogeneous Multi-Core Processor Physical Design of a -Stacked Heterogeneous Multi-Core Processor Randy Widialaksono, Rangeen Basu Roy Chowdhury, Zhenqian Zhang, Joshua Schabel, Steve Lipa, Eric Rotenberg, W. Rhett Davis, Paul Franzon

More information

On Enhancing Power Benefits in 3D ICs: Block Folding and Bonding Styles Perspective

On Enhancing Power Benefits in 3D ICs: Block Folding and Bonding Styles Perspective On Enhancing Power Benefits in 3D ICs: Block Folding and Bonding Styles Perspective Moongon Jung, Taigon Song, Yang Wan, Yarui Peng, and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta,

More information

Xilinx SSI Technology Concept to Silicon Development Overview

Xilinx SSI Technology Concept to Silicon Development Overview Xilinx SSI Technology Concept to Silicon Development Overview Shankar Lakka Aug 27 th, 2012 Agenda Economic Drivers and Technical Challenges Xilinx SSI Technology, Power, Performance SSI Development Overview

More information

A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye

A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS Theepan Moorthy and Andy Ye Department of Electrical and Computer Engineering Ryerson

More information

Cadence On-Line Document

Cadence On-Line Document Cadence On-Line Document 1 Purpose: Use Cadence On-Line Document to look up command/syntax in SoC Encounter. 2 Cadence On-Line Document An on-line searching system which can be used to inquire about LEF/DEF

More information

Three DIMENSIONAL-CHIPS

Three DIMENSIONAL-CHIPS IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 4 (Sep-Oct. 2012), PP 22-27 Three DIMENSIONAL-CHIPS 1 Kumar.Keshamoni, 2 Mr. M. Harikrishna

More information

IMEC CORE CMOS P. MARCHAL

IMEC CORE CMOS P. MARCHAL APPLICATIONS & 3D TECHNOLOGY IMEC CORE CMOS P. MARCHAL OUTLINE What is important to spec 3D technology How to set specs for the different applications - Mobile consumer - Memory - High performance Conclusions

More information

ECE 486/586. Computer Architecture. Lecture # 2

ECE 486/586. Computer Architecture. Lecture # 2 ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:

More information

Fast frame memory access method for H.264/AVC

Fast frame memory access method for H.264/AVC Fast frame memory access method for H.264/AVC Tian Song 1a), Tomoyuki Kishida 2, and Takashi Shimamoto 1 1 Computer Systems Engineering, Department of Institute of Technology and Science, Graduate School

More information

3D TECHNOLOGIES: SOME PERSPECTIVES FOR MEMORY INTERCONNECT AND CONTROLLER

3D TECHNOLOGIES: SOME PERSPECTIVES FOR MEMORY INTERCONNECT AND CONTROLLER 3D TECHNOLOGIES: SOME PERSPECTIVES FOR MEMORY INTERCONNECT AND CONTROLLER CODES+ISSS: Special session on memory controllers Taipei, October 10 th 2011 Denis Dutoit, Fabien Clermidy, Pascal Vivet {denis.dutoit@cea.fr}

More information

Physical Implementation

Physical Implementation CS250 VLSI Systems Design Fall 2009 John Wawrzynek, Krste Asanovic, with John Lazzaro Physical Implementation Outline Standard cell back-end place and route tools make layout mostly automatic. However,

More information

Japanese two Samurai semiconductor ventures succeeded in near 3D-IC but failed the business, why? and what's left?

Japanese two Samurai semiconductor ventures succeeded in near 3D-IC but failed the business, why? and what's left? Japanese two Samurai semiconductor ventures succeeded in near 3D-IC but failed the business, why? and what's left? Liquid Design Systems, Inc CEO Naoya Tohyama Overview of this presentation Those slides

More information

ASIC Physical Design Top-Level Chip Layout

ASIC Physical Design Top-Level Chip Layout ASIC Physical Design Top-Level Chip Layout References: M. Smith, Application Specific Integrated Circuits, Chap. 16 Cadence Virtuoso User Manual Top-level IC design process Typically done before individual

More information

FRAME-LEVEL QUALITY AND MEMORY TRAFFIC ALLOCATION FOR LOSSY EMBEDDED COMPRESSION IN VIDEO CODEC SYSTEMS

FRAME-LEVEL QUALITY AND MEMORY TRAFFIC ALLOCATION FOR LOSSY EMBEDDED COMPRESSION IN VIDEO CODEC SYSTEMS FRAME-LEVEL QUALITY AD MEMORY TRAFFIC ALLOCATIO FOR LOSSY EMBEDDED COMPRESSIO I VIDEO CODEC SYSTEMS Li Guo, Dajiang Zhou, Shinji Kimura, and Satoshi Goto Graduate School of Information, Production and

More information

Calibrating Achievable Design GSRC Annual Review June 9, 2002

Calibrating Achievable Design GSRC Annual Review June 9, 2002 Calibrating Achievable Design GSRC Annual Review June 9, 2002 Wayne Dai, Andrew Kahng, Tsu-Jae King, Wojciech Maly,, Igor Markov, Herman Schmit, Dennis Sylvester DUSD(Labs) Calibrating Achievable Design

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

UCLA 3D research started in 2002 under DARPA with CFDRC

UCLA 3D research started in 2002 under DARPA with CFDRC Coping with Vertical Interconnect Bottleneck Jason Cong UCLA Computer Science Department cong@cs.ucla.edu http://cadlab.cs.ucla.edu/ cs edu/~cong Outline Lessons learned Research challenges and opportunities

More information

Laboratory 6. - Using Encounter for Automatic Place and Route. By Mulong Li, 2013

Laboratory 6. - Using Encounter for Automatic Place and Route. By Mulong Li, 2013 CME 342 (VLSI Circuit Design) Laboratory 6 - Using Encounter for Automatic Place and Route By Mulong Li, 2013 Reference: Digital VLSI Chip Design with Cadence and Synopsys CAD Tools, Erik Brunvand Background

More information

Packaging Technology for Image-Processing LSI

Packaging Technology for Image-Processing LSI Packaging Technology for Image-Processing LSI Yoshiyuki Yoneda Kouichi Nakamura The main function of a semiconductor package is to reliably transmit electric signals from minute electrode pads formed on

More information

OVERCOMING THE MEMORY WALL FINAL REPORT. By Jennifer Inouye Paul Molloy Matt Wisler

OVERCOMING THE MEMORY WALL FINAL REPORT. By Jennifer Inouye Paul Molloy Matt Wisler OVERCOMING THE MEMORY WALL FINAL REPORT By Jennifer Inouye Paul Molloy Matt Wisler ECE/CS 570 OREGON STATE UNIVERSITY Winter 2012 Contents 1. Introduction... 3 2. Background... 5 3. 3D Stacked Memory...

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

Unleashing the Power of Embedded DRAM

Unleashing the Power of Embedded DRAM Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers

More information

ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2

ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2 ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2 9.2 A 80/20MHz 160mW Multimedia Processor integrated with Embedded DRAM MPEG-4 Accelerator and 3D Rendering Engine for Mobile Applications

More information

Development of Low Power ISDB-T One-Segment Decoder by Mobile Multi-Media Engine SoC (S1G)

Development of Low Power ISDB-T One-Segment Decoder by Mobile Multi-Media Engine SoC (S1G) Development of Low Power ISDB-T One-Segment r by Mobile Multi-Media Engine SoC (S1G) K. Mori, M. Suzuki *, Y. Ohara, S. Matsuo and A. Asano * Toshiba Corporation Semiconductor Company, 580-1 Horikawa-Cho,

More information

Outline. SoC Encounter Flow. Typical Backend Design Flow. Digital IC-Project and Verification. Place and Route. Backend ASIC Design flow

Outline. SoC Encounter Flow. Typical Backend Design Flow. Digital IC-Project and Verification. Place and Route. Backend ASIC Design flow Outline Digital IC-Project and Verification Deepak Dasalukunte Backend ASIC Design flow General steps Input files Floorplanning Placement Clock-synthesis Routing Typical Backend Design Flow SoC Encounter

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

Thermal Analysis on Face-to-Face(F2F)-bonded 3D ICs

Thermal Analysis on Face-to-Face(F2F)-bonded 3D ICs 1/16 Thermal Analysis on Face-to-Face(F2F)-bonded 3D ICs Kyungwook Chang, Sung-Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology Introduction Challenges in 2D Device

More information

High-Density Integration of Functional Modules Using Monolithic 3D-IC Technology

High-Density Integration of Functional Modules Using Monolithic 3D-IC Technology High-Density Integration of Functional Modules Using Monolithic 3D-IC Technology Shreepad Panth 1, Kambiz Samadi 2, Yang Du 2, and Sung Kyu Lim 1 1 Dept. of Electrical and Computer Engineering, Georgia

More information

8Kb Logic Compatible DRAM based Memory Design for Low Power Systems

8Kb Logic Compatible DRAM based Memory Design for Low Power Systems 8Kb Logic Compatible DRAM based Memory Design for Low Power Systems Harshita Shrivastava 1, Rajesh Khatri 2 1,2 Department of Electronics & Instrumentation Engineering, Shree Govindram Seksaria Institute

More information

SYNTHESIS FOR ADVANCED NODES

SYNTHESIS FOR ADVANCED NODES SYNTHESIS FOR ADVANCED NODES Abhijeet Chakraborty Janet Olson SYNOPSYS, INC ISPD 2012 Synopsys 2012 1 ISPD 2012 Outline Logic Synthesis Evolution Technology and Market Trends The Interconnect Challenge

More information

CAD Technology of the SX-9

CAD Technology of the SX-9 KONNO Yoshihiro, IKAWA Yasuhiro, SAWANO Tomoki KANAMARU Keisuke, ONO Koki, KUMAZAKI Masahito Abstract This paper outlines the design techniques and CAD technology used with the SX-9. The LSI and package

More information

Lecture Content. 1 Adam Teman, 2018

Lecture Content. 1 Adam Teman, 2018 Lecture Content 1 Adam Teman, 2018 Digital VLSI Design Lecture 6: Moving to the Physical Domain Semester A, 2018-19 Lecturer: Dr. Adam Teman December 24, 2018 Disclaimer: This course was prepared, in its

More information

ProASIC PLUS FPGA Family

ProASIC PLUS FPGA Family ProASIC PLUS FPGA Family Key Features Reprogrammable /Nonvolatile Flash Technology Low Power Secure Single Chip/Live at Power Up 1M Equivalent System Gates Cost Effective ASIC Alternative ASIC Design Flow

More information

Chapter 0 Introduction

Chapter 0 Introduction Chapter 0 Introduction Jin-Fu Li Laboratory Department of Electrical Engineering National Central University Jhongli, Taiwan Applications of ICs Consumer Electronics Automotive Electronics Green Power

More information

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING Dieison Silveira, Guilherme Povala,

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

Monolithic 3D IC Design for Deep Neural Networks

Monolithic 3D IC Design for Deep Neural Networks Monolithic 3D IC Design for Deep Neural Networks 1 with Application on Low-power Speech Recognition Kyungwook Chang 1, Deepak Kadetotad 2, Yu (Kevin) Cao 2, Jae-sun Seo 2, and Sung Kyu Lim 1 1 School of

More information

edram to the Rescue Why edram 1/3 Area 1/5 Power SER 2-3 Fit/Mbit vs 2k-5k for SRAM Smaller is faster What s Next?

edram to the Rescue Why edram 1/3 Area 1/5 Power SER 2-3 Fit/Mbit vs 2k-5k for SRAM Smaller is faster What s Next? edram to the Rescue Why edram 1/3 Area 1/5 Power SER 2-3 Fit/Mbit vs 2k-5k for SRAM Smaller is faster What s Next? 1 Integrating DRAM and Logic Integrate with Logic without impacting logic Performance,

More information

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding LETTER IEICE Electronics Express, Vol.14, No.21, 1 11 Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding Rongshan Wei a) and Xingang Zhang College of Physics

More information

DFT-3D: What it means to Design For 3DIC Test? Sanjiv Taneja Vice President, R&D Silicon Realization Group

DFT-3D: What it means to Design For 3DIC Test? Sanjiv Taneja Vice President, R&D Silicon Realization Group I N V E N T I V E DFT-3D: What it means to Design For 3DIC Test? Sanjiv Taneja Vice President, R&D Silicon Realization Group Moore s Law & More : Tall And Thin More than Moore: Diversification Moore s

More information

On the Design of Ultra-High Density 14nm Finfet based Transistor-Level Monolithic 3D ICs

On the Design of Ultra-High Density 14nm Finfet based Transistor-Level Monolithic 3D ICs 2016 IEEE Computer Society Annual Symposium on VLSI On the Design of Ultra-High Density 14nm Finfet based Transistor-Level Monolithic 3D ICs Jiajun Shi 1,2, Deepak Nayak 1,Motoi Ichihashi 1, Srinivasa

More information

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System 1 1 1 Centip3De: A 64-Core, 3D Stacked, Near-Threshold System Ronald G. Dreslinski David Fick, Bharan Giridhar, Gyouho Kim, Sangwon Seo, Matthew Fojtik, Sudhir Satpathy, Yoonmyung Lee, Daeyeon Kim, Nurrachman

More information

ISSN Vol.05, Issue.12, December-2017, Pages:

ISSN Vol.05, Issue.12, December-2017, Pages: ISSN 2322-0929 Vol.05, Issue.12, December-2017, Pages:1174-1178 www.ijvdcs.org Design of High Speed DDR3 SDRAM Controller NETHAGANI KAMALAKAR 1, G. RAMESH 2 1 PG Scholar, Khammam Institute of Technology

More information

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary

More information

Design and Implementation of High Performance DDR3 SDRAM controller

Design and Implementation of High Performance DDR3 SDRAM controller Design and Implementation of High Performance DDR3 SDRAM controller Mrs. Komala M 1 Suvarna D 2 Dr K. R. Nataraj 3 Research Scholar PG Student(M.Tech) HOD, Dept. of ECE Jain University, Bangalore SJBIT,Bangalore

More information

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System

More information

A Low Power DDR SDRAM Controller Design P.Anup, R.Ramana Reddy

A Low Power DDR SDRAM Controller Design P.Anup, R.Ramana Reddy A Low Power DDR SDRAM Controller Design P.Anup, R.Ramana Reddy Abstract This paper work leads to a working implementation of a Low Power DDR SDRAM Controller that is meant to be used as a reference for

More information

TSV Test. Marc Loranger Director of Test Technologies Nov 11 th 2009, Seoul Korea

TSV Test. Marc Loranger Director of Test Technologies Nov 11 th 2009, Seoul Korea TSV Test Marc Loranger Director of Test Technologies Nov 11 th 2009, Seoul Korea # Agenda TSV Test Issues Reliability and Burn-in High Frequency Test at Probe (HFTAP) TSV Probing Issues DFT Opportunities

More information

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN BANDWIDTH REDUCTION SCHEMES FOR MPEG- TO H. TRANSCODER DESIGN Xianghui Wei, Wenqi You, Guifen Tian, Yan Zhuang, Takeshi Ikenaga, Satoshi Goto Graduate School of Information, Production and Systems, Waseda

More information

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 09, 2016 ISSN (online): 2321-0613 A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM Yogit

More information

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor Taeho Kgil, Shaun D Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steve Reinhardt, Krisztian Flautner,

More information

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad NoC Round Table / ESA Sep. 2009 Asynchronous Three Dimensional Networks on on Chip Frédéric ric PétrotP Outline Three Dimensional Integration Clock Distribution and GALS Paradigm Contribution of the Third

More information

Tutorial 2 Automatic Placement & Routing

Tutorial 2 Automatic Placement & Routing Tutorial 2 Automatic Placement & Routing Please follow the instructions found under Setup on the CADTA main page before starting this tutorial. 1.1. Start Encounter Log on to a VLSI server using your EE

More information

Digital IC- Project 1. Place and Route. Oskar Andersson. Oskar Andersson, EIT, LTH, Digital IC project and Verifica=on

Digital IC- Project 1. Place and Route. Oskar Andersson. Oskar Andersson, EIT, LTH, Digital IC project and Verifica=on Digital IC- Project 1 Oskar Andersson Outline Backend ASIC Design flow (Physical Design) General steps Input files Floorplanning Placement ClockTree- synthesis Rou=ng Typical Backend Design Flow Synthesis

More information

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University Abbas El Gamal Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program Stanford University Chip stacking Vertical interconnect density < 20/mm Wafer Stacking

More information

3D-IC is Now Real: Wide-IO is Driving 3D-IC TSV. Samta Bansal and Marc Greenberg, Cadence EDPS Monterey, CA April 5-6, 2012

3D-IC is Now Real: Wide-IO is Driving 3D-IC TSV. Samta Bansal and Marc Greenberg, Cadence EDPS Monterey, CA April 5-6, 2012 3D-IC is Now Real: Wide-IO is Driving 3D-IC TSV Samta Bansal and Marc Greenberg, Cadence EDPS Monterey, CA April 5-6, 2012 What the fuss is all about * Source : ECN Magazine March 2011 * Source : EDN Magazine

More information

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS)

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS) ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS) Objective Part A: To become acquainted with Spectre (or HSpice) by simulating an inverter,

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

An Automated System for Checking Lithography Friendliness of Standard Cells

An Automated System for Checking Lithography Friendliness of Standard Cells An Automated System for Checking Lithography Friendliness of Standard Cells I-Lun Tseng, Senior Member, IEEE, Yongfu Li, Senior Member, IEEE, Valerio Perez, Vikas Tripathi, Zhao Chuan Lee, and Jonathan

More information

Introduction 1. GENERAL TRENDS. 1. The technology scale down DEEP SUBMICRON CMOS DESIGN

Introduction 1. GENERAL TRENDS. 1. The technology scale down DEEP SUBMICRON CMOS DESIGN 1 Introduction The evolution of integrated circuit (IC) fabrication techniques is a unique fact in the history of modern industry. The improvements in terms of speed, density and cost have kept constant

More information

SMAFTI Package Technology Features Wide-Band and Large-Capacity Memory

SMAFTI Package Technology Features Wide-Band and Large-Capacity Memory SMAFTI Package Technology Features Wide-Band and Large-Capacity Memory KURITA Yoichiro, SOEJIMA Koji, KAWANO Masaya Abstract and NEC Corporation have jointly developed an ultra-compact system-in-package

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 10: Three-Dimensional (3D) Integration

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 10: Three-Dimensional (3D) Integration 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 10: Three-Dimensional (3D) Integration Instructor: Ron Dreslinski Winter 2016 University of Michigan 1 1 1 Announcements

More information

An Infrastructural IP for Interactive MPEG-4 SoC Functional Verification

An Infrastructural IP for Interactive MPEG-4 SoC Functional Verification International Journal on Electrical Engineering and Informatics - Volume 1, Number 2, 2009 An Infrastructural IP for Interactive MPEG-4 SoC Functional Verification Trio Adiono 1, Hans G. Kerkhoff 2 & Hiroaki

More information

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla.

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla. HotChips 2007 An innovative HD video and digital image processor for low-cost digital entertainment products Deepu Talla Texas Instruments 1 Salient features of the SoC HD video encode and decode using

More information

Design of Low Power Wide Gates used in Register File and Tag Comparator

Design of Low Power Wide Gates used in Register File and Tag Comparator www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,

More information

DIRECT Rambus DRAM has a high-speed interface of

DIRECT Rambus DRAM has a high-speed interface of 1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, NOVEMBER 1999 A 1.6-GByte/s DRAM with Flexible Mapping Redundancy Technique and Additional Refresh Scheme Satoru Takase and Natsuki Kushiyama

More information

Memory Technologies for the Multimedia Market

Memory Technologies for the Multimedia Market Memory Technologies for the Multimedia Market Hitachi Review Vol. 50 (), No. 2 33 Katsuyuki Sato, Ph.D. Yoshikazu Saito Hitoshi Miwa Yasuhiro Kasama OVERVIEW: Different mobile multimedia-oriented products

More information

POWER REDUCTION IN CONTENT ADDRESSABLE MEMORY

POWER REDUCTION IN CONTENT ADDRESSABLE MEMORY POWER REDUCTION IN CONTENT ADDRESSABLE MEMORY Latha A 1, Saranya G 2, Marutharaj T 3 1, 2 PG Scholar, Department of VLSI Design, 3 Assistant Professor Theni Kammavar Sangam College Of Technology, Theni,

More information

Physical Placement with Cadence SoCEncounter 7.1

Physical Placement with Cadence SoCEncounter 7.1 Physical Placement with Cadence SoCEncounter 7.1 Joachim Rodrigues Department of Electrical and Information Technology Lund University Lund, Sweden November 2008 Address for correspondence: Joachim Rodrigues

More information

Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering,

Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering, Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering, K.S.R College of Engineering, Tiruchengode, Tamilnadu,

More information

THE latest generation of microprocessors uses a combination

THE latest generation of microprocessors uses a combination 1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz

More information

ISSN Vol.05,Issue.09, September-2017, Pages:

ISSN Vol.05,Issue.09, September-2017, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.09, September-2017, Pages:1693-1697 AJJAM PUSHPA 1, C. H. RAMA MOHAN 2 1 PG Scholar, Dept of ECE(DECS), Shirdi Sai Institute of Science and Technology, Anantapuramu,

More information

Tutorial for Cadence SOC Encounter Place & Route

Tutorial for Cadence SOC Encounter Place & Route Tutorial for Cadence SOC Encounter Place & Route For Encounter RTL-to-GDSII System 13.15 T. Manikas, Southern Methodist University, 3/9/15 Contents 1 Preliminary Setup... 1 1.1 Helpful Hints... 1 2 Starting

More information

System Verification of Hardware Optimization Based on Edge Detection

System Verification of Hardware Optimization Based on Edge Detection Circuits and Systems, 2013, 4, 293-298 http://dx.doi.org/10.4236/cs.2013.43040 Published Online July 2013 (http://www.scirp.org/journal/cs) System Verification of Hardware Optimization Based on Edge Detection

More information

Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization

Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization Fazal Hameed and Jeronimo Castrillon Center for Advancing Electronics Dresden (cfaed), Technische Universität Dresden,

More information

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION Yi-Hau Chen, Tzu-Der Chuang, Chuan-Yung Tsai, Yu-Jen Chen, and Liang-Gee Chen DSP/IC Design Lab., Graduate Institute

More information

Digital system (SoC) design for lowcomplexity. Hyun Kim

Digital system (SoC) design for lowcomplexity. Hyun Kim Digital system (SoC) design for lowcomplexity multimedia processing Hyun Kim SoC Design for Multimedia Systems Goal : Reducing computational complexity & power consumption of state-ofthe-art technologies

More information

More Course Information

More Course Information More Course Information Labs and lectures are both important Labs: cover more on hands-on design/tool/flow issues Lectures: important in terms of basic concepts and fundamentals Do well in labs Do well

More information

FPGA Provides Speedy Data Compression for Hyperspectral Imagery

FPGA Provides Speedy Data Compression for Hyperspectral Imagery FPGA Provides Speedy Data Compression for Hyperspectral Imagery Engineers implement the Fast Lossless compression algorithm on a Virtex-5 FPGA; this implementation provides the ability to keep up with

More information

3D Memory Stacking for Fast Checkpointing/Restore Applications

3D Memory Stacking for Fast Checkpointing/Restore Applications 3D Memory Stacking for Fast Checkpointing/Restore Applications Jing Xie, Xiangyu Dong, Yuan Xie Pennsylvania State University Computer Science and Engineering Department University Park, PA, 682, USA Abstract

More information

MOSAID Semiconductor

MOSAID Semiconductor MOSAID Semiconductor Fabr-IC (A Single-Chip Gigabit Ethernet Switch With Integrated Memory) @Hot Chips Dave Brown Chief Architect July 4, 2001 Fabr-IC Feature summary 2 Gig ports 1 gig port for stacking

More information