Parallel Discrete Event Simulation of Transaction Level Models

Size: px
Start display at page:

Download "Parallel Discrete Event Simulation of Transaction Level Models"

Transcription

1 Parallel Discrete Event Simulation of Transaction Level Models Rainer Dömer, Weiwei Chen, Xu Han Center for Embedded Computer Systems University of California, Irvine, USA Abstract Describing Multi-Processor Systems-on-Chip (MP- SoC) at the abstract Electronic System Level (ESL) is one task, validating them efficiently is another. Here, fast and accurate system-level simulation is critical. Recently, Parallel Discrete Event Simulation (PDES) has gained significant attraction again as it promises to utilize the existing parallelism in today s multicore CPU hosts. This paper discusses the parallel simulation of Transaction-Level Models (TLMs) described in System-Level Description Languages (SLDLs), such as SystemC and SpecC. We review how PDES exploits the explicit parallelism in the ESL design models and uses the parallel processing units available on multicore host PCs to significantly reduce the simulation time. We show experimental results for two highly parallel benchmarks as well as for two actual embedded applications. I. INTRODUCTION Describing complex Multi-Processor Systems-on-Chip (MPSoC) at the abstract Electronic System Level (ESL) is one difficult task. However, getting the models to work correctly is another. The large size and great complexity of modern embedded systems with their heterogeneous components, complex interconnects, and sophisticated functionality pose enormous challenges to system validation and debugging. Accurate yet fast ESL simulation is a key to enabling effective and efficient design validation and implementation. In this paper, we review recent approaches for simulating MPSoC designs described at the system or transaction level. We focus in particular on parallel simulation techniques. The well-known approach of Parallel Discrete Event Simulation (PDES) has recently gained a lot of attraction again due to the inexpensive availability of parallel processing capabilities in today s multi-core CPU hosts. PDES holds the promise to map the explicit parallelism described in Transaction-Level Models (TLMs) efficiently onto the parallel cores available on the simulation host. As such, it can exploit the available parallelism and significantly reduce the simulation time. A. Parallel execution in SystemC and SpecC In this paper, we discuss the parallel simulation of TLMs described in the System-Level Description Languages (SLDLs) SystemC [7] and SpecC [6]. We note that, in contrast to flat and sequential C/C++ programming code, both languages explicitly specify the key ESL concepts in the design model, including behavioral and structural hierarchy, potential for parallelism and pipelining, communication channels, and constraints on timing. Having these intrinsic features of the application explicit in the model enables efficient design space exploration and automatic refinement by computer-aided design (CAD) tools. While both SystemC and SpecC model parallel execution explicitly, there is a significant difference in the parallel execution semantics between the languages. Both SLDLs define their execution semantics by use of DE-based scheduling of multiple concurrent threads which are managed and coordinated by a central simulation kernel. However, SystemC requires cooperative multi-threading, whereas SpecC allows preemptive multi-threading [4]. SystemC semantics guarantee the uninterrupted execution of threads (including their accesses to any shared variables) which makes it hard to implement a truly parallel simulator. In short, complex inter-dependency analysis over all variables in the system model is a prerequisite to parallel multi-core simulation in SystemC. In contrast to the cooperative scheduling mandated for SystemC models, multi-threading in SpecC is explicitly defined as preemptive. This makes a parallel simulation significantly easier because only shared variables in channels (which act as monitors) need to be protected for mutually exclusive access. Such protection can easily be inserted automatically into the model [2]. B. Related Work Most ESL design frameworks today still rely on regular Discrete Event (DE) simulators which issue only a single thread at any time to avoid complex synchronization of the concurrent threads. As such, the simulator kernel becomes an obstacle in improving simulation performance on multi-core host machines [8]. A well-studied solution to this problem is Parallel Discrete Event Simulation (PDES) [1, 5, 11]. [9, 10] discuss the PDES on Hardware Description Languages, VHDL and Verilog. To apply PDES solutions to today s SLDLs and actually allow parallel execution on multi-core processors, the simulator kernel needs to be modified to issue and properly synchronize multiple OS kernel threads in each scheduling step. [3] and [12] have extended the SystemC simulator kernel accordingly. Clusters with single-core nodes are targeted in [3] which uses multiple schedulers on different processing nodes and defines a master node for time synchronization. A parallelized SystemC kernel for fast simulation on SMP machines is presented in [12] which issues multiple runable OS kernel threads in each simulation cycle. The SpecC-based scheduling approach de /12/$ IEEE 227

2 scribed in [2] is very similar. However, it features a detailed synchronization protection mechanism which automatically instruments any user-defined and hierarchical channels. As such, this approach does not need to work around the cooperative SystemC execution semantics, neither does it require a specially prepared channel library. II. SYSTEM-LEVEL DISCRETE EVENT SIMULATION System design models with explicitly specified parallelism carry the promise of increased simulation performance through parallel execution of the threads on the available hardware resources of a multi-core host. In this section, we will first review the DE scheduling scheme used in traditional SLDL simulators which issues only a single thread at any time. We will then discuss the extended scheduling algorithm with true multi-threading capability on symmetric multiprocessing (multi-core) machines. With the exception of the different requirements for protection of communication and synchronization between the concurrent threads, as outlined in Section I.A above, the following sections apply equally to both SystemC and SpecC SLDLs. A. Traditional Discrete Event Simulation In traditional system-level simulation, a regular DE simulator is used. Threads are created for the explicit parallelism described in the models (e.g. par{} and pipe{} statements in SpecC, and SC THREADS in SystemC). The threads are generally managed in the scheduler by use of queues, such as READY, which contains all threads that ready to execute, and WAIT, which contains all waiting threads. When running, the threads communicate via events and shared variables or channels, and advance simulation time using wait-for-time constructs. start increases, the scheduler is called to move the simulation forward. As shown in Fig. 1, at any time the traditional scheduler runs a single thread which is picked from the READY queue. Within a delta-cycle, the choice of the next thread to run is nondeterministic (by definition). If the READY queue is empty, the scheduler will fill the queue again by waking threads who have received events they were waiting for. These are taken out of the WAIT queue and a new delta-cycle begins. If the READY queue is still empty after event delivery, the scheduler advances the simulation time, moves all threads with the earliest timestamp from the WAITFOR queue into the READY queue, and resumes execution (timed-cycle). te that at any time, there is only one thread actively executing in the traditional simulation. B. Parallel Discrete Event Simulation A parallel DE scheduler basically works the same way as the traditional scheduler, with one exception: in each cycle, it picks multiple threads from the READY queue and runs them in parallel on the available processor cores. For SLDL simulators in particular, the parallel scheduler picks and runs as many threads as CPU cores are available 1. In other words, it keeps all parallel processing resources as busy as possible. wait(cond_s, L); signal(cond_th, L); start end wait(cond_s, L) end Fig. 1.: Traditional DE scheduler. delta-cycle timed-cycle Fig. 2.: Parallel DE scheduler. Fig. 2 shows the extended control flow of the parallel scheduler. te the added loop at the left which issues running threads as long as CPU cores are available and the READY queue still has candidates. te that for the traditional sequential scheduler any threading model (including user-level or kernel-level threads) is acceptable since only one thread is running at any time. For the parallel scheduler to be effective on a multi-core platform, in contrast, OS kernel threads are necessary. Here, the underlying operating system needs to be aware of the multiple simulator threads so that it can utilize the available parallel CPU cores. Traditional DE simulation is driven by events and simulation time advances. Whenever events are delivered or time 1 Scheduling more threads to run than cores are available would be possible, but this would hurt the simulation performance due to unnecessary preemptive scheduling by the underlying operating system. 228

3 Fig. 3.: Simulation results for highly parallel benchmark designs. III. PARALLEL SYSTEM-LEVEL BENCHMARKS To demonstrate the potential of parallel simulation, we have designed two highly parallel benchmark design models, a parallel floating-point multiplication example and a parallel recursive Fibonacci calculator. Both benchmark designs are systemlevel models specified in SpecC SLDL. For all our experiments, we use a symmetric multiprocessing (SMP) capable server running 64-bit Fedora 12 Linux. The SMP hardware specifically consists of 2 Intel (R) Xeon (R) X5650 processors running at 2.67 GHz 2. Each CPU contains 6 parallel cores, each of which supports 2 hyperthreads per core. Thus, in total the server hardware supports up to 24 threads running in parallel. A. Parallel floating-point multiplications Our first parallel benchmark fmul is a simple stress-test example for parallel floating-point calculations. Specifically, fmul creates 256 parallel instances which perform 10 million double-precision floating-point multiplications each. As an extreme example, the parallel threads are completely independent, i. e. do not communicate or share any variables. The chart in Fig. 3 shows the experimental results for our parallel simulator when executing this benchmark. To demonstrate the scalability of parallel execution on our server, we vary the number of parallel threads admitted by the parallel scheduler (the value #CPUs in Fig. 2) between 1 and 30. The elapsed simulator run times are shown in the red line with circle points. Clearly, with more and more CPU cores used, the elapsed simulation time decreases in near logarithmic 2 To ensure consistent timing measurements, we have disabled the dynamic frequency scaling and turbo mode of the processors. fashion and flattens when no more CPU cores become available (> 24). When plotting the relative speedup, as shown in the green line with triangle points, one can see that, as expected, the simulation speed increases in nearly linear manner the more parallel cores are used. The maximal speedup is about 17x for this example on our 24-core server. B. Parallel Fibonacci calculation Our second parallel benchmark fibo calculates the Fibonacci series in parallel and recursive fashion. Recall that a Fibonacci number is defined as the sum of the previous two Fibonacci numbers, fib(n) = fib(n 1) + fib(n 2), and the first two numbers are fib(0) = 0 and fib(1) = 1. Our fibo design parallelizes the Fibonacci calculation by letting two parallel units compute the two previous numbers in the series. This parallel decomposition continues up to a user-specified depth limit (in our case 8), from where on the classic recursive calculation method is used. In contrast to the fmul example above, the fibo benchmark uses shared variables to communicate the input and calculated output values between the units, as well as a few counters to keep track of the actual number of parallel threads (for statistical purposes). Thus, the threads are not fully independent from each other. Also, the computational load is not evenly distributed among the instances due to the fact that the number of calculations increases by a factor of approximately (the golden ratio) for every next number. The fibo simulation results are also plotted in Fig. 3. The elapsed simulator run times, shown in the dark red line with square points, show the same decreasing curve as the fmul example. Accordingly, the relative simulation speedup, shown in the blue line with diamond points, also shows the same ex- 229

4 pected behavior. Speed increases in nearly linear fashion until it reaches saturation at about a factor of 14x. When comparing the fmul and fibo benchmark results, we notice a more regular behavior of the fmul example due to its even load and zero inter-thread communication. IV. EXPERIMENTAL RESULTS FOR EMBEDDED APPLICATIONS To demonstrate the improved simulation speed of the parallel multi-core simulator for actual embedded applications, we use a H.264 video decoder and a JPEG encoder application [2]. A. H.264 AVC Decoder with Parallel Slice Decoding The H.264 Advanced Video Coding (AVC) standard [13] is widely used in video applications, such as internet streaming, disc storage, and television services. It provides high-quality video at less than half the bit rate compared to its predecessors H.263 and H.262. At the same time, it requires more computing resources for both video encoding and decoding. The H.264 decoder takes as input a stream of encoded video frames. A frame can be further split into one or more slices, as illustrated in the upper right part of Fig. 4. tably, slices are independent of each other in the sense that decoding one slice will not require any data from the other slices. For this reason, parallelism exists at the slice-level and parallel slice decoders can be used to decode multiple slices in a frame simultaneously. against the sequential reference simulator (the table also includes the CPU load reported by the Linux OS). We can clearly see that the 24 available cores on the server are under-utilized. The reason is obviouly the maximum available parallelism in the model (2.05). The measured speedups are also somewhat lower than the maximum due to the overhead introduced in parallelizing and synchronizing the slice decoders. Simulator Reference Multi-Core Par. issued threads: n/a 24 sim. time sim. time speedup spec 37.71s (90%) 21.01s (173%) 1.79 arch 38.26s (90%) 21.26s (171%) 1.80 models sched 38.10s (91%) 22.31s (166%) 1.71 net 38.18s (91%) 22.45s (166%) 1.70 tlm 39.17s (90%) 22.32s (167%) 1.75 comm 49.92s (90%) 33.55s (156%) 1.49 Fig. 5.: Simulation results for H.264 Decoder example. Simulator Reference Multi-Core Par. issued threads: n/a 24 sim. time sim. time speedup spec 3.46s (93%) 2.82s (146%) 1.23 arch 3.92s (94%) 2.83s (147%) 1.39 models sched 4.64s (93%) 3.61s (137%) 1.29 net 13.87s (97%) 45.96s (130%) 0.30 Fig. 6.: Simulation results for JPEG Encoder example. B. JPEG Image Encoder Fig. 4.: Parallelized H.264 decoder model. Fig. 4 shows the block diagram of our H.264 decoder model. The decoding of a frame begins with reading new slices from the input stream. These are then dispatched into four parallel slice decoders. Finally, a synchronizer block completes the decoding by applying a deblocking filter to the decoded frame. All the blocks communicate via FIFO channels. Internally, each slice decoder consists of the regular H.264 decoder functions, such as entropy decoding, inverse quantization and transformation, motion compensation, and intra-prediction. For our experiment, we use the stream Harbour with 299 video frames, each with 4 slices of equal size. As discussed in [2], 68.4% of the total computation time is spent in the parallel slice decoding. Thus, while the maximum parallelism is 4 in this model, the effective maximum speedup we can gain is only 2.05 (due to the sequential parts). Table 5 lists the simulator run times for TLMs at different abstraction levels. We compare the elapsed simulation time As a second embedded application, Table 6 shows the simulation speedup for a JPEG Encoder example which performs its DCT, Quantization and Zigzag modules for the 3 color components in parallel, followed by a sequential Huffman encoder at the end. Again, the available parallelism in the model is low (maximal 3 parallel threads, followed by a significant sequential part). Some speedup is gained by the multicore parallel simulator for the higher level models (spec, arch, sched). However, the simulator performance decreases for the model at the lowest abstraction level (net) due to the high number of bus transactions and arbitrations which are not parallelized. V. CONCLUDING REMARKS Parallel Discrete Event Simulation (PDES) has recently come into fashion again. PDES is highly desirable for ESL design due to the constantly rising complexity of embedded systems which makes accurate and fast simulation challenging. After a brief review of the essential differences between traditional discrete event simulation and PDES, we have examined two highly parallel benchmark applications, fmul and fibo, and two actual embedded design examples, a H.264 decoder and a JPEG encoder. While the measured simulation 230

5 times of the parallel fmul and fibo benchmarks promise a linear increase in simulation speed due to their excellent scalability, the speedup achieved for the two embedded applications is quite limited. Nevertheless, the measured speedup is significant (nearly 2x!) and can make a real difference in shortening the validation time of such systems. Given the need for higher simulation speeds and the demonstrated potential of parallel simulation, it becomes clear that PDES is and will be an area of active research. tably, the basic PDES algorithm discussed in Section B is very straightforward and advanced techniques, including conservative and optimistic approaches, can further improve the simulation speed by optimizing the scheduling around synchronization barriers. However, all efforts are limited by the amount of exposed parallelism in the application. We can conclude that the parallelism available in the application, which in TLMs is explicitly specified by SLDL constructs, can be exploited by parallel simulators and then results in effective reduction of simulator run time. However, the Grand Challenge remains, the problem of how to efficiently parallelize applications. ACKNOWLEDGMENTS [8] K. Huang, I. Bacivarov, F. Hugelshofer, and L. Thiele. Scalably Distributed SystemC Simulation for Embedded Applications. In International Symposium on Industrial Embedded Systems, SIES 2008., pages , June [9] T. Li, Y. Guo, and S.-K. Li. Design and Implementation of a Parallel Verilog Simulator: PVSim. VLSI Design, International Conference on, pages , [10] E. Naroska. Parallel VHDL simulation. In DATE 98: Proceedings of the conference on Design, automation and test in Europe, pages , [11] D. Nicol and P. Heidelberger. Parallel Execution for Serial Simulators. ACM Transactions on Modeling and Computer Simulation, 6(3): , July [12] E. P, P. Chandran, J. Chandra, B. P. Simon, and D. Ravi. Parallelizing SystemC Kernel for Fast Hardware Simulation on SMP Machines. In PADS 09: Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation, pages 80 87, Washington, DC, USA, IEEE Computer Society. [13] T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra. Overview of the H.264/AVC video coding standard. Circuits and Systems for Video Technology, IEEE Transactions on, 13(7): , july This work has been supported in part by funding from the National Science Foundation (NSF) under research grant NSF Award # The authors thank the NSF for the valuable support. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. REFERENCES [1] K. Chandy and J. Misra. Distributed Simulation: A Case Study in Design and Verification of Distributed Programs. Software Engineering, IEEE Transactions on, SE-5(5): , Sept [2] W. Chen, X. Han, and R. Dömer. Multiprocessor SoC Platforms: A Component-Based Design Approach. IEEE Design and Test of Computers, 28(3):20 31, May/June [3] B. Chopard, P. Combes, and J. Zory. A Conservative Approach to SystemC Parallelization. In V. N. Alexandrov, G. D. van Albada, P. M. A. Sloot, and J. Dongarra, editors, International Conference on Computational Science (4), volume 3994 of Lecture tes in Computer Science, pages Springer, [4] R. Dömer, W. Chen, X. Han, and A. Gerstlauer. Multi-core parallel simulation of system-level description languages. In Proceedings of the Asia and South Pacific Design Automation Conference (ASPDAC), Yokohama, Japan, January [5] R. Fujimoto. Parallel Discrete Event Simulation. Communications of the ACM, 33(10):30 53, Oct [6] D. D. Gajski, J. Zhu, R. Dömer, A. Gerstlauer, and S. Zhao. SpecC: Specification Language and Design Methodology. Kluwer, [7] T. Grötker, S. Liao, G. Martin, and S. Swan. System Design with SystemC. Kluwer,

A Parallel Transaction-Level Model of H.264 Video Decoder

A Parallel Transaction-Level Model of H.264 Video Decoder Center for Embedded Computer Systems University of California, Irvine A Parallel Transaction-Level Model of H.264 Video Decoder Xu Han, Weiwei Chen and Rainer Doemer Technical Report CECS-11-03 June 2,

More information

Out-of-Order Parallel Simulation of SystemC Models. G. Liu, T. Schmidt, R. Dömer (CECS) A. Dingankar, D. Kirkpatrick (Intel Corp.)

Out-of-Order Parallel Simulation of SystemC Models. G. Liu, T. Schmidt, R. Dömer (CECS) A. Dingankar, D. Kirkpatrick (Intel Corp.) Out-of-Order Simulation of s using Intel MIC Architecture G. Liu, T. Schmidt, R. Dömer (CECS) A. Dingankar, D. Kirkpatrick (Intel Corp.) Speaker: Rainer Dömer doemer@uci.edu Center for Embedded Computer

More information

Advances in Parallel Discrete Event Simulation EECS Colloquium, May 9, Advances in Parallel Discrete Event Simulation For Embedded System Design

Advances in Parallel Discrete Event Simulation EECS Colloquium, May 9, Advances in Parallel Discrete Event Simulation For Embedded System Design Advances in Parallel Discrete Event Simulation For Embedded System Design Rainer Dömer doemer@uci.edu With contributions by Weiwei Chen and Xu Han Center for Embedded Computer Systems University of California,

More information

Multicore Simulation of Transaction-Level Models Using the SoC Environment

Multicore Simulation of Transaction-Level Models Using the SoC Environment Transaction-Level Validation of Multicore Architectures Multicore Simulation of Transaction-Level Models Using the SoC Environment Weiwei Chen, Xu Han, and Rainer Dömer University of California, Irvine

More information

Eliminating Race Conditions in System-Level Models by using Parallel Simulation Infrastructure

Eliminating Race Conditions in System-Level Models by using Parallel Simulation Infrastructure 1 Eliminating Race Conditions in System-Level Models by using Parallel Simulation Infrastructure Weiwei Chen, Che-Wei Chang, Xu Han, Rainer Dömer Center for Embedded Computer Systems University of California,

More information

A Distributed Parallel Simulator for Transaction Level Models with Relaxed Timing

A Distributed Parallel Simulator for Transaction Level Models with Relaxed Timing Center for Embedded Computer Systems University of California, Irvine A Distributed Parallel Simulator for Transaction Level Models with Relaxed Timing Weiwei Chen, Rainer Dömer Technical Report CECS-11-02

More information

Computer-Aided Recoding for Multi-Core Systems

Computer-Aided Recoding for Multi-Core Systems Computer-Aided Recoding for Multi-Core Systems Rainer Dömer doemer@uci.edu With contributions by P. Chandraiah Center for Embedded Computer Systems University of California, Irvine Outline Embedded System

More information

Efficient Modeling of Embedded Systems using Designer-controlled Recoding. Rainer Dömer. With contributions by Pramod Chandraiah

Efficient Modeling of Embedded Systems using Designer-controlled Recoding. Rainer Dömer. With contributions by Pramod Chandraiah Efficient Modeling of Embedded Systems using Rainer Dömer With contributions by Pramod Chandraiah Center for Embedded Computer Systems University of California, Irvine Outline Introduction Designer-controlled

More information

SpecC Methodology for High-Level Modeling

SpecC Methodology for High-Level Modeling EDP 2002 9 th IEEE/DATC Electronic Design Processes Workshop SpecC Methodology for High-Level Modeling Rainer Dömer Daniel D. Gajski Andreas Gerstlauer Center for Embedded Computer Systems Universitiy

More information

SystemC Coding Guideline for Faster Out-of-order Parallel Discrete Event Simulation

SystemC Coding Guideline for Faster Out-of-order Parallel Discrete Event Simulation SystemC Coding Guideline for Faster Out-of-order Parallel Discrete Event Simulation Zhongqi Cheng, Tim Schmidt, Rainer Dömer Center for Embedded and Cyber-Physical Systems University of California, Irvine,

More information

Transaction-Level Modeling Definitions and Approximations. 2. Definitions of Transaction-Level Modeling

Transaction-Level Modeling Definitions and Approximations. 2. Definitions of Transaction-Level Modeling Transaction-Level Modeling Definitions and Approximations EE290A Final Report Trevor Meyerowitz May 20, 2005 1. Introduction Over the years the field of electronic design automation has enabled gigantic

More information

Profiling High Level Abstraction Simulators of Multiprocessor Systems

Profiling High Level Abstraction Simulators of Multiprocessor Systems Profiling High Level Abstraction Simulators of Multiprocessor Systems Liana Duenha, Rodolfo Azevedo Computer Systems Laboratory (LSC) Institute of Computing, University of Campinas - UNICAMP Campinas -

More information

Creating Explicit Communication in SoC Models Using Interactive Re-Coding

Creating Explicit Communication in SoC Models Using Interactive Re-Coding Creating Explicit Communication in SoC Models Using Interactive Re-Coding Pramod Chandraiah, Junyu Peng, Rainer Dömer Center for Embedded Computer Systems University of California, Irvine California, USA

More information

Parallel and Distributed VHDL Simulation

Parallel and Distributed VHDL Simulation Parallel and Distributed VHDL Simulation Dragos Lungeanu Deptartment of Computer Science University of Iowa C.J. chard Shi Department of Electrical Engineering University of Washington Abstract This paper

More information

Quantitative Analysis of Transaction Level Models for the AMBA Bus

Quantitative Analysis of Transaction Level Models for the AMBA Bus Quantitative Analysis of Transaction Level Models for the AMBA Bus Gunar Schirner, Rainer Dömer Center of Embedded Computer Systems University of California, Irvine hschirne@uci.edu, doemer@uci.edu Abstract

More information

Cycle-accurate RTL Modeling with Multi-Cycled and Pipelined Components

Cycle-accurate RTL Modeling with Multi-Cycled and Pipelined Components Cycle-accurate RTL Modeling with Multi-Cycled and Pipelined Components Rainer Dömer, Andreas Gerstlauer, Dongwan Shin Technical Report CECS-04-19 July 22, 2004 Center for Embedded Computer Systems University

More information

System-On-Chip Architecture Modeling Style Guide

System-On-Chip Architecture Modeling Style Guide Center for Embedded Computer Systems University of California, Irvine System-On-Chip Architecture Modeling Style Guide Junyu Peng Andreas Gerstlauer Rainer Dömer Daniel D. Gajski Technical Report CECS-TR-04-22

More information

RTOS Scheduling in Transaction Level Models

RTOS Scheduling in Transaction Level Models RTOS Scheduling in Transaction Level Models Haobo Yu, Andreas Gerstlauer, Daniel Gajski CECS Technical Report 03-12 March 20, 2003 Center for Embedded Computer Systems Information and Computer Science

More information

RTOS Modeling for System Level Design

RTOS Modeling for System Level Design RTOS Modeling for System Level Design Andreas Gerstlauer Haobo Yu Daniel D. Gajski Center for Embedded Computer Systems University of California, Irvine Irvine, CA 92697, USA E-mail: {gerstl,haoboy,gajski}@cecs.uci.edu

More information

Embedded Software Generation from System Level Design Languages

Embedded Software Generation from System Level Design Languages Embedded Software Generation from System Level Design Languages Haobo Yu, Rainer Dömer, Daniel Gajski Center for Embedded Computer Systems University of California, Irvine, USA haoboy,doemer,gajski}@cecs.uci.edu

More information

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer) ESE Back End 2.0 D. Gajski, S. Abdi (with contributions from H. Cho, D. Shin, A. Gerstlauer) Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu 1 Technology advantages

More information

A Hybrid Instruction Set Simulator for System Level Design

A Hybrid Instruction Set Simulator for System Level Design Center for Embedded Computer Systems University of California, Irvine A Hybrid Instruction Set Simulator for System Level Design Yitao Guo, Rainer Doemer Technical Report CECS-10-06 June 11, 2010 Center

More information

Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology

Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology September 19, 2007 Markus Levy, EEMBC and Multicore Association Enabling the Multicore Ecosystem Multicore

More information

THE increasing complexity of embedded systems poses

THE increasing complexity of embedded systems poses IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 12, DECEMBER 2014 1859 Out-of-Order Parallel Discrete Event Simulation for Transaction Level Models Weiwei Chen,

More information

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano Modeling and Simulation of System-on on-chip Platorms Donatella Sciuto 10/01/2007 Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32, 20131, Milano Key SoC Market

More information

Automatic Generation of Communication Architectures

Automatic Generation of Communication Architectures i Topic: Network and communication system Automatic Generation of Communication Architectures Dongwan Shin, Andreas Gerstlauer, Rainer Dömer and Daniel Gajski Center for Embedded Computer Systems University

More information

A Generic RTOS Model for Real-time Systems Simulation with SystemC

A Generic RTOS Model for Real-time Systems Simulation with SystemC A Generic RTOS Model for Real-time Systems Simulation with SystemC R. Le Moigne, O. Pasquier, J-P. Calvez Polytech, University of Nantes, France rocco.lemoigne@polytech.univ-nantes.fr Abstract The main

More information

Model-based Analysis of Event-driven Distributed Real-time Embedded Systems

Model-based Analysis of Event-driven Distributed Real-time Embedded Systems Model-based Analysis of Event-driven Distributed Real-time Embedded Systems Gabor Madl Committee Chancellor s Professor Nikil Dutt (Chair) Professor Tony Givargis Professor Ian Harris University of California,

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

A Review on Parallel Logic Simulation

A Review on Parallel Logic Simulation Volume 114 No. 12 2017, 191-199 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu A Review on Parallel Logic Simulation 1 S. Karthik and 2 S. Saravana

More information

Quantitative Analysis of Transaction Level Models for the AMBA Bus

Quantitative Analysis of Transaction Level Models for the AMBA Bus Quantitative Analysis of Transaction Level Models for the AMBA Bus Gunar Schirner and Rainer Dömer Center for Embedded Computer Systems University of California, Irvine Motivation Higher productivity is

More information

The SpecC System-Level Design Language and Methodology, Part 1. Class 309

The SpecC System-Level Design Language and Methodology, Part 1. Class 309 Embedded Systems Conference San Francisco 2002 The SpecC System-Level Design Language and Methodology, Part 1 Class 309 Rainer Dömer Center for Embedded Computer Systems Universitiy of California, Irvine,

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

Parallel Computing Concepts. CSInParallel Project

Parallel Computing Concepts. CSInParallel Project Parallel Computing Concepts CSInParallel Project July 26, 2012 CONTENTS 1 Introduction 1 1.1 Motivation................................................ 1 1.2 Some pairs of terms...........................................

More information

Multi-level Design Methodology using SystemC and VHDL for JPEG Encoder

Multi-level Design Methodology using SystemC and VHDL for JPEG Encoder THE INSTITUTE OF ELECTRONICS, IEICE ICDV 2011 INFORMATION AND COMMUNICATION ENGINEERS Multi-level Design Methodology using SystemC and VHDL for JPEG Encoder Duy-Hieu Bui, Xuan-Tu Tran SIS Laboratory, University

More information

RISC Compiler and Simulator, Release V0.5.0: Out-of-Order Parallel Simulatable SystemC Subset

RISC Compiler and Simulator, Release V0.5.0: Out-of-Order Parallel Simulatable SystemC Subset Center for Embedded and Cyber-physical Systems University of California, Irvine RISC Compiler and Simulator, Release V0.5.0: Out-of-Order Parallel Simulatable SystemC Subset Guantao Liu, Tim Schmidt, Zhongqi

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

EFFICIENT AND EXTENSIBLE TRANSACTION LEVEL MODELING BASED ON AN OBJECT ORIENTED MODEL OF BUS TRANSACTIONS

EFFICIENT AND EXTENSIBLE TRANSACTION LEVEL MODELING BASED ON AN OBJECT ORIENTED MODEL OF BUS TRANSACTIONS EFFICIENT AND EXTENSIBLE TRANSACTION LEVEL MODELING BASED ON AN OBJECT ORIENTED MODEL OF BUS TRANSACTIONS Rauf Salimi Khaligh, Martin Radetzki Institut für Technische Informatik,Universität Stuttgart Pfaffenwaldring

More information

May-Happen-in-Parallel Analysis based on Segment Graphs for Safe ESL Models

May-Happen-in-Parallel Analysis based on Segment Graphs for Safe ESL Models May-Happen-in-Parallel Analysis based on Segment Graphs for Safe ESL Models Weiwei Chen, Xu Han, Rainer Dömer Center for Embedded Computer Systems University of California, Irvine, USA weiweic@uci.edu,

More information

Embedded System Design and Modeling EE382V, Fall 2008

Embedded System Design and Modeling EE382V, Fall 2008 Embedded System Design and Modeling EE382V, Fall 2008 Lecture Notes 3 The SpecC System-Level Design Language Dates: Sep 9&11, 2008 Scribe: Mahesh Prabhu Languages: Any language would have characters, words

More information

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou (  Zhejiang University Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads

More information

RTOS Scheduling in Transaction Level Models

RTOS Scheduling in Transaction Level Models RTOS Scheduling in Transaction Level Models Haobo Yu, Andreas Gerstlauer, Daniel Gajski Center for Embedded Computer Systems University of California, Irvine Irvine, CA 92697, USA {haoboy,gerstl,gajksi}@cecs.uci.edu

More information

Compression of Stereo Images using a Huffman-Zip Scheme

Compression of Stereo Images using a Huffman-Zip Scheme Compression of Stereo Images using a Huffman-Zip Scheme John Hamann, Vickey Yeh Department of Electrical Engineering, Stanford University Stanford, CA 94304 jhamann@stanford.edu, vickey@stanford.edu Abstract

More information

SOFTWARE AND DRIVER SYNTHESIS FROM TRANSACTION LEVEL MODELS

SOFTWARE AND DRIVER SYNTHESIS FROM TRANSACTION LEVEL MODELS SOFTWARE AND DRIVER SYNTHESIS FROM TRANSACTION LEVEL MODELS Haobo Yu, Rainer Dömer, Daniel D. Gajski Center of Embedded Computer Systems University of California, Irvine Abstract This work presents a method

More information

Parallel Computers. c R. Leduc

Parallel Computers. c R. Leduc Parallel Computers Material based on B. Wilkinson et al., PARALLEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers c 2002-2004 R. Leduc Why Parallel Computing?

More information

RISC Compiler and Simulator, Release V0.4.0: Out-of-Order Parallel Simulatable SystemC Subset

RISC Compiler and Simulator, Release V0.4.0: Out-of-Order Parallel Simulatable SystemC Subset Center for Embedded and Cyber-physical Systems University of California, Irvine RISC Compiler and Simulator, Release V0.4.0: Out-of-Order Parallel Simulatable SystemC Subset Guantao Liu, Tim Schmidt, Zhongqi

More information

COMPILED CODE IN DISTRIBUTED LOGIC SIMULATION. Jun Wang Carl Tropper. School of Computer Science McGill University Montreal, Quebec, CANADA H3A2A6

COMPILED CODE IN DISTRIBUTED LOGIC SIMULATION. Jun Wang Carl Tropper. School of Computer Science McGill University Montreal, Quebec, CANADA H3A2A6 Proceedings of the 2006 Winter Simulation Conference L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson, D. M. Nicol, and R. M. Fujimoto, eds. COMPILED CODE IN DISTRIBUTED LOGIC SIMULATION Jun Wang Carl

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

EE382N.23: Embedded System Design and Modeling

EE382N.23: Embedded System Design and Modeling EE382N.23: Embedded System Design and Modeling Lecture 3 Language Semantics Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin gerstl@ece.utexas.edu Lecture 3: Outline

More information

A Study of the Effect of Partitioning on Parallel Simulation of Multicore Systems

A Study of the Effect of Partitioning on Parallel Simulation of Multicore Systems A Study of the Effect of Partitioning on Parallel Simulation of Multicore Systems Zhenjiang Dong, Jun Wang, George Riley, Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing This document consists of two parts. The first part introduces basic concepts and issues that apply generally in discussions of parallel computing. The second part consists

More information

AN INTERACTIVE MODEL RE-CODER FOR EFFICIENT SOC SPECIFICATION

AN INTERACTIVE MODEL RE-CODER FOR EFFICIENT SOC SPECIFICATION AN INTERACTIVE MODEL RE-CODER FOR EFFICIENT SOC SPECIFICATION Center for Embedded Computer Systems University of California Irvine pramodc@uci.edu, doemer@uci.edu Abstract To overcome the complexity in

More information

Cosimulation of ITRON-Based Embedded Software with SystemC

Cosimulation of ITRON-Based Embedded Software with SystemC Cosimulation of ITRON-Based Embedded Software with SystemC Shin-ichiro Chikada, Shinya Honda, Hiroyuki Tomiyama, Hiroaki Takada Graduate School of Information Science, Nagoya University Information Technology

More information

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE 5359 Gaurav Hansda 1000721849 gaurav.hansda@mavs.uta.edu Outline Introduction to H.264 Current algorithms for

More information

Lecture 14. Synthesizing Parallel Programs IAP 2007 MIT

Lecture 14. Synthesizing Parallel Programs IAP 2007 MIT 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007 MIT Synthesizing parallel programs (or borrowing some ideas from hardware design) Arvind Computer Science & Artificial

More information

Parallel Programming Multicore systems

Parallel Programming Multicore systems FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have

More information

Introducing Preemptive Scheduling in Abstract RTOS Models using Result Oriented Modeling

Introducing Preemptive Scheduling in Abstract RTOS Models using Result Oriented Modeling Introducing Preemptive Scheduling in Abstract RTOS Models using Result Oriented Modeling Gunar Schirner, Rainer Dömer Center of Embedded Computer Systems, University of California Irvine E-mail: {hschirne,

More information

The Design and Evaluation of Hierarchical Multilevel Parallelisms for H.264 Encoder on Multi-core. Architecture.

The Design and Evaluation of Hierarchical Multilevel Parallelisms for H.264 Encoder on Multi-core. Architecture. UDC 0043126, DOI: 102298/CSIS1001189W The Design and Evaluation of Hierarchical Multilevel Parallelisms for H264 Encoder on Multi-core Architecture Haitao Wei 1, Junqing Yu 1, and Jiang Li 1 1 School of

More information

Lars Schor, and Lothar Thiele ETH Zurich, Switzerland

Lars Schor, and Lothar Thiele ETH Zurich, Switzerland Iuliana Bacivarov, Wolfgang Haid, Kai Huang, Lars Schor, and Lothar Thiele ETH Zurich, Switzerland Efficient i Execution of KPN on MPSoC Efficiency regarding speed-up small memory footprint portability

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

Communication Abstractions for System-Level Design and Synthesis

Communication Abstractions for System-Level Design and Synthesis Communication Abstractions for System-Level Design and Synthesis Andreas Gerstlauer Technical Report CECS-03-30 October 16, 2003 Center for Embedded Computer Systems University of California, Irvine Irvine,

More information

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors. CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says

More information

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)

More information

Parallel Processing & Multicore computers

Parallel Processing & Multicore computers Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)

More information

Multi-core Architectures. Dr. Yingwu Zhu

Multi-core Architectures. Dr. Yingwu Zhu Multi-core Architectures Dr. Yingwu Zhu What is parallel computing? Using multiple processors in parallel to solve problems more quickly than with a single processor Examples of parallel computing A cluster

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (10 th Week) (Advanced) Operating Systems 10. Multiprocessor, Multicore and Real-Time Scheduling 10. Outline Multiprocessor

More information

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor

More information

LECTURE 3:CPU SCHEDULING

LECTURE 3:CPU SCHEDULING LECTURE 3:CPU SCHEDULING 1 Outline Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time CPU Scheduling Operating Systems Examples Algorithm Evaluation 2 Objectives

More information

Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming

Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming Fabiana Leibovich, Laura De Giusti, and Marcelo Naiouf Instituto de Investigación en Informática LIDI (III-LIDI),

More information

Efficient Usage of Concurrency Models in an Object-Oriented Co-design Framework

Efficient Usage of Concurrency Models in an Object-Oriented Co-design Framework Efficient Usage of Concurrency Models in an Object-Oriented Co-design Framework Piyush Garg Center for Embedded Computer Systems, University of California Irvine, CA 92697 pgarg@cecs.uci.edu Sandeep K.

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 10 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Chapter 6: CPU Scheduling Basic Concepts

More information

CS377P Programming for Performance Multicore Performance Multithreading

CS377P Programming for Performance Multicore Performance Multithreading CS377P Programming for Performance Multicore Performance Multithreading Sreepathi Pai UTCS October 14, 2015 Outline 1 Multiprocessor Systems 2 Programming Models for Multicore 3 Multithreading and POSIX

More information

All MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes

All MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes MSEE Curriculum All MSEE students are required to take the following two core courses: 3531-571 Linear systems 3531-507 Probability and Random Processes The course requirements for students majoring in

More information

Modeling and Simulating Discrete Event Systems in Metropolis

Modeling and Simulating Discrete Event Systems in Metropolis Modeling and Simulating Discrete Event Systems in Metropolis Guang Yang EECS 290N Report December 15, 2004 University of California at Berkeley Berkeley, CA, 94720, USA guyang@eecs.berkeley.edu Abstract

More information

Co-synthesis and Accelerator based Embedded System Design

Co-synthesis and Accelerator based Embedded System Design Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer

More information

Multi-core Architectures. Dr. Yingwu Zhu

Multi-core Architectures. Dr. Yingwu Zhu Multi-core Architectures Dr. Yingwu Zhu Outline Parallel computing? Multi-core architectures Memory hierarchy Vs. SMT Cache coherence What is parallel computing? Using multiple processors in parallel to

More information

Parallel Simulation Accelerates Embedded Software Development, Debug and Test

Parallel Simulation Accelerates Embedded Software Development, Debug and Test Parallel Simulation Accelerates Embedded Software Development, Debug and Test Larry Lapides Imperas Software Ltd. larryl@imperas.com Page 1 Modern SoCs Have Many Concurrent Processing Elements SMP cores

More information

GOP Level Parallelism on H.264 Video Encoder for Multicore Architecture

GOP Level Parallelism on H.264 Video Encoder for Multicore Architecture 2011 International Conference on Circuits, System and Simulation IPCSIT vol.7 (2011) (2011) IACSIT Press, Singapore GOP Level on H.264 Video Encoder for Multicore Architecture S.Sankaraiah 1 2, H.S.Lam,

More information

Simulation Of Computer Systems. Prof. S. Shakya

Simulation Of Computer Systems. Prof. S. Shakya Simulation Of Computer Systems Prof. S. Shakya Purpose & Overview Computer systems are composed from timescales flip (10-11 sec) to time a human interacts (seconds) It is a multi level system Different

More information

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and

More information

Overview: Shared Memory Hardware

Overview: Shared Memory Hardware Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

Operating Systems Overview. Chapter 2

Operating Systems Overview. Chapter 2 Operating Systems Overview Chapter 2 Operating System A program that controls the execution of application programs An interface between the user and hardware Masks the details of the hardware Layers and

More information

Why Parallel Architecture

Why Parallel Architecture Why Parallel Architecture and Programming? Todd C. Mowry 15-418 January 11, 2011 What is Parallel Programming? Software with multiple threads? Multiple threads for: convenience: concurrent programming

More information

RTOS Modeling for System Level Design

RTOS Modeling for System Level Design RTOS Modeling for System Level Design Design, Automation and Test in Europe Conference and Exhibition (DATE 03) Andreas Gerslauer, Haobo Yu, Daniel D. Gajski 2010. 1. 20 Presented by Jinho Choi c KAIST

More information

System-level design refinement using SystemC. Robert Dale Walstrom. A thesis submitted to the graduate faculty

System-level design refinement using SystemC. Robert Dale Walstrom. A thesis submitted to the graduate faculty System-level design refinement using SystemC by Robert Dale Walstrom A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Major: Computer

More information

Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased

Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased platforms Damian Karwowski, Marek Domański Poznan University of Technology, Chair of Multimedia Telecommunications and Microelectronics

More information

High Efficiency Video Decoding on Multicore Processor

High Efficiency Video Decoding on Multicore Processor High Efficiency Video Decoding on Multicore Processor Hyeonggeon Lee 1, Jong Kang Park 2, and Jong Tae Kim 1,2 Department of IT Convergence 1 Sungkyunkwan University Suwon, Korea Department of Electrical

More information

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University CS 571 Operating Systems Midterm Review Angelos Stavrou, George Mason University Class Midterm: Grading 2 Grading Midterm: 25% Theory Part 60% (1h 30m) Programming Part 40% (1h) Theory Part (Closed Books):

More information

Moore s Law. Computer architect goal Software developer assumption

Moore s Law. Computer architect goal Software developer assumption Moore s Law The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months. Self-fulfilling prophecy Computer architect goal Software developer

More information

A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264. Massachusetts Institute of Technology Texas Instruments

A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264. Massachusetts Institute of Technology Texas Instruments 2013 IEEE Workshop on Signal Processing Systems A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264 Vivienne Sze, Madhukar Budagavi Massachusetts Institute of Technology Texas Instruments ABSTRACT

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

EE382V: System-on-a-Chip (SoC) Design

EE382V: System-on-a-Chip (SoC) Design EE382V: System-on-a-Chip (SoC) Design Lecture 10 Task Partitioning Sources: Prof. Margarida Jacome, UT Austin Prof. Lothar Thiele, ETH Zürich Andreas Gerstlauer Electrical and Computer Engineering University

More information

Multimedia Decoder Using the Nios II Processor

Multimedia Decoder Using the Nios II Processor Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Design, Implementation and Evaluation of a Task-parallel JPEG Decoder for the Libjpeg-turbo Library

Design, Implementation and Evaluation of a Task-parallel JPEG Decoder for the Libjpeg-turbo Library Design, Implementation and Evaluation of a Task-parallel JPEG Decoder for the Libjpeg-turbo Library Jingun Hong 1, Wasuwee Sodsong 1, Seongwook Chung 1, Cheong Ghil Kim 2, Yeongkyu Lim 3, Shin-Dug Kim

More information

The modularity requirement

The modularity requirement The modularity requirement The obvious complexity of an OS and the inherent difficulty of its design lead to quite a few problems: an OS is often not completed on time; It often comes with quite a few

More information