Optimizing Performance of C++ Threading Libraries

Size: px
Start display at page:

Download "Optimizing Performance of C++ Threading Libraries"

Transcription

1 Optimizing Performance of C++ Threading Libraries András Fekete Department of Computer Science University of New Hampshire Durham, NH, 03824, USA Abstract Multi-core architectures are now the standard on highend computing devices, from desktop computers to tablets to cell phones. Multi-threaded applications can take full advantage of parallel hardware. This paper compares the performance of several C++ threading interfaces, and presents a theoretical framework for optimal utilization of parallel hardware. 1 Introduction The Von Neumann computing model [17] dominated computer architecture for over half a century. In this serial processing model, program instructions are then transferred over the computer bus, one at a time, to be decoded and executed in the CPU. This Von Neumann bottleneck limits the speed at which a program may be executed. The serial processing bottleneck was not an issue for many years, due to ongoing improvements in computer hardware. This is quantified by an observation known as Moore s Law: computing performance doubled roughly every 18 months for over 50 years [6]. Exponential increases in performance cannot be sustained indefinitely. Eventually fundamental physical limits associated with speed, size, power consumption, and heat dissipation impact the performance of single processors. The most straightforward ways to increase performance is to increase clock speed, and clock speeds increased a thousand-fold between 1980 and But clock speeds of 4GHz were achieved on the Pentium 4 processor in 2004, and have not increased much since then. As a result, chip manufacturers have turned to hardware parallelism to increase computing performance. There are several different types of parallel architectures, but in the past decade, multi-core processors [14] have dominated computing platforms. In Flynn s taxonomy [13], multi-core architectures are classified as MIMD (multiple instruction, multiple data stream) systems, with each core capable of independent processing. Multi-core architectures aim to increase computing performance by taking advantage of parallelism [15], rather than by increasing performance of a single CPU. These systems feature more than one core (CPU) on a John M. Weiss Department of Math and Computer Science South Dakota School of Mines and Technology Rapid City, SD , USA John.Weiss@sdsmt.edu chip. Each core may have its own cache memory, but typically share system memory (RAM). Dual-core and quad-core processor chips dominate the market today, with more cores gradually appearing on servers and highperformance computer systems. This quiet revolution has dramatically altered the computing landscape, but not all software fully utilizes multi-core hardware. For example, C++ only recently added concurrency support in the 2011 standard [12]. Prior to this time, programmers were forced to rely upon external concurrency libraries such as POSIX Threads (Pthreads) [9], Boost Threads, and Open Multi-Processing (OpenMP) [8] for parallel processing. External libraries offer the advantage of a coherent concurrency model across different platforms and languages (such as C and Fortran), but incorporating concurrency features directly into the language core adds stability, portability, and optimization opportunities. With a variety of different concurrency frameworks available, software developers may have difficulty choosing the best threading library for their application. In this study, we compare the performance of two C++11 concurrency frameworks (async and threads) with that of three external threading libraries (Pthreads, Boost Threads, and OpenMP). We also provide a theoretical framework for examining optimal utilization of parallel hardware. Other researchers have attempted to understand and simplify the process of writing multi threaded applications by creating models [7], or creating an abstraction of the architecture [18], or looking at the underlying virtual machine given by the language [10]. In this research we take a different approach, in that the overall goal is to have a simple metric to base decisions on how to split up a program into smaller tasks that will gain the largest speedup. 2 Concurrency in C++ Concurrency was added to the C++ language standard in 2011 [12]. Both the core language and its standard library are guaranteed to support multiple threads of control, and several different approaches to fundamental concurrency issues (synchronization, mutual exclusion, etc.) are provided. Concurrency interfaces in C++11 are implemented by the async() function and methods of the thread class. These approaches are largely interchangeable,

2 and differ primarily in that the async() function provides a simple mechanism for returning function results. Prior to the C++11 standard, software developers were forced to rely upon external libraries for multi-threaded applications. Among these external libraries, POSIX threads, Boost threads, OpenMP, and MPI are the most widely used. The message passing interface implementation of MPI is geared towards distributed (rather than multi-core) platforms, and was not considered in this study. POSIX threads (Pthreads) [9] were introduced in the 1980 s, in an early effort to implement a portable interface for parallel programming in C. Boost is a set of libraries for C++ that provide support for a wide range of tasks, including concurrency, linear algebra, random numbers, regular expressions, and unit testing. Many of the newer C++ features have relied upon Boost as a test bed, prior to formal adoption as part of the language standard. OpenMP is a multi-platform library that supports multiprocessing on many different processor architectures and operating systems, with language bindings to C, C++, and Fortran [8]. Multithreading support in OpenMP is implemented via #pragma directives to the preprocessor. 3 Concurrency performance In this study [2], two benchmarks were used to compare the performance of five concurrency frameworks (C++ async, C++ threads, Pthreads, Boost Threads, and OpenMP): primality testing by trial division [16] (prime), and a long series of assembly instructions (asmproc). The prime benchmark was used in a previous study by one of the authors [11]. When testing a set of numbers for primality, almost all the computation is parallelizable. In other words, N processors should provide close to the theoretical maximum N-fold speedup. Potential concurrency issues such as race conditions, deadlock, etc. [3][4][9] do not occur, since each primality test is independent of all others. Code for the primality testing by trial division is listed in Figure 1. The input value N is tested for primality by dividing by all possible odd factors between 3 and N/2. The routine does not short circuit the primality testing; it continues to divide by all factors, regardless of whether the number is already determined to be non-prime. bool is_prime( unsigned n ) { if ( n == 2 ) return true; // 2 is only even prime if ( n < 2 n % 2 == 0 ) return false; // test all odd factors up to n / 2 bool prime = true; for ( unsigned i = 3; i < n / 2; i += 2 ) if ( n % i == 0 ) prime = false; return prime; } Figure 1: Primality testing by trial division This benchmark, while simple and robust, does not necessarily reflect all types of real-world computations. Typical multi-threaded applications often involve longer serial instructions sequences. Such longer sequences of operations allow the processor to make better use of instruction prefetch, which in turn decreases the need for the kernel to swap out the process from the CPU for a different task. The assembly instruction benchmark (Figure 2) had roughly an order of magnitude more operations than the primality test benchmark. bool asmproc ( unsigned n ) { bool retval; for ( unsigned i = 0; i < n; i++ ) { // long series of add, multiply, shift instructions } return retval; } Figure 2: Assembly instruction benchmark Both benchmarks were executed with an input number N that was proportional to the number of iterations in the benchmark loop. (For the prime benchmark, N was the prime number to be checked for primality.) These benchmarks are purely CPU bound, rather than memory or I/O bound. Benchmark code was executed on a variety of hardware platforms, including older Intel quad-core CPUs, newer Intel dual-core i5 and quad-core i7 CPUs, and 16- core and 256-core Xeon CPUs [5]. The newer CPUs are all hyperthreaded, doubling the reported number of hardware threads. Software platforms included Windows and Linux. The GNU g++ compiler was used to compile C++ code, with various levels of optimization. Other than for Pthreads, optimization had little impact on these benchmarks. 4 Theoretical speedup According to Amdahl s Law [1], the speedup gains from concurrency are a function of both the serial processing time and parallelizable processing time: t p = t ser + t par / c (1) where t ser is the serial processing time, t par is the parallelizable processing time, and t p is the processing time to run on c cores (processors). In the real world, however, things do not work out quite so neatly. Let us consider a serial process that takes t 1 time to complete on one processor. Adding more processors will not speed up this process, since it is inherently serial in nature. However, multiple processes may be run concurrently on parallel hardware. For example, we may have a process that tests numbers for primality. On a 4-core machine, assuming complete utilization of parallel hardware, up to 4 processes (primality tests) may execute simultaneously, in the same time it takes to run one process.

3 What happens when the number of threads exceeds the number of available processors? Depending on how concurrency is implemented, there are several possible outcomes, exemplified by two extremes. In the first scenario, concurrent execution is coarsely quantized. The scheduler allows the first 4 processes to run to completion in time t 1, at which point a processor becomes available to run process 5. This process also takes time t 1 to complete, and the total time is 2 t 1. Alternatively, the scheduler may switch between tasks with finer granularity, using time slices too short for the process to complete prior to task switching. In this case we might expect a completion time of 5/4 t 1. These are extreme cases, and the actual completion time might lie somewhere between 2 t 1 and 5/4 t 1. As illustrated in Figure 3, the red line shows serial processing time, which increases linearly as the number of processes (threads). The stair-step green line illustrates the coarse quantization scenario, and the blue line illustrates fine quantization. Figure 4: Expected speedup of a 4-core system 5 Results The Figures 5 and 6 in this section show the results of benchmarking the prime and asmproc routines, plotting speedup vs. number of processes for a 4-core system. Figure 3: Theoretical serial and parallel times for a 4-core processor running multiple threads Figure 5: Measured speedup: asmproc benchmark This analysis yields the following mathematical model: t ser = t 1 p (2) t fine = max( t 1 p / c, t 1 ) (3) t coarse = t 1 floor( ( p + c 1 ) / c ) (4) where t 1 is the serial time for one process to complete, p is the number of processes (threads), and c is the number of cores (processors). t ser, t fine, and t coarse are serial, fine-grained parallel and coarse-grained parallel processing times for p processes, respectively. From Equations (3) and (4), we can predict the expected speedup of a 4-core system for fine- and coarse-grained thread scheduling, respectively. This is illustrated in Figure 4. Figure 6: Measured speedup: prime benchmark

4 The measured timings match the predicted curves quite well. OpenMP seems to use coarse-grained scheduling on this system, whereas the other multithreading libraries allow for fine-grained scheduling. The speedup drop at c+1 processes is small but reproducible, and is likely due to scheduler overhead. The Figures 7-10 show the impact of problem size (i.e., time required to complete processing of the concurrent routine). Note that these curves are consistent across different benchmarks and threads. This indicates that, to achieve optimal speedup, a minimal amount of time must be spent in the concurrent routine (otherwise scheduler overhead reduces concurrency gains). The exception is OpenMP which is susceptible to the number of threads launched as described earlier. Figure 9: Impact of problem size: asmproc benchmark, 12 cores, 4 threads Figure 7: Impact of problem size: prime benchmark, 12 cores, 4 threads Figure 8: Impact of problem size: prime benchmark, 12 cores, 32 threads Figure 10: Impact of problem size: asmproc benchmark, 12 cores, 32 threads To further investigate the impact of scheduler overhead, we ran tests using a no-op benchmark: a routine which was called and simply returned immediately. Timing this routine gives a measure of the multithreading overhead for each of the concurrency libraries. Table 1 lists the results for different numbers of threads, ranging from 10 2 to 10 9 threads. Table 1: Threadpool overhead (times in msec). log N serial async omp pthread thread boost Time to execute the thread significantly exceeds threadpool instantiation time after log(n) of approximately 7, which is consistent with the previous benchmark observations on optimal concurrency performance. A

5 simple rule of thumb is that if each thread takes at least 10msec to execute, more time is spent processing data relative to the 1-2msec instantiation time of the thread data structures. The expected speedup gain is a ratio of instantiation time to thread execution time. To examine the impact of scalability, we ran benchmarks on a 256-core system. As shown in the following figures, the results are as expected for a 40-core system due to the nature of such a machine not having only a single user running programs at a given time. As can be clearly seen from Figure 11 and 12, there is a linear relation to speedup based on the number of threads up until the system is 100% utilized. At full utilization, it is up to the scheduler to allocate resources to each thread based on the other processes running on the machine. instruction sequence, achieves higher speedups. This is shown in Figures 13 and 14. Figure 13: Impact of hyper-threading: prime benchmark, 16 cores (32 hardware threads) Figure 11: Measured speedup: prime benchmark, 256 cores, 85% system load Figure 12: Measured speedup: asmproc benchmark, 256 cores, 85% system load The impact of hyper-threading depends on a number of factors, including compiler, degree of optimization, CPU model, and instruction sequence. For a 16-core Xeon system, the asmproc benchmark, with its longer Figure 14: Impact of hyper-threading: asmproc benchmark, 16 cores (32 hardware threads) 6 Conclusions In this study, we present a refined version of Amdahl s Law. Our model takes scheduler granularity into account. Although fine-grained scheduler quantization appears to be the rule, our benchmarks demonstrate that coarsegrained scheduler quantization can be observed in some concurrency frameworks (notably OpenMP). As expected, more cores equates to a greater concurrency speedup. But for schedulers with coarse granularity, optimal utilization of parallel hardware will take place when either the number of threads is an exact multiple of the number of cores, or when there are significantly more threads than processor cores. This conclusion is born out in both theory and practice. The drop in performance when the number of threads first exceeds the number of cores is notable. As a result, it

6 is generally best to maximize the number of threads in an application, in order to maximize the concurrency speedup. This reduces both the impact of scheduler granularity and stalled (e.g., I/O bound) threads. We also examined the impact of scheduler overhead on concurrency optimization. Longer processes will generally benefit more from parallelization, since the impact of concurrency overhead is reduced. Sufficiently optimal thread execution times started at 100ms in our tests. This gave us speedups close to the theoretical maximum while still splitting the serial task into smaller sub-tasks. Finally, we tested two different benchmarks with five different concurrency frameworks on a variety of systems. Scalability was excellent for both benchmarks, with more processors giving an (expected) greater speedup. System load impacts performance as expected: only available processors yield concurrency speedups. hyper-threading is not equivalent to doubling the number of processors, but benchmarks with longer instruction sequences seem to perform as much as 30% better on hyperthreaded processors. Finally, the concurrency framework (async, C++11 threads, Pthreads, Boost Threads, OpenMP) had relatively little impact on performance. This was somewhat surprising, since the low-level Pthread approach is closest to the hardware, and might be expected to execute most efficiently. In reality, Pthread code seldom ran faster than the other threading libraries, and in some cases actually performed worse. OpenMP has the highest level of abstraction, and its performance might be expected to suffer accordingly. The major performance impact observed in this study seems due to scheduler granularity. Hence the choice of concurrency framework should be based more on usability than performance. From a usability standpoint, Pthreads require low-level C code (function callbacks with void pointers) and a high degree of manual resource management. The C++11 concurrency interface offers type safety, elegant syntax, and more features, and is clearly superior for most tasks. The same is true of Boost Threads. OpenMP is remarkably easy to use, and offers Fortran bindings in addition to C/C++. However, it relies upon preprocessor decorations (#pragma omp) which hides the parallel implementation from the user, making it more difficult to achieve the same degree of concurrency control provided by the other multithreading libraries. The performance increase from concurrent processing on multi-core processors seems well worth the relatively small coding effort. Barring unforeseen technological breakthroughs, it seems evident that concurrency frameworks will become increasingly important in the near future. References [1] Gene M. Amdahl, Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities, AFIPS Conference Proceedings (30): pp , [2] Andras Fekete, CPP Thread Tester [3] M. Herlihy and N. Shavit. The Art of Multiprocessor Programming, Morgan Kaufman, [4] W. Hwu and D. Kirk. Programming Massively Parallel Processors: A Hands-on Approach, Elsevier Science, [5] Intel processors: Dec [6] G. E. Moore, Cramming More Components onto Integrated Circuits, Electronics, pp , [7] Geoffrey Nelissen, Vandy Berten, Joel Goossens, Dragomir Milojevic, Techniques Optimizing the Number of Processors to Schedule Multi-Threaded Tasks, Proc. - Euromicro Conf. Real-Time Syst., pp , [8] OpenMP: Dec [9] POSIX threads: POSIX_Threads, Dec [10] Jennifer B. Sartor and Lieven Eeckhout, Exploring Multi-Threaded Java Application Performance on Multicore Hardware, SIGPLAN, vol. 47, no. 10, pp , [11] John Weiss, Comparison of POSIX Threads, OpenMP and C Concurrency Frameworks, in 30th International Conference on Computers and Their Applications, 2015, pp [12] Wikipedia: C , Dec [13] Wikipedia: Flynn s taxonomy. org/ wiki/flynn%27s_taxonomy, Dec [14] Wikipedia: Multi-core processors. Wikipedia.org/wiki/Multi-core_processor, Dec [15] Wikipedia: Parallel computing. en.wikipedia.org/wiki/parallel_computing, Dec [16] Wikipedia: Primality testing. en.wikipedia.org/wiki/primality_test, Dec [17] Wikipedia: Von Neumann architecture. Von_Neumann_architecture, Dec [18] Ruken Zilan, Javier Verdu, Jorge Garcia, Mario Nemirovsky, Rodolfo Milito, Mateo Valero, An Abstraction Methodology for the Evaluation of Multicore Multi-threaded Architectures, IEEE Int. Work. Model. Anal. Simul. Comput. Telecommun. Syst. - Proc., pp , 2011.

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Big problems and Very Big problems in Science How do we live Protein

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Parallelism. Parallel Hardware. Introduction to Computer Systems

Parallelism. Parallel Hardware. Introduction to Computer Systems Parallelism We have been discussing the abstractions and implementations that make up an individual computer system in considerable detail up to this point. Our model has been a largely sequential one,

More information

Parallel Computing Concepts. CSInParallel Project

Parallel Computing Concepts. CSInParallel Project Parallel Computing Concepts CSInParallel Project July 26, 2012 CONTENTS 1 Introduction 1 1.1 Motivation................................................ 1 1.2 Some pairs of terms...........................................

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing This document consists of two parts. The first part introduces basic concepts and issues that apply generally in discussions of parallel computing. The second part consists

More information

The Art of Parallel Processing

The Art of Parallel Processing The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a

More information

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Issues in Parallel Processing Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction Goal: connecting multiple computers to get higher performance

More information

Improving Performance and Power of Multi-Core Processors with Wonderware and System Platform 3.0

Improving Performance and Power of Multi-Core Processors with Wonderware and System Platform 3.0 Improving Performance and Power of Multi-Core Processors with Wonderware and Krishna Gummuluri, Wonderware Software Development Manager Introduction Wonderware has spent a considerable effort to enable

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen

More information

Issues In Implementing The Primal-Dual Method for SDP. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

Issues In Implementing The Primal-Dual Method for SDP. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM Issues In Implementing The Primal-Dual Method for SDP Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu Outline 1. Cache and shared memory parallel computing concepts.

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework and numa control Examples

More information

Performance of Multicore LUP Decomposition

Performance of Multicore LUP Decomposition Performance of Multicore LUP Decomposition Nathan Beckmann Silas Boyd-Wickizer May 3, 00 ABSTRACT This paper evaluates the performance of four parallel LUP decomposition implementations. The implementations

More information

Introduction to High-Performance Computing

Introduction to High-Performance Computing Introduction to High-Performance Computing Simon D. Levy BIOL 274 17 November 2010 Chapter 12 12.1: Concurrent Processing High-Performance Computing A fancy term for computers significantly faster than

More information

Chapter 4: Multithreaded Programming

Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013 Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Trends and Challenges in Multicore Programming

Trends and Challenges in Multicore Programming Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores

More information

Parallel Computing Why & How?

Parallel Computing Why & How? Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 Outline 1 Motivation 2 Parallel

More information

Introduction II. Overview

Introduction II. Overview Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and

More information

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures University of Virginia Dept. of Computer Science Technical Report #CS-2011-09 Jeremy W. Sheaffer and Kevin

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

BİL 542 Parallel Computing

BİL 542 Parallel Computing BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications

More information

CS 475: Parallel Programming Introduction

CS 475: Parallel Programming Introduction CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

10th August Part One: Introduction to Parallel Computing

10th August Part One: Introduction to Parallel Computing Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially

More information

Multi-core Architectures. Dr. Yingwu Zhu

Multi-core Architectures. Dr. Yingwu Zhu Multi-core Architectures Dr. Yingwu Zhu What is parallel computing? Using multiple processors in parallel to solve problems more quickly than with a single processor Examples of parallel computing A cluster

More information

Review of previous examinations TMA4280 Introduction to Supercomputing

Review of previous examinations TMA4280 Introduction to Supercomputing Review of previous examinations TMA4280 Introduction to Supercomputing NTNU, IMF April 24. 2017 1 Examination The examination is usually comprised of: one problem related to linear algebra operations with

More information

Claude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique Got 2 seconds Sequential 84 seconds Expected 84/84 = 1 second!?! Got 25 seconds MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Séminaire MATHEMATIQUES

More information

Parallelism Marco Serafini

Parallelism Marco Serafini Parallelism Marco Serafini COMPSCI 590S Lecture 3 Announcements Reviews First paper posted on website Review due by this Wednesday 11 PM (hard deadline) Data Science Career Mixer (save the date!) November

More information

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Parallel Processing Basics Prof. Onur Mutlu Carnegie Mellon University Readings Required Hill, Jouppi, Sohi, Multiprocessors and Multicomputers, pp. 551-560 in Readings in Computer

More information

27. Parallel Programming I

27. Parallel Programming I The Free Lunch 27. Parallel Programming I Moore s Law and the Free Lunch, Hardware Architectures, Parallel Execution, Flynn s Taxonomy, Scalability: Amdahl and Gustafson, Data-parallelism, Task-parallelism,

More information

CS377P Programming for Performance Multicore Performance Multithreading

CS377P Programming for Performance Multicore Performance Multithreading CS377P Programming for Performance Multicore Performance Multithreading Sreepathi Pai UTCS October 14, 2015 Outline 1 Multiprocessor Systems 2 Programming Models for Multicore 3 Multithreading and POSIX

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

27. Parallel Programming I

27. Parallel Programming I 771 27. Parallel Programming I Moore s Law and the Free Lunch, Hardware Architectures, Parallel Execution, Flynn s Taxonomy, Scalability: Amdahl and Gustafson, Data-parallelism, Task-parallelism, Scheduling

More information

Fundamentals of Computer Design

Fundamentals of Computer Design Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

Chapter 4: Threads. Chapter 4: Threads

Chapter 4: Threads. Chapter 4: Threads Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

THREAD LEVEL PARALLELISM

THREAD LEVEL PARALLELISM THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Dec. 11 th This lecture

More information

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)

More information

27. Parallel Programming I

27. Parallel Programming I 760 27. Parallel Programming I Moore s Law and the Free Lunch, Hardware Architectures, Parallel Execution, Flynn s Taxonomy, Scalability: Amdahl and Gustafson, Data-parallelism, Task-parallelism, Scheduling

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

Introduction to parallel computing

Introduction to parallel computing Introduction to parallel computing 3. Parallel Software Zhiao Shi (modifications by Will French) Advanced Computing Center for Education & Research Vanderbilt University Last time Parallel hardware Multi-core

More information

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 Readings: Multiprocessing Required Amdahl, Validity of the single processor

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin

More information

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Introduction Goal: connecting multiple computers to get higher performance

More information

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS Computer Science 14 (4) 2013 http://dx.doi.org/10.7494/csci.2013.14.4.679 Dominik Żurek Marcin Pietroń Maciej Wielgosz Kazimierz Wiatr THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer

More information

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples Multicore Jukka Julku 19.2.2009 1 2 3 4 5 6 Disclaimer There are several low-level, languages and directive based approaches But no silver bullets This presentation only covers some examples of them is

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete

More information

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018 Introduction CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University What is Parallel

More information

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008 SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

The determination of the correct

The determination of the correct SPECIAL High-performance SECTION: H i gh-performance computing computing MARK NOBLE, Mines ParisTech PHILIPPE THIERRY, Intel CEDRIC TAILLANDIER, CGGVeritas (formerly Mines ParisTech) HENRI CALANDRA, Total

More information

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications September 2013 Navigating between ever-higher performance targets and strict limits

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

Designing for Performance. Patrick Happ Raul Feitosa

Designing for Performance. Patrick Happ Raul Feitosa Designing for Performance Patrick Happ Raul Feitosa Objective In this section we examine the most common approach to assessing processor and computer system performance W. Stallings Designing for Performance

More information

1.3 Data processing; data storage; data movement; and control.

1.3 Data processing; data storage; data movement; and control. CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical

More information

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015 18-447 Computer Architecture Lecture 27: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015 Assignments Lab 7 out Due April 17 HW 6 Due Friday (April 10) Midterm II April

More information

Parallel Computing Introduction

Parallel Computing Introduction Parallel Computing Introduction Bedřich Beneš, Ph.D. Associate Professor Department of Computer Graphics Purdue University von Neumann computer architecture CPU Hard disk Network Bus Memory GPU I/O devices

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES Anish Athalye and Patrick Long Mentors: Austin Clements and Stephen Tu 3 rd annual MIT PRIMES Conference Sequential

More information

Applying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems

Applying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems V Brazilian Symposium on Computing Systems Engineering Applying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems Alessandro Trindade, Hussama Ismail, and Lucas Cordeiro Foz

More information

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads Chapter 4: Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads Chapter 4: Threads Objectives To introduce the notion of a

More information

Software within building physics and ground heat storage. HEAT3 version 7. A PC-program for heat transfer in three dimensions Update manual

Software within building physics and ground heat storage. HEAT3 version 7. A PC-program for heat transfer in three dimensions Update manual Software within building physics and ground heat storage HEAT3 version 7 A PC-program for heat transfer in three dimensions Update manual June 15, 2015 BLOCON www.buildingphysics.com Contents 1. WHAT S

More information

High Performance Computing Systems

High Performance Computing Systems High Performance Computing Systems Shared Memory Doug Shook Shared Memory Bottlenecks Trips to memory Cache coherence 2 Why Multicore? Shared memory systems used to be purely the domain of HPC... What

More information

OPERATING SYSTEM. Chapter 4: Threads

OPERATING SYSTEM. Chapter 4: Threads OPERATING SYSTEM Chapter 4: Threads Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples Objectives To

More information

Oracle Developer Studio 12.6

Oracle Developer Studio 12.6 Oracle Developer Studio 12.6 Oracle Developer Studio is the #1 development environment for building C, C++, Fortran and Java applications for Oracle Solaris and Linux operating systems running on premises

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

Test on Wednesday! Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs)

Test on Wednesday! Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs) Test on Wednesday! 50 minutes Closed notes, closed computer, closed everything Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs) Study notes and readings posted on course

More information

GOP Level Parallelism on H.264 Video Encoder for Multicore Architecture

GOP Level Parallelism on H.264 Video Encoder for Multicore Architecture 2011 International Conference on Circuits, System and Simulation IPCSIT vol.7 (2011) (2011) IACSIT Press, Singapore GOP Level on H.264 Video Encoder for Multicore Architecture S.Sankaraiah 1 2, H.S.Lam,

More information

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started

More information

Parallel Processing & Multicore computers

Parallel Processing & Multicore computers Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)

More information

Top500 Supercomputer list

Top500 Supercomputer list Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity

More information

Scientific Programming in C XIV. Parallel programming

Scientific Programming in C XIV. Parallel programming Scientific Programming in C XIV. Parallel programming Susi Lehtola 11 December 2012 Introduction The development of microchips will soon reach the fundamental physical limits of operation quantum coherence

More information

Parallel Computing. Hwansoo Han (SKKU)

Parallel Computing. Hwansoo Han (SKKU) Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo

More information

CS420: Operating Systems

CS420: Operating Systems Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing

More information

High Performance Computing

High Performance Computing The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS Performance Analysis of Java NativeThread and NativePthread on Win32 Platform Bala Dhandayuthapani Veerasamy Research Scholar Manonmaniam Sundaranar University Tirunelveli, Tamilnadu, India dhanssoft@gmail.com

More information

Intel Hyper-Threading technology

Intel Hyper-Threading technology Intel Hyper-Threading technology technology brief Abstract... 2 Introduction... 2 Hyper-Threading... 2 Need for the technology... 2 What is Hyper-Threading?... 3 Inside the technology... 3 Compatibility...

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming Linda Woodard CAC 19 May 2010 Introduction to Parallel Computing on Ranger 5/18/2010 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor

More information

Kernel Synchronization I. Changwoo Min

Kernel Synchronization I. Changwoo Min 1 Kernel Synchronization I Changwoo Min 2 Summary of last lectures Tools: building, exploring, and debugging Linux kernel Core kernel infrastructure syscall, module, kernel data structures Process management

More information

Intel Parallel Studio XE 2015

Intel Parallel Studio XE 2015 2015 Create faster code faster with this comprehensive parallel software development suite. Faster code: Boost applications performance that scales on today s and next-gen processors Create code faster:

More information