DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA
|
|
- Malcolm Berry
- 1 years ago
- Views:
Transcription
1 DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA M. GAUS, G. R. JOUBERT, O. KAO, S. RIEDEL AND S. STAPEL Technical University of Clausthal, Department of Computer Science Julius-Albert-Str. 4, Clausthal-Zellerfeld, Germany Distributed platforms are not necessarily well-suited for systems which handle large data sets, such as processed in multimedia applications. In this paper a specialised computation model, based on asynchronous transmission, is presented. As the necessary functions are encapsulated this system can be used without detailed knowledge of the system architecture. A dynamic strategy of task execution is utilised to adjust the number and size of the distributed data packages according to the computational load of the processing elements at transmission time. Thus more powerful PE s, or those whose resources are not fully utilised, will either receive packages more frequently or will be given larger packages. In large networks some nodes can be replaced by others or only a few data blocks may be sent to (a) particular node(s). The efficiency of the method is evaluated with a variety of practical run time measurements. 1 Introduction Distributed systems consisting of a network of workstations are increasingly being used for solving compute intensive problems. Distributed platforms are, however, not always well-suited for systems which handle large data sets as can be found in, for example, multimedia applications. The limiting factor for processing of large data sets is usually network bandwidth. Thus, the distribution of huge amounts of data bounds the overall processing speed. This situation is made worse by the fact that data is transmitted only when requested or sent by the parallel processes. In order to reduce this effect, the transmission of data should be separated from the process synchronisation. Well-known software systems for parallel/distributed processing on existing computer networks are PVM, MPI, PVMPI, Condor, Mosix [1-5] and Treadmarks. An advantage of the PVM is its availability on nearly all important architectures and operating systems. On the other hand synchronous data transfers and type conversions are time consuming, making it unsuitable for the processing of large multimedia data sets. 1.1 Multimedia data Multimedia data has become an important component of modern software systems. Static media (images, graphics, text) are combined with dynamic media (audio, video, animations) to obtain realistic representations of natural processes, for the visualisation of complex results or to depict dynamic processes. In spite of the increases in memory sizes, processing and communication speeds, the processing and communication of multimedia data is still time and submitted to World Scientific : : 13:47 1/1
2 compute intensive. Some of the initial problems, essentially data compression, could be solved by the development of efficient compression algorithms, e.g. JPEG, MPEG, MP3. Many of these algorithms have been implemented in hardware, offering the possibility of real time encoding. The next step resulted in parallelising numerous procedures for processing multimedia data. Static media, such as encountered in image processing applications, are usually subdivided into independent data fragments, which are then distributed among a number of processing elements. The results are gathered and combined to form the final result. In dynamic media, interdependencies between the different data blocks must be considered and resolved. An example for this is MPEG compression, which is based on finding and eliminating redundant information in consecutive frames. Parallelisation by means of data segmentation is well-suited for parallel computers with shared memory, since little or no time is spent on communicating the data. Software for distributed computing in heterogeneous networks will have less of a performance gain, because of the slow synchronous transfers and greater variances in client resources. If the operations executed are simple these delay effects can be seen quite clearly. An example for this is the calculation of correlation coefficients for short term series [8]. Considering all combinations between 100 shares and a time difference of 5 days resulted in correlation terms and 27 megabytes of data. The performance gain by parallelising the algorithm with the PVM among 4 DEC Alphas was negated by the resulting administration overhead. This resulted in the run time on a single workstation being up to 6 times faster than the parallel PVM version. These requirements (large data sets, simple operations) are also found in the management, retrieval and processing of multimedia data. Current approaches to multimedia databases are based on the extraction and management of specific characteristics. Queries compare the extracted characteristics with all images stored in the database, and return the most similar images. Each archival and retrieval process results in the computation of huge amounts of data. Performance gains through parallelisation are negated by transfer times and administration of the data, as described in the correlation example. This results in the necessity of a specialised model for parallel processing of huge amounts of data. 2 Processing model for static multimedia data The proposed processing model aims to make development of parallel programs by non-experienced users easy, and minimise the communication and management effort, by using TCP/IP sockets directly. Similar to the work pile model [6], this model is based on the creation of pools of tasks, which are controlled by three special processes (distribution and collection manager, computation client). The information is divided into sections which are distributed to a number of processing elements (Figure 1). submitted to World Scientific : : 13:47 2/2
3 PE 1 Pool of Tasks Distribution manager PE 2 : Collection manager Pool of Results PE n Figure 1: Schematic representation of the processing model 2.1 Distribution manager The distribution manager is responsible for the division and management of the data packets to be processed. Push technology is used to minimise the transfer cost between server and clients. The responsibility of the distribution manager includes data packets definition, management of data packets in the local pool of tasks, processing of client requests and distribution of the data packets among the processing elements. The distribution strategy is set within this process. Essential requirements include the efficient use of available resources, as well as being failure tolerant. To circumvent problems related to processing element failures the data packets are subdivided into three groups: the first group consists of packages which were not yet distributed, the second group comprises transmitted, but unprocessed data, whereas the third group consists of processed data packets. A simple distribution strategy of available data packets increases computing efficiency. If the first group is empty, but non-processed data blocks are still in the second group, then these are dispatched to idle clients, which have already completed their computation tasks. This can be achieved by generating a list of all available active nodes and of the status of their local pools of tasks. The number of distributed but not yet processed packets can be calculated from the number of packets sent, but not yet received by the collection task. This requires a direct connection between the distributor and the collector. The difference is analysed and compared to a given threshold values. If it is below the threshold the distribution manager sends new packets to the client. This strategy requires a time and/or workload oriented distribution of the data packets as well, since processing can only occur if the processing element has a low CPU load. A blocked client that does not satisfy this requirement is regarded as a node that has failed. The server will redistribute the data packets sent to this client. 2.2 Computation client This component performs the computation on each processing element. A simple and compact structure reduces the management overhead and enables an important performance increase. The computation client consists of a local pool of tasks, a submitted to World Scientific : : 13:47 3/3
4 processing object and a local pool of results. In this pool the processed data packets are temporarily stored until a connection for the transfer to the collection manager becomes available. 2.3 Collection manager This process accepts processed data packets from the computation clients and stores them until all data packets have been received in the pool of results. Once this occurs, it composes the processed original from the received data packets. A picture or a series of pictures would be composed at this point during e.g. JPEG-encoding. Furthermore, the collector sends a message giving the number of received data packets to the distributor. From this information the distribution manager determines the current workload of each client and redefines the distribution strategy. The distributor is also notified when all data packets have reached the collector and the processing is completed. 2.4 Arraying in multiple hierarchical levels The described model consists of two hierarchical levels, containing the distribution and collection processes on one level, and computation clients on the other. This model will reach its capacities quickly with a large number of non-local processing elements. An alternative is to arrange servers hierarchically. The lower levels of this hierarchy contain not only clients, but subordinated servers as well, which distribute the data packets to lower level clients. An example for the application of such a model are data distributions in corporate or university networks: a super server sends data packets to subordinate servers in each division. Each of these servers initiates the computation in its own domain. This significantly reduces the communication complexity, or at least binds it locally. The processed packets are still sent to a central collector making dynamic regrouping possible. The clients of a new group will then receive their packets from the server of the new group. Marking the processed data packets with the id of the group which processed them is mandatory. This allows the collector to find out which group processed each data packet so that this group is resupplied with data to process once it drops below a given threshold. 3 An adaptive distribution strategy Heterogeneous networks consist of processing elements with different performance capabilities (CPU, memory etc). Information about the complexity of tasks being processed is usually not available. Furthermore, the number of users working on a particular workstation are continuously changing. Thus it is impossible to predict the performance of any particular workstation in a network at a given time. This submitted to World Scientific : : 13:47 4/4
5 makes it impossible to a priori schedule task processing. A dynamic distribution strategy of processing tasks is thus needed. The number and size of the distributed data packages must be adapted to the work load of the processing element at transmission time. Even this strategy may not be near optimal, as additional tasks can be started on the PE between the determination of the current load and the arrival of data packages. More powerful PE s or those with a small performance utilisation will receive packets more frequently or will be allocated larger packages. In large networks some low performance nodes can be skipped and the work distributed to more powerful PE s. If this is not possible the data blocks sent to the low performance nodes will automatically be adapted. For the concrete realisation of this method a performance ranking must be generated. This can be done by calculating the difference between sent and processed packages as described above. In the first distribution run each processing element is supplied with n packages. After a certain time interval a performance rank list is created. The number of packets for the respective processing elements are then increased or decreased. This operation is repeated until the collector has received all data. Alternatively the packet size can be adapted. Larger packets are sent to the PE s at the top of the performance list. This can minimise the communication and network traffic. However, this is not always possible. For example, an image is usually subdivided into n sections. If all sections are distributed during the first run a change of the package size is not possible without a loss of already processed data. This performance information can only be used if the image has large dimensions, or if a whole image sequence is to be processed. A disadvantage of this model is that additional logic for the management of dynamic block sizes is necessary in the clients. Furthermore the complexity of the model tasks and the requirements regarding the user knowledge are increased. 4 The usage of the system The data flow of the proposed model for the parallel processing of multimedia data involves the following steps: The generated data packets are put into the pool of tasks when processing starts and the distribution manager is initialised. The data packets, received by the clients, are stored in the local pool of tasks, which is essentially a queue. Afterwards the computation starts. Processed data is stored in the local pool of results and sent to the collection manager. The collection manager informs the distribution manager of the receipt of the processed packets. When all data packets have been received, the so-called NULL-packet is distributed. Every processing element which receives a NULL-packet immediately terminates processing. An object oriented system design will help making system components reusable and lessens the difficulty of using the distribution models. The most submitted to World Scientific : : 13:47 5/5
6 important class is the processing class. It does the actual processing and is the focal point of the model. All other classes support it by managing the administration, reception and distribution of data. The parameter of its run()-method contains the data to be processed. The packet is processed in this method, stored in the local pool of results by means of a return call and is then sent back. The usage of this system merely requires an overloading of the run()-method of the processing class, adjusting the class for special problems. The distribution and collection manager have to be initialised at the beginning of a session. Furthermore, the required processes need to be launched in the processing nodes. These will then contact the distributor and collector on their own. At this stage the system will be idle. The pool of tasks is now filled with the required packets. Once this has been done, the distributor is activated and the data is processed. All processed packets are stored in the pool of results. Manipulating the packet size requires overloading of the methods that split and merge the packets. 5 Performance measurements The measurements were performed on a cluster of Linux K6, 300 MHz PCs connected over a 10Mbit Ethernet. In a first attempt different block sizes and number of iterations as well as various configurations of the processing model were examined in order to obtain data about the efficiency and the run time behaviour of the proposed system. Table 1: Measurement results (run times, speedup and efficiency) with the implemented prototype Iterations Time[s]: 1 PE Time[s]/Sp/Ep : 2 PE Time[s]/Sp/Ep : 3 PE Time[s]/Sp/Ep : 4 PE / 1.407/ / 1.615/ / 1.590/ / 1.586/ / 1.751/ / 1.939/ / 1.637/ / 1.852/ / 2.136/ / 1.758/ / 2.036/ / 2.409/ / 1.768/ / 2.215/ / 2.501/ / 1.805/ / 2.218/ / 2.514/ / 1.829/ / 2.388/ / 2.797/ / 1.855/ / 2.400/ / 2.921/ / 1.846/ / 2.429/ / 2.991/ / 1.825/ / 2.511/ / 3.155/ / 1.938/ / 2.529/ / 3.301/ Table 1 shows the run times needed for iterations of a simple inverting operation performed on a 10 Mbyte large block as well as the speedup factor S P submitted to World Scientific : : 13:47 6/6
7 and the efficiency E P. The data is subdivided into byte large subsections and according to the strategy described distributed to the single PE clients. Speedup values between 1.4 and 3.3 are reached in this simple application. At the beginning the network communication is the most influencing factor resulting in speedups between (2 PE s) and 1.59 (4 PE s). With larger numbers of iterations a linear increase of the speedup values can be observed reaching top speedup values of 3.3 in case of 4 PE s and 200 iterations. The efficiency decreases only slightly, e.g. there is a difference of 0.24 between the mean values of the two and four PE systems. Thus the scalability of the system model appears to be good. A clearer description of the results is given in figure 2. The right hand diagram shows the run times of the different system configurations, the left hand diagram contains the mean speedup and efficiency values for the parallel configurations. 3,0 2,5 Speedup and Efficiency (mean values ) Speedup Efficiency Client 2 Clients 3 Clients 4 Clients 2,0 1,5 1,0 0,5 Time [s] , Processing Elements Iterations Figure 2: A diagram of the speedup and efficiency values achieved (left); run times for 1-4 PEs (right) The achieved results are compared to the mean speedup and efficiency values of the PVM, which are shown in figure 3. The measurements are performed on the same configurations (K6 with Linux, distribution of byte large blocks) and type conversion disabled. Speedup and Efficiency PVM (mean values) 3 2,5 2 Speedup Efficiency 1,5 1 0, Processing Elements Figure 3: A diagram of the PVM average speedup and efficiency values submitted to World Scientific : : 13:47 7/7
8 An analysis of the PVM results shows slightly better speedup and efficiency values in case of two processing elements. These decrease when larger numbers of PE s are used. The effort of management and transfer clearly reduces the performance. Thus the proposed system model reaches a five times better speedup and efficiency in case of configurations with four PEs. 6 Conclusions In this paper a specialised computation model based on asynchronous transmission is presented, which automatically adapts to the workload of the elements in the parallel environment at transmission time, enables easy development of parallel programs and minimises the communication and management effort by direct use of TCP/IP sockets. It is based on the creation of pools of tasks, which are controlled by three special modules. A simple distribution strategy of the available packages increases the computing efficiency. More powerful processing elements or such with a small workload will more frequently receive packages. Additionally the package size can be adapted. The efficiency of the proposed method is evaluated through a variety of performance measurements. The results are compared with the results of the PVM. Future work includes extensions, which primarily concern improving the system s performance. Storing the packets in the local file system, similar to a spool-directory, makes it possible to save all packets of the same type that are to be processed in a special directory. Furthermore, comparative benchmarks with other systems are to be performed. References: 1. PVM Home page: Documentation, comparison between various packages, 2. CONDOR Project description, documentation, 3. MPI Project Home page: Documentation, tutorials, etc, 4. Mosix Home page: 5. Information about PVMPI: 6. S. Keinman, D. Shah, Programming with Threads, Prentice Hall, B. Wilkinson, M. Allen: Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers, Prentice Hall, O. Sachs, Analyse von Aktienreihen mittels paralleler Korrelationsberechnungen, Master thesis, TU Clausthal, 1998 submitted to World Scientific : : 13:47 8/8
Fractals exercise. Investigating task farms and load imbalance
Fractals exercise Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
Fractals. Investigating task farms and load imbalance
Fractals Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
Transactions on Information and Communications Technologies vol 15, 1997 WIT Press, ISSN
Balanced workload distribution on a multi-processor cluster J.L. Bosque*, B. Moreno*", L. Pastor*" *Depatamento de Automdtica, Escuela Universitaria Politecnica de la Universidad de Alcald, Alcald de Henares,
Job Re-Packing for Enhancing the Performance of Gang Scheduling
Job Re-Packing for Enhancing the Performance of Gang Scheduling B. B. Zhou 1, R. P. Brent 2, C. W. Johnson 3, and D. Walsh 3 1 Computer Sciences Laboratory, Australian National University, Canberra, ACT
Chapter 3. Design of Grid Scheduler. 3.1 Introduction
Chapter 3 Design of Grid Scheduler The scheduler component of the grid is responsible to prepare the job ques for grid resources. The research in design of grid schedulers has given various topologies
PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18
PROCESS VIRTUAL MEMORY CS124 Operating Systems Winter 2015-2016, Lecture 18 2 Programs and Memory Programs perform many interactions with memory Accessing variables stored at specific memory locations
I/O in the Gardens Non-Dedicated Cluster Computing Environment
I/O in the Gardens Non-Dedicated Cluster Computing Environment Paul Roe and Siu Yuen Chan School of Computing Science Queensland University of Technology Australia fp.roe, s.chang@qut.edu.au Abstract Gardens
Why Study Multimedia? Operating Systems. Multimedia Resource Requirements. Continuous Media. Influences on Quality. An End-To-End Problem
Why Study Multimedia? Operating Systems Operating System Support for Multimedia Improvements: Telecommunications Environments Communication Fun Outgrowth from industry telecommunications consumer electronics
Chapter 7 Multimedia Operating Systems
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 7 Multimedia Operating Systems Introduction To Multimedia (1) Figure 7-1. Video on demand using different local distribution technologies.
Continuous Real Time Data Transfer with UDP/IP
Continuous Real Time Data Transfer with UDP/IP 1 Emil Farkas and 2 Iuliu Szekely 1 Wiener Strasse 27 Leopoldsdorf I. M., A-2285, Austria, farkas_emil@yahoo.com 2 Transilvania University of Brasov, Eroilor
Multimedia Communications. Transform Coding
Multimedia Communications Transform Coding Transform coding Transform coding: source output is transformed into components that are coded according to their characteristics If a sequence of inputs is transformed
Final Project Writeup
Jitu Das Bertha Lam 15-418 Final Project Writeup Summary We built a framework that facilitates running computations across multiple GPUs and displaying results in a web browser. We then created three demos
PowerVR Series5. Architecture Guide for Developers
Public Imagination Technologies PowerVR Series5 Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
Operating System Support for Multimedia. Slides courtesy of Tay Vaughan Making Multimedia Work
Operating System Support for Multimedia Slides courtesy of Tay Vaughan Making Multimedia Work Why Study Multimedia? Improvements: Telecommunications Environments Communication Fun Outgrowth from industry
An Optimized Search Mechanism for Large Distributed Systems
An Optimized Search Mechanism for Large Distributed Systems Herwig Unger 1, Thomas Böhme, Markus Wulff 1, Gilbert Babin 3, and Peter Kropf 1 Fachbereich Informatik Universität Rostock D-1051 Rostock, Germany
Process. One or more threads of execution Resources required for execution. Memory (RAM) Others
Memory Management 1 Learning Outcomes Appreciate the need for memory management in operating systems, understand the limits of fixed memory allocation schemes. Understand fragmentation in dynamic memory
CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007
CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007 Question 344 Points 444 Points Score 1 10 10 2 10 10 3 20 20 4 20 10 5 20 20 6 20 10 7-20 Total: 100 100 Instructions: 1. Question
CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University
CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [NETWORKING] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Why not spawn processes
Design of Parallel Algorithms. Course Introduction
+ Design of Parallel Algorithms Course Introduction + CSE 4163/6163 Parallel Algorithm Analysis & Design! Course Web Site: http://www.cse.msstate.edu/~luke/courses/fl17/cse4163! Instructor: Ed Luke! Office:
From Cluster Monitoring to Grid Monitoring Based on GRM *
From Cluster Monitoring to Grid Monitoring Based on GRM * Zoltán Balaton, Péter Kacsuk, Norbert Podhorszki and Ferenc Vajda MTA SZTAKI H-1518 Budapest, P.O.Box 63. Hungary {balaton, kacsuk, pnorbert, vajda}@sztaki.hu
Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel
Chapter-6 SUBJECT:- Operating System TOPICS:- I/O Management Created by : - Sanjay Patel Disk Scheduling Algorithm 1) First-In-First-Out (FIFO) 2) Shortest Service Time First (SSTF) 3) SCAN 4) Circular-SCAN
Quality of Service II
Quality of Service II Patrick J. Stockreisser p.j.stockreisser@cs.cardiff.ac.uk Lecture Outline Common QoS Approaches Best Effort Integrated Services Differentiated Services Integrated Services Integrated
PARA++ : C++ Bindings for Message Passing Libraries
PARA++ : C++ Bindings for Message Passing Libraries O. Coulaud, E. Dillon {Olivier.Coulaud, Eric.Dillon}@loria.fr INRIA-lorraine BP101, 54602 VILLERS-les-NANCY, FRANCE Abstract The aim of Para++ is to
Scheduling Large Parametric Modelling Experiments on a Distributed Meta-computer
Scheduling Large Parametric Modelling Experiments on a Distributed Meta-computer David Abramson and Jon Giddy Department of Digital Systems, CRC for Distributed Systems Technology Monash University, Gehrmann
Evaluating Algorithms for Shared File Pointer Operations in MPI I/O
Evaluating Algorithms for Shared File Pointer Operations in MPI I/O Ketan Kulkarni and Edgar Gabriel Parallel Software Technologies Laboratory, Department of Computer Science, University of Houston {knkulkarni,gabriel}@cs.uh.edu
The Transport Layer: User Datagram Protocol
The Transport Layer: User Datagram Protocol CS7025: Network Technologies and Server Side Programming http://www.scss.tcd.ie/~luzs/t/cs7025/ Lecturer: Saturnino Luz April 4, 2011 The UDP All applications
ENSC 427: COMMUNICATION NETWORKS (Spring 2011) Final Report
ENSC 427: COMMUNICATION NETWORKS (Spring 2011) Final Report Video Streaming over the 802.11g WLAN Technologies http://www.sfu.ca/~zxa7/ Zhenpeng Xue 301062408 zxa7@sfu.ca Page 2 of 16 Table of Contents
Lecture 13. Quality of Service II CM0256
Lecture 13 Quality of Service II CM0256 Types of QoS Best Effort Services Integrated Services -- resource reservation network resources are assigned according to the application QoS request and subject
Mitsubishi FX Net Driver PTC Inc. All Rights Reserved.
2017 PTC Inc. All Rights Reserved. 2 Table of Contents 1 Table of Contents 2 3 Overview 3 Device Setup 4 Channel Properties 5 Channel Properties - General 5 Channel Properties - Serial Communications 6
April 9, 2000 DIS chapter 1
April 9, 2000 DIS chapter 1 GEINTEGREERDE SYSTEMEN VOOR DIGITALE SIGNAALVERWERKING: ONTWERPCONCEPTEN EN ARCHITECTUURCOMPONENTEN INTEGRATED SYSTEMS FOR REAL- TIME DIGITAL SIGNAL PROCESSING: DESIGN CONCEPTS
File Format for Storage of Multimedia Information Peter Armyanov
M a t h e m a t i c a B a l k a n i c a New Series Vol. 24, 2010, Fasc.3-4 File Format for Storage of Multimedia Information Peter Armyanov This article studies problems referring to storage, editing and
The Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES
CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES OBJECTIVES Detailed description of various ways of organizing memory hardware Various memory-management techniques, including paging and segmentation To provide
Chapter 8: Main Memory. Operating System Concepts 9 th Edition
Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel
Process. One or more threads of execution Resources required for execution. Memory (RAM) Others
Memory Management 1 Process One or more threads of execution Resources required for execution Memory (RAM) Program code ( text ) Data (initialised, uninitialised, stack) Buffers held in the kernel on behalf
On the scalability of tracing mechanisms 1
On the scalability of tracing mechanisms 1 Felix Freitag, Jordi Caubet, Jesus Labarta Departament d Arquitectura de Computadors (DAC) European Center for Parallelism of Barcelona (CEPBA) Universitat Politècnica
LDPC benchmarking for DVB-H
Subject Description and discussion of the benchmarking of our LDPC solual-fec for DVB-H Category Report Revision 1.0.1 Authors STM/AST and INRIA Table of contents 1. Introduction...4 2. Test environment...5
Operating Systems (2INC0) 2017/18
Operating Systems (2INC0) 2017/18 Memory Management (09) Dr. Courtesy of Dr. I. Radovanovic, Dr. R. Mak (figures from Bic & Shaw) System Architecture and Networking Group Agenda Reminder: OS & resources
Query Answering Using Inverted Indexes
Query Answering Using Inverted Indexes Inverted Indexes Query Brutus AND Calpurnia J. Pei: Information Retrieval and Web Search -- Query Answering Using Inverted Indexes 2 Document-at-a-time Evaluation
MICE: A Prototype MPI Implementation in Converse Environment
: A Prototype MPI Implementation in Converse Environment Milind A. Bhandarkar and Laxmikant V. Kalé Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign
MEMORY MANAGEMENT/1 CS 409, FALL 2013
MEMORY MANAGEMENT Requirements: Relocation (to different memory areas) Protection (run time, usually implemented together with relocation) Sharing (and also protection) Logical organization Physical organization
PARALLEL XXXXXXXXX IMPLEMENTATION USING MPI
2008 ECE566_Puneet_Kataria_Project Introduction to Parallel and Distributed Computing PARALLEL XXXXXXXXX IMPLEMENTATION USING MPI This report describes the approach, implementation and experiments done
Fast optimal task graph scheduling by means of an optimized parallel A -Algorithm
Fast optimal task graph scheduling by means of an optimized parallel A -Algorithm Udo Hönig and Wolfram Schiffmann FernUniversität Hagen, Lehrgebiet Rechnerarchitektur, 58084 Hagen, Germany {Udo.Hoenig,
The Peregrine High-performance RPC System
SOFIWARE-PRACTICE AND EXPERIENCE, VOL. 23(2), 201-221 (FEBRUARY 1993) The Peregrine High-performance RPC System DAVID B. JOHNSON* AND WILLY ZWAENEPOEL Department of Computer Science, Rice University, P.
Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT
Hardware Assisted Recursive Packet Classification Module for IPv6 etworks Shivvasangari Subramani [shivva1@umbc.edu] Department of Computer Science and Electrical Engineering University of Maryland Baltimore
PART IV. Internetworking Using TCP/IP
PART IV Internetworking Using TCP/IP Internet architecture, addressing, binding, encapsulation, and protocols in the TCP/IP suite Chapters 20 Internetworking: Concepts, Architecture, and Protocols 21 IP:
Ping Driver PTC Inc. All Rights Reserved.
2017 PTC Inc. All Rights Reserved. 2 Table of Contents 1 Table of Contents 2 3 Overview 4 Channel Properties General 4 Channel Properties Ethernet Communications 5 Channel Properties Write Optimizations
Using Time Division Multiplexing to support Real-time Networking on Ethernet
Using Time Division Multiplexing to support Real-time Networking on Ethernet Hariprasad Sampathkumar 25 th January 2005 Master s Thesis Defense Committee Dr. Douglas Niehaus, Chair Dr. Jeremiah James,
Chapter 4 Communication
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 4 Communication Layered Protocols (1) Figure 4-1. Layers, interfaces, and protocols in the OSI
Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking
Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking Di-Shi Sun and Douglas M. Blough School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA
CS301 - Data Structures Glossary By
CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm
Building Mobile Applications. F. Ricci 2010/2011
Building Mobile Applications F. Ricci 2010/2011 Wireless Software Engineering Model Mobile User Analysis Scenario Analysis Architectural Design Planning Navigation & User Interface Design Maintenance Implementation
Symphony: An Integrated Multimedia File System
Symphony: An Integrated Multimedia File System Prashant J. Shenoy, Pawan Goyal, Sriram S. Rao, and Harrick M. Vin Distributed Multimedia Computing Laboratory Department of Computer Sciences, University
Receive Livelock. Robert Grimm New York University
Receive Livelock Robert Grimm New York University The Three Questions What is the problem? What is new or different? What are the contributions and limitations? Motivation Interrupts work well when I/O
SMD149 - Operating Systems
SMD149 - Operating Systems Roland Parviainen November 3, 2005 1 / 45 Outline Overview 2 / 45 Process (tasks) are necessary for concurrency Instance of a program in execution Next invocation of the program
Intersection Acceleration
Advanced Computer Graphics Intersection Acceleration Matthias Teschner Computer Science Department University of Freiburg Outline introduction bounding volume hierarchies uniform grids kd-trees octrees
Latency on a Switched Ethernet Network
Page 1 of 6 1 Introduction This document serves to explain the sources of latency on a switched Ethernet network and describe how to calculate cumulative latency as well as provide some real world examples.
Chapter 6: DataLink Layer - Ethernet Olivier Bonaventure (2010)
Chapter 6: DataLink Layer - Ethernet Olivier Bonaventure (2010) 6.3.2. Ethernet Ethernet was designed in the 1970s at the Palo Alto Research Center [Metcalfe1976]. The first prototype [5] used a coaxial
Multicast can be implemented here
MPI Collective Operations over IP Multicast? Hsiang Ann Chen, Yvette O. Carrasco, and Amy W. Apon Computer Science and Computer Engineering University of Arkansas Fayetteville, Arkansas, U.S.A fhachen,yochoa,aapong@comp.uark.edu
An Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks
An Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks Ryan G. Lane Daniels Scott Xin Yuan Department of Computer Science Florida State University Tallahassee, FL 32306 {ryanlane,sdaniels,xyuan}@cs.fsu.edu
different problems from other networks ITU-T specified restricted initial set Limited number of overhead bits ATM forum Traffic Management
Traffic and Congestion Management in ATM 3BA33 David Lewis 3BA33 D.Lewis 2007 1 Traffic Control Objectives Optimise usage of network resources Network is a shared resource Over-utilisation -> congestion
Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition
Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time
OpenMP: Open Multiprocessing
OpenMP: Open Multiprocessing Erik Schnetter June 7, 2012, IHPC 2012, Iowa City Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to parallelise an existing code 4. Advanced
Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition
Chapter 7: Main Memory Operating System Concepts Essentials 8 th Edition Silberschatz, Galvin and Gagne 2011 Chapter 7: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure
Chapter 8 & Chapter 9 Main Memory & Virtual Memory
Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array
Measuring the Processing Performance of NetSniff
Measuring the Processing Performance of NetSniff Julie-Anne Bussiere *, Jason But Centre for Advanced Internet Architectures. Technical Report 050823A Swinburne University of Technology Melbourne, Australia
Monte Carlo Method on Parallel Computing. Jongsoon Kim
Monte Carlo Method on Parallel Computing Jongsoon Kim Introduction Monte Carlo methods Utilize random numbers to perform a statistical simulation of a physical problem Extremely time-consuming Inherently
Evaluating external network bandwidth load for Google Apps
Evaluating external network bandwidth load for Google Apps This document describes how to perform measurements to better understand how much network load will be caused by using a software as a service
AADECA 2004 XIX Congreso Argentino de Control Automático. Ethernet delay evaluation by an embedded real time Simulink model PC in asynchronous mode
Ethernet delay evaluation by an embedded real time Simulink model PC in asynchronous mode Mario R. Modesti, Luis R. Canali, Jorge C. Vaschetti Grupo de Investigaciones en Informática para Ingeniería, Universidad
What is a multi-model database and why use it?
What is a multi-model database and why use it? An When it comes to choosing the right technology for a new project, ongoing development or a full system upgrade, it can often be challenging to define the
Parallel Branch & Bound
Parallel Branch & Bound Bernard Gendron Université de Montréal gendron@iro.umontreal.ca Outline Mixed integer programming (MIP) and branch & bound (B&B) Linear programming (LP) based B&B Relaxation and
Splitting Algorithms
Splitting Algorithms We have seen that slotted Aloha has maximal throughput 1/e Now we will look at more sophisticated collision resolution techniques which have higher achievable throughput These techniques
Work Queue + Python. A Framework For Scalable Scientific Ensemble Applications
Work Queue + Python A Framework For Scalable Scientific Ensemble Applications Peter Bui, Dinesh Rajan, Badi Abdul-Wahid, Jesus Izaguirre, Douglas Thain University of Notre Dame Distributed Computing Examples
A Rule Chaining Architecture Using a Correlation Matrix Memory. James Austin, Stephen Hobson, Nathan Burles, and Simon O Keefe
A Rule Chaining Architecture Using a Correlation Matrix Memory James Austin, Stephen Hobson, Nathan Burles, and Simon O Keefe Advanced Computer Architectures Group, Department of Computer Science, University
In multiprogramming systems, processes share a common store. Processes need space for:
Memory Management In multiprogramming systems, processes share a common store. Processes need space for: code (instructions) static data (compiler initialized variables, strings, etc.) global data (global
Network protocols and. network systems INTRODUCTION CHAPTER
CHAPTER Network protocols and 2 network systems INTRODUCTION The technical area of telecommunications and networking is a mature area of engineering that has experienced significant contributions for more
Part IV. Chapter 15 - Introduction to MIMD Architectures
D. Sima, T. J. Fountain, P. Kacsuk dvanced Computer rchitectures Part IV. Chapter 15 - Introduction to MIMD rchitectures Thread and process-level parallel architectures are typically realised by MIMD (Multiple
Message Passing Interface (MPI)
What the course is: An introduction to parallel systems and their implementation using MPI A presentation of all the basic functions and types you are likely to need in MPI A collection of examples What
Praktikum 2014 Parallele Programmierung Universität Hamburg Dept. Informatics / Scientific Computing. October 23, FluidSim.
Praktikum 2014 Parallele Programmierung Universität Hamburg Dept. Informatics / Scientific Computing October 23, 2014 Paul Bienkowski Author 2bienkow@informatik.uni-hamburg.de Dr. Julian Kunkel Supervisor
Benchmarking CPU Performance
Benchmarking CPU Performance Many benchmarks available MHz (cycle speed of processor) MIPS (million instructions per second) Peak FLOPS Whetstone Stresses unoptimized scalar performance, since it is designed
3. Memory Management
Principles of Operating Systems CS 446/646 3. Memory Management René Doursat Department of Computer Science & Engineering University of Nevada, Reno Spring 2006 Principles of Operating Systems CS 446/646
Mesh-Based Content Routing Using XML
Outline Mesh-Based Content Routing Using XML Alex C. Snoeren, Kenneth Conley, and David K. Gifford MIT Laboratory for Computer Science Presented by: Jie Mao CS295-1 Fall 2005 2 Outline Motivation Motivation
Multiprocessor scheduling
Chapter 10 Multiprocessor scheduling When a computer system contains multiple processors, a few new issues arise. Multiprocessor systems can be categorized into the following: Loosely coupled or distributed.
Measurement-based Analysis of TCP/IP Processing Requirements
Measurement-based Analysis of TCP/IP Processing Requirements Srihari Makineni Ravi Iyer Communications Technology Lab Intel Corporation {srihari.makineni, ravishankar.iyer}@intel.com Abstract With the
CS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling
Early Evaluation of the "Infinite Memory Engine" Burst Buffer Solution
Early Evaluation of the "Infinite Memory Engine" Burst Buffer Solution Wolfram Schenck Faculty of Engineering and Mathematics, Bielefeld University of Applied Sciences, Bielefeld, Germany Salem El Sayed,
Parallel Computers. c R. Leduc
Parallel Computers Material based on B. Wilkinson et al., PARALLEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers c 2002-2004 R. Leduc Why Parallel Computing?
Internetworking With TCP/IP
Internetworking With TCP/IP Vol II: Design, Implementation, and Internals SECOND EDITION DOUGLAS E. COMER and DAVID L. STEVENS Department of Computer Sciences Purdue University West Lafayette, IN 47907
Data reduction for CORSIKA. Dominik Baack. Technical Report 06/2016. technische universität dortmund
Data reduction for CORSIKA Technical Report Dominik Baack 06/2016 technische universität dortmund Part of the work on this technical report has been supported by Deutsche Forschungsgemeinschaft (DFG) within
Remote Health Monitoring for an Embedded System
July 20, 2012 Remote Health Monitoring for an Embedded System Authors: Puneet Gupta, Kundan Kumar, Vishnu H Prasad 1/22/2014 2 Outline Background Background & Scope Requirements Key Challenges Introduction
Overview of Project's Achievements
PalDMC Parallelised Data Mining Components Final Presentation ESRIN, 12/01/2012 Overview of Project's Achievements page 1 Project Outline Project's objectives design and implement performance optimised,
Universal Communication Component on Symbian Series60 Platform
Universal Communication Component on Symbian Series60 Platform Róbert Kereskényi, Bertalan Forstner, Hassan Charaf Department of Automation and Applied Informatics Budapest University of Technology and
Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition
Chapter 8: Memory- Management Strategies Operating System Concepts 9 th Edition Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation
Chapter 8: Memory- Management Strategies
Chapter 8: Memory Management Strategies Chapter 8: Memory- Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and
CHAPTER 8: MEMORY MANAGEMENT. By I-Chen Lin Textbook: Operating System Concepts 9th Ed.
CHAPTER 8: MEMORY MANAGEMENT By I-Chen Lin Textbook: Operating System Concepts 9th Ed. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the
Basics of datacommunication
Data communication I Lecture 1 Course Introduction About the course Basics of datacommunication How is information transported between digital devices? Essential data communication protocols Insight into
Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin
CONTENT MODEL FOR MOBILE ADAPTATION OF MULTIMEDIA INFORMATION
CONTENT MODEL FOR MOBILE ADAPTATION OF MULTIMEDIA INFORMATION Maija Metso, Antti Koivisto and Jaakko Sauvola MediaTeam, MVMP Unit Infotech Oulu, University of Oulu e-mail: {maija.metso, antti.koivisto,
Grid-Based Genetic Algorithm Approach to Colour Image Segmentation
Grid-Based Genetic Algorithm Approach to Colour Image Segmentation Marco Gallotta Keri Woods Supervised by Audrey Mbogho Image Segmentation Identifying and extracting distinct, homogeneous regions from
Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy
Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Patrick Brown EE382C Embedded Software Systems May 10, 2000 $EVWUDFW MPEG Audio Layer-3 is a standard for the compression of high-quality digital audio.