Enhancing Integrated Layer Processing using Common Case. Anticipation and Data Dependence Analysis. Extended Abstract

Similar documents
Optimization Techniques for Parallel Protocol Implementation

On Parallelising and Optimising the. Stefan Leue, Member, IEEE, and Philippe Oechslin, Member, IEEE

Technische Universitat Munchen. Institut fur Informatik. D Munchen.

Assignment 4. Overview. Prof. Stewart Weiss. CSci 335 Software Design and Analysis III Assignment 4

Methods and Semantics for. der Philosophisch-naturwissenschaftlichen Fakultat. der Universitat Bern. vorgelegt von. Stefan Leue.

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

Solve the Data Flow Problem

SAMOS: an Active Object{Oriented Database System. Stella Gatziu, Klaus R. Dittrich. Database Technology Research Group

Natural Semantics [14] within the Centaur system [6], and the Typol formalism [8] which provides us with executable specications. The outcome of such

Proc. XVIII Conf. Latinoamericana de Informatica, PANEL'92, pages , August Timed automata have been proposed in [1, 8] to model nite-s

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

to automatically generate parallel code for many applications that periodically update shared data structures using commuting operations and/or manipu

A Bintree Representation of Generalized Binary. Digital Images

A Suite of Formal Denitions for Consistency Criteria. in Distributed Shared Memories Rennes Cedex (France) 1015 Lausanne (Switzerland)

Improving the Quality of Test Suites for Conformance. Tests by Using Message Sequence Charts. Abstract

Dewayne E. Perry. Abstract. An important ingredient in meeting today's market demands

Managing ANSA Objects with OSI Network Management Tools. on the joint ISO-ITU `Reference Model for Open Distributed. to OSI managers.

Autolink. A Tool for the Automatic and Semi-Automatic Test Generation

TCP over Wireless Networks Using Multiple. Saad Biaz Miten Mehta Steve West Nitin H. Vaidya. Texas A&M University. College Station, TX , USA

A Note on Fairness in I/O Automata. Judi Romijn and Frits Vaandrager CWI. Abstract

Interrupt Driven Programming in MSP430 Assembly (ESCAPE) *

2 J. Karvo et al. / Blocking of dynamic multicast connections Figure 1. Point to point (top) vs. point to multipoint, or multicast connections (bottom

\Classical" RSVP and IP over ATM. Steven Berson. April 10, Abstract

Storage System. Distributor. Network. Drive. Drive. Storage System. Controller. Controller. Disk. Disk

On the Performance Impact of. Supporting QoS Communication in TCP/IP Protocol Stacks

Design of it : an Aldor library to express parallel programs Extended Abstract Niklaus Mannhart Institute for Scientic Computing ETH-Zentrum CH-8092 Z

signature i-1 signature i instruction j j+1 branch adjustment value "if - path" initial value signature i signature j instruction exit signature j+1

Server 1 Server 2 CPU. mem I/O. allocate rec n read elem. n*47.0. n*20.0. select. n*1.0. write elem. n*26.5 send. n*

Server A Server B. Sequence Number (MB) Time (seconds)

We now begin to build models in DEVS-Scheme and organize them using the system entity

A multichannel Ethernet protocol for a WDM local area star network

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song

V 1. volume. time. t 1

Concurrent Objects and Linearizability

Static WCET Analysis: Methods and Tools

A Boolean Expression. Reachability Analysis or Bisimulation. Equation Solver. Boolean. equations.

Parallel Graph Algorithms

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University

Learning Outcomes. Processes and Threads. Major Requirements of an Operating System. Processes and Threads

Abstract formula. Net formula

when a process of the form if be then p else q is executed and also when an output action is performed. 1. Unnecessary substitution: Let p = c!25 c?x:

Mobile Computing An Browser. Grace Hai Yan Lo and Thomas Kunz fhylo, October, Abstract

Constraint Optimisation Problems. Constraint Optimisation. Cost Networks. Branch and Bound. Dynamic Programming

Global Scheduler. Global Issue. Global Retire

This example uses the or-operator ' '. Strings which are enclosed in angle brackets (like <N>, <sg>, and <pl> in the example) are multi-character symb

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T.

21. Distributed Algorithms

A Parallel Intermediate Representation based on. Lambda Expressions. Timothy A. Budd. Oregon State University. Corvallis, Oregon.

Maple on the Intel Paragon. Laurent Bernardin. Institut fur Wissenschaftliches Rechnen. ETH Zurich, Switzerland.

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme

A Freely Congurable Audio-Mixing Engine. M. Rosenthal, M. Klebl, A. Gunzinger, G. Troster

Distributed multimedia applications have dierent quality of service (QoS) requirements

Inter-Project Dependencies in Java Software Ecosystems

Main Points of the Computer Organization and System Software Module

Rance Cleaveland The Concurrency Factory is an integrated toolset for specication, simulation,

PROJECTION MODELING SIMPLIFICATION MARKER EXTRACTION DECISION. Image #k Partition #k

Henning Koch. Dept. of Computer Science. University of Darmstadt. Alexanderstr. 10. D Darmstadt. Germany. Keywords:

perform well on paths including satellite links. It is important to verify how the two ATM data services perform on satellite links. TCP is the most p

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Processes and Threads

Valmir C. Barbosa. Programa de Engenharia de Sistemas e Computac~ao, COPPE. Caixa Postal Abstract

Scalability Enhancements for. Connection-Oriented Networks. E. Gauthier J.-Y. Le Boudec. Laboratoire de Reseaux de Communication (LRC),

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano

RECONFIGURATION OF HIERARCHICAL TUPLE-SPACES: EXPERIMENTS WITH LINDA-POLYLITH. Computer Science Department and Institute. University of Maryland

C Language Programming through the ADC and the MSP430 (ESCAPE)

CPSC 320 Sample Solution, Playing with Graphs!

Ray shooting from convex ranges

req unit unit unit ack unit unit ack

TEMPORAL AND SPATIAL SEMANTIC MODELS FOR MULTIMEDIA PRESENTATIONS ABSTRACT

requests or displaying activities, hence they usually have soft deadlines, or no deadlines at all. Aperiodic tasks with hard deadlines are called spor

Monitoring Script. Event Recognizer

Computer Technology Institute. Patras, Greece. In this paper we present a user{friendly framework and a

under Timing Constraints David Filo David Ku Claudionor N. Coelho, Jr. Giovanni De Micheli

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

Hiroshi Nakashima Yasutaka Takeda Katsuto Nakajima. Hideki Andou Kiyohiro Furutani. typing and dereference are the most unique features of

Synchronization Expressions: Characterization Results and. Implementation. Kai Salomaa y Sheng Yu y. Abstract

How useful is the UML profile SPT without Semantics? 1

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Application Programm 1

Low pass filter/over drop avoidance (LPF/ODA): an algorithm to improve the response time of RED gateways

On the Diculty of Software Key Escrow. Abstract. At Eurocrypt'95, Desmedt suggested a scheme which allows individuals to encrypt

Worker-Checker A Framework for Run-time. National Tsing Hua University. Hsinchu, Taiwan 300, R.O.C.

Centre for Parallel Computing, University of Westminster, London, W1M 8JS

Simultaneous Multithreading: a Platform for Next Generation Processors

The Priority Queue as an Example of Hardware/Software Codesign. Flemming Heg, Niels Mellergaard, and Jrgen Staunstrup. Department of Computer Science

Stability in ATM Networks. network.

Timer DRAM DDD FM9001 DRAM

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals

CSC148, Lab #4. General rules. Overview. Tracing recursion. Greatest Common Denominator GCD

Toward a Time-Scale Based Framework for ABR Trac Management. Tamer Dag Ioannis Stavrakakis. 409 Dana Research Building, 360 Huntington Avenue

EC EMBEDDED AND REAL TIME SYSTEMS

Reuse Contracts As Component Interface. Descriptions. Koen De Hondt, Carine Lucas, and Patrick Steyaert. Programming Technology Lab

Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log

ARQo: The Architecture for an ARQ Static Query Optimizer

Toward the joint design of electronic and optical layer protection

Global Transactions Global Transaction Global Manager Transaction Interface Global Global Scheduler Recovery Manager Global Log Server Log

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

UNIT I (Two Marks Questions & Answers)

Two Image-Template Operations for Binary Image Processing. Hongchi Shi. Department of Computer Engineering and Computer Science

Determining the Number of CPUs for Query Processing

Transcription:

Enhancing Integrated Layer Processing using Common Case Anticipation and Data Dependence Analysis Extended Abstract Philippe Oechslin Computer Networking Lab Swiss Federal Institute of Technology DI-LTI EPFL CH-1015 Lausanne, Switzerland Telephone: +41 21 693 47 08 Fax: +41 21 693 66 00 Email: oechslin@ltisun.ep.ch Stefan Leue Institute for Informatics University of Berne Langgassstrasse 51 CH-3012 Berne, Switzerland Telephone: +41 31 631 44 30 Fax: +41 31 631 39 65 Email: leue@iam.unibe.ch 1 Introduction Integrated Layer Processing (ILP) has been recognized as one ecient way to speed up the execution of protocols (for an introduction to ILP see [CT90]). Protocols, however, are still being specied in distinct layers and it is the job of the implementor to use the concepts of ILP with his intuition when designing an implementation of a protocol stack. In this paper we want to discuss under which conditions ILP can be applied to a given protocol stack and how an implementation can be structured to allow more ILP. Based on the methods discussed in this paper we intend to develop automated tools which lead from a formal specication of a protocol stack to a structured implementation allowing a maximum of eciency gained by ILP. The methods have mainly been published in [LO93] and [LO94b]. 1.1 Classical Protocol Stack Model A protocol stack is usually specied as a set of concurrent processes each of which represents the functionality of one layer of the protocol stack. A packet is usually received at one end of the protocol stack and processed by each layer process in turn until it reaches the other end of the protocol stack. Since the dierent layer processes are concurrent several packets could be processed at the same time in a protocol stack, each one being in a dierent layer. This kind of processing is referred to as pipelining. 1

2 ILP and High Speed Protocols Integrated layer processing consists in avoiding pipelining and processing each packet by its own. The processing steps in each layer are combined to form a unique thread of processing from where the packet enters the protocol stack to where it exits. Especially for data manipulation operations a combined execution of operations along the packet processing thread can be much more ecient than the separate executions of dierent processing steps. Moreover, depending on the type of implementation, the actual passing of packets from one layer to an adjacent layer can induce an overhead which is avoided when doing ILP. We conjecture that high speed protocols can be dened as being those protocols in which the machine doing the processing is the bottleneck. With this denition TCP over Ethernet has been a high speed protocol at the time where it rst appeared as are ATM based protocols nowadays. processing machine. In both cases ILP is necessary to reduce the excessive load on the protocol 2.1 Conditions for ecient ILP A rst condition for ecient ILP is that the processing of a packet through a protocol stack may not be blocking. If for example a packet can not be processed in the last layer of a protocol stack because an acknowledgement is pending, an integrated processing of all layers is not possible until the missing acknowledgment is received. In such a case it is more ecient to process all non-blocking layers while waiting for the acknowledgment and then to have only the last layer to process once the acknowledgment has been received. The second condition for the possibility of ecient ILP is that the operations in the dierent layers can actually be combined into more ecient ones. For this to be true there may be no (control or data) dependences that enforce a sequential execution of operations. If for example a rst operation in one layer depends upon a result which in turn depends on the execution of a second operation then the two operations must be executed sequentially and cannot be combined. 2.2 Enhancing ILP From the rst condition we see that protocols should be designed in such a way that there should be no blocking within the protocol stack. A blocking wait on an acknowledgement is typically avoided by using a sliding window mechanism. If a protocol is designed in such a way that the machine has to spend time waiting for some event to occur then it doesn't match our denition of high speed protocol and ILP is not the correct way of gaining eciency. From the second condition we see that dependences are a crucial point in integrated processing. Thus we propose a method which starts with a dependence analysis of the protocol stack and allows to detect and reduce the obstructive dependences. Our method is 2

based on SDL specications of protocol stacks for which it generates data and control ow dependence graphs. For details see [LO94a] and [LO94b]. 3 Dependence graphs for protocol stacks Each layer of a protocol stack in an SDL specication typically consists of a process dened as an extended nite state machine. For every possible transition of this machine a sequence of operations is specied. This sequence may include decisions and can thus be branching. There are two kinds of dependences between operations of a transition: data dependences and control dependences. Control dependences represent the sequence in which the operations are specied and the decisions that are made during execution. Data dependences represent the fact that the result of one operation has to be known before another operation can be executed correctly. For a denition of data and control dependences see [PKL80]. The dependences between the operations in a transition are best described using a dependence graph. The dependence graph for one transition (TDG) is found by a syntactical analysis of the statements executed in the course of one transition. One dependence graph is created for each entry point (point at which a packet is received from the environment) of the protocol stack. The graph is composed by concatenating TDGs of the transitions that can be triggered from the point where a packet is received at one end of the stack to the point where it is either destroyed or delivered at the other end. As decisions are taken during processing of the packet, this graph has the shape of a tree. The concatenation of the transitions of the dierent processes does not in general correspond to the SDL model of processes communicating through asynchronous queues. However, we argue that the concatenated execution of the transitions corresponds to one possible interleaving of the execution of the SDL model and that an implementation allowing just this one sequence of operations along the concatenated structure is thus valid. 4 Common/uncommon partitioning The major part of a protocol usually serves to detect and recover from many kinds of exceptions and errors. However, in typical high speed environments (optical transmission) errors tend to be very rare. Thus a large part of a protocol is executed only in very few cases whereas the part executed most often is only a small fraction of the protocol's code. Which part of a protocol is common can be found either by simulation or by statistical analysis of a working model. Many assumptions can also be based on common sense reasoning (e.g., in a le transfer protocol the establishment of a connection is uncommon whereas the transfer of data is common). We partition our dependence graphs into common and uncommon parts by examining each decision and dening which evaluation of the decisions are common or 3

uncommon. All nodes which depend transitively of some uncommon decision are marked uncommon and removed from the graph. The resulting graph is called the Common Path Graph CPG. 5 Avoiding decisions through anticipation Decisions introduce 'hard' dependences similar to data dependences in the way that no next operation can be executed before the decision has been evaluated itself. Avoiding decisions thus reduces these dependences and enhances ILP. By anticipating the common case all decisions that have only one common outcome can be ignored. The consequence of this is that protocol processing can then be much more ecient for common cases. Another consequence is that inconsistencies appear in the uncommon cases. To treat these inconsistencies some precautions have to be taken. 6 Relaxing the dependences The basic idea is that we want to eliminate control ow dependences which impose sequential processing of operations as far as possible. Thus we generate a relaxed dependence graph which consists only of the data ow dependences of the complete dependence graph. In order to ensure that the resulting graph structure is connected and that all decisions are respected we have to introduce some auxiliary dependences. 7 Grouping Data Manipulation Operations (DMOs) The joint execution of two or more operations on the same data has been proven to greatly enhance the eciency of a protocol as the total number of data relocation operations can be reduced. We propose an algorithm which is a generalization of the methods proposed in [AP93] and [OP91]. It has the advantage of combining DMOs at compile time, taking into account all combinations that may be needed at run time. 8 A small example In the full paper we give a small example with an SDL specication and all the dependence graphs. 9 Conclusion After having discussed some limiting factors for ILP we have explained a formal method which allows to derive an optimized ILP protocol implementation from the formal SDL specication 4

of a protocol. Starting with a dependence analysis we create a common path graph for each entry point of the protocol stack. This graph represents the processing that has to be carried out in the common case in all protocol layers until the packet can be delivered to the environment. Considering only the common case not only reduces the complexity of this graph but also allows to anticipate the outcome of some decisions and consequently to reduce control dependences. The CPG is transformed to reect only the data dependences and some control dependences that have to be respected in the common case which increases the potential parallelism. The transformed dependence graph is then the starting point for an algorithm that allows to generate, at compile time, all the combinations of DMOs necessary for the execution of the common cases. We believe that our method is a contribution towards the integration of ILP concepts into automatic implementation methods based on formal specications. References [AP93] M. Abbot and L. Peterson. Increasing network throughput by integerating protocol layers. IEEE/ACM Transactions on Networking, 1(5):600{610, october 1993. [CT90] D. D. Clark and D. L. Tennenhouse. Architectural considerations for a new generation of protocols. In Proceedings of the ACM SIGCOMM '90 conference, Computer Communication Review, pages 200{208, 1990. [LO93] Stefan Leue and Philippe Oechslin. Optimization techniques for parallel protocol implementation. In Future Trends of Distributed Computing Systems, pages 387{ 393. IEEE, 1993. [LO94a] S. Leue and Ph. Oechslin. A formal approach to optimized parallel protocol implementation. Technical report, University of Berne, Institute for Informatics, 1994. [LO94b] S. Leue and Ph. Oechslin. Formalizations and algorithms for optimized parallel protocol implementation. In International Conference on Network Protocols, 1994. to appear. [OP91] S. W. O'Malley and L. L. Peterson. A highly layered architecture for high-speed networks. In M. J. Johnson, editor, Protocols for High Speed Networks II, pages 141{156. Elsevier Science Publishers (North-Holland), 1991. [PKL80] D. Padua, D. Kuck, and D. Lawrie. High-speed multiprocessors and compilation techniques. IEEE Transactions on Computers, 29(9):763{776, September 1980. 5