Parallel Solutions of the Longest Increasing Subsequence Problem Using Pipelined Optical Bus Systems

Similar documents
Simulating a Pipelined Reconfigurable Mesh on a Linear Array with a Reconfigurable Pipelined Bus System

Sorting and Selection on a Linear Array with Optical Bus System

Sorting on Linear Arrays. Xuan Guo

A Two-Level Search Strategy for Packing Unequal Circles into a Circle Container

Sankalchand Patel College of Engineering - Visnagar Department of Computer Engineering and Information Technology. Assignment

Group Mutual Exclusion in Token Rings

CS141: Intermediate Data Structures and Algorithms Dynamic Programming

Re-configurable VLIW processor for streaming data

A more efficient algorithm for perfect sorting by reversals

An Efficient List-Ranking Algorithm on a Reconfigurable Mesh with Shift Switching

COMP Analysis of Algorithms & Data Structures

How invariants help writing loops Author: Sander Kooijmans Document version: 1.0

A New Architecture for Multihop Optical Networks

Verifying a Border Array in Linear Time

An Improved Upper Bound for the Sum-free Subset Constant

Binary search trees. Binary search trees are data structures based on binary trees that support operations on dynamic sets.

Efficient validation and construction of border arrays

Dynamic Programming Algorithms

Testing Isomorphism of Strongly Regular Graphs

Distributed minimum spanning tree problem

Midterm solutions. n f 3 (n) = 3

Multi-Cluster Interleaving on Paths and Cycles

A FAST LONGEST COMMON SUBSEQUENCE ALGORITHM FOR BIOSEQUENCES ALIGNMENT

Lecture 1. 1 Notation

Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Data Communication and Parallel Computing on Twisted Hypercubes

A Theory of Parallel Computation The π-calculus

Search Trees. Undirected graph Directed graph Tree Binary search tree

Cache-Oblivious Traversals of an Array s Pairs

Dynamic Programming. Design and Analysis of Algorithms. Entwurf und Analyse von Algorithmen. Irene Parada. Design and Analysis of Algorithms

AXIOMS FOR THE INTEGERS

Problem Set 5 Due: Friday, November 2

CSC236 Week 5. Larry Zhang

Data Structure and Algorithm Midterm Reference Solution TA

Fundamental mathematical techniques reviewed: Mathematical induction Recursion. Typically taught in courses such as Calculus and Discrete Mathematics.

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

We will give examples for each of the following commonly used algorithm design techniques:

Parallel Evaluation of Hopfield Neural Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

MOST attention in the literature of network codes has

2.3.4 Optimal paths in directed acyclic graphs

SHARED MEMORY VS DISTRIBUTED MEMORY

Scheduling with Bus Access Optimization for Distributed Embedded Systems

An Algorithm for k-pairwise Cluster-fault-tolerant Disjoint Paths in a Burnt Pancake Graph

MATH Iris Loeb.

Unavoidable Constraints and Collision Avoidance Techniques in Performance Evaluation of Asynchronous Transmission WDMA Protocols

Student number: Datenstrukturen & Algorithmen page 1

Realizing Common Communication Patterns in Partitioned Optical Passive Stars (POPS) Networks

Operations on Heap Tree The major operations required to be performed on a heap tree are Insertion, Deletion, and Merging.

Parallel algorithms for generating combinatorial objects on linear processor arrays with reconfigurable bus systems*

Algorithms IV. Dynamic Programming. Guoqiang Li. School of Software, Shanghai Jiao Tong University

Binary search trees 3. Binary search trees. Binary search trees 2. Reading: Cormen et al, Sections 12.1 to 12.3

STRAIGHT LINE ORTHOGONAL DRAWINGS OF COMPLETE TERNERY TREES SPUR FINAL PAPER, SUMMER July 29, 2015

CPSC 320: Intermediate Algorithm Design and Analysis. Tutorial: Week 3

6.001 Notes: Section 4.1

Algorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48

CS 6402 DESIGN AND ANALYSIS OF ALGORITHMS QUESTION BANK

Lower Bound on Comparison-based Sorting

Models and Algorithms for Optical and Optoelectronic Parallel Computers

Problem with Scanning an Infix Expression

Dynamic Programming Algorithms

CSc 225 Algorithms and Data Structures I Case Studies

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Selection (deterministic & randomized): finding the median in linear time

: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day

Mathematics and Symmetry: A Bridge to Understanding

Lecture 10. Sequence alignments

OPTIMAL MULTI-CHANNEL ASSIGNMENTS IN VEHICULAR AD-HOC NETWORKS

HOMEWORK 1: Solutions

Lecture 5: Formation Tree and Parsing Algorithm

Virtual Topologies for Multicasting with Multiple Originators in WDM Networks

1 Non greedy algorithms (which we should have covered

1 More on the Bellman-Ford Algorithm

Node-Disjoint Paths in Hierarchical Hypercube Networks

Notes on Turing s Theorem and Computability

An Optimal Parallel Algorithm for Merging using Multiselection

Exploring Multiple Paths using Link Utilization in Computer Networks

Lecture 2: Getting Started

1 Definition of Reduction

1 i n (p i + r n i ) (Note that by allowing i to be n, we handle the case where the rod is not cut at all.)

On the interconnection of message passing systems

LEARNING OF GEOMETRY SUPPORTED BY THE PROGRAM CABRI

CS 350 Final Algorithms and Complexity. It is recommended that you read through the exam before you begin. Answer all questions in the space provided.

Outline. Introduction. 2 Proof of Correctness. 3 Final Notes. Precondition P 1 : Inputs include

CS2110 Assignment 2 Lists, Induction, Recursion and Parsing, Summer

Job Re-Packing for Enhancing the Performance of Gang Scheduling

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

State-Optimal Snap-Stabilizing PIF In Tree Networks

Lecture 7: Primitive Recursion is Turing Computable. Michael Beeson

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri

Number Systems Using and Converting Between Decimal, Binary, Octal and Hexadecimal Number Systems

DATA STRUCTURES AND ALGORITHMS

Computer Science 236 Fall Nov. 11, 2010

Lecture 9 March 4, 2010

Digital Halftoning Algorithm Based o Space-Filling Curve

CS2 Algorithms and Data Structures Note 1

VERTEX MAPS FOR TREES: ALGEBRA AND PERIODS OF PERIODIC ORBITS. Chris Bernhardt

MATH10001 Mathematical Workshop. Graphs, Trees and Algorithms Part 2. Trees. From Trees to Prüfer Codes

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Transcription:

Parallel Solutions of the Longest Increasing Subsequence Problem Using Pipelined Optical Bus Systems David SEME and Sidney YOULOU LaRIA, Université de Picardie Jules Verne, CURI, 5, rue du Moulin Neuf, 80000 Amiens, France, e-mail: {seme, youlou}@laria.u-picardie.fr Abstract In this paper we give parallel solutions to the problem of finding the Longest Increasing Subsequence of a given sequence of n integers. First, we show the existence of a simple dynamic programming solution. Its running time is Θ(n ) and its space requirement is Θ(n). Then, we continue by showing that it is possible to develop two parallel solutions based on optical bus system of n processors, one using Θ(n) communication cycles and the other using Θ(1) communication cycle. 1 Introduction Since the last decade, lots of articles have been written about optical interconnections, they were either dealing with architectures or with algorithms. The reason of this growing success is due to the characteristics of the optical fibre. The use of an optical bus, instead of an electrical one, allows processors to have a concurrent access to the bus and so the transfer of their data can be performed in a pipeline way. Several models using an optical bus have been proposed in the literature. The most famous are: the Linear Array with Reconfigurable Pipelined Bus System (LARPBS) [1], the Array with Reconfigurable Optical Bus (AROB) [], the Pipelined Optical Bus (POB) [3], the Linear Pipelined Bus (LPB) [4] and most recently the Restricted Linear Array with Reconfigurable Pipelined Bus (R- LARPBS) [5]. There exists many other models using optical bus that we do not list here but in [6] authors proved that some models are equivalent. In any case, the main interest of all the models remains the use of the pipelined optical bus properties. In this paper, we use them in order to solve a wellknown problem of combinatorics [7, 8], the longest increasing subsequence problem (also called LIS problem for short). This problem is very interesting and a basis to solve some other problems such as the Longest Common Subsequence (LCS) or the Longest Increasing Chain (LIC) problems. There exists some parallel solutions to the LIS problem as proposed in [9] on the CGM (Coarse Grained Multicomputers) model and in [10] on a linear systolic array. In the first section, we give a description of the model and its properties. In the second section, we define the Longest Increasing Subsequence problem. Section 4 and 5 include respectively a Θ(n) cycle communication solution and a Θ(1) communication cycle solution on a linear array with an optical bus. Section 6 is dedicated to a discussion about the two solutions presented in previous sections and we finally conclude on our perspectives in section 7. The computation model In this section, we describe the model used to perform our algorithms. As we focus on the properties of an optical bus and for very high simplicity, we present a really basic model compared to all those listed below. Our approach is as architectural than algorithmical. So, we give the important details needed to know how is this model. We consider a linear array of n processors and an unidirectional optical bus. The connection between them is made by two directional couplers, one for transmitting data from the processor on the upper segment and the other for receiving data from the bus on the lower segment as shown in figure 1. The optical bus is a waveguide on which the messages circulate between processors. Several messages can circulate on the bus at the same time in a pipelined

way. In order to have the same propagation delays d... transmitting segment... 0 1 i n 1 receiving segment Figure 1: A linear array of processors with an optical bus between consecutive processors, the length of the optical fibre between those processors is the same. This condition and the directionality of the signal propagation allow the bus to carry several messages from processors at the same time. Let a bus cycle be the end-to-end propagation delay on the bus (we omit the time to process messages). During a cycle it appears essential to avoid overlapping between messages. To ensure this, all transmissions must be synchronised and the length of a message on the bus must be lower than or equal to the length of the optical fibre between two consecutive processors, called distance d on figure 1. In other words let b be the number of bits forming a message. Consider that each bit is represented by an optical signal of width w seconds for a binary value of 1 and an absence of this signal for 0, the last condition can be seen as follow: the length of optical path between any two consecutive processors is larger than or equal to bwcg where cg is the velocity of light in the waveguide []. In our algorithms we use the technique called timedivision source-oriented multiplexing (TDSM) to route messages developed in [11]. It can be used when the receiver knows the address of the sender. This method introduces a function, wait(i,j ), which controls the time the processor i needs to wait before receiving a message from a processor j. Let τ be the time for a message to traverse the optical distance d (see figure 1), a message from j to i must go from processor j to processor 0 and then from processor 0 to processor i. So, we have wait(i,j ) = (i + j)τ. This time is relative to the beginning of a bus cycle. With this method, all basic communications are allowed (one to one, broadcasting, multicasting...). Note that the distance d must be less than or equal to bwcg in order that two messages sent by two consecutive processors cannot be overlapped. 3 The longest increasing subsequence problem Let us consider a sequence L = x 1, x,..., x n of n integers. A subsequence L of L is obtained by deleting one or more of those integers. For example: L={4, 7, 8, 5, 6, 1, 9, 11}, L ={4, 7, 1, 11} is a subsequence of L. The longest increasing subsequence (LIS) problem is to find a subsequence of maximal length in which the integers are in increasing order. In the example L ={4, 5, 6, 9, 11} is a LIS of L and we notice that L ={4, 7, 8, 9, 11} is also a LIS of L. A LIS of any sequence is not necessary unique. Formally this problem is defined as follow: Let L = x 1, x,..., x n be a sequence of n distinct integers. An increasing subsequence of length l is a sequence x f(1), x f(),..., x f(l) where i, j, 1 i < j l f(i) < f(j) and x f(i) < x f(j). A longest increasing subsequence is one of maximal length. The problem seems to not be really difficult according to its definition. That was probably what thought Dijkstra s students when he asked them just to find the length of the longest increasing subsequence in a sequence of integers in July 1978 at Marktoberdorf s school. Just a few of them had been able to solve the problem. Nowadays, this exercise remains a useful didactic example for teachers of dynamic programming. In particular, it shows how to strengthen an induction hypothesis in a very explicit way. The main difficulty of this problem lies in the formulation of this induction hypothesis. Finding the longest increasing subsequence is performed in time Θ(n log n) sequentially. In [10], Crin and al. gave a solution on a systolic array of n processors that runs in time Θ(n). The main idea is to use a list to store the intermediate solutions and proceeding by successive insertions into it. To achieve this, the first processor is considered as a master processor with a different program than the others. This systolic architecture also requires four communication links between consecutive cells. In the next section, we describe a solution on a linear array with an optical bus that overcomes this by using another approach. 4 A Θ(n) communication cycles solution 4.1 Approach description The longest increasing subsequence is a sequence of integers, algorithmically, it means that it is a list. A way to create such a list is to assume that each

element knows its predecessor in the list. The main idea is that each integer should find its best predecessor in the list. For any integer of the initial sequence, the best predecessor is necessarily an integer with a smaller value and as a predecessor, it is obvious that it has a smaller rank in the initial sequence. Let us consider an initial sequence L = x 0, x 1,..., x n 1 and all processors, numbered from 0 to n-1, carry a unique integer of L. We will now say best predecessor to denote the processor instead of the integer it carries. So, any processor i carrying integer x i will perform an algorithm that searches, from processor 0 to processor i-1, its best predecessor. Let us formulate, now, the criteria required for any processor j (such that j < i) to be its best predecessor. Then, the definition of our induction hypothesis is: the best predecessor of processor i, carrying x i, is the processor j containing x j the last element of the longest increasing subsequence of the sequence x 0,..., x i 1 such that x j is smaller than x i. According to this formulation, it is necessary for a processor to know the length of the increasing subsequence of its predecessors. Initially, every processor is in an increasing subsequence of length 1 (composed by itself). Recursively, a processor i will increase its associated length value l i if and only if there is a processor j < i such that l j l i (l j is the associated length value of processor j) that carries a smaller integer. The processor j becomes temporally the best predecessor of processor i and the increasing length of the increasing subsequence ending by x i (on processor i) is l j + 1. After that, processor i will repeat this operation with the values of processor j + 1, and so on. On our model, every cycle, a processor finds its best predecessor. It means that at the k th cycle (k < n 1), the processor k-1 has found its best predecessor and can send its values to all its successors to allow them to finish their computation. The values sent by processor i are its value x i and its associated length l i. As you can see on Algorithm 1, the primitive send allows a processor to write a message on the bus and the primitive get is for reading a message from it. integer : id /* id of the processor integer : pred = nil /* value of the best predecessor integer: proc pred /* id of the best predecessor integer: value /* associated value of the processor integer: value received /* value received from the bus integer: length = 1 /* length of the subsequence ended by the associated value of the processor integer: length received /* length received from the bus integer: n /* number of processors integer: i BEGIN for i 0 to id do emission phase: if ((id = i) and (id n-1)) then send(value, length) receiving phase: wait(id, i) get(value received, length received) If (value < value received and length received length) then pred value received proc pred i length length received+1 endfor END Algorithm 1: A Θ(n) communication cycles solution for the LIS problem At the end of the computation, each processor has the length of the increasing sequence ending by it and the address of its predecessor in this sequence. The length is also its rank in the list. We give now a few hints for the retrieving of the longest increasing subsequence. The last element is on the processor having the biggest length value. Note that according to what is written in the previous section, this processor is not necessarily unique. In this case, we choose arbitrarily the one with the smallest associated value. When it is done, the last processor sends a message to its predecessor to alert it, hey, you re in the longest increasing subsequence, tell it to your predecessor. 4. Time complexity The time complexity on such model is defined by the number of required cycles multiplied by the local

time complexity of the processors. It is obvious that the number of cycles is Θ(n). The local time complexity is constant because either the emission phase or the reception phase are performed in constant time. So, the global algorithm runs in time Θ(n). 5 A Θ(1) communication cycle solution 5.1 Approach description The previous solution is simple compared to those with the same time complexity that you may find in the literature. However, one of the main advantages of the model is that several data can circulate on the bus at the same time in a pipeline way. For this solution there is only one datum on the bus per cycle. We give here another solution that takes this remark in consideration. The same approach is used but now, instead of waiting several cycles to get the best predecessor, this operation is done in only one cycle. All processors send their data on the bus at the beginning of the cycle. The data circulate on the bus in a pipeline way. The processors can read a value on the bus, perform a computation, then read the next value and so on. But some precautions are necessary to be able to do that. A processor must have enough time to perform its computation between the reading of two consecutive values on the bus. We introduce a delay h between two consecutive messages. According to the notation in section, d bw, i.e. a message on the bus is lower than or equal to the length of optical path between two consecutive processors. Now we have d bw + h. The code of this solution is done by Algorithm. 5. Time complexity The solution retrieval remains the same. The number of cycle is Θ(1) and the receiving phase is performed in time Θ(n ), so the whole algorithms runs in times Θ(n ). The space complexity is Θ(n). 6 Discussion integer : id /* id of the processor integer : pred = nil /* value of the best predecessor integer: proc pred /* id of the best predecessor integer: value /* associated value of the processor integer: value received /* value received from the bus integer: length = 1 /* length of the subsequence ended by the associated value of the processor integer: n /* number of processors struct couple value : integer length = 1 : integer end couple : tab[0..n] integer: i, j BEGIN emission phase: send(value) receiving phase: for i 0 to id do wait(id, i) get(value received) tab[i].value value received for j 0 to i do if (tab[i] > tab [j] and tab[i].length > tab[j].length) then tab[i].length tab[j].length + 1 if (i = id) then pred tab[j].value proc pred j length tab[i].length endfor endfor END Algorithm : A Θ(1) communication cycle algorithm for the LIS problem In this section, we compare the two algorithms proposed previously. First, we notice that the Θ(1) communication cycle algorithm can be used in regard of some conditions while the Θ(n) communication cycle algorithm can always be used. Let t c be the computation time of the emission

phase of the algorithm 1 corresponding to an elementary operation called µ. As defined in section, d is the distance between two consecutive processors. The number of elementary operations µ in the Θ(1) communication cycle algorithm is n (n 1). This algorithm can only be used if n (n 1) t c d. Because n (n 1) t c is the time needed for a message processing and d is the time between two consecutive messages. As a message is processing in the same time of the next message sending, the time complexity of the Θ(1) communication cycle algorithm is (n 1) d. This time corresponds to the communication time of a message from processor n 1 to processor n. We call T 1 this time complexity. The minimum distance d between two consecutive processors is at least equal to bw as defined in section. When we consider this minimum distance, we have T 1 = (n 1) bw. Now we consider the Θ(n) communication cycle algorithm in the same conditions, i.e. n (n 1) t c d. The time complexity of the Θ(n) communication cycle algorithm is T n = (n 1) (n d + t c ). This corresponds to the communication time (n 1) n d added to the computation time (n 1) t c. As presented in section 4, in the Θ(n) communication cycle algorithm, each processor communicates a couple of integers. Then, the minimum distance d between two consecutive processors is at least equal to bw. When we consider this minimum distance, we have T n = (n 1) (n bw + t c ). As T 1 < T n, we can conclude that the Θ(1) communication cycle algorithm is better than the Θ(n) communication cycle algorithm for n (n 1) t c d. 7 Conclusion In this paper we gave parallel solutions of the problem of finding the Longest Increasing Subsequence of a given sequence of n integers. We first showed the existence of a simple dynamic programming solution. Its running time is Θ(n ) and its space requirement is Θ(n). Then, we continued by showing that it is possible to develop two parallel solutions based on optical bus system of n processors, one using Θ(n) communication cycles and the other using Θ(1) communication cycle. Finally, we compared the two proposed algorithms and showed that the Θ(1) communication cycle algorithm can only be used if n (n 1) t c d while the Θ(n) communication cycle algorithm can always be used. Notice that the Θ(1) communication cycle algorithm is a better pipelined solution than the Θ(n) communication cycle algorithm. It will be interesting to make some experimental results in order to know which is the best solution in practice. References [1] S. Sahmi, Models and algorithms for optical and optoelectronic parallel computers, International journal of Computer Science, vol. 1, no. 3, pp. 49 64, 001. [] S. Pavel and S. G. Akl, On the power of arrays with optical pipelined bus, in Proceedings of the International Conference of Parallel and Distributed Processing Techniques and Applications, (Sunnydale, California), pp. 1443 1454, 1996. [3] S. Q. Zeng and Y. Li, Pipelined asynchronous time-division multiplexing optical bus, Optical Engineering, vol. 36, pp. 339 3400, 1997. [4] Y. Pan, Order statistics on a linear array with a reconfigurable bus, Future Generation Computer Systems, vol. 11, pp. 31 38, 1995. [5] Y. Pan, Computing on the restricted larpbs model, Lecture Notes in Computer Science, vol. 745, pp. 9 13, 003. [6] J. L. Trahan, A. G. Bourgeois, Y. Pan, and R. Vaidyanathan, Optimally scaling permutation routing on reconfigurable arrays with optically pipelined buses, journal of parallel an distributed computing, vol. 60, no. 9, pp. 115 1136, 000. [7] A. Aldous and P. Diaconis, Longest increasing subsequence: From patience sorting to the baik-deif-johansen theorem, Bulletin of the American Mathematical Society, vol. 36, pp. 413 43, 1999. [8] S. Bespamyatnikh and M. Seagal, Enumerating longest increasing subsequence and patience sorting, Information Processing Letters, vol. 76. [9] T. Garcia, D. Sem, and J.-F. Myoupo, A work optimal cgm algorithm for the longest increasing subsequence problem, in International Conference on Parallel and Distributed Processing Techniques and Applications, pp. 563 569, 001.

[10] C. Crin, C. Dufourd, and J.-F. Myoupo, An efficient parallel solution for the longest increasing subsequence problem, in Fifth International Conference on Computing and Information, pp. 0 4, 1993. [11] C. Qiao and R. Melhem, Time-division communications in multiprocessor arrays, IEEE Transactions on Computers, vol. 4, no. 5, pp. 577 590, 1993.