OASys: An AND/OR Parallel Logic Programming System

Size: px
Start display at page:

Download "OASys: An AND/OR Parallel Logic Programming System"

Transcription

1 OASys: An AND/OR Parallel Logic Programming System I.Vlahavas (1), P.Kefalas (2) and C.Halatsis (3) (1) Dept. of Informatics, Aristotle Univ. of Thessaloniki, Thessaloniki, Greece Fax: , URL: (2) Dept. of Computer Science, City College, Affiliated Institution of the University of Shefield, Thessaloniki, Greece Fax: , (3) Dept. of Informatics, University of Athens, Athens, Greece Fax: ,

2 Abstract - The OASys system is a software implementation designed for AND/OR-parallel execution of logic programs. In order to combine these two types of parallelism, OASys considers each alternative path as a totally independent computation (leading to OR-parallelism) which consists of a conjunction of determinate subgoals (leading to ANDparallelism). This computation model is motivated by the need for the elimination of communication between processing elements. OASys aims towards a distributed memory architecture in which the processing elements performing the OR-parallel computation possess their own address space while other simple processing units are assigned with AND-parallel computation and share the same address space. OASys execution is based on distributed scheduling which allows either recomputation of paths or environment copying. We discuss in detail the OASys execution scheme and we demonstrate OASys effectiveness by presenting the results obtained by a prototype implementation, running on a network of workstations. The results show that speedup obtained by AND/OR-parallelism is greater than the speedups obtained by exploiting AND or OR parallelism alone. In addition, comparative performance measurements show that copying has a minor advantage over recomputation. Keywords: Prolog Implementation, AND/OR parallelism, Copying, Recomputation,g

3 INTRODUCTION Exploitation of parallelism in logic programs has drawn much attention to the logic programming community. The aim is to exploit the inherent parallelism embodied in the declarative semantics of logic programs. Various methods have been implemented to enable parallel execution of Prolog programs while not affecting the language semantics. The procedural semantics of sequential Prolog is described by a fixed execution sequence of the program clauses, i.e. a top-down, left-to-right traversal of the AND/OR tree built while trying to reduce a given query. Prolog execution strategy uses chronological backtracking in order to explore all possible ways of reducing a goal. The principle idea behind parallel implementations of Prolog is to traverse the execution AND/OR tree in a parallel mode. Alternative paths are independent since they represent different ways of solving a goal and therefore can be simultaneously searched (OR-parallelism). In order to solve a conjunction of goals, the different paths starting from each goal can be independently searched (AND-parallelism) as long as there exist a reconciliation method to resolve conflicting bindings. During the last decade, research was mainly concentrated in either exploiting OR-parallelism [1, 16, 14] or various forms of AND-parallelism [11, 13, 17] reporting promising results. Practically, the implementation of a full AND/OR Prolog system is too complex to be efficient. There have been efforts to combine both AND and OR in such a way that each type of parallelism will be exploited only when it is efficient to do so [25]. This paper describes a different approach to implementing full AND/ORparallelism in an experimental system called OASys (Or/And SYStem). OASys efficiently implements OR-parallelism by viewing the search space as many independent and deterministic computations which rarely need to communicate. In addition, this feature provides the ability of exploiting the deterministic paths of the proof in an AND-parallel way. OASys aims towards an architecture in which the processing elements performing the OR-parallel computation, possess their own address space while other simple processing units are assigned with AND-parallel computation and share the same address space. Communication is limited only to

4 - 2 - scheduling operations, i.e. allocation of work to available processing elements. The later is accomplished in two different ways: recomputing the alternative path or copying the computation state. In the next section, we briefly describe the problems of parallelising logic programs as well as the main approaches for implementation of AND/OR-parallelism. Section 3 presents the OASys computation model and section 4 its proposed architecture. Section 5 is devoted to presenting performance results. Finally, we conclude with section 6 in which we outline the features of the proposed model and the objectives of our further work. 2. IMPLEMENTING PARALLELISM IN LOGIC PROGRAMMING The main burden in implementing OR-parallelism is the handling of multiple bindings of the same variable produced by different paths. A variety of methods have been implemented: Processors share the same address space while a binding method records the different bindings to an appropriately defined data structure [5, 24]. In shared environment schemes, the overhead reported is small in respect to sequential Prolog execution. Despite their complexity they are adopted in many successful implementations [4, 14]. Processors are independent and have their own copy of the working environment. The main advantage of this approach is the locality of reference. However, there is an extra overhead due to environment copying between parent and child processes although in some cases this is proved to be faster than a sharing binding scheme [2]. Processors view the search space as totally independent computations. Therefore each processor starts computing each path from the root in a sequential Prologlike manner without having to communicate with others at all. This implies recomputation of certain parts of the search space but avoids the need of either having to share variable bindings or copying environments. It is argued that this method can perform well in highly non-deterministic programs where the previous

5 - 3 - two schemes require excessive amount of communication [3, 7, 8, 12, 15]. However using the recomputation method, it is difficult to cope with cut and sideeffect predicates of Prolog. In AND-parallelism the problem has been the handling of shared variables between subgoals. The main approaches to implement AND-parallel systems have been the following: Processors work in a producer-consumer way, communicating through the shared variables. This approach, called Stream AND-parallelism, is adopted by the Committed Choice Non-Deterministic languages [6, 18]. However, the language issues introduced for the sake of efficiency sacrifice to a great extend the semantics of Prolog. Processors work in parallel only when the subgoals do not share variables at all (independent AND-parallelism) [13]. It is, however, difficult to establish independence of variables in goals and extra overhead is introduced due to static or dynamic analysis in order to identify such goals. Processors work in parallel even if they share variables (dependent ANDparallelism) [17]. The main overhead introduced in this case is reconciliation of bindings produced by different processes. Full AND/OR parallelism is difficult to implement due to the overheads introduced by both types of parallelism. In practice, it is more efficient to combine OR with Independent AND-parallelism [3, 4, 9]. In Andorra-I [25], a substantial effort has been made to fully combine the two types. The main idea is to delay the execution of non-determinate goals while performing the determinate computation available in an AND-parallel way. When there is no more determinate computation, i.e. all computations are suspended, the leftmost goal is reduced in an OR-parallel way. As in Andorra-I, OASys exploits both OR-parallelism and dependent AND parallelism. However OASys does not require any determinacy checking, since the search space is considered as independent and determinate OR-parallel branches. This feature allows computation to proceed freely without suspending any nondeterminate goals. The OASys approach imposes a different and much simpler implementation aiming towards higher performance.

6 - 4 - Our previous work [19] proposed a formal model for the resolution of AND/ORparallelism and suggested its implementation and abstract architecture [20]. In OASys, we improve the execution scheme and we eliminate the problems due to centralised control by providing an efficient distributed scheduling. In addition, OASys supports two alternative ways of distributing work, namely copying and recomputation, which are determined at compile time. 3. OASYS COMPUTATIONAL MODEL In Prolog, the search space can be normally represented by an AND/OR tree, where AND nodes represent the body goals of a clause and OR nodes represent the alternative clauses matching a body goal. In OASys, the search space can be represented by a tree where the nodes, called U-nodes, denote the unifications between the calling predicates and the heads of the corresponding procedures. A link between two U-nodes indicates a sequence of procedure calls as it is specified by the program. A link is called successful if the unifications performed in the two U- nodes produce consistent bindings, or unsuccessful otherwise. Figure 1a shows a Prolog program and its corresponding AND/OR tree while figure 1b shows the OASys search tree of the same Prolog program. In Prolog, a node in the execution tree has already produced the variable bindings through unification. In OASys, unifications in the U-nodes are in progress while new U-nodes are continuously produced, i.e. the U-nodes of a path are produced and executed in a, asynchronous and concurrent way. This has the effect that a path containing U-nodes may be speculative, i.e. some of the U-nodes generated may not have existed in the corresponding sequential Prolog execution. [place of Figure 1] OASys supports OR-parallelism by searching simultaneously different paths of the search tree. It also supports AND-parallelism by executing in parallel the U-nodes belonging to the same path. Execution of a path terminates when: all links are successful (a solution is found), or one link is unsuccessful (failure in conjunction), or

7 - 5 - unification in an U-node fails (mismatch). Figure 2 shows the links generated for the paths of the search tree of figure 1 and the most general unifiers found for every unification in U-nodes. The same figure presents the potential of exploiting AND/OR parallelism in OASys by viewing the same tree as three independent computations. The gray shaded areas include U- nodes which also exist in other paths and their meaning is discussed later. [place of Figure 2] 4. THE OASYS ARCHITECTURE Following the above computation model, OASys can be implemented by employing a group of Processing Elements (PEs). Each PE is a shared-memory multiprocessor and consists of three main parts: a preprocessor, a scheduler and an engine (figure 3). In the following sections we present the compilation process, the execution scheme and we discuss some implementation issues. 4.1 Compilation The source Prolog program is compiled in two codes, namely the Compiled Prolog (CP) code and the Compiled Clauses Dependency (CCD) code [21]. The CP code represents the definitions of clauses and during the startup phase, it is distributed to all PE engines. The CP code is broadly based on the APIM [20], it includes only one unify instruction and provides a mechanism to implement arithmetic and other built-in operations. The CCD code is produced by abstract interpretation of the Prolog program and during the startup phase is distributed to all PE preprocessors. It maps each Prolog clause to its actual address in the CP code together with the index table entry for every body subgoal. For each procedure in the program there exists an index table which contains the addresses of the clauses of that procedure as well as the data types of the first argument of their head predicate. The preprocessor executes the

8 - 6 - CCD code and in combination with the index tables generates choices, when there is more than one clause in the definition of a procedure. Each Prolog clause: H i : - B 1, B 2,..., B n is represented in the CCD code notation as: Ci : T 1 (ArgType 1 ), T 2 (ArgType 2 ),..., T n (ArgType n ). where Ci is the address of the clause in the CCD code used to derive the actual address of the clause in the CP code, Tj is the address of the index table of Bj, and ArgTypej is the data type of the first argument of Bj. [place of figure 3] The CCD code for the N queens program together with the graph representation for the CCD code is given in the appendix. 4.2 OASys Execution Scheme Each PE passes through three phases of operation: preprocessing, scheduling and execution. A PE starts generating choices by executing the CCD code (preprocessing phase) and passes them to the scheduler which decides whether available work could be shared with other PEs (scheduling phase). The PE's engine works concurrently by consuming the choices passed from the scheduler. The sequence of choices are used as directives in order to construct the sequence of U- nodes which in turn are assigned to special processing units (AND Processing Units or APUs) which work in parallel (execution phase). The two types of parallelism supported, are handled as follows: OR-parallelism. The different paths of the search tree may be distributed to idle PEs. The scheduler decides whether it will assign the alternative paths to other PEs (OR-parallel execution) or keep them for its own engine (sequential execution). AND-parallelism. The engine executes the U-nodes of a single path in parallel, i.e. it performs the unifications within a U-node by assigning each node to an APU. This is in effect AND-parallelism because the U-nodes in the same path form the conjunction of predicate calls which may solve the query.

9 - 7 - These two types of parallelism are depicted in figure 2. The three different paths are allocated in three distinct PEs while each U-node of the same path is allocated in a different APU The Preprocessor The preprocessor executes the CCD code, thus generating a sequence of choice points along a path which are then passed to the engine. As shown previously, a call to a predicate is represented by an index table address together with the type of its first argument. Execution of the call is actually a search in the corresponding index table to find a clause with a matching data type. If only one matches, the call is deterministic. Otherwise, there is a choice point and one of the alternative addresses is sent to the engine whereas the fate of the others is decided by the scheduler. Note that the preprocessor in OASys finds the sequencing of calls, thus facilitating the indexing operation at run time. In this context, preprocessing differs to the compile time preprocessing of Andorra-I that performs determinacy checking The Scheduler At any time, each PE scheduler is aware of availability of work. This is done by maintaining a table of PE-ids which is updated by broadcasting a single byte that indicates whether a PE has unexplored paths to share or a PE has no more available work. More analytically, when a PE completes the execution of its current path either successfully or not, it backtracks to its local choice points, if any. When there is no more available work in a PE, it is declared as idle and tries to find work by consulting its table. It chooses an entry with available work and makes a request. The PE which is asked to give work: it is locked against other requests, it simulates failure (dummy backtracking) up to the nearest to the root alternative, it transmits the appropriate information, it unlocks and continues executing its current path. A PE's request for work might not be answered from another PE if the latter is locked. Instead of busy waiting, the scheduler decides to try another entry from its table.

10 - 8 - The current implementation supports two possible ways of distributing work which are determined at compile time. The PE with available work may either: send the links followed from root to the available node and force the PE which issued the request to recompute that specific path and continue execution with one alternative, or copy the computation state of the available node and let the PE which issued the request to continue execution with one alternative. The effect of both techniques is illustrated in figure 2, where recomputation includes the portions of paths in the gray shaded areas while copying is concerned only with the portions of paths outside those areas. Scheduling is a crucial issue and could be optimised in various ways. For the moment, work that is nearest to the root has priority for distribution. This minimizes the amount of information transmitted. This could also be avoided by pre-compilation detection of small chunks of work or by employing user annotations to indicate parts of the program that should be kept private to a PE. In addition to these, we are currently experimenting with a more sophisticated scheduler which can decide at run time whether copying or recomputation is more profitable at a specific occasion, taking under consideration various factors that affect one or the other, such as the amount of information to be transmitted, the estimated communication, the recomputation overhead, etc The Engine The engine consists of a work pool, a control unit and a number of special processing units (APUs) which operate in parallel sharing a common memory (figure 4). The work pool accumulates choices supplied by the preprocessor. The control unit executes the machine instructions, constructs U-nodes taking directives from the work pool, and distributes them to the APUs. The APUs receive the addresses of a calling predicate and the head of the corresponding procedure, and perform the unification operation efficiently. Parallel execution of APUs requires synchronised access to unbound variables which may also be accessed by other APUs. This does not add considerable complexity since, according to the execution algorithm, unbound variables may be

11 - 9 - bound in any order, either left to right as in Prolog, or right to left. The mechanism used for the variable bindings is a lock-test-and-set operation. According to this, whenever an APU tries to bind a variable, initially tests if the variable is still unbound. If so, the variable is locked and the APU proceeds by setting a value. Otherwise, the variable binding is retrieved and a consistency check takes place. [place of figure 4] Heap writing operations are also synchronised by setting a lock to the starting address of the space available for writing. An APU cannot allocate space on the heap until an already heap-writing operation by another APU has been completed. However, access operations are free up to the locked address. A similar problem has been addressed in [10] proposing that simple processing units configured as a systolic array used to speed up the parallel logic inferences. 4.3 Implementation Issues The previously described execution scheme adopts quite naturally to hybrid multiprocessor in which parts of the address space are shared among subsets of processors, as for example, in a distributed (loosely coupled) system containing multiple shared-memory multiprocessors. In such a system each shared-memory multiprocessor would be a natural host for a team of processors sharing a common address space. OASys can also be considered as a SPMD (Single Program Multiple Data) machine, its processors executing the same program on different parts of the data, proceeding independently of the others inside their own program. Besides the above mentioned desired ideal architecture, OASys could also be implemented as follows : A network of multiprocessor workstations each one of which corresponds to a PE while its included processors correspond to the various components of the PE. A message passing scalable multiprocessor architecture that provides a shared virtual address space on essentially physically distributed hardware [23]. The address space is truly shared but it is also virtual in the sense that memory is physically distributed across different processors.

12 OASys could also be emulated in a shared memory multiprocessor machine by grouping processors into teams such that each team corresponds to a PE and each processor of the team corresponds to the various components of the PE. The memory should be partitioned into as many independent areas as the number of teams. All processors in a team will share the same logical address space in each area. 5. PERFORMANCE RESULTS A prototype system for OASys has been developed in C and runs on a network of Sun workstations. We executed a number of benchmark programs and the results taken are found to be in accordance with those taken by a simulation of the system [22]. We have chosen to run the following series of Prolog programs: nrev500: naive reverse of a list of 500 elements, merge500: mergesort of a list of 500 elements, zebra: who owns the zebra problem, queens8: the 8-queens problem, ham: the search for hamilton paths in a graph. The first two of the above programs exploit only AND-parallelism while the rest exploit both types of parallelism. 5.1 Relative Speed Performance There are two versions of the prototype system, the sequential and the parallel OASys. In sequential OASys, several aspects concerning synchronization, communication and scheduling have been omitted. Table 1 compares the performance of sequential OASys and parallel OASys employing one PE, with other well known implementations of Prolog, like sequential Andorra-I, Eclipse and Sicstus. All times shown are actual cpu times in seconds and obtained on the same machine (Sun Enterprise 3000). The runtimes of Andorra-I were obtained with preprocessing for determinacy checking.

13 [place of table 1] 5.2 OR-Parallel Performance Table 2 shows the speedups measured for 1 to 10 PEs, each with a single APU, using a network with the same number of workstations as the number of PEs. The speedups concern only OR-parallelism exploited by the benchmark programs. Since the programs are non-deterministic, the speedup is close to linear, as expected. It is worth noticing that the performance of recomputation and copying is similar with a small number of PEs, giving a precedence to copying as the number of PEs increases. Although the data transfer in recomputation is 4 to 7 times less than copying, the above results are justified by the fact that the transfer time in copying is much less than the re-execution time in recomputation. The overall amount of data needed to be transferred in copying produces a small overhead since current networks are capable of transferring large amounts of data within a very short time. In addition, copying is efficiently implemented, since each PE has identical but independent logical address space and the stack portions are copied without reallocating any pointers. Bearing in mind that copying is profitable when work is allocated lower in the tree over recomputation which is more profitable closer to the root, a hybrid scheduling scheme deciding which method is more appropriate at any given occasion is expected to result in better actual performance. [place of Table 2] 5.3 AND-Parallel Performance The current prototype system implements AND-parallelism by creating the same number of processes as the number of required APUs in each workstation. Having measured the actual timings, AND-parallel performance is estimated by taking into acoount the following factors: the parallel overhead due to variable locking over sequential execution with 1 APU,

14 the overhead due to book keeping operations of non-determinate nodes which demand more processing than deterministic nodes, the overhead of allocating unifications to APUs which increases directly proportionally to the number of APUs. Table 3 shows the estimated speedups for the same benchmark programs referring to a single PE with 1 to 10 APUs. [place of Table 3] Some of the programs show good speedup since they are purely deterministic, like nrev500 and merge500. Others, like ham and zebra, have long deterministic paths between choice points and AND-parallelism can be exploited without much overhead. Speedup in queens8 is low and flattens as the number of APUs increases. This is because the search space has short deterministic paths between choice points and therefore more APUs do not contribute to the exploitation of ANDparallelism. On the other hand, for all programs, we do not expect better performance with more than 10 APUs since the preprocessor s total execution time is 8-15% of the total execution time of an engine with 1 APU. That means that as the number of APUs increases, the above percentage increases and the execution times become roughly equal and the total execution time of a PE will be the execution time of the preprocessor, i.e. constant. 5.4 AND/OR-Parallel Performance Results of combing both types of parallelism are shown in Table 4, for two representative programs. The results were obtained by running queens8 and ham with different number of PEs, each one configured with a fixed number of APUs at every run (1 to 6). The higher performance is reached with 10 PEs with 6 APUs each (i.e. 60 APUs in total). [place of Table 4] 6. CONCLUSIONS AND FUTURE WORK

15 In this paper we have described the OASys model for implementing full AND/ORparallelism. OASys achieves AND/OR-parallelism by assigning each path of the search tree into different processing elements (PEs) while at the same time assigning each node of an individual path into different processing units (APUs). The above scheme adopts a modular design which is easily expandable and highly distributed, thus limiting communication only to scheduling operations. Allocation of work to available processing elements can be achieved by either copying or recomputing. OASys software architecture aims towards a distributed system in which the processing elements rarely need to communicate and could be implemented in various ways. However, an ideal architecture would be a hybrid multiprocessor containing a multiple shared memory multiprocessors connected by a message passing network. We have implemented a software prototype in order to investigate the model's performance. By running a series of example programs we showed that the two scheduling methods performed similarly, giving a minor precedence to copying. In both cases, the speedup obtained is linear to the number of processing elements. However, there is an upper bound in speeding up the execution which is determined by the ratio of the time required to generate the alternative paths over the time of performing the deterministic computations on these paths. We are currently working on an OASys implementation which integrates both scheduling methods during the execution of a program. This approach aims towards a more sophisticated scheduler which will decide, at any occasion: which of the two methods (copying or recomputation) to use depending on the estimated overhead, whether it is more profitable to keep an alternative path for sequential execution rather than sharing it with other processing elements, depending on the estimated amount of work in this path, which of the available alternative paths to send, depending on the estimated relative profit of sharing one path compared to the profit of sharing any other. We believe that the performance will increase, since the overhead from communication will be further optimised.

16 ACKNOWLEDGMENTS We would like to thank D. Xochellis, D. Benis, C. Berberidis and E. Sakellariou for their help in the prototype and the collection of performance data. Many thanks to Rong Yang for the help with Andorra-I and the anonymous referees for their constructive comments. REFERENCES [1] K. Ali and R. Karlsson. The Muse Or-Parallel Prolog model and its performance. Proc North American Conf. on Logic Programming, Austin, USA, Oct. 1990, [2] K. Ali and R. Karlsson. Scheduling Speculative Work in MUSE and Performance Results. Int. J. of Parallel Programming, 21 (6), 1992, [3] L. Araujo, J.J. Ruz, PDP: Prolog Distributed Processor for Independent AND/OR Parallel Execution of Prolog, Proc. of the 11th ICLP (P.Van Hentenryk ed.), MIT Press, 1994, [4] U. Baron, J.C. de Kergommeaux, M.Hailperin, M.Ratcliffe, P.Robert, J.Syre and H.Westphal. The parallel ECRC Prolog system PEPSys: An overview and evaluation results. Future Generation Computer Systems '88 Conf., Tokyo 1988, [5] A. Ciepielewski and S. Haridi. A formal model for OR-Parallel Execution of Logic Programs. Proc. in Format Processing, IFIP, 1983, [6] K.L.Clark and S.Gregory. Parlog: a parallel programming in logic, ACM Transactions for Languages and Systems, 8 (1), 1986, [7] W.F. Clocksin. Principles of the DelPhi parallel inference machine. The Computer Journal, 30 (5), 1987, [8] G. Gupta and M. Hermenegildo. And-Or Parallel Prolog: A Recomputation Based Approach. New Generation Computing, 11, 1993,

17 [9] G. Gupta, M. Hermenegildo, E. Pontelli, V.S. Costa. ACE: And/Or-Parallel Copying-Based Execution of Logic Programs. Proc. of the 11th ICLP (P.Van Hentenryk ed.), MIT Press, 1994, [10] Y.F.Han and D.J.Evans. Parallel Inference Algorithms for the Connection Method on Systolic Arrays. Int. J. Computer Math., Vol. 53, 1994, [11] M. Hermenegildo and K. Green. &-Prolog and Its Performance: Exploiting Independent And-Parallelism. Intern. Conference on Logic Programming ICLP 90, MIT press, 1990, [12] Z. Lin. Self-organizing Task Scheduling for Parallel Execution of Logic Programs. Proc. of the Intern. Conf. on Fifth Generation Computer Systems, ACM, 1992, [13] Y. Lin and V. Kumar. AND parallel execution of Logic Programs on a Shared- Memory Multiprocessor. The J. of Logic programming, 10, 1991, [14] E. Lusk, R. Butler, T. Disz, R. Olson, R. Overbeek, R. Stevens, D.H.D. Warren, A. Calderwood, P. Szeredi, S. Haridi, P. Brand, M. Carlsson, A. Ciepielewski, and B.Hausman. The Aurora OR-parallel Prolog system. New Generation Computing, 7 (2,3), 1990, [15] S. Mudambi, J. Schimpf. Parallel CLP on Heterogeneous Networks, Proc. of the 11th ICLP (P.Van Hentenryk ed.), MIT Press, 1994, [16] T.J. Reynolds and P.Kefalas. OR-parallel Prolog and search problems in AI applications. In Proc. of the 7th Intern. Conference on Logic Programming eds. D.H.D.Warren and P.Szeredi, Jerusalem, 1990, [17] K. Shen. Exploiting And-parallelism in Prolog: the Dynamic Dependent Andparallel Scheme (DDAS). Proc. of Joint Intern. Conf. and Symp. on Logic Programming, 1992, [18] K. Ueda. Introduction to Guarded Horn Clauses. ICOT Research Center, TR- 209, Tokyo, [19] I. Vlahavas and P. Kefalas. A Parallel Prolog Resolution Based on Multiple Unifications. Parallel Computing, North-Holland, vol. 18, 1992, [20] I. Vlahavas and P. Kefalas. The AND/OR Parallel Prolog Machine APIM: Execution Model and Abstract Design. The Journal of Programming Languages 1, 1993,

18 [21] I. Vlahavas, Exploiting AND-OR Parallelism in Prolog: The OASys Computational Model and Abstract Architecture, Journal of Systems and Software (to appear). [22] I. Vlahavas, P. Kefalas, I. Dakellariou and C. Halatsis, The Basic OASys Model: Preliminary Results, Proceedings of 6th National Conf. on Informatics, Greek Computer Society, Athens 1997, [23] D.H.D. Warren and S. Haridi. Data Diffusion Machine-A Scalable Shared Virtual Memory Multiprocessor. Proceedings of the Int. Conf. on Fifth Generation Computer Systems, 1988, [24] D.S. Warren. Efficient Prolog Memory Management for Flexible Control Strategy. Int. Symp. on Logic Programming, 1984, [25] R. Yang, T. Beaumont, V. Santos Costa, D.H.D.Warren. Performance of the Compiler-based Andorra-I System, Proc. of the 10th Intern. Conference in Logic Programming (D.S. Warren ed.), MIT Press, 1993,

19 APPENDIX Consider the following Prolog program for N queens: queens([],r,r). queens([h T],R,P):- delete([a,[h T],L), safe(r,a,1), queens(l,[a R],P). % C1 % C2 delete(x,[x Y],Y). delete(x,[y Z],[Y W]):- delete(x,z,w). % C3 % C4 safe([],_,_). safe([h T],U,D):- H-U =\=D, U-H =\=D, D1 is D+1, safe(t,u,d1). % C5 % C6 Given the query for 4 queens:?- queens([1,2,3,4],k,m) % Query the CCD code, the index tables and the graph representation for the CCD code are given below: T1: nil C1 T2: var C3 T3: nil C5 {queens} list C2 {delete} var C4 {safe} list C6 C1: true. queens C2: T2(var), T3(var), T1(var). nil var C3: true. C4: T2(var). solution delete var C5: true. C6: T3(var). var Query: T1(list). safe var nil

20 - 2 - Figure and Table captions Figure 1: Figure 2: Figure 3: Figure 4: Representation of search trees in Prolog and OASys Nodes and links in OASys search tree Overview of the OASys architecture The OASys Engine configuration Table 1: Table 2 : Table 3: Table 4: Runtimes (in seconds) for OASys and other Prolog systems OR-parallel Speedups AND-parallel Speedups AND/OR-parallel Speedups of the Queens8 and Hamilton path problem

21 - 3 -?- q(a,b) q(x,y) q(x,y):- a(x,y). a(x,y):-b(x,y),c(y). a(x,y):-d(x),e(y). b(1,2). c(3). d(1). d(2). e(3).?- q(a,b). b(x,y) a(x,y) a(x,y) c(2) d(x) a(x,y) e(y) b(1,2) FAIL d(1) d(2) e(1) (a) AND/OR tree generated by Prolog q(a,b)=q(x,y) a(x,y)=a(x',y') a(x,y)=a(x',y') b(x',y')=b(1,2) d(x')=d(1) d(x')=d(2) c(y')=c(3) e(y')=e(3) e(y')=e(3) (b) Search tree in OASys Figure 1

22 - 4 - OR Ð1={A=X,B=Y} successful Ð2={X=X', Y=Y'} Ð1={A=X,B=Y} successful Ð1={X=X',Y=Y'} Ð1={A=X,B=Y} successful Ð2={X=X',Y=Y'} AND successful successful successful Ð3={X'=1,Y'=2} unsuccessful Ð3={X'=1} successful Ð3={X'=2} successful Ð4={Y'=3} Ð4={Y'=3} Ð4={Y'=3} PATH FAILED PATH SOLVED PATH SOLVED Figure 2

23 - 5 - System Communication Network PRE- PROCESSOR SCHEDULER PRE- PROCESSOR SCHEDULER.... ENGINE ENGINE PE PE Figure 3

24 - 6 - Scheduler work pool Control Unit APU APU... APU Memory Interconnection Network (MIN) MEMORY ENGINE Figure 4

25 - 7 - Sequentia l Parallel OASys Sequentia l Eclipse Sicstus OASys Andorra-I queens hamilton zebra merge nrev Table 1

26 - 8 - PEs ham queens zebra Copy Recom. Copy Recom. Copy Recom Table 2

27 - 9 - APUs ham queens8 zebra nrev500 merge Table 3

28 PEs 1 APU 2 APUs 4 APUs 6 APUs 1 APU 2 APUs 4 APUs 6 APUs Table 4

An Or-Parallel Prolog Execution Model for Clusters of Multicores

An Or-Parallel Prolog Execution Model for Clusters of Multicores An Or-Parallel Prolog Execution Model for Clusters of Multicores João Santos and Ricardo Rocha CRACS & INESC TEC and Faculty of Sciences, University of Porto Rua do Campo Alegre, 1021, 4169-007 Porto,

More information

Parallel Execution of Logic Programs: Back to the Future (??)

Parallel Execution of Logic Programs: Back to the Future (??) Parallel Execution of Logic Programs: Back to the Future (??) Enrico Pontelli Dept. Computer Science Overview 1. Some motivations [Logic Programming and Parallelism] 2. The Past [Types of Parallelism,

More information

Thread-Based Competitive Or-Parallelism

Thread-Based Competitive Or-Parallelism Thread-Based Competitive Or-Parallelism Paulo Moura 1,3, Ricardo Rocha 2,3, and Sara C. Madeira 1,4 1 Dep. of Computer Science, University of Beira Interior, Portugal {pmoura, smadeira}@di.ubi.pt 2 Dep.

More information

facultad de informatica universidad politecnica de madrid

facultad de informatica universidad politecnica de madrid facultad de informatica universidad politecnica de madrid A Simulation Study on Parallel Backtracking with Solution Memoing for Independent And-Parallelism Pablo Chico de Guzman Amadeo Casas Manuel Carro

More information

DAOS Scalable And-Or Parallelism

DAOS Scalable And-Or Parallelism DAOS Scalable And-Or Parallelism Luís Fernando Castro 1,Vítor Santos Costa 2,Cláudio F.R. Geyer 1, Fernando Silva 2,Patrícia Kayser Vargas 1, and Manuel E. Correia 2 1 Universidade Federal do Rio Grande

More information

Kelpie: a Concurrent Logic Programming System for Knowledge Based Applications

Kelpie: a Concurrent Logic Programming System for Knowledge Based Applications Kelpie: a Concurrent Logic Programming System for Knowledge Based Applications Hamish Taylor Department of Computer Science Heriot-Watt University Abstract A software architecture that interfaces a concurrent

More information

Infinite Derivations as Failures

Infinite Derivations as Failures Infinite Derivations as Failures Andrea Corradi and Federico Frassetto DIBRIS, Università di Genova, Italy name.surname@dibris.unige.it Abstract. When operating on cyclic data, programmers have to take

More information

CONSIDERATIONS CONCERNING PARALLEL AND DISTRIBUTED ARCHITECTURE FOR INTELLIGENT SYSTEMS

CONSIDERATIONS CONCERNING PARALLEL AND DISTRIBUTED ARCHITECTURE FOR INTELLIGENT SYSTEMS CONSIDERATIONS CONCERNING PARALLEL AND DISTRIBUTED ARCHITECTURE FOR INTELLIGENT SYSTEMS 1 Delia Ungureanu, 2 Dominic Mircea Kristaly, 3 Adrian Virgil Craciun 1, 2 Automatics Department, Transilvania University

More information

IDRA (IDeal Resource Allocation): Computing Ideal Speedups in Parallel Logic Programming

IDRA (IDeal Resource Allocation): Computing Ideal Speedups in Parallel Logic Programming IDRA (IDeal Resource Allocation): Computing Ideal Speedups in Parallel Logic Programming M.J. Fernández M. Carro M. Hermenegildo {mjf, mcarro, herme}@dia.fi.upm.es School of Computer Science Technical

More information

Implementação de Linguagens 2016/2017

Implementação de Linguagens 2016/2017 Implementação de Linguagens Ricardo Rocha DCC-FCUP, Universidade do Porto ricroc @ dcc.fc.up.pt Ricardo Rocha DCC-FCUP 1 Logic Programming Logic programming languages, together with functional programming

More information

On the BEAM Implementation

On the BEAM Implementation On the BEAM Implementation Ricardo Lopes 1,Vítor Santos Costa 2, and Fernando Silva 1 1 DCC-FC and LIACC, Universidade do Porto, Portugal {rslopes,fds}@ncc.up.pt 2 COPPE-Sistemas, Universidade Federal

More information

Exploiting Path Parallelism in Logic Programming*

Exploiting Path Parallelism in Logic Programming* Exploiting Path Parallelism in Logic Programming* Jordi Tubella and Antonio Gonz6lez Universitat Politkcnica de Catalunya C/. Gran CapitA, s/n. - Edifici D6 E-08071 Barcelona (Spain) e-mail: {jordit,antonio}@ac.upc.es

More information

Combining depth-first and breadth-first search in Prolog execution

Combining depth-first and breadth-first search in Prolog execution Combining depth-first and breadth-first search in Prolog execution Jordi Tubella and Antonio González Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya, Campus Nord UPC, C/.

More information

Specialization-based parallel Processing without Memo-trees

Specialization-based parallel Processing without Memo-trees Specialization-based parallel Processing without Memo-trees Hidemi Ogasawara, Kiyoshi Akama, and Hiroshi Mabuchi Abstract The purpose of this paper is to propose a framework for constructing correct parallel

More information

Relating Data Parallelism and (And ) Parallelism in Logic Programs

Relating Data Parallelism and (And ) Parallelism in Logic Programs Published in Proceedings of EURO PAR 95, Sweden Relating Data Parallelism and (And ) Parallelism in Logic Programs Manuel V. Hermenegildo and Manuel Carro Universidad Politécnica de Madrid Facultad de

More information

Job Re-Packing for Enhancing the Performance of Gang Scheduling

Job Re-Packing for Enhancing the Performance of Gang Scheduling Job Re-Packing for Enhancing the Performance of Gang Scheduling B. B. Zhou 1, R. P. Brent 2, C. W. Johnson 3, and D. Walsh 3 1 Computer Sciences Laboratory, Australian National University, Canberra, ACT

More information

Implementação de Linguagens

Implementação de Linguagens Implementação de Linguagens de Programação Lógica Extended Andorra Model Ricardo Lopes rslopes@ncc.up.pt DCC-FCUP Tópicos Avançados de Informática Mestrado em Informática 2005/06 The Andorra Principle

More information

Towards CIAO-Prolog - A Parallel Concurrent Constraint System

Towards CIAO-Prolog - A Parallel Concurrent Constraint System Towards CIAO-Prolog - A Parallel Concurrent Constraint System M. Hermenegildo Facultad de Informática Universidad Politécnica de Madrid (UPM) 28660-Boadilla del Monte, Madrid, Spain herme@fi.upm.es 1 Introduction

More information

Operational Semantics

Operational Semantics 15-819K: Logic Programming Lecture 4 Operational Semantics Frank Pfenning September 7, 2006 In this lecture we begin in the quest to formally capture the operational semantics in order to prove properties

More information

Complete and Efficient Methods for Supporting Side-effects in Independent/Restricted And-parallelism

Complete and Efficient Methods for Supporting Side-effects in Independent/Restricted And-parallelism (Proc. 1989 Int l. Conf. on Logic Programming, MIT Press) Complete and Efficient Methods for Supporting Side-effects in Independent/Restricted And-parallelism K. Muthukumar M. Hermenegildo MCC and University

More information

PALS: Efficient Or-Parallel Execution of Prolog on Beowulf Clusters

PALS: Efficient Or-Parallel Execution of Prolog on Beowulf Clusters PALS: Efficient Or-Parallel Execution of Prolog on Beowulf Clusters K. Villaverde and E. Pontelli H. Guo G. Gupta Dept. Computer Science Dept. Computer Science Dept. Computer Science New Mexico State University

More information

We exploit parallelism across recursion levels in a deterministic subset

We exploit parallelism across recursion levels in a deterministic subset Exploiting Recursion-Parallelism in Prolog Johan Bevemyr Thomas Lindgren Hakan Millroth Computing Science Dept., Uppsala University Box 311, S-75105 Uppsala, Sweden Email: fbevemyr,thomasl,hakanmg@csd.uu.se

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols The International Arab Journal of Information Technology, Vol. 5, No. 4, October 2008 381 Incompatibility Dimensions and Integration of Atomic Commit Protocols Yousef Al-Houmaily Department of Computer

More information

}Optimization Formalisms for recursive queries. Module 11: Optimization of Recursive Queries. Module Outline Datalog

}Optimization Formalisms for recursive queries. Module 11: Optimization of Recursive Queries. Module Outline Datalog Module 11: Optimization of Recursive Queries 11.1 Formalisms for recursive queries Examples for problems requiring recursion: Module Outline 11.1 Formalisms for recursive queries 11.2 Computing recursive

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

}Optimization. Module 11: Optimization of Recursive Queries. Module Outline

}Optimization. Module 11: Optimization of Recursive Queries. Module Outline Module 11: Optimization of Recursive Queries Module Outline 11.1 Formalisms for recursive queries 11.2 Computing recursive queries 11.3 Partial transitive closures User Query Transformation & Optimization

More information

Evaluation of Parallel Programs by Measurement of Its Granularity

Evaluation of Parallel Programs by Measurement of Its Granularity Evaluation of Parallel Programs by Measurement of Its Granularity Jan Kwiatkowski Computer Science Department, Wroclaw University of Technology 50-370 Wroclaw, Wybrzeze Wyspianskiego 27, Poland kwiatkowski@ci-1.ci.pwr.wroc.pl

More information

PARALLELISM IN LOGIC PROGRAMMING

PARALLELISM IN LOGIC PROGRAMMING (To be presented at IFIP Congress 89) PARALLELISM IN LOGIC PROGRAMMING Kazunori UEDA Institute for New Generation Computer Technology 4-28, Mita 1-chome, Minato-ku, Tokyo 108, Japan This paper discusses

More information

Prolog Programming. Lecture Module 8

Prolog Programming. Lecture Module 8 Prolog Programming Lecture Module 8 Prolog Language Prolog is unique in its ability to infer facts from the given facts and rules. In Prolog, an order of clauses in the program and goals in the body of

More information

A Partial Breadth-First Execution Model for Prolog*

A Partial Breadth-First Execution Model for Prolog* A Partial Breadth-First Execution Model for Prolog* Jordi Tubella and Antonio Gonzilez Departament d Arquitectura de Computadors Universitat Politkcnica de Catalunya - Barcelona - Spain Abstract MEM (Multipath

More information

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher COP4020 ming Languages Compilers and Interpreters Robert van Engelen & Chris Lacher Overview Common compiler and interpreter configurations Virtual machines Integrated development environments Compiler

More information

Implementing Prolog on Distributed Systems: n-parallel PROLOG

Implementing Prolog on Distributed Systems: n-parallel PROLOG Implementing Prolog on Distributed Systems: n-parallel PROLOG Douglas Eadline Paralogic, Inc. 115 Research Drive, Bethlehem, Pa 18015 (610) 861-6960 deadline@plogic.com Abstract: A method for distributing

More information

Milind Kulkarni Research Statement

Milind Kulkarni Research Statement Milind Kulkarni Research Statement With the increasing ubiquity of multicore processors, interest in parallel programming is again on the upswing. Over the past three decades, languages and compilers researchers

More information

Scheduling with Bus Access Optimization for Distributed Embedded Systems

Scheduling with Bus Access Optimization for Distributed Embedded Systems 472 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 5, OCTOBER 2000 Scheduling with Bus Access Optimization for Distributed Embedded Systems Petru Eles, Member, IEEE, Alex

More information

Rule partitioning versus task sharing in parallel processing of universal production systems

Rule partitioning versus task sharing in parallel processing of universal production systems Rule partitioning versus task sharing in parallel processing of universal production systems byhee WON SUNY at Buffalo Amherst, New York ABSTRACT Most research efforts in parallel processing of production

More information

A FORMAL MODEL FOR IMPLEMENTATION OF OR PARALLELISM

A FORMAL MODEL FOR IMPLEMENTATION OF OR PARALLELISM Loredana MOCEAN, PhD Monica CIACA, PhD Business Information Systems Department Babes Bolyai University, Cluj Napoca Halim Khelalfa, PhD University of Wollongong, Dubai A FORMAL MODEL FOR IMPLEMENTATION

More information

Concurrent Programming Constructs and First-Class Logic Engines

Concurrent Programming Constructs and First-Class Logic Engines Concurrent Programming Constructs and First-Class Logic Engines Paul Tarau University of North Texas tarau@cs.unt.edu Multi-threading has been adopted in today s Prolog implementations as it became widely

More information

ADAPTIVE SORTING WITH AVL TREES

ADAPTIVE SORTING WITH AVL TREES ADAPTIVE SORTING WITH AVL TREES Amr Elmasry Computer Science Department Alexandria University Alexandria, Egypt elmasry@alexeng.edu.eg Abstract A new adaptive sorting algorithm is introduced. The new implementation

More information

Runtime support for region-based memory management in Mercury

Runtime support for region-based memory management in Mercury Runtime support for region-based memory management in Mercury Quan Phan*, Zoltan Somogyi**, Gerda Janssens* * DTAI, KU Leuven, Belgium. ** Computer Science and Software Engineering Department, University

More information

Design of Parallel Algorithms. Course Introduction

Design of Parallel Algorithms. Course Introduction + Design of Parallel Algorithms Course Introduction + CSE 4163/6163 Parallel Algorithm Analysis & Design! Course Web Site: http://www.cse.msstate.edu/~luke/courses/fl17/cse4163! Instructor: Ed Luke! Office:

More information

Scalability via Parallelization of OWL Reasoning

Scalability via Parallelization of OWL Reasoning Scalability via Parallelization of OWL Reasoning Thorsten Liebig, Andreas Steigmiller, and Olaf Noppens Institute for Artificial Intelligence, Ulm University 89069 Ulm, Germany firstname.lastname@uni-ulm.de

More information

Prolog. Intro to Logic Programming

Prolog. Intro to Logic Programming Prolog Logic programming (declarative) Goals and subgoals Prolog Syntax Database example rule order, subgoal order, argument invertibility, backtracking model of execution, negation by failure, variables

More information

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

Garbage Collection (2) Advanced Operating Systems Lecture 9

Garbage Collection (2) Advanced Operating Systems Lecture 9 Garbage Collection (2) Advanced Operating Systems Lecture 9 Lecture Outline Garbage collection Generational algorithms Incremental algorithms Real-time garbage collection Practical factors 2 Object Lifetimes

More information

Notes for Chapter 12 Logic Programming. The AI War Basic Concepts of Logic Programming Prolog Review questions

Notes for Chapter 12 Logic Programming. The AI War Basic Concepts of Logic Programming Prolog Review questions Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions The AI War How machines should learn: inductive or deductive? Deductive: Expert => rules =>

More information

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract

More information

1 Publishable Summary

1 Publishable Summary 1 Publishable Summary 1.1 VELOX Motivation and Goals The current trend in designing processors with multiple cores, where cores operate in parallel and each of them supports multiple threads, makes the

More information

Language and Compiler Support for Out-of-Core Irregular Applications on Distributed-Memory Multiprocessors

Language and Compiler Support for Out-of-Core Irregular Applications on Distributed-Memory Multiprocessors Language and Compiler Support for Out-of-Core Irregular Applications on Distributed-Memory Multiprocessors Peter Brezany 1, Alok Choudhary 2, and Minh Dang 1 1 Institute for Software Technology and Parallel

More information

What needs to be kept track of ffl The current substitution ffl The current goal ffl The clause currently in use ffl Backtrack Information 2

What needs to be kept track of ffl The current substitution ffl The current goal ffl The clause currently in use ffl Backtrack Information 2 Prolog Implementation ffl The Generic Idea: Everything except unification. ffl An Introduction to the WAM: Flat unification only. 1 What needs to be kept track of ffl The current substitution ffl The current

More information

Data Flow Graph Partitioning Schemes

Data Flow Graph Partitioning Schemes Data Flow Graph Partitioning Schemes Avanti Nadgir and Harshal Haridas Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802 Abstract: The

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/22891 holds various files of this Leiden University dissertation Author: Gouw, Stijn de Title: Combining monitoring with run-time assertion checking Issue

More information

Memory Space Representation for Heterogeneous Network Process Migration

Memory Space Representation for Heterogeneous Network Process Migration Memory Space Representation for Heterogeneous Network Process Migration Kasidit Chanchio Xian-He Sun Department of Computer Science Louisiana State University Baton Rouge, LA 70803-4020 sun@bit.csc.lsu.edu

More information

Improving the Practicality of Transactional Memory

Improving the Practicality of Transactional Memory Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter

More information

Topics on Compilers Spring Semester Christine Wagner 2011/04/13

Topics on Compilers Spring Semester Christine Wagner 2011/04/13 Topics on Compilers Spring Semester 2011 Christine Wagner 2011/04/13 Availability of multicore processors Parallelization of sequential programs for performance improvement Manual code parallelization:

More information

Functional Logic Programming. Kristjan Vedel

Functional Logic Programming. Kristjan Vedel Functional Logic Programming Kristjan Vedel Imperative vs Declarative Algorithm = Logic + Control Imperative How? Explicit Control Sequences of commands for the computer to execute Declarative What? Implicit

More information

Processing Interaction Protocols in Parallel: a Logic Programming implementation for Robotic Soccer

Processing Interaction Protocols in Parallel: a Logic Programming implementation for Robotic Soccer Processing Interaction Protocols in Parallel: a Logic Programming implementation for Robotic Soccer Mariano Tucat 1 Alejandro J. García 2 Artificial Intelligence Research and Development Laboratory Department

More information

Towards a High-Level Implementation of Flexible Parallelism Primitives for Symbolic Languages

Towards a High-Level Implementation of Flexible Parallelism Primitives for Symbolic Languages Towards a High-Level Implementation of Flexible Parallelism Primitives for Symbolic Languages Amadeo Casas 1 Manuel Carro 2 Manuel Hermenegildo 1,2 1 University of New Mexico (USA) 2 Technical University

More information

Parallel Execution of Prolog Programs: a Survey

Parallel Execution of Prolog Programs: a Survey Parallel Execution of Prolog Programs: a Survey GOPAL GUPTA University of Texas at Dallas ENRICO PONTELLI New Mexico State University KHAYRI A.M. ALI Swedish Institute of Computer Science MATS CARLSSON

More information

A Feasibility Study for Methods of Effective Memoization Optimization

A Feasibility Study for Methods of Effective Memoization Optimization A Feasibility Study for Methods of Effective Memoization Optimization Daniel Mock October 2018 Abstract Traditionally, memoization is a compiler optimization that is applied to regions of code with few

More information

A NETWORK OF COMMUNICATING LOGIC PROGRAMS AND ITS SEMANTICS. Susumu Yamasaki. Department of Information Science, Kyoto University Sakyo, Kyoto, Japan

A NETWORK OF COMMUNICATING LOGIC PROGRAMS AND ITS SEMANTICS. Susumu Yamasaki. Department of Information Science, Kyoto University Sakyo, Kyoto, Japan A NETWORK OF COMMUNICATING LOGIC PROGRAMS AND ITS SEMANTICS Susumu Yamasaki Department of Information Science, Kyoto University Sakyo, Kyoto, Japan ABSTRACT In this paper, a network of communicating loqic

More information

A High-Level Implementation of Non-Deterministic, Unrestricted, Independent And-Parallelism

A High-Level Implementation of Non-Deterministic, Unrestricted, Independent And-Parallelism A High-Level Implementation of Non-Deterministic, Unrestricted, Independent And-Parallelism Amadeo Casas 1 Manuel Carro 2 Manuel V. Hermenegildo 1,2 amadeo@cs.unm.edu mcarro@fi.upm.es herme@{fi.upm.es,cs.unm.edu}

More information

Lecture 4: Graph Algorithms

Lecture 4: Graph Algorithms Lecture 4: Graph Algorithms Definitions Undirected graph: G =(V, E) V finite set of vertices, E finite set of edges any edge e = (u,v) is an unordered pair Directed graph: edges are ordered pairs If e

More information

WACC Report. Zeshan Amjad, Rohan Padmanabhan, Rohan Pritchard, & Edward Stow

WACC Report. Zeshan Amjad, Rohan Padmanabhan, Rohan Pritchard, & Edward Stow WACC Report Zeshan Amjad, Rohan Padmanabhan, Rohan Pritchard, & Edward Stow 1 The Product Our compiler passes all of the supplied test cases, and over 60 additional test cases we wrote to cover areas (mostly

More information

Control Flow Analysis with SAT Solvers

Control Flow Analysis with SAT Solvers Control Flow Analysis with SAT Solvers Steven Lyde, Matthew Might University of Utah, Salt Lake City, Utah, USA Abstract. Control flow analyses statically determine the control flow of programs. This is

More information

Concurrent Table Accesses in Parallel Tabled Logic Programs

Concurrent Table Accesses in Parallel Tabled Logic Programs Concurrent Table Accesses in Parallel Tabled Logic Programs Ricardo Rocha 1, Fernando Silva 1,andVítor Santos Costa 2 1 DCC-FC & LIACC University of Porto, Portugal {ricroc,fds@ncc.up.pt 2 COPPE Systems

More information

On the Space-Time Trade-off in Solving Constraint Satisfaction Problems*

On the Space-Time Trade-off in Solving Constraint Satisfaction Problems* Appeared in Proc of the 14th Int l Joint Conf on Artificial Intelligence, 558-56, 1995 On the Space-Time Trade-off in Solving Constraint Satisfaction Problems* Roberto J Bayardo Jr and Daniel P Miranker

More information

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree. Group B Assignment 9 Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Code generation using DAG. 9.1.1 Problem Definition: Code generation using DAG / labeled tree. 9.1.2 Perquisite: Lex, Yacc,

More information

Using Prolog and CLP for. Designing a Prolog Processor

Using Prolog and CLP for. Designing a Prolog Processor Using Prolog and CLP for Designing a Prolog Processor Abstract This paper presents the LPCAD (Logic Programming-based system for CAD) system for designing complex microprogrammed processors using Prolog

More information

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures University of Virginia Dept. of Computer Science Technical Report #CS-2011-09 Jeremy W. Sheaffer and Kevin

More information

Register Allocation in a Prolog Machine

Register Allocation in a Prolog Machine Register Allocation in a Prolog Machine Saumya K. Debray Department of Computer Science State University of New York at Stony Brook Stony Brook, NY 11794 Abstract: We consider the Prolog Engine described

More information

Heckaton. SQL Server's Memory Optimized OLTP Engine

Heckaton. SQL Server's Memory Optimized OLTP Engine Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability

More information

Multiprocessor scheduling

Multiprocessor scheduling Chapter 10 Multiprocessor scheduling When a computer system contains multiple processors, a few new issues arise. Multiprocessor systems can be categorized into the following: Loosely coupled or distributed.

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Published in: Proceedings of the 2010 International Conference on Field-programmable

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols Preprint Incompatibility Dimensions and Integration of Atomic Protocols, Yousef J. Al-Houmaily, International Arab Journal of Information Technology, Vol. 5, No. 4, pp. 381-392, October 2008. Incompatibility

More information

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology exam Compiler Construction in4020 July 5, 2007 14.00-15.30 This exam (8 pages) consists of 60 True/False

More information

The Metalanguage λprolog and Its Implementation

The Metalanguage λprolog and Its Implementation The Metalanguage λprolog and Its Implementation Gopalan Nadathur Computer Science Department University of Minnesota (currently visiting INRIA and LIX) 1 The Role of Metalanguages Many computational tasks

More information

CIS 1.5 Course Objectives. a. Understand the concept of a program (i.e., a computer following a series of instructions)

CIS 1.5 Course Objectives. a. Understand the concept of a program (i.e., a computer following a series of instructions) By the end of this course, students should CIS 1.5 Course Objectives a. Understand the concept of a program (i.e., a computer following a series of instructions) b. Understand the concept of a variable

More information

A Fast Arc Consistency Algorithm for n-ary Constraints

A Fast Arc Consistency Algorithm for n-ary Constraints A Fast Arc Consistency Algorithm for n-ary Constraints Olivier Lhomme 1 and Jean-Charles Régin 2 1 ILOG, 1681, route des Dolines, 06560 Valbonne, FRANCE 2 Computing and Information Science, Cornell University,

More information

Multi-paradigm Declarative Languages

Multi-paradigm Declarative Languages Michael Hanus (CAU Kiel) Multi-paradigm Declarative Languages ICLP 2007 1 Multi-paradigm Declarative Languages Michael Hanus Christian-Albrechts-University of Kiel Programming Languages and Compiler Construction

More information

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine. 1. true / false By a compiler we mean a program that translates to code that will run natively on some machine. 2. true / false ML can be compiled. 3. true / false FORTRAN can reasonably be considered

More information

Combinational Equivalence Checking Using Incremental SAT Solving, Output Ordering, and Resets

Combinational Equivalence Checking Using Incremental SAT Solving, Output Ordering, and Resets ASP-DAC 2007 Yokohama Combinational Equivalence Checking Using Incremental SAT Solving, Output ing, and Resets Stefan Disch Christoph Scholl Outline Motivation Preliminaries Our Approach Output ing Heuristics

More information

A Framework for Optimizing IP over Ethernet Naming System

A Framework for Optimizing IP over Ethernet Naming System www.ijcsi.org 72 A Framework for Optimizing IP over Ethernet Naming System Waleed Kh. Alzubaidi 1, Dr. Longzheng Cai 2 and Shaymaa A. Alyawer 3 1 Information Technology Department University of Tun Abdul

More information

Chapter 9: Constraint Logic Programming

Chapter 9: Constraint Logic Programming 9. Constraint Logic Programming 9-1 Deductive Databases and Logic Programming (Winter 2007/2008) Chapter 9: Constraint Logic Programming Introduction, Examples Basic Query Evaluation Finite Domain Constraint

More information

CS229 Project: TLS, using Learning to Speculate

CS229 Project: TLS, using Learning to Speculate CS229 Project: TLS, using Learning to Speculate Jiwon Seo Dept. of Electrical Engineering jiwon@stanford.edu Sang Kyun Kim Dept. of Electrical Engineering skkim38@stanford.edu ABSTRACT We apply machine

More information

LINDA. The eval operation resembles out, except that it creates an active tuple. For example, if fcn is a function, then

LINDA. The eval operation resembles out, except that it creates an active tuple. For example, if fcn is a function, then LINDA Linda is different, which is why we've put it into a separate chapter. Linda is not a programming language, but a way of extending ( in principle ) any language to include parallelism IMP13, IMP14.

More information

QUTE: A PROLOG/LISP TYPE LANGUAGE FOR LOGIC PROGRAMMING

QUTE: A PROLOG/LISP TYPE LANGUAGE FOR LOGIC PROGRAMMING QUTE: A PROLOG/LISP TYPE LANGUAGE FOR LOGIC PROGRAMMING Masahiko Sato Takafumi Sakurai Department of Information Science, Faculty of Science University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113, JAPAN

More information

Runtime Checking for Program Verification Systems

Runtime Checking for Program Verification Systems Runtime Checking for Program Verification Systems Karen Zee, Viktor Kuncak, and Martin Rinard MIT CSAIL Tuesday, March 13, 2007 Workshop on Runtime Verification 1 Background Jahob program verification

More information

A Parallel Logic Programming Language for PEPSys

A Parallel Logic Programming Language for PEPSys A Parallel Logic Programming Language for PEPSys Michael Rate iffe and Jean-Claude Syre Abstract ECRC, Arabellastr. 1 This paper describes a new parallel Logic Programming language designed to exploit

More information

Parallel Processors. Session 1 Introduction

Parallel Processors. Session 1 Introduction Parallel Processors Session 1 Introduction Applications of Parallel Processors Structural Analysis Weather Forecasting Petroleum Exploration Fusion Energy Research Medical Diagnosis Aerodynamics Simulations

More information

A framework for network modeling in Prolog

A framework for network modeling in Prolog A framework for network modeling in Prolog Zdravko I. Markov Institute of Engineering Cybernetics and Robotics Bulgarian Academy of Sciences Acad.G.Bonchev str. bl.29a f 1113 Sofia, Bulgaria Abstract A

More information

Implementation of Process Networks in Java

Implementation of Process Networks in Java Implementation of Process Networks in Java Richard S, Stevens 1, Marlene Wan, Peggy Laramie, Thomas M. Parks, Edward A. Lee DRAFT: 10 July 1997 Abstract A process network, as described by G. Kahn, is a

More information

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

The Prolog to Mercury transition guide

The Prolog to Mercury transition guide The Prolog to Mercury transition guide Version 14.01.1 Thomas Conway Zoltan Somogyi Fergus Henderson Copyright c 1995 2014 The University of Melbourne. Permission is granted to make and distribute verbatim

More information

Object Query Standards by Andrew E. Wade, Ph.D.

Object Query Standards by Andrew E. Wade, Ph.D. Object Query Standards by Andrew E. Wade, Ph.D. ABSTRACT As object technology is adopted by software systems for analysis and design, language, GUI, and frameworks, the database community also is working

More information

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS Computer Science 14 (4) 2013 http://dx.doi.org/10.7494/csci.2013.14.4.679 Dominik Żurek Marcin Pietroń Maciej Wielgosz Kazimierz Wiatr THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT

More information

Computing Approximate Happens-Before Order with Static and Dynamic Analysis

Computing Approximate Happens-Before Order with Static and Dynamic Analysis Department of Distributed and Dependable Systems Technical report no. D3S-TR-2013-06 May 7, 2018 Computing Approximate Happens-Before Order with Static and Dynamic Analysis Pavel Parízek, Pavel Jančík

More information

Global Dataow Analysis of Prolog. Andreas Krall and Thomas Berger. Institut fur Computersprachen. Technische Universitat Wien. Argentinierstrae 8

Global Dataow Analysis of Prolog. Andreas Krall and Thomas Berger. Institut fur Computersprachen. Technische Universitat Wien. Argentinierstrae 8 The VAM AI { an Abstract Machine for Incremental Global Dataow Analysis of Prolog Andreas Krall and Thomas Berger Institut fur Computersprachen Technische Universitat Wien Argentinierstrae 8 A-1040 Wien

More information