DAOS Scalable And-Or Parallelism

Size: px
Start display at page:

Download "DAOS Scalable And-Or Parallelism"

Transcription

1 DAOS Scalable And-Or Parallelism Luís Fernando Castro 1,Vítor Santos Costa 2,Cláudio F.R. Geyer 1, Fernando Silva 2,Patrícia Kayser Vargas 1, and Manuel E. Correia 2 1 Universidade Federal do Rio Grande do Sul, Porto Alegre - RS, Brasil {lfcastro,geyer,kayser}@inf.ufrgs.br 2 Universidade do Porto, Porto, Portugal {vsc,fds,mcc}@ncc.up.pt Abstract. This paper presents DAOS, a model for exploitation of Andand Or-parallelism in logic programs. DAOS assumes a physically distributed memory environment and a logically shared address space. Exploiting both major forms of implicit parallelism should serve a broadest range of applications. Besides, a model that uses a distributed memory environment provides scalability and can be implemented over a computer network. However, distributed implementations of logic programs have to deal with communication overhead and inherent complexity of distributed memory managent. DAOS overcomes those problems through the use of a distributed shared memory layer to provide single-writer, multiple-readers sharing for the main execution stacks combined with explicit message passing for work distribution and management. Keywords: Parallel Logic Programming, And/Or Model, Scheduling, Distributed Shared Memory 1 Introduction Logic programs are amenable to the exploitation of two major forms of implicit parallelism: or- and and-parallelism. Or-parallelism (ORP) aims at exploring different alternatives to a goal in parallel and arises naturally in search problems. And-parallelism (ANDP) consists in the parallel execution of two or more goals that cooperate in determining the solutions to a query. One important form of and-parallelism is independent and-parallelism (IAP), where parallel goals do not share variables. This form of parallelism arises in divide-and-conquer algorithms. Another type is called dependent and-parallelism (DAP), that allows goals to share variables. This is common in consumer-producer applications. Parallel logic programming systems (PLPs) should support both ANDP and ORP due to serve the broadest range of applications and to become popular. In practice, exploiting just one form of implicit parallelism requires sophisticated system design and exploiting two distinct forms of parallelism is even harder. Shared memory PLPs have been the most successful and widespread so far. There are several examples of systems supporting full Prolog. Aurora [8] and Muse [1] are well-known ORP systems while &-Prolog, DASWAM [13], and &-ACE [9] are IAP systems that have been used to parallelise sizeable applications. Andorra-I [12] is a further example that supports both determinate P. Amestoy et al. (Eds.): Euro-Par 99, LNCS 1685, pp , c Springer-Verlag Berlin Heidelberg 1999

2 900 Luís Fernando Castro et al. and-parallelism and or-parallelism. Several distributed memory PLPs have been proposed. Some support pure IAP [15], or pure ORP [2]. The differences between shared and distributed memory machines have become less significant in the last few years due to distributed shared memory systems (DSMs) that gives a shared memory abstraction. Work on PLPs for hardware DSMs has given interesting results. Dorpp ORP system [14] achieved good performance of a DSM machine. More recently, Santos Costa et al [10] had analyzed Andorra-I system on a DASH-like simulated machine. Their numbers confirm that most read cache misses result from scheduling and from accessing the code area. Misses to the execution stacks (most part eviction or cold misses) varied from 8% on a ORP parallel application to 30% on an ANDP application. Only the ORP application has significant sharing misses for an execution stack, the choice-point stack (60%), because this stack is also used for scheduling. We argue that the previous analysis suggests a new approach to distributed PLPs. First, most large data structures in these systems are built in the execution stacks. We should take the best advantage of caches to reduce network traffic. In contrast, the previous studies show high rates of sharing misses in scheduler data-structures. This suggests using explicit messages for scheduling. DAOS: Distributing And/Or in Scalable machines is the first PLP model for distributed systems that supports these innovations. Its main contributions are: binding data representation: it is both simple to implement on PLPs and it naturally adapts to DSM techniques allowing the use of previously designed synchronization and scheduling algorithms; combined DSM and message passing techniques in the same framework: DAOS innovates over shared memory PLPs by explicitly addressing the distribution problems inherent to scalable machines. Next, some considerations about And/Or exploitation are presented. Then, the DAOS model is presented in Section3. Section 4 analyze how workers can be implemented in DAOS. Finally, there are some conclusions. 2 Exploiting And/Or Parallelism ORP and IAP are arguably two of the more interesting forms of parallelism available in Prolog programs. Several methods have been proposed to exploit And/Or parallelism in PLPs. This section will present the three main techniques to deal with IAP. Consider the following and-parallel query:?- a(x) & b(y). where & represents a parallel conjunction. Figure 1 shows that both goals have several solutions. Besides running the goals a(x) and b(y) in parallel, one needs to combine the solutions. One alternative is to use a special node, that maintains pointers to each solution for each goal. Solutions to the conjunction are obtained by calculating the cross-product between values for X and for Y. This approach is known as reuse as presented in Figure 1(a).

3 DAOS Scalable And-Or Parallelism 901 Recomputation-based models are based on having an Or search tree, such that and-parallel goals correspond to building bits of the search tree in advance. Thus, when one starts a(x) and b(x) in parallel, the solution to b(x) is a continuation branch for a(x), and is thus associated with a specific solution for a(x), as shown in Figure 1(b). These models are named recomputation-based because to exploit several alternatives for a(x), the solution for b(x) has to be recomputed for each alternative. Recomputation avoids cross-product node implementation overheads and symplifies Full Prolog semantic support. Reuse saves some effort in recomputing goal but Prolog programmers usually try to reduce search space [6]. a(x) a(x) b(y) a(x) & b(y) a(x) & b(y) X=a b(y) X=a X=b Y=c Y=f Y=c Y=c (a) (b) (c) Fig. 1. (a) Reuse (b) Recomputation (c) C-Tree The C-tree is shown in Figure 1(c). In the C-tree [6], whenever a worker picks an alternative from another process, it creates a copy of all parallel conjunctions found along the way and restarts work to the right. Traditionally, the C-tree has been associated with a notion of team, i.e, a group of workers. Normally, IAP work is explored inside a team and ORP work is executed between teams. DAOS gives freedom to system designer to decide if he or she wants to use or not teams. The use of teams has the advantage of simplifying scheduling allowing (re)use of IAP schedulers within teams and ORP scheduler between teams. This organization also simplifies the solution propagation in IAP through the possibility of use multicast inside a group. 3 DAOS: Distributed And-Or in Scalable System DAOS aims at two goals: improve efficiency over traditional distributed systems, and preserve Prolog semantics. It is a fundamental point in DAOS establish which data areas should be private to a worker, and which ones should be virtually shared. There two opposite approachs: (a) all stacks must be private as in distributed PLPs, or (b) all stacks must be shared through a DSM layer. The

4 902 Luís Fernando Castro et al. later option seems interesting because we could use a previous implementation to shared memory systems. However, this may be inefficient due to scheduling data structures. DAOS presents a intermediate solution: the major data-structures area will be logically shared and the work management areas will be private. 3.1 A Shared Address Space How to implement the virtually shared address space is one of the key aspects of DAOS. This shared space must contains the major data structures used in Prolog, such as all Prolog variables and compound terms. Or-parallelism exploitation in a shared memory space normally is done using a binding array (BA) based approach as in Aurora [8] and Andorra-I [12]. The original BA was designed for ORP and keeps a private slot for every variable in the current search-tree. This slot stores all conditional bindings made by a worker, instead of writing on the shared space. Accesses to other memory areas are read-only This gives a important single-writer, multiple-reader pattern for the shared memory. Unfortunately, the original BA design is not suitable to IAP, because the number of cells for each and-goal is not know beforehand. Management of slots between workers running in and-parallel becomes highly complex [6]. In DAOS, we propose to use the Sparse Binding Array (SBA) [11] to manage bindings to shared variables. The SBA addresses the memory management problems in traditional BAs by shadowing the whole shared stacks, that is, every cell in a shared stack has a private shadow in a worker or team s SBA. SBA was designed to organize workers in teams. Each team should share the same choicepoints and thus the same SBA. This approach is not a good one to DAOS since the SBA is write-intensive. So, we propose a different SBA solution: Each worker will have a private SBA. SBAs are synchronized, through the trail, both when sharing ORP and IAP work. 3.2 Sharing Work in DAOS In this section we present the shared and private areas in the Prolog execution environment. We follow the Warren Abstract Machine (WAM) [16] organisation as found in most current PLPs, where each worker has a set of stacks. Heap and Environment Stack support forward execution. Control Stack is used both for backtracking and for parallelism. Trail supports backtracking. Sparse Binding Array (SBA) provides access to private bindings. Last, Goal Stack has goals to be exploited in IAP. The Forward Execution Stacks The two largest stacks store terms and logical variables. Both have data structures that are virtually shared: The Environment stack stores environments, corresponding to the activation records of traditional imperative languages. The Global stack or Heap accommodates compound terms and the remaining logical variables. False sharing may happen in two situations [14] First, workers can make previous work public, and then continue execution on the same stack. In this

5 DAOS Scalable And-Or Parallelism 903 case, their next updates to the stack might be sent to the sharing workers. Such sharing updates can be treated by relaxed consistency techniques [3] which ensures that new updates from the worker will only be sent at synchronisation points. A second source of false sharing is backtracking. In general, when all alternatives have been exploited, both global and environment stack space may be reclaimed, and next reutilised to store new work. Updates to this space may then be sent to the workers which had originally shared the work, unless one uses an invalidate-based protocol. The Control Stack This stack stores Choice-points. Choice-point is a structure that includes pointers to the stacks and to the goal s arguments before creating the choice-point plus pointers to the available alternatives. Some ORP systems also store scheduling informations in choice-points while IAP systems extend Control stack to include parcall frames that describe the conjunctions to be executed in parallel. ORP systems had used Control stack to manage the available work. For instance, in the chat-80 ORP-only benchmark running under Aurora, about a third of the sharing misses originated from Control stack (the rest originated from the scheduler data structures). Although a similar study is not available for IAP systems, parcall-frames are expected to also be a source of sharing misses. The Trail Trail Stack stores all conditional bindings. In a sequential execution this is required to undo bindings to variables when backtracking. In a BA-based system, the Trail is a fundamental data-structure, as it is used to synchronise bindings between different branches. Since conditional bindings may not be placed directly on the stacks, the alternative is to store these bindings in the Trail. When workers fetch work, they read the bindings from the Trail and stored them in the SBA. So, the first operation to be performed in DAOS when sharing work is installing all conditional bindings in the SBA. This requires access to corresponding Trail section. Deciding whether to keep Trail under DSM or to use explicit distribution is a fundamental open question: All chunks of the Trail must be present in a worker before it can start work. This argues for sending the required trail segments immediately when we share choice-points or goals and thus for a fully-distributed solution. The Trail tends to be a relatively small stack. After an initial delay, one may take advantage of the sharing facilities in a DSM system to actually have the Trail segments before they are asked. IAP programs tend to require much larger stacks than ORP programs, as they perform much less backtracking, but they do tend to perform less conditional bindings. Trail segments are expected to grow larger. A final decision will depend on several factors, such as the efficiency of the DSM system and of the message passing implementation. In the IDAOS implementation, as discussed next, we have decided to initially follow a fully distributed implementation because trail copying can be naturally integrated with Control-stack copying.

6 904 Luís Fernando Castro et al. 4 IDAOS: A DAOS Implementation We have so far discussed the DAOS model. We now concentrate on the design of our DAOS prototype, IDAOS (an Implementation of DAOS). In our design, each IDAOS processor or worker consists of three modules that are implemented as threads. The Engine is responsible for the execution of Prolog code. Most of the execution time will be spent in this code which should have performance close to good sequential implementations. We base our work on Diaz and Codognet s wamcc [4] which has performance close to the best Prolog implementations currently available. The Work-dispatcher module controls the exportation of both and- and or-work. This module and the Engine have exclusive access to the Control Stack, Trail, and Goal stack. The Memory-Manager module controls the major execution stacks, namely the Environment stack and the Heap through a page-based software DSM. Engine Work Dispatching Memory Management Shared Stack Heap Control Stack Goal Stack Private SBA Trail Components Data Areas Fig. 2. Worker Organisation IDAOS uses MPI to implement message passing and the commercial software TreadMarks [3] for DSM. Having both a message passing and DSM mechanisms creates an interesting problem: both Treadmarks and MPI want to initialise the distributed processors. One solution is to give management to TreadMarks and use dynamic process functionality from current MPI implementations (that will be standardised in MPI-2). Next main issues in the implementation of the IDAOS Engine, and in section 4.2 issues about implementation of Trail and Control stacks are presented. 4.1 And/Or Support in Engine Module As we have explained, the Engine thread implements an SBA based abstract machine. It combines the use of SBA [5] to deal with or-parallelism, with mechanisms based on Hermenegildo s RAP-WAM [7] to deal with and-parallelism. We had to perform major changes to the wamcc to support both IAP and ORP. Regarding ORP, the major changes in wamcc are as follows:

7 DAOS Scalable And-Or Parallelism 905 conditional bindings must be stored and read from the SBA. We have found that this has an impact of from 19% up to 38% on the execution time on a Pentium-II 333MHz PC with wamcc-2.2, depending on the benchmark. in ORP the trail is used to both install and deinstall bindings. Thus, each trail entry receives an extra field containing the new value to be marked. We found the overhead to vary from 2% up to 10% on the same machine. Last, choice-points must include new fields, to support sharing or-work. Supporting IAP requires supporting an extensive set of new data-structures[7]: the parcall-frame (representing a parallel conjunction), a goal-slot (representing a sub-goal in the conjunction), a goal-stack, and markers (to perform stack management). The data structures must handle three forms of execution: forward execution; inside backtracking, that is backtracking within the parallel conjunction; and outside backtracking, that is, backtracking caused by a goal in the continuation that failed. Last, the compiler must also support the sequential conjunction. Our design tries to follow closely Hermenegildo s, and more recent implementations such as Shen s DASWAM [13], Pontelli and Gupta s &-ACE [9], and Correia s SBA [5]. We discuss in detail some issues specifically important for IDAOS. First, in IDAOS there is not a shared and-tree of goals, rooted at the parcallframe. Goals are instead sent to an external worker, that executes in its stacks. The parcall-frame therefore must contain information on which processors are executing a goal, not just direct links. Second, on receiving a goal, the receiver will not have direct access to the parent s parcall-frame, as it is stored in the sender. This is a problem because traditional implementations of IAP use markers to mark the new space being allocated. These markers are linked to the goal slot, and from there to the parcall frame, resulting in an involved chain of pointers in shared memory 1.InIDAOS a new goal is started with a starting choice-point (SCP). The SCP marks the stacks allocated for this goal, and thus fulfills the task of the markers. The only alternative in SCP is the code to be executed when backtracking out of the task. When completing the task, we install a final choice-point (FCP). TheFCP stores the final values for the stacks, points to the SCP, and its alternative is the code to be executed when backtracking to the task. A further advantage of using choice-points is that all memory management is now performed through choicepoints, simplifying integration with ORP. Memory allocation in this model will be based in segments to implement the so-called cactus stack. The idea is that all segments, for ORP or for IAP, start with a choice-point and can be allocated and recovered using the same techniques. 1 This could be a good argument for storing the Control stack in the DSM, but this process is closely related to scheduling and, thus, should be made explicit.

8 906 Luís Fernando Castro et al. 4.2 Trail and Control Management A key data-structure in IDAOS is the Trail. It is used both to propagate conditional bindings during exportation of and- and or-work, and to return conditional bindings performed during remote execution of and-goals: Whenever a team imports or-work, workers have to move up or down in the search tree. This is performed by copying the trail from the exporter and installing the bindings in their SBAs. When a worker receives an and-goal to execute, it starts its execution in its own memory address space. Any conditional bindings, or any bindings to variables created prior to the and-parallel conjunction, is stored in the SBA, and trailed. At the end of execution of the goal, the worker returns its trail to the exporter. TS0 Desc TS1 TS2 Previous Segment Segment Start Sharing Bitmap Desc Desc Fig. 3. Trail Segments To support And/Or parallelism the trail needs to be segmented. The trail is physically divided into trail segments (TSs) which corresponds to a contiguous computation. In PLPs the Trail forms a tree: each TS is a children from the one it got work. Each TS starts with a descriptor, followed by a sequence of bindings, and terminates with a special parent pointer, that points to where previous bindings are stored. Figure 3 shows a situation where trail segment TS0 corresponds to several choicepoints generated in a row. TS1 and TS2 correspond to new segments that are rooted in TS0, but the computation for TS3 starts from an older choicepoint than TS1 (we assume the Trail grows downwards). Each trail segment descriptor contains a pointer to the start of the segment, a direct pointer to the ancestor node, and a bitmap indicating which nodes already have this segment. Copying the trail implies start copying from the segment of the goal or choice-point being exported and then follow to the root until a segment that has already been sent is found. MPI s buffering can be used to send all the TSs in a single message.

9 DAOS Scalable And-Or Parallelism 907 Note that in this case we are implementing our own DSM mechanism. So, coherence problem must be treated. There are several solutions to this problem: keep the bitmaps or the whole Trail in a DSM area; use broadcast for trail segments; or simply ignore the problem and accept duplicated broadcasts of TSs. Lack of space prevents us from discussing the issues here in detail, but in IDAOS we are using the ostrich algorithm. Performance evaluation will then tell us if more sophisticated solutions are required. Other important stack in IDAOS is the Control Stack. Our principle in designing DAOS was to use scheduling to avoid unnecessary pressure over the DSM subsystem. To reduce sharing in this area, one solution would be to split the Control stack into a choice-point stack that would be under the DSM, and in a Control stack that would be managed by the message-passing system. For simplicity reasons, we will favour a simpler solution: maintaining the stack as fully distributed. The owner of a parcall-frame or of a choice-point to be the one that can directly access the data-structure, and that accesses from other workers will be performed through the communication protocol. 5 Conclusions We have proposed a scheme for Distributed And/Or execution in Scalable systems, DAOS. The model innovates by taking advantage of recent work in DSM technology to obtain efficient sharing, and efficiently supporting both IAP and ORP in a distributed environment. We have found that the DSM mechanism is quite effective in simplifying our design allowing us to focus on the issues that we believe have a major impact in performance. Work in the IDAOS prototype is progressing at Universidade Federal do Rio Grande do Sul (UFRGS), Federal do Rio de Janeiro (UFRJ) and Porto (UP). Our target is a network of workstations connected by a fast network, such as Myrinet or Fast Ethernet. Both the UFRGS and the UP groups have access to such networks. Changes required to support IAP and ORP have already been included to the base system wamcc, and work will next move on to experimenting with the distributed platform. We expect that after implementing the message mechanism on top of MPI most of the work will move on to scheduler design, as it is traditional in parallel logic programming systems. Acknowledgments We would like to acknowledge Inês de Castro Dutra, Ricardo Bianchini, Gopal Gupta, Enrico Pontelli, Cristiano Costa, and Kish Shen for their contribution and influence. This work has been partially supported by the CNPq/ProTem- CC project Appelo and by funds granted to LIACC through the Programa de Financiamento Plurianual, Fundação para a Ciência e Tecnologia and Programa PRAXIS.

10 908 Luís Fernando Castro et al. References [1] K. A. M. Ali and R. Karlsson. The Muse Or-parallel Prolog Model and its Performance. In Proceedings of the North American Conference on Logic Programming, pages MIT Press, October [2] J. Briat, M. Favre, C. Geyer, and J. Chassin. Scheduling of or-parallel Prolog on a scaleable, reconfigurable, distributed-memory multiprocessor. In Proceedings of Parallel Architecture and Languages Europe. Springer Verlag, [3] C. Amza et al. TreadMarks: Shared memory computing on networks of workstations. IEEE Computer, 19(2):18 28, February [4] P. Codognet and D. Diaz. wamcc: Compiling Prolog to C. In 12th International Conference on Logic Programming. The MIT Press, [5] M. E. Correia, F. M. A. Silva, and V. Santos Costa. The SBA: Exploiting orthogonality in OR-AND Parallel Systems. In Proceedings of the 1997 International Logic Programming Symposium, October [6] G. Gupta, M. Hermenegildo, and V. Santos Costa. And-Or Parallel Prolog: A Recomputation based Approach. New Generation Computing, 11(3,4): , [7] M. V. Hermenegildo. An Abstract Machine for Restricted And-Parallel Execution of Logic Programs. In E. Shapiro, editor, Third International Conference on Logic Programming, London, pages Springer-Verlag, July [8] E.Lusk,R.Butler,T.Disz,R.Olson,R.Overbeek,R.Stevens,D.H.D.Warren, A. Calderwood, P. Szeredi, S. Haridi, P. Brand, M. Carlsson, A. Ciepelewski, and B. Hausman. The Aurora or-parallel Prolog system. In International Conference on Fifth Generation Computer Systems 1988, pages ICOT, Tokyo, Japan, Nov [9] E. Pontelli, G. Gupta, M. Hermenegildo, M. Carro, and D. Tang. Efficient Implementation of And-Parallel Logic Programming Systems. Computer Languages, 22(2/3), [10] V. Santos Costa, R. Bianchini, and I. C. Dutra. Parallel Logic Programming Systems on Scalable Multiprocessors. In Proceedings of the 2nd International Symposium on Parallel Symbolic Computation, PASCO 97, pages 58 67, July [11] V. Santos Costa, M. E. Correia, and F. Silva. Performance of Sparse Binding Arrays for Or-Parallelism. In Proceedings of the VIII Brazilian Symposium on Computer Architecture and High Performance Processing SBAC-PAD, August [12] V. Santos Costa, D. H. D. Warren, and R. Yang. Andorra-I: A Parallel Prolog System that Transparently Exploits both And- and Or-Parallelism. In Third ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming PPOPP, pages ACM press, April SIGPLAN Notices vol 26(7), July [13] K. Shen. Initial Results from the Parallel Implementation of DASWAM. In M. Maher, editor, Proceedings of the 1996 Joint International Conference and Symposium on Logic Programming. The MIT Press, [14] F. M. A. Silva. An Implementation of Or-Parallel Prolog on a Distributed Shared Memory Architecture. PhD thesis, Dept. of Computer Science, Univ. of Manchester, September [15] A. R. Verden and H. Glaser. Independent And-Parallel Prolog for Distributed Memory Architectures. Technical report, Department of Electronics and Computer Science, University of Southampton, Apr [16] D. H. D. Warren. An Abstract Prolog Instruction Set. Technical Note 309, SRI International, 1983.

On the BEAM Implementation

On the BEAM Implementation On the BEAM Implementation Ricardo Lopes 1,Vítor Santos Costa 2, and Fernando Silva 1 1 DCC-FC and LIACC, Universidade do Porto, Portugal {rslopes,fds}@ncc.up.pt 2 COPPE-Sistemas, Universidade Federal

More information

An Or-Parallel Prolog Execution Model for Clusters of Multicores

An Or-Parallel Prolog Execution Model for Clusters of Multicores An Or-Parallel Prolog Execution Model for Clusters of Multicores João Santos and Ricardo Rocha CRACS & INESC TEC and Faculty of Sciences, University of Porto Rua do Campo Alegre, 1021, 4169-007 Porto,

More information

Parallel Execution of Logic Programs: Back to the Future (??)

Parallel Execution of Logic Programs: Back to the Future (??) Parallel Execution of Logic Programs: Back to the Future (??) Enrico Pontelli Dept. Computer Science Overview 1. Some motivations [Logic Programming and Parallelism] 2. The Past [Types of Parallelism,

More information

Concurrent Table Accesses in Parallel Tabled Logic Programs

Concurrent Table Accesses in Parallel Tabled Logic Programs Concurrent Table Accesses in Parallel Tabled Logic Programs Ricardo Rocha 1, Fernando Silva 1,andVítor Santos Costa 2 1 DCC-FC & LIACC University of Porto, Portugal {ricroc,fds@ncc.up.pt 2 COPPE Systems

More information

Or-Parallel Scheduling Strategies Revisited

Or-Parallel Scheduling Strategies Revisited Or-Parallel Scheduling Strategies Revisited Inês de Castro Dutra ½ and Adriana Marino Carrusca ¾ ½ COPPE/Departamento de Engenharia de Sistemas e Computação Universidade Federal do Rio de Janeiro, Brasil

More information

facultad de informatica universidad politecnica de madrid

facultad de informatica universidad politecnica de madrid facultad de informatica universidad politecnica de madrid A Simulation Study on Parallel Backtracking with Solution Memoing for Independent And-Parallelism Pablo Chico de Guzman Amadeo Casas Manuel Carro

More information

Thread-Based Competitive Or-Parallelism

Thread-Based Competitive Or-Parallelism Thread-Based Competitive Or-Parallelism Paulo Moura 1,3, Ricardo Rocha 2,3, and Sara C. Madeira 1,4 1 Dep. of Computer Science, University of Beira Interior, Portugal {pmoura, smadeira}@di.ubi.pt 2 Dep.

More information

Relating Data Parallelism and (And ) Parallelism in Logic Programs

Relating Data Parallelism and (And ) Parallelism in Logic Programs Published in Proceedings of EURO PAR 95, Sweden Relating Data Parallelism and (And ) Parallelism in Logic Programs Manuel V. Hermenegildo and Manuel Carro Universidad Politécnica de Madrid Facultad de

More information

A High-Level Implementation of Non-Deterministic, Unrestricted, Independent And-Parallelism

A High-Level Implementation of Non-Deterministic, Unrestricted, Independent And-Parallelism A High-Level Implementation of Non-Deterministic, Unrestricted, Independent And-Parallelism Amadeo Casas 1 Manuel Carro 2 Manuel V. Hermenegildo 1,2 amadeo@cs.unm.edu mcarro@fi.upm.es herme@{fi.upm.es,cs.unm.edu}

More information

PALS: Efficient Or-Parallel Execution of Prolog on Beowulf Clusters

PALS: Efficient Or-Parallel Execution of Prolog on Beowulf Clusters PALS: Efficient Or-Parallel Execution of Prolog on Beowulf Clusters K. Villaverde and E. Pontelli H. Guo G. Gupta Dept. Computer Science Dept. Computer Science Dept. Computer Science New Mexico State University

More information

And-Or Parallel Prolog: A Recomputation Based Approachf

And-Or Parallel Prolog: A Recomputation Based Approachf And-Or Parallel Prolog: A Recomputation Based Approachf Gopal Gupta Department of Computer Science Box 30001, Dept. CS, New México State University Las Cruces, NM 88003, USA guptaonmsu.edu Manuel V. Hermenegildo

More information

Concurrent Programming Constructs and First-Class Logic Engines

Concurrent Programming Constructs and First-Class Logic Engines Concurrent Programming Constructs and First-Class Logic Engines Paul Tarau University of North Texas tarau@cs.unt.edu Multi-threading has been adopted in today s Prolog implementations as it became widely

More information

IDRA (IDeal Resource Allocation): Computing Ideal Speedups in Parallel Logic Programming

IDRA (IDeal Resource Allocation): Computing Ideal Speedups in Parallel Logic Programming IDRA (IDeal Resource Allocation): Computing Ideal Speedups in Parallel Logic Programming M.J. Fernández M. Carro M. Hermenegildo {mjf, mcarro, herme}@dia.fi.upm.es School of Computer Science Technical

More information

Pruning in the Extended Andorra Model

Pruning in the Extended Andorra Model Pruning in the Extended ndorra Model Ricardo Lopes 1,Vítor Santos Costa 2, and Fernando Silva 1 1 DCC-FC & LICC, University of Porto Rua do Campo legre, 823, 4150-180 Porto, Portugal Tel. +351 226078830,

More information

Efficient Instance Retrieval of Subgoals for Subsumptive Tabled Evaluation of Logic Programs arxiv: v1 [cs.pl] 27 Jul 2011

Efficient Instance Retrieval of Subgoals for Subsumptive Tabled Evaluation of Logic Programs arxiv: v1 [cs.pl] 27 Jul 2011 Under consideration for publication in Theory and Practice of Logic Programming Efficient Instance Retrieval of Subgoals for Subsumptive Tabled Evaluation of Logic Programs arxiv:07.5556v [cs.pl] 7 Jul

More information

Global Storing Mechanisms for Tabled Evaluation

Global Storing Mechanisms for Tabled Evaluation Global Storing Mechanisms for Tabled Evaluation Jorge Costa and Ricardo Rocha DCC-FC & CRACS University of Porto, Portugal c060700@alunos.dcc.fc.up.pt ricroc@dcc.fc.up.pt Abstract. Arguably, the most successful

More information

Towards a High-Level Implementation of Flexible Parallelism Primitives for Symbolic Languages

Towards a High-Level Implementation of Flexible Parallelism Primitives for Symbolic Languages Towards a High-Level Implementation of Flexible Parallelism Primitives for Symbolic Languages Amadeo Casas 1 Manuel Carro 2 Manuel Hermenegildo 1,2 1 University of New Mexico (USA) 2 Technical University

More information

Implementação de Linguagens 2016/2017

Implementação de Linguagens 2016/2017 Implementação de Linguagens Ricardo Rocha DCC-FCUP, Universidade do Porto ricroc @ dcc.fc.up.pt Ricardo Rocha DCC-FCUP 1 Logic Programming Logic programming languages, together with functional programming

More information

An Object Model for Multiparadigm

An Object Model for Multiparadigm 1 of 7 03/02/2007 15:37 http://www.dmst.aueb.gr/dds/pubs/conf/1994-oopsla-multipar/html/mlom.html This is an HTML rendering of a working paper draft that led to a publication. The publication should always

More information

Towards a High-Level Implementation of Execution Primitives for Unrestricted, Independent And-parallelism

Towards a High-Level Implementation of Execution Primitives for Unrestricted, Independent And-parallelism Towards a High-Level Implementation of Execution Primitives for Unrestricted, Independent And-parallelism Amadeo Casas 1 Manuel Carro 2 Manuel V. Hermenegildo 1,2 {amadeo, herme}@cs.unm.edu {mcarro, herme}@fi.upm.es

More information

OASys: An AND/OR Parallel Logic Programming System

OASys: An AND/OR Parallel Logic Programming System OASys: An AND/OR Parallel Logic Programming System I.Vlahavas (1), P.Kefalas (2) and C.Halatsis (3) (1) Dept. of Informatics, Aristotle Univ. of Thessaloniki, 54006 Thessaloniki, Greece Fax:+3031-998419,

More information

Selection-based Weak Sequential Consistency Models for. for Distributed Shared Memory.

Selection-based Weak Sequential Consistency Models for. for Distributed Shared Memory. Selection-based Weak Sequential Consistency Models for Distributed Shared Memory Z. Huang, C. Sun, and M. Purvis Departments of Computer & Information Science University of Otago, Dunedin, New Zealand

More information

Last Parallel Call Optimization and Fast Backtracking. in And-parallel Logic Programming Systems. 1 Introduction

Last Parallel Call Optimization and Fast Backtracking. in And-parallel Logic Programming Systems. 1 Introduction Last Parallel Call Optimization and Fast Backtracking in And-parallel Logic Programming Systems Tang DongXing, Enrico Pontelli Gopal Gupta Laboratory for Logic and Databases Dept of Computer Science New

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access

A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access Philip W. Howard, Josh Triplett, and Jonathan Walpole Portland State University Abstract. This paper explores the

More information

Complete and Efficient Methods for Supporting Side-effects in Independent/Restricted And-parallelism

Complete and Efficient Methods for Supporting Side-effects in Independent/Restricted And-parallelism (Proc. 1989 Int l. Conf. on Logic Programming, MIT Press) Complete and Efficient Methods for Supporting Side-effects in Independent/Restricted And-parallelism K. Muthukumar M. Hermenegildo MCC and University

More information

Implementação de Linguagens

Implementação de Linguagens Implementação de Linguagens de Programação Lógica Extended Andorra Model Ricardo Lopes rslopes@ncc.up.pt DCC-FCUP Tópicos Avançados de Informática Mestrado em Informática 2005/06 The Andorra Principle

More information

CSE 513: Distributed Systems (Distributed Shared Memory)

CSE 513: Distributed Systems (Distributed Shared Memory) CSE 513: Distributed Systems (Distributed Shared Memory) Guohong Cao Department of Computer & Engineering 310 Pond Lab gcao@cse.psu.edu Distributed Shared Memory (DSM) Traditionally, distributed computing

More information

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function Chen-Ting Chang, Yu-Sheng Chen, I-Wei Wu, and Jyh-Jiun Shann Dept. of Computer Science, National Chiao

More information

C++ Idioms for Concurrent Operations

C++ Idioms for Concurrent Operations C++ Idioms for Concurrent Operations Fernando Náufel do Amaral, Patricia Garcês Rabelo, Sergio E. R. de Carvalho * Laboratório de Métodos Formais, Departamento de Informática Pontifícia Universidade Católica

More information

HISTORICAL BACKGROUND

HISTORICAL BACKGROUND VALID-TIME INDEXING Mirella M. Moro Universidade Federal do Rio Grande do Sul Porto Alegre, RS, Brazil http://www.inf.ufrgs.br/~mirella/ Vassilis J. Tsotras University of California, Riverside Riverside,

More information

Coping with Conflicts in an Optimistically Replicated File System

Coping with Conflicts in an Optimistically Replicated File System Coping with Conflicts in an Optimistically Replicated File System Puneet Kumar School of Computer Science Carnegie Mellon University 1. Introduction Coda is a scalable distributed Unix file system that

More information

Incremental Copying Garbage Collection for WAM-based Prolog systems

Incremental Copying Garbage Collection for WAM-based Prolog systems Incremental Copying Garbage Collection for WAM-based Prolog systems Ruben Vandeginste Bart Demoen Department of Computer Science, Katholieke Universiteit Leuven, Belgium {ruben,bmd}@cs.kuleuven.ac.be Abstract

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Producing EAM code from the WAM

Producing EAM code from the WAM Producing EAM code from the WAM Paulo André 1 and Salvador Abreu 1 Departamento de Informática, Universidade de Évora and CENTRIA FCT/UNL, Portugal {prla,spa}@di.uevora.pt Abstract. Logic programming provides

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM UNIT III MULTIPROCESSORS AND THREAD LEVEL PARALLELISM 1. Symmetric Shared Memory Architectures: The Symmetric Shared Memory Architecture consists of several processors with a single physical memory shared

More information

Adaptive Prefetching Technique for Shared Virtual Memory

Adaptive Prefetching Technique for Shared Virtual Memory Adaptive Prefetching Technique for Shared Virtual Memory Sang-Kwon Lee Hee-Chul Yun Joonwon Lee Seungryoul Maeng Computer Architecture Laboratory Korea Advanced Institute of Science and Technology 373-1

More information

An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language

An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language Martin C. Rinard (martin@cs.ucsb.edu) Department of Computer Science University

More information

Page 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence

Page 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence SMP Review Multiprocessors Today s topics: SMP cache coherence general cache coherence issues snooping protocols Improved interaction lots of questions warning I m going to wait for answers granted it

More information

Towards CIAO-Prolog - A Parallel Concurrent Constraint System

Towards CIAO-Prolog - A Parallel Concurrent Constraint System Towards CIAO-Prolog - A Parallel Concurrent Constraint System M. Hermenegildo Facultad de Informática Universidad Politécnica de Madrid (UPM) 28660-Boadilla del Monte, Madrid, Spain herme@fi.upm.es 1 Introduction

More information

The BEAM: Towards a rst EAM Implementation. Ricardo Lopes, Vtor Santos Costa. LIACC, Universidade do Porto,

The BEAM: Towards a rst EAM Implementation. Ricardo Lopes, Vtor Santos Costa. LIACC, Universidade do Porto, The BEAM: Towards a rst EAM Implementation Ricardo Lopes, Vtor Santos Costa frslopes,vscg@ncc.up.pt LIACC, Universidade do Porto, Rua do Campo Alegre, 823, 4150 Porto, Portugal September 10, 1997 Abstract

More information

Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems. Robert Grimm University of Washington

Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems. Robert Grimm University of Washington Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems Robert Grimm University of Washington Extensions Added to running system Interact through low-latency interfaces Form

More information

A Feasibility Study for Methods of Effective Memoization Optimization

A Feasibility Study for Methods of Effective Memoization Optimization A Feasibility Study for Methods of Effective Memoization Optimization Daniel Mock October 2018 Abstract Traditionally, memoization is a compiler optimization that is applied to regions of code with few

More information

BioTechnology. An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 15 2014 BioTechnology An Indian Journal FULL PAPER BTAIJ, 10(15), 2014 [8768-8774] The Java virtual machine in a thread migration of

More information

Specialization-based parallel Processing without Memo-trees

Specialization-based parallel Processing without Memo-trees Specialization-based parallel Processing without Memo-trees Hidemi Ogasawara, Kiyoshi Akama, and Hiroshi Mabuchi Abstract The purpose of this paper is to propose a framework for constructing correct parallel

More information

On the Design and Implementation of a Portable DSM System for Low-Cost Multicomputers

On the Design and Implementation of a Portable DSM System for Low-Cost Multicomputers On the Design and Implementation of a Portable DSM System for Low-Cost Multicomputers Federico Meza, Alvaro E. Campos, and Cristian Ruz Departamento de Ciencia de la Computación Pontificia Universidad

More information

Optimizing Closures in O(0) time

Optimizing Closures in O(0) time Optimizing Closures in O(0 time Andrew W. Keep Cisco Systems, Inc. Indiana Univeristy akeep@cisco.com Alex Hearn Indiana University adhearn@cs.indiana.edu R. Kent Dybvig Cisco Systems, Inc. Indiana University

More information

A design and implementation of the Extended Andorra Model 1

A design and implementation of the Extended Andorra Model 1 TLP: page 1 of 42 C Cambridge University Press 2011 doi:10.1017/s1471068411000068 1 design and implementation of the Extended ndorra Model 1 RICRDO LOPES, VÍTOR SNTOS COST and FERNNDO SILV CRCS-INESC Porto

More information

Chap. 4 Multiprocessors and Thread-Level Parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,

More information

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor

More information

Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers

Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers Henrik Löf, Markus Nordén, and Sverker Holmgren Uppsala University, Department of Information Technology P.O. Box

More information

Windows 7 Overview. Windows 7. Objectives. The History of Windows. CS140M Fall Lake 1

Windows 7 Overview. Windows 7. Objectives. The History of Windows. CS140M Fall Lake 1 Windows 7 Overview Windows 7 Overview By Al Lake History Design Principles System Components Environmental Subsystems File system Networking Programmer Interface Lake 2 Objectives To explore the principles

More information

Lightweight Remote Procedure Call. Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat

Lightweight Remote Procedure Call. Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat Outline Introduction RPC refresher Monolithic OS vs. micro-kernel

More information

The 2006 Federated Logic Conference. CICLOPS 2006: Colloquium on Implementation of Constraint LOgic Programming Systems

The 2006 Federated Logic Conference. CICLOPS 2006: Colloquium on Implementation of Constraint LOgic Programming Systems The 2006 Federated Logic Conference The Seattle Sheraton Hotel and Towers Seattle, Washington August 10-22, 2006 ICLP 06 Workshop CICLOPS 2006: Colloquium on Implementation of Constraint LOgic Programming

More information

Chapter 7 The Potential of Special-Purpose Hardware

Chapter 7 The Potential of Special-Purpose Hardware Chapter 7 The Potential of Special-Purpose Hardware The preceding chapters have described various implementation methods and performance data for TIGRE. This chapter uses those data points to propose architecture

More information

Annotation Algorithms for Unrestricted Independent And-Parallelism in Logic Programs

Annotation Algorithms for Unrestricted Independent And-Parallelism in Logic Programs Annotation Algorithms for Unrestricted Indepent And-Parallelism in Logic Programs Amadeo Casas, 1 Manuel Carro, 2 and Manuel V. Hermenegildo 1,2 {amadeo, herme}@cs.unm.edu {mcarro, herme}@fi.upm.es 1 Depts.

More information

Incremental Flow Analysis. Andreas Krall and Thomas Berger. Institut fur Computersprachen. Technische Universitat Wien. Argentinierstrae 8

Incremental Flow Analysis. Andreas Krall and Thomas Berger. Institut fur Computersprachen. Technische Universitat Wien. Argentinierstrae 8 Incremental Flow Analysis Andreas Krall and Thomas Berger Institut fur Computersprachen Technische Universitat Wien Argentinierstrae 8 A-1040 Wien fandi,tbg@mips.complang.tuwien.ac.at Abstract Abstract

More information

The basic operations defined on a symbol table include: free to remove all entries and free the storage of a symbol table

The basic operations defined on a symbol table include: free to remove all entries and free the storage of a symbol table SYMBOL TABLE: A symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier in a program's source code is associated with information relating

More information

RTI Performance on Shared Memory and Message Passing Architectures

RTI Performance on Shared Memory and Message Passing Architectures RTI Performance on Shared Memory and Message Passing Architectures Steve L. Ferenci Richard Fujimoto, PhD College Of Computing Georgia Institute of Technology Atlanta, GA 3332-28 {ferenci,fujimoto}@cc.gatech.edu

More information

DPC++: Object-Oriented Programming Applied to Cluster Computing

DPC++: Object-Oriented Programming Applied to Cluster Computing DPC++: Object-Oriented Programming Applied to Cluster Computing André Silveira Rafael Ávila Marcos Barreto Philippe Navaux Institute of Informatics Federal University of Rio Grande do Sul Porto Alegre,

More information

Properties Preservation in Distributed Execution of Petri Nets Models

Properties Preservation in Distributed Execution of Petri Nets Models Properties Preservation in Distributed Execution of Petri Nets Models Anikó Costa 1, Paulo Barbosa 2, Luís Gomes 1, Franklin Ramalho 2, Jorge Figueiredo 2, and Antônio Junior 2 1 Universidade Nova de Lisboa,

More information

Accelerated Library Framework for Hybrid-x86

Accelerated Library Framework for Hybrid-x86 Software Development Kit for Multicore Acceleration Version 3.0 Accelerated Library Framework for Hybrid-x86 Programmer s Guide and API Reference Version 1.0 DRAFT SC33-8406-00 Software Development Kit

More information

Offloading Java to Graphics Processors

Offloading Java to Graphics Processors Offloading Java to Graphics Processors Peter Calvert (prc33@cam.ac.uk) University of Cambridge, Computer Laboratory Abstract Massively-parallel graphics processors have the potential to offer high performance

More information

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 9: Performance Issues in Shared Memory. The Lecture Contains:

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 9: Performance Issues in Shared Memory. The Lecture Contains: The Lecture Contains: Data Access and Communication Data Access Artifactual Comm. Capacity Problem Temporal Locality Spatial Locality 2D to 4D Conversion Transfer Granularity Worse: False Sharing Contention

More information

CH : 15 LOCAL AREA NETWORK OVERVIEW

CH : 15 LOCAL AREA NETWORK OVERVIEW CH : 15 LOCAL AREA NETWORK OVERVIEW P. 447 LAN (Local Area Network) A LAN consists of a shared transmission medium and a set of hardware and software for interfacing devices to the medium and regulating

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

OpenMPI OpenMP like tool for easy programming in MPI

OpenMPI OpenMP like tool for easy programming in MPI OpenMPI OpenMP like tool for easy programming in MPI Taisuke Boku 1, Mitsuhisa Sato 1, Masazumi Matsubara 2, Daisuke Takahashi 1 1 Graduate School of Systems and Information Engineering, University of

More information

GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs

GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs Authors: Jos e L. Abell an, Juan Fern andez and Manuel E. Acacio Presenter: Guoliang Liu Outline Introduction Motivation Background

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

Security for Multithreaded Programs under Cooperative Scheduling

Security for Multithreaded Programs under Cooperative Scheduling Security for Multithreaded Programs under Cooperative Scheduling Alejandro Russo and Andrei Sabelfeld Dept. of Computer Science and Engineering, Chalmers University of Technology 412 96 Göteborg, Sweden,

More information

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads Operating Systems 2 nd semester 2016/2017 Chapter 4: Threads Mohamed B. Abubaker Palestine Technical College Deir El-Balah Note: Adapted from the resources of textbox Operating System Concepts, 9 th edition

More information

Concurrency Control. Chapter 17. Comp 521 Files and Databases Spring

Concurrency Control. Chapter 17. Comp 521 Files and Databases Spring Concurrency Control Chapter 17 Comp 521 Files and Databases Spring 2010 1 Conflict Serializable Schedules Recall conflicts (WW, RW, WW) were the cause of sequential inconsistency Two schedules are conflict

More information

Chapter 9 Memory Management

Chapter 9 Memory Management Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual

More information

Processor-Directed Cache Coherence Mechanism A Performance Study

Processor-Directed Cache Coherence Mechanism A Performance Study Processor-Directed Cache Coherence Mechanism A Performance Study H. Sarojadevi, dept. of CSE Nitte Meenakshi Institute of Technology (NMIT) Bangalore, India hsarojadevi@gmail.com S. K. Nandy CAD Lab, SERC

More information

Distributed Shared Memory: Concepts and Systems

Distributed Shared Memory: Concepts and Systems Distributed Shared Memory: Concepts and Systems Jelica Protić, Milo Toma sević and Veljko Milutinović IEEE Parallel & Distributed Technology, Summer 1996 Context: distributed memory management high-performance

More information

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started

More information

Implementing Sequential Consistency In Cache-Based Systems

Implementing Sequential Consistency In Cache-Based Systems To appear in the Proceedings of the 1990 International Conference on Parallel Processing Implementing Sequential Consistency In Cache-Based Systems Sarita V. Adve Mark D. Hill Computer Sciences Department

More information

Shared Virtual Memory. Programming Models

Shared Virtual Memory. Programming Models Shared Virtual Memory Arvind Krishnamurthy Fall 2004 Programming Models Shared memory model Collection of threads Sharing the same address space Reads/writes on shared address space visible to all other

More information

CSCI 4717 Computer Architecture

CSCI 4717 Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel

More information

Programming Languages Third Edition. Chapter 7 Basic Semantics

Programming Languages Third Edition. Chapter 7 Basic Semantics Programming Languages Third Edition Chapter 7 Basic Semantics Objectives Understand attributes, binding, and semantic functions Understand declarations, blocks, and scope Learn how to construct a symbol

More information

Multiprocessor Cache Coherency. What is Cache Coherence?

Multiprocessor Cache Coherency. What is Cache Coherence? Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by

More information

Enhancing Locality in Java based Irregular Applications

Enhancing Locality in Java based Irregular Applications Enhancing Locality in Java based Irregular Applications N. Faria, R. Silva and J. L. Sobral {nfaria, ruisilva, jls}@di.uminho.pt CCTC/Universidade do Minho Abstract. Improving locality of memory accesses

More information

OPERATING SYSTEM. Chapter 4: Threads

OPERATING SYSTEM. Chapter 4: Threads OPERATING SYSTEM Chapter 4: Threads Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples Objectives To

More information

!! How is a thread different from a process? !! Why are threads useful? !! How can POSIX threads be useful?

!! How is a thread different from a process? !! Why are threads useful? !! How can POSIX threads be useful? Chapter 2: Threads: Questions CSCI [4 6]730 Operating Systems Threads!! How is a thread different from a process?!! Why are threads useful?!! How can OSIX threads be useful?!! What are user-level and kernel-level

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra is a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Summary: Open Questions:

Summary: Open Questions: Summary: The paper proposes an new parallelization technique, which provides dynamic runtime parallelization of loops from binary single-thread programs with minimal architectural change. The realization

More information

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall Concurrency Control Chapter 17 Comp 521 Files and Databases Fall 2012 1 Conflict Serializable Schedules Recall conflicts (WR, RW, WW) were the cause of sequential inconsistency Two schedules are conflict

More information

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Barbara Chapman, Gabriele Jost, Ruud van der Pas Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology

More information

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication John Markus Bjørndalen, Otto J. Anshus, Brian Vinter, Tore Larsen Department of Computer Science University

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) A 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Parallel Execution of Prolog Programs: a Survey

Parallel Execution of Prolog Programs: a Survey Parallel Execution of Prolog Programs: a Survey GOPAL GUPTA University of Texas at Dallas ENRICO PONTELLI New Mexico State University KHAYRI A.M. ALI Swedish Institute of Computer Science MATS CARLSSON

More information

CACHE-CONSCIOUS ALLOCATION OF POINTER- BASED DATA STRUCTURES

CACHE-CONSCIOUS ALLOCATION OF POINTER- BASED DATA STRUCTURES CACHE-CONSCIOUS ALLOCATION OF POINTER- BASED DATA STRUCTURES Angad Kataria, Simran Khurana Student,Department Of Information Technology Dronacharya College Of Engineering,Gurgaon Abstract- Hardware trends

More information

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data FedX: A Federation Layer for Distributed Query Processing on Linked Open Data Andreas Schwarte 1, Peter Haase 1,KatjaHose 2, Ralf Schenkel 2, and Michael Schmidt 1 1 fluid Operations AG, Walldorf, Germany

More information

Process size is independent of the main memory present in the system.

Process size is independent of the main memory present in the system. Hardware control structure Two characteristics are key to paging and segmentation: 1. All memory references are logical addresses within a process which are dynamically converted into physical at run time.

More information

Implementation Garbage Collection

Implementation Garbage Collection CITS 3242 Programming Paradigms Part IV: Advanced Topics Topic 19: Implementation Garbage Collection Most languages in the functional, logic, and object-oriented paradigms include some form of automatic

More information

Imperative Functional Programming

Imperative Functional Programming Imperative Functional Programming Uday S. Reddy Department of Computer Science The University of Illinois at Urbana-Champaign Urbana, Illinois 61801 reddy@cs.uiuc.edu Our intuitive idea of a function is

More information

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008 Dynamic Storage Allocation CS 44: Operating Systems Spring 2 Memory Allocation Static Allocation (fixed in size) Sometimes we create data structures that are fixed and don t need to grow or shrink. Dynamic

More information

Part IV. Chapter 15 - Introduction to MIMD Architectures

Part IV. Chapter 15 - Introduction to MIMD Architectures D. Sima, T. J. Fountain, P. Kacsuk dvanced Computer rchitectures Part IV. Chapter 15 - Introduction to MIMD rchitectures Thread and process-level parallel architectures are typically realised by MIMD (Multiple

More information