Eden Trace Viewer: A Tool to Visualize Parallel Functional Program Executions

Size: px
Start display at page:

Download "Eden Trace Viewer: A Tool to Visualize Parallel Functional Program Executions"

Transcription

1 Eden Trace Viewer: A Tool to Visualize Parallel Functional Program Executions Pablo Roldán Gómez roldan@informatik.uni-marburg.de Universidad Complutense de Madrid Facultad de Informática E Madrid, Spain Jost Berthold berthold@informatik.uni-marburg.de Philipps-Universität Marburg, Fachbereich Mathematik und Informatik D Marburg, Germany PRELIMINARY VERSION submitted to PACT 2005 conference Abstract This paper presents the Eden Trace Viewer (EdenTV), a tool which generates graphic representations of parallel program executions. The tracing tool was conceived to analyze the behavior of programs written in the parallel functional language Eden. With the high abstraction offered by parallel functional languages, tools for the analysis of the runtime behaviour are all the more important. We describe the concepts of Trace Generation, Trace Interpretation and Trace Representation in the EdenTV, giving brief technical details about its implementation. The functionality of the EdenTV is illustrated by two examples, emphasizing the utility of the EdenTV to analyze and improve the performance of parallel programs. An area of future work is to improve and optimize the tool in future versions, as well as to adapt it for other parallel functional languages. I. INTRODUCTION Due to the high complexity and nondeterministic interaction in parallel systems, optimizing parallel programs is a very complex process. Parallel functional languages eliminate many of the most unpleasant burdens of parallel programming[... ] (foreword of [5]) because they remain abstract about the operational properties of a program. Algorithmic skeletons([2], [16]) capture typical patterns of parallelism, which can often be expressed in the language itself (offering concepts like higher-order functions or demand-driven evaluation, which makes infinite data structures possible), but parallel functional languages [leave] a large gap for the implementation to bridge [5] In the area of parallel programming, and especially optimization, the high abstraction offered by functional languages makes parallel programming much less error-prone, but on the other hand, it is very difficult to know or guess what really happens inside the parallel machine during a program execution. The programmer has few possibilities to analyze and optimize the behavior of a program by only knowing the runtime or speedup achieved. In order to improve the performance of parallel functional programs productively, we need more information about the execution. The common way to analyze the performance of a program is to use profiling tools. Specific information about the behavior of a program is collected during the execution and often written in files for a subsequent study. After the execution, we have a file with data about which events have occurred, when and where. That is what we call trace generation. Afterwards, the collected information is statistically processed by an analysis software (trace interpretation) and the result can be (graphically or textually) presented to the programmer (trace representation). Toolkits exist to support all steps of this procedure, but for high-level languages, the profiling tools often need to implement specific language concepts not present in standard profiling and monitoring tools. We opted to use parts of the well established Pablo Toolkit [17] and its Self-Describing Data Format (SDDF) [1] for trace generation and only customize the subsequent steps in a special software. Trace files in a specialized SDDF are generated by an instrumented runtime system which uses the Pablo Trace Library. The generated files are then loaded into the Eden Trace Viewer (EdenTV), our interpretation and visualization tool programmed in Java which produces interactive bar diagrams for all computational units of the parallel functional language Eden. SDDF contains portable, semantically enriched data (following the same concept as XML), EdenTV can be used on any OS and hardware which supports Java. To understand the target language for this tool and its concepts, Section II gives a short description of Eden, with the basics about language concepts and runtime system (RTS). The next section of the paper (III) explains the components of our profiling system in the three steps mentioned above: First, we look at the trace generation and describe the data which is stored during the execution of a program. Then we describe the trace interpretation step, and how trace data is stored by the EdenTV in order to build the graphics. The last part of the section refers to the representation of the stored data and talks about the style of the EdenTV graphics. Section IV describes features of the EdenTV and shows

2 two example analyses. Section V points to related work, and the last section (VI) concludes and discusses extensions and improvements of EdenTV as future work. II. EDEN LANGUAGE AND IMPLEMENTATION CONCEPTS The target language of the tool is Eden [11]. This language extends the purely functional language Haskell[13] with syntactic constructs for explicitly defining and creating parallel processes in machines of a closed parallel system. The programmer has direct control over process granularity, data distribution and communication topology [8], but does not have to manage synchronization and data exchange between processes, which is done by the runtime system through implicit communication channels, transparent to the programmer. A. Coordination Constructs The essential two coordination constructs of Eden are process abstraction and instantiation. Functions can be embedded into process abstractions and a process instantiation operator starts a process by supplying input to the abstraction. process :: (Trans a, Trans b) => (a -> b) -> Process a b embeds functions of type a->b into process abstractions of type Process a b where the context (Trans a, Trans b) states that both a and b are overloaded values belonging to the Trans class of transmissible values. A process abstraction process (\x -> e) defines the behavior of a process with parameter x as input and expression e as output. A process instantiation uses the predefined infix operator ( # ) :: (Trans a, Trans b) => Process a b -> a -> b to provide a process abstraction with actual input parameters. Processes are distinguished from functions by their operational property to be executed remotely, while their denotational meaning remains unchanged as compared to the underlying function. Haskell, the computation language of Eden, uses lazy evaluation i.e. data is only evaluated on demand, if it is needed for the overall result. Eden deviates from this principle: Eden processes are eagerly instantiated and evaluate and send their output eagerly. This aims at increasing the parallelism degree and at speeding up the distribution of the computation. Local garbage collection detects unnecessary results and stops the evaluating remote threads. B. Process Communication In general, Eden processes do not share data among each other and are encapsulated units of computation. All data is communicated eagerly via (internal) channels, avoiding the need for global memory management and data request messages, but possibly duplicating data. Evaluation of the expression (process funct) # arg leads to the creation of a new process for evaluating the application of the function funct to the argument arg. The argument is evaluated by new concurrent threads in the parent process and sent to the new child process, which, in turn, fully evaluates and sends back the result of the function application. Both are using implicit 1:1 communication channels established between child and parent process on process instantiation. Communication is encoded in the type class Trans (Transmissible) shown in the type signatures above. As a matter of principle, every transmitted value is evaluated to normal form (NF) prior to transmission. Furthermore, the class uses overloaded communication functions for lists, which are transmitted as streams, element by element, and for tuples, which are evaluated component-wise by concurrent threads in the same process. An Eden process can thus contain a variable number of threads during its lifetime, supplying input to child processes or concurrently evaluating results. As communication channels are normally connections between parent and child process, the communication topologies are hierarchical. In order to support other topologies, Eden provides additional language constructs to create channels dynamically, which is another source of concurrent threads in the sender process. C. Runtime System Eden is implemented on the basis of the Glasgow Haskell Compiler GHC [12], a mature and efficient Haskell implementation. While the compiler frontend is almost unchanged, Eden uses a modified runtime system (RTS), where kernel parts are shared with GUM, the implementation of Glasgow Parallel Haskell (GpH) [20]. The parallel RTS uses suitable middleware (currently PVM[15]) to manage parallel execution and communication. Besides, the implementation of Eden relies essentially on the GHC implementation of concurrency. Concurrent threads, modelled in GHC by Thread State Objects (TSOs), are the basic unit for the implementation, so the central task for profiling is to keep track of their execution. TSOs in GHC are scheduled Round-Robin, depending on the memory consumption, which leads to the possible state transitions shown in Figure 1. new thread kill thread Fig. 1. Runnable Finished run thread suspend thread Running kill thread kill thread deblock thread block thread Thread state transition diagram in Eden Blocked A thread can be in one of the following states: runnable, running, blocked and finished. The first state of a thread after its creation is runnable, i.e. it can be executed, but the RTS has not chosen it yet. When the RTS schedules it for execution, the thread goes to state running, and possibly back to state runnable when descheduled. If a running thread needs a value which is not available, it blocks on a placeholder node in the graph heap. When the missing value arrives, the thread goes back to state runnable again. Finally, a thread finishes after

3 evaluating and sending its output, or when its output is not required any more (detected by a local or remote garbage collection). The channels between processes are modelled by inports and outports in the RTS, both administered by respective tables to ensure the 1:1 condition. A channel is thus a connection from an outport to exactly one inport. An Eden process, as a purely conceptual unit, consists of a number of concurrent threads which share a common graph heap (as opposed to processes, which only communicate via channels). A process table keeps track of the thread count in a process and its number of open ports. The Eden RTS does not support the migration of threads or processes to other machines during execution, so every thread is located on exactly one machine during its lifetime. The first task for profiling is thus to keep track of the thread state transitions within a process, and of communication between processes. III. IMPLEMENTATION OF THE EDENTV As explained in Section I, the typical steps of profiling usually are trace generation, trace interpretation and trace representation. The EdenTV separates these steps as much as possible, so that single parts can easily be maintained and modified for other purposes. The overall aim of EdenTV is a post-mortem analysis, thus the steps are also temporarily separated. A. First Step: Trace Generation To profile the execution of an Eden program, we must collect information about the behavior of machines, processes, threads and messages by writing selected events into a trace file. Usually, a trace event is considered as an instance of the execution of a specific statement or instruction in an application, on a specific machine. In our case, an event indicates the creation or a state transition of a unit of computation, machine, process or thread. Table 2 contains the events selected for the instrumentation, which have self-explanatory names. Fig. 2. Start Machine New Process New Thread Run Thread Block Thread Send Message End Machine Kill Process Kill Thread Suspend Thread Deblock Thread Receive Message Events to capture during the execution of an Eden program As already mentioned in Section I, we use the Pablo Toolkit [17] and its Self-Defining Data Format (SDDF) [1] developed at the University of Illinois. The Pablo Trace Capture Library provides an API of tracing methods for Fortran and C that can be called from the target code to generate trace files. The Eden RTS is instrumented with these calls to give us the necessary information about the execution. This is important, because it means that we do not have to change the Eden program itself # 201: "Block Thread" { int "timestamp"[] double "Seconds" int "Event ID" int "Machine Number" int "Process ID" int "Thread ID" int "Inport ID" int "Block Reason" };; Fig. 3. #217: "Send Message" { int "timestamp"[] double "Seconds" int "Event ID" int "Machine Number" int "Sending Process ID" int "Outport ID" int "Receiving Machine"; int "Receiving Process" int "Inport ID" int "Tag of the message" };; Examples of SDDF data record structures to obtain the traces. The instrumentation is just a matter of placing library functions properly in the RTS source code. The SDDF is a flexible file meta-format that specifies both data record structures and data record instances [1]. This meta-format is not based on predefined layouts and allows to add new event types easily. Thanks to this we can define our own event family to meet the requirements of our instrumentation. SDDF events are specified by data record structures, which are written at the beginning of each SDDF file. Data record instances in the file contain the trace event information itself, and identifiers linking them to their corresponding data record structures. This idea is the same that underlies the XML format, but in the case of the SDDF both definition structures and data instances can be found in the same file. By capturing an event we obtain specific data depending on its type. This data is saved in the trace file by data record instances according the Pablo SDDF, in ASCII representation (two examples of SDDF data record descriptors are in Figure 3). Each event is characterized by an integer EventID appearing at the beginning of each SDDF data record instance. In order to localize the events, SDDF record instances contain a unique tuple with the machine, process and thread numbers in which the event has occurred. A field called Seconds determines the time of the execution when the corresponding event occurred. Not all events require the same values to be saved in the SDDF records. For example, for the event Send Message we should save integers representing the sending and the receiving processes respectively. For the event Block Thread (Figure 3), we need different information. Thus, according to the nature of each captured event, a different type of SDDF record instance is written into the trace file. The instrumentation of the RTS has been carried out mostly in a separate module, which contains the overloaded trace functions and initializations. To cover all the events, only four existing source code files of the RTS had to be modified. The instrumented Eden RTS is implemented as an additional compiler way 1. The compiler itself (code generation) is not modified at all. In order to use the profiling version, the program only has to be linked to a different RTS. The Pablo 1 Ways are a concept of GHC which allows to select between different compiler and RTS versions when compiling a program.

4 Trace Library is efficiently implemented. Measurements have shown that the performance of Eden programs is hardly affected at all by writing the event files. B. Second Step: Trace Interpretation To read and interpret the trace files produced by the instrumentation of the Eden RTS, we model the logical units of the execution by the object hierarchy shown in Figure 4. EdenObject <<contains>> TraceData EdenMachine EdenProcess EdenThread EdenMessage <<contains>> Fig. 4. <<contains>> <<contains>> Object hierarchy of the EdenTV For the EdenTV, logical units are machines, processes, threads and messages, abstracted by the class EdenObject. The four concrete subclasses derived from the class EdenObject are EdenMachine, EdenProcess, EdenThread and EdenMessage. Each class defines fields to identify the objects and saves information that rebuilds the activity and behavior of each logical unit. The most important class for the reconstruction of the execution is the class TraceData. An instance of this class represents the whole execution of an Eden program, containing arrays of machines, processes, threads and messages. The complete information collected from one trace file is accessible in an object of this class. The TraceData uses a parser class to obtain the data from the trace file and fills the mentioned arrays from it. We model a class for each event of the instrumentation as well. Since each event needs different fields to be saved, new subclasses of the event class were defined for each different event of Figure 2. To explain it shortly, the constructor of the class Trace- Data creates, updates and arranges the objects of the class EdenObject (machines, processes, threads and messages) from the event objects that the parser class produces while reading the trace file. The creation of an object of the class TraceData from a trace file is the trace interpretation part of the EdenTV. The following and last step is to represent the information collected in the TraceData instance graphically. C. Third Step: Trace Representation In the space-time diagrams generated by the EdenTV, machines, processes and threads are represented by horizontal bars, where the horizontal axis of the diagram is the execution time. Messages between processes or machines are optionally represented by grey arrows which start from the sending unit bar and point at the receiving unit bar. The representation of messages is very important for the programmer, since he/she can observe hot spots and inefficiencies in the communication during the execution. TABLE I COLOR CODES FOR BAR GRAPHICS IN EDENTV Machine Process Thread Blue Idle Idle n/a (no processes) (no threads) Yellow System Time Runnable Runnable (threads runnable) (one thread) (this thread) Green Running Running Running (one process) (one thread) (this thread) Red Blocked Blocked Blocked (all processes) (all threads) (this thread) EdenTV produces six diagrams: All Machines, All Processes, All Threads, Processes In Machines, Threads In Processes and Threads In Machines. The diagram Process In Machines is similar to the diagram All Processes, but the processes are arranged by machines and not by process ID. The same holds for the diagrams Threads In Processes and Threads In Machines, which are like the diagram All Threads, but arranged by processes and machines respectively. For this reason, this section only gives a description for the first three diagrams. The diagram bars have segments in different colors, which indicate the activities of the respective logical unit in a period during the execution. These color codes are shown in Table I. The diagram All Machines (Figure 5) shows the activity of all machines that have taken part in the execution. A machine is displayed as idle if it has no process to execute. It is in state system if at least one thread in it is runnable, but no thread is running (garbage collection, receiving messages, other user s processes). The machine is in state running if one thread is running in it. When all threads in the machine are blocked, it is in state blocked. A similar diagram All Processes (Figure 6(a)) shows the activity of all processes. The same four colors are used to differentiate the states of the processes. The conditions for the states system, running and blocked are exactly the same for a process as for a machine. A process is in state idle if the process does not contain any threads 2. In general, the following equation holds for the thread count inside one process or machine: 0 Runnable Threads + Blocked Threads Total Threads Color codes for processes and machines are assigned following the comparison of these numbers. In general, the first condition checked by the EdenTV is if the machine or process is in state idle (Total Processes resp. Total Threads= 0). An inequality on the right implies that a thread is running. Otherwise the state is either runnable or blocked, depending on the number of runnable threads (the state is runnable if Runnable Threads > 0). This context information for all units is the basis of the graphical representations. For the diagram All Threads (Figure 6(b)) the colorcodification is similar, but the colors of the bar fragments 2 An idle process is normally inexistent, but exists in the implementation, so it was modelled in the representation as well.

5 Fig. 5. All Machines Graphic (a) All Processes Graphic (b) All Threads Graphic Fig. 6. Diagrams of EdenTV correspond to the state of the thread itself. Note that a thread can not be in state idle. To implement the diagram construction, a class GraphicEdenPanel was defined. The instances of this class contain a TraceData instance which is processed to generate a diagram. Since each diagram is generated independently, six subclasses of GraphicEdenPanel were created for the six different diagrams. IV. USING EDENTV A. Functionality and Mode of Use When a trace file generated by the instrumented RTS is opened, the gathered information is presented in the bar diagrams shown in the previous section as well as in textual form. The text field on the right gives detailed statistics about the overall computation and details for each unit (machine, process or thread) shown in the graphic representation. The bar diagrams presented by EdenTV can be zoomed in

6 Fig. 7. Zoomed View with Messages and Interactive mode. order to get a closer view on the activities at critical points during the execution. Furthermore, the user can select to see the messages sent and received by each machine or process during the execution of an Eden program. Messages are not shown by default because showing the messages often hides other details in the diagrams, and due to resource consumption in the viewer when handling bigger trace files. Additionally, an interactive mode provides extra information when a bar in the diagram is clicked. This viewer mode is resource-consuming as well and therefore turned off by default. It provides the information for the event immediately preceding the point in time which has been clicked, i.e. the cause of the current state of the clicked unit. The described options are situated on the lower right of the panel below the text field. Some other options only control the appearance, such as the separation between the displayed bars and the bar style. A more important feature is the ability to leave out units from the display if their lifetime was less than an adjustable minimum (Minimum Time). B. Examples: Tuning Eden Programs with EdenTV 1) Warshall: Lazy Evaluation vs. Parallelism: Using a lazy functional computation language, a crucial issue for Eden as well as other Haskell-based parallelism is to start the evaluation of needed subexpressions early enough and to fully evaluate them for later use. While this has lead to Eden s eager communication in the language design described earlier, evaluation strategies [19] are another well-established means of starting parallel evaluation. However, the basic choice to either evaluate a final result to WHNF or completely (NF) sometimes does not offer enough control to optimize an algorithm strategies must then be applied to certain subresults. On the sparse basis of runtime measurements, such an optimization would be rather cumbersome. The EdenTV, accompanied with code inspection, makes such inefficiencies obvious, as in the following example. Example: The program measured here is a parallel implementation of Warshall s algorithm to compute shortest paths for all nodes of a graph from the adjacency matrix (adapted from [14]). The minimum distances from one node to every other are continuously updated and successively passed through a ring structure linking all processes, starting with distances from the first node. Implementing this algorithm is very simple, provided the existence of a suitable ring skeleton. The task reduces to defining the function each ring process should apply to its input from the parent and from the ring neighbour. The trace visualizations shown in Fig. 8 show the All machines view of the EdenTV for two different warshall programs on 16 processors of a Beowulf cluster, the input graph consisting of 500 nodes. The first straight-forward version of the program shows bad performance, due to the inherent data dependence in the algorithm. Using the trace viewer makes obvious that the first phase of the algorithm runs on several machines, but completely sequentially. Only the second phase of the algorithm really runs in parallel on all machines. The length of this second phase depends on the position in the ring: the first ring process (which started working first) must do the most work in the end, thereby dominating runtime. Viewing the communication, one can see that the nodes communicate as intended, but do not process the received data. The reason for this bad performance is demand-driven evaluation: Data received from a ring neighbour is passed

7 Runtimes:154.2 sec. (above) and 40.7 sec.(below) Fig. 8. Warshall-Algorithm, first (above) and optimized version (Beowulf Cluster, Heriot Watt University, Edinburgh, 16 machines) to the next node unchanged, until the node sends its own rows, which are updated with all ring data received before. Only when the node s updated own rows are passed into the ring, their evaluation begins. Before this inherent demand, the processes only accumulate data from the ring communication, but do not proceed updating their rows with the distances received from the ring neighbour. Inserting a normal form evaluation strategy (rnf) for an intermediate result into the respective node function dramatically improves runtime. We still see the impact of the data dependency, leading to a short wait phase passing through the ring, but the optimized version shows good speedup and load balance. A crucial issue for such optimizations is however to identify the displayed units from their behaviour, without a link to the program s source code. 2) Mandelbrot: Irregularity and Cost of Load Balancing: Parallel skeletons (higher-order functions implementing common patterns of parallel computation [2], [16], [8]) provide an easy parallelization of existing programs by replacing common higher-order functions by parallel equivalents. A prime example is the well-known higher-order function map, which applies a function to all elements of a list, and can be parallelized in different ways. In the case of map, problems may arise when the tasks to solve expose irregular complexity. Irregularity can lead to uneven load distribution when tasks are statically distributed, and the machine with the most complex tasks will dominate the runtime of a parallel computation. By using a feedback from process results to process inputs, work can also be distributed dynamically in a parallel map implementation, thereby adapting to the current load and speed of the different machines. But the dynamic distribution necessarily has more overhead than a static one. Example: We compare these different work distribution schemes using a program which generates Mandelbrot Set visualizations by applying a uniform, but potentially irregular computation to a set of coordinates. The underlying computation scheme, map, is parallelized by two skeletons presented in [8]. 1) Farm: the tasks are statically distributed to a set of workers, one for each available machine. Each worker solves a fixed set of tasks. 2) Workpool: the main machine distributes only few initial tasks among the others (worker machines). Each time a worker has finished one task, the main machine sends a new one, i.e. subsequent tasks are distributed dynamically on request. From previous results in older measurements ([8]), we could expect that the dynamic distribution in the workpool skeleton

8 a.- Farm Skeleton (Runtime: 8.6 sec.) b.- Workpool Skeleton (Runtime: 11.5 sec.) Fig. 9. All Processes diagrams for the Mandelbrot program (Beowulf Cluster, Heriot Watt University, Edinburgh, 16 machines) performs better than a static distribution, since the tasks have highly irregular complexity. On the other hand, the dynamic load balancing requires additional, time-critical communication and prevents some optimization in the communication system. The program was executed on 16 machines of a Beowulf- Cluster, with a problem size of pixels. As we can see in the diagrams shown in Fig. 9, the workpool-skeleton

9 does not perform as good as expected. Runtime is longer for the workpool skeleton, where EdenTV reveals that the master node is a serious bottleneck, which receives requests from all other 15 machines and thus spends much time receiving messages (system time). One of the worker machines seems to work slower than the others, most likely due to other tasks on the respective machine. The other machines have spent a lot of time in blocked state, waiting for new work assigned by the heavily loaded master node. We see that all machines stop at the same time, i.e. the system is well synchronized, but has a severe bottleneck. The trace of the farm skeleton shows an obvious load imbalance among the workers. The main machine (bottom bar) merges all results at the end of the computation, leading to a longer sequential end phase. Both versions are far below the theoretical optimum of using all processors in parallel all of the time. The analysis using the EdenTV has revealed why the Farm skeleton has better performance than the Workpool skeleton for this program and the problem size investigated here. By increasing the amount of tasks a worker receives initially, we gain only slightly better performance. Increasing granularity by combining several tasks would lead to less tasks in total and contradicts the intended load balancing properties. A better way is to parallelize the work distribution as well, and to use more than one main machine in a hierarchy of workpools. V. RELATED WORK The EdenTV conceptually follows a rather general concept of tracing tools: trace information is gathered during execution, saved in files, and afterwards processed by a tool which is aware of specific language concepts and the relationship between units of computation. A rather simple (yet insufficient) way to obtain information about the program s parallelism would be to trace the behaviour of the communication subsystem. The Eden RTS currently uses PVM, which comes with a built-in trace capability (using Pablo SDDF as well). Tracing PVM-specific and user-defined actions would be possible and visualisation could be carried out by XPVM [9]. But the weak points of such a solution off-the-shelf are obvious: PVM-tracing yields only information about spawned subtasks and concrete messages between the machines. Internal buffering and threads in the RTS would remain invisible, unless user-defined events are used. So we opted not to bind us to PVM tracing. Switching to a different message passing system would otherwise be very hard, although one can expect every suitable system to come with comparable features. Comparable to the tracing included in XPVM, many efforts have been made in the past to create standard tools for trace analysis and representation (e.g. Pablo Analysis GUI [17] or the ParaGraph Suite [6], to mention only two essential results). These tools have interesting features EdenTV does not yet include, like stream-based online trace analysis, execution replay, and a wide range of standard diagrams. But the aim of EdenTV was a specific visualisation of logical Eden units and needed a more customized solution. The more closely related tracing concepts for parallel functional languages have collectively influenced the EdenTV in tracing and representation. Using the trace format and tools of the Granularity Simulator GranSim [10], one can not only simulate programs in the parallel functional language GpH, but also obtain information about real parallel program executions. Due to the different language concept, GranSim does not trace any kind of communication and offers rather bird s eye views on the execution than to concentrate on the single computation units. The EdenTV diagrams are inspired by the per-processor view of GranSim, commonly known as Gantt diagrams, or spacetime-diagrams when combined with message display [3]. The Eden derivative of GranSim, Paradise [7], was a pure simulator, but it offers an interesting feature not yet present in EdenTV: respecting the explicit coordination concept of Eden, Paradise offers to label instantiated processes in the visualisation, thereby linking the trace information to the program source code. Last but not least, the direct predecessor of EdenTV was the work by Ralf Freitag [4], which was entirely based on the Pablo Analysis GUI (not available any more). VI. CONCLUSION AND FUTURE WORK The Eden Trace Viewer is a combination of an instrumented runtime system to generate, and an interactive GUI to interpret trace information and to represent it in interactive diagrams. These three steps are common to many profiling solutions, separated as much as possible in the EdenTV implementation. Its usefulness for the analysis and optimization of parallel program performance is due to a view of concrete parallel executions. All important aspects of parallel execution are shown in the diagrams, which can be explored in great detail. Specific language concepts of the parallel functional language Eden are present in all parts of the tool, but the described concepts of design and implementation can be applied to target other parallel languages as well. The code of the EdenTV is written in Java and can be easily extended to support a larger set of events and to generate new diagram types. All languages using logical units like processes, threads and messages in a way similar to Eden can be analyzed by such a tool. With minor extensions to the instrumentation carried out for Eden, trace files for GpH programs could be generated in SDDF by the shared RTS. As the instrumentation is cleanly separated from other parts and uses a well-established library, only few work should be necessary for this task. Once the trace files contain new event types, appropriate new objects can be introduced into the code of the EdenTV to parse, interpret and represent these new files. Other logical units than Eden units can be represented in new diagrams. [18] explains in detail where such modifications would be located, as well as reasonable optimizations, which could not be integrated in the code due to time limitations. As another

10 extension, the EdenTV could show an analysis while the program still runs, using stream-based trace file processing. Acknoledgements The authors thank Rita Loogen for her helpful comments improving this paper, and the colleagues from Heriot-Watt- University, Edinburgh for the opportunity to work with their beowulf cluster. REFERENCES [1] R. A. Aydt. The Pablo Self-Defining Data Format. Technical report, University of Illinois. Department of Computer Science, http: //www-pablo.cs.uiuc.edu. [2] M. I. Cole. Algorithmic Skeletons: Structured Management of Parallel Computation. Research Monographs in Parallel and Distributed Computing. The MIT Press, Cambridge, MA, [3] I. Foster. Designing and Building Parallel Programs. Addison-Wesley, [4] R. Freitag. Entwicklung eines Werkzeugs zur Analyse des Laufzeitverhaltens paralleler funktionaler Programme. Master s thesis, Philipps- Universität Marburg, Germany, In German. [5] K. Hammond and G. Michaelson, editors. Research Directions in Parallel Functional Programming. Springer-Verlag, [6] M. T. Heath and J. A.. Etheridge. Visualizing the performance of parallel programs. IEEE Software, 8(5), [7] F. Hernández, R. Peña, and F. Rubio. From GranSim to Paradise. In Trends in Functional Programming 1 (SFP 99). Intellect, [8] U. Klusik, R. Loogen, S. Priebe, and F. Rubio. Implementation Skeletons in Eden Low-Effort Parallel Programming. In Implementation of Functional Languages: IFL 00, volume 2011 of LNCS, Aachen, Germany, Springer. [9] J. A. Kohl and G. A. Geist. The PVM 3.4 tracing facility and XPVM 1.1. In Proceedings of HICSS-29, pages IEEE Computer Society Press, [10] H. W. Loidl. GranSim User s Guide. Technical report, University of Glasgow. Department of Computer Science, [11] R. Loogen, Y. Ortega, and R. Peña. Parallel Functional Programming in Eden. Journal of Functional Programming, To appear. [12] S. Peyton Jones, C. Hall, K. Hammond, W. Partain, and P. Wadler. The Glasgow Haskell Compiler: a Technical Overview. In JFIT 93, March [13] S. Peyton Jones and J. Hughes. Haskell 98: A Non-strict, Purely Functional Language, Available at org/. [14] M. Plasmeijer and M. van Eekelen. Functional Programming and Parallel Graph Rewriting. Addison-Wesley, Reading, Massachusetts, USA, [15] Parallel Virtual Machine Reference Manual, Version 3.2. University of Tennessee, August [16] F. A. Rabhi and S. Gorlatch, editors. Patterns and Skeletons for Parallel and Distributed Computing. Springer, [17] D. A. Reed, R. D. Olson, R. A. Aydt, T. Madhyastha, T. Birkett, J. Jensen, B. A. A. Nazief, and B. K. Totty. Scalable performance environments for parallel systems. Technical Report 1673, Dep. of Computer Sc., University of Illinois, Illinois, [18] P. Roldán. Eden Trace Viewer: Ein Werkzeug zur Visualisierung paralleler funktionaler Programme. Master s thesis, Philipps-Universität Marburg, Germany, In German. [19] P. Trinder, K. Hammond, H.-W. Loidl, and S. Peyton Jones. Algorithm + Strategy = Parallelism. J. of Functional Programming, 8(1), January [20] P. W. Trinder, K. Hammond, J. S. M. Jr., A. S. Partridge, and S. L. P. Jones. Gum: A portable parallel implementation of haskell. In PLDI, 1996.

Visualizing Parallel Functional Program Runs: Case Studies with the Eden Trace Viewer

Visualizing Parallel Functional Program Runs: Case Studies with the Eden Trace Viewer John von Neumann Institute for Computing Visualizing Parallel Functional Program Runs: Case Studies with the Eden Trace Viewer Jost Berthold and Rita Loogen published in Parallel Computing: Architectures,

More information

Porting the Eden System to GHC 5.00

Porting the Eden System to GHC 5.00 Porting the Eden System to GHC 5.00 Jost Berthold, Rita Loogen, Steffen Priebe, and Nils Weskamp Philipps-Universität Marburg, Fachbereich Mathematik und Informatik Hans Meerwein Straße, D-35032 Marburg,

More information

THE IMPACT OF DYNAMIC CHANNELS ON FUNCTIONAL TOPOLOGY SKELETONS

THE IMPACT OF DYNAMIC CHANNELS ON FUNCTIONAL TOPOLOGY SKELETONS THE IMPACT OF DYNAMIC CHANNELS ON FUNCTIONAL TOPOLOGY SKELETONS J. BERTHOLD AND R. LOOGEN Fachbereich Mathematik und Informatik, Philipps-Universität Marburg Hans-Meerwein-Straße, D-35032 Marburg, Germany

More information

Controlling Parallelism and Data Distribution in Eden

Controlling Parallelism and Data Distribution in Eden Chapter 5 Controlling Parallelism and Data Distribution in Eden Ulrike Klusik, Rita Loogen, and Steffen Priebe 1 Abstract: The parallel functional language Eden uses explicit processes to export computations

More information

Parallel FFT With Eden Skeletons

Parallel FFT With Eden Skeletons Parallel FFT With Eden Skeletons Jost Berthold, Mischa Dieterle, Oleg Lobachev, and Rita Loogen Philipps-Universität Marburg, Fachbereich Mathematik und Informatik Hans Meerwein Straße, D-35032 Marburg,

More information

A Layered Implementation of the Parallel Functional Language Eden

A Layered Implementation of the Parallel Functional Language Eden A Layered Implementation of the Parallel Functional Language Eden Or: Lifting Process Control to the Functional Level Jost Berthold, Ulrike Klusik, Rita Loogen, Steffen Priebe, and Nils Weskamp Philipps-Universität

More information

A Parallel Framework for Computational Science

A Parallel Framework for Computational Science A Parallel Framework for Computational Science Fernando Rubio and Ismael Rodriguez* Departamento de Sistemas Informiticos y Programacibn Universidad Complutense, 28040 - Madrid, Spain {fernando,isrodrig)@sip.ucm.es

More information

Parallel Functional Programming in Eden

Parallel Functional Programming in Eden Under consideration for publication in J. Functional Programming 1 Parallel Functional Programming in Eden RITA LOOGEN Philipps-Universität Marburg, Germany (e-mail: loogen@mathematik.uni-marburg.de) YOLANDA

More information

1 Overview. 2 A Classification of Parallel Hardware. 3 Parallel Programming Languages 4 C+MPI. 5 Parallel Haskell

1 Overview. 2 A Classification of Parallel Hardware. 3 Parallel Programming Languages 4 C+MPI. 5 Parallel Haskell Table of Contents Distributed and Parallel Technology Revision Hans-Wolfgang Loidl School of Mathematical and Computer Sciences Heriot-Watt University, Edinburgh 1 Overview 2 A Classification of Parallel

More information

Towards An Adaptive Framework for Performance Portability

Towards An Adaptive Framework for Performance Portability Towards An Adaptive Framework for Performance Portability Work in Progress (submission #23) Patrick Maier Magnus Morton Phil Trinder School of Computing Science University of Glasgow IFL 25 Maier, Morton,

More information

Managing Heterogeneity in a Grid Parallel Haskell

Managing Heterogeneity in a Grid Parallel Haskell Managing Heterogeneity in a Grid Parallel Haskell A. Al Zain 1, P. W. Trinder 1, H-W. Loidl 2, and G. J. Michaelson 1 1 School of Mathematical and Computer Sciences, Heriot-Watt University, Riccarton,

More information

Effective Performance Measurement and Analysis of Multithreaded Applications

Effective Performance Measurement and Analysis of Multithreaded Applications Effective Performance Measurement and Analysis of Multithreaded Applications Nathan Tallent John Mellor-Crummey Rice University CSCaDS hpctoolkit.org Wanted: Multicore Programming Models Simple well-defined

More information

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine. 1. true / false By a compiler we mean a program that translates to code that will run natively on some machine. 2. true / false ML can be compiled. 3. true / false FORTRAN can reasonably be considered

More information

gsysc Visualization of SystemC-Projects (Extended abstract)

gsysc Visualization of SystemC-Projects (Extended abstract) gsysc Visualization of SystemC-Projects (Extended abstract) Christian J. Eibl Institute for Computer Engineering University of Lübeck February 16, 2005 Abstract SystemC is a C++ library for modeling of

More information

Dynamic Task Generation and Transformation within a Nestable Workpool Skeleton

Dynamic Task Generation and Transformation within a Nestable Workpool Skeleton Dynamic Task Generation and Transformation within a Nestable Workpool Skeleton Steffen Priebe Philipps-Universität Marburg, Fachbereich Mathematik und Informatik Hans-Meerwein-Straße, D-35032 Marburg,

More information

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.

More information

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 1 Today Characteristics of Tasks and Interactions (3.3). Mapping Techniques for Load Balancing (3.4). Methods for Containing Interaction

More information

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No # 09 Lecture No # 40 This is lecture forty of the course on

More information

Once Upon a Polymorphic Type

Once Upon a Polymorphic Type Once Upon a Polymorphic Type Keith Wansbrough Computer Laboratory University of Cambridge kw217@cl.cam.ac.uk http://www.cl.cam.ac.uk/users/kw217/ Simon Peyton Jones Microsoft Research Cambridge 20 January,

More information

Design of Parallel Algorithms. Models of Parallel Computation

Design of Parallel Algorithms. Models of Parallel Computation + Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms and Concurrency n Introduction to Parallel Algorithms n Tasks and Decomposition n Processes and Mapping n Processes

More information

Introduction to Parallel Performance Engineering

Introduction to Parallel Performance Engineering Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance:

More information

Parallel Functional Programming Lecture 1. John Hughes

Parallel Functional Programming Lecture 1. John Hughes Parallel Functional Programming Lecture 1 John Hughes Moore s Law (1965) The number of transistors per chip increases by a factor of two every year two years (1975) Number of transistors What shall we

More information

RAISE in Perspective

RAISE in Perspective RAISE in Perspective Klaus Havelund NASA s Jet Propulsion Laboratory, Pasadena, USA Klaus.Havelund@jpl.nasa.gov 1 The Contribution of RAISE The RAISE [6] Specification Language, RSL, originated as a development

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models CIEL: A Universal Execution Engine for

More information

Kent Academic Repository

Kent Academic Repository Kent Academic Repository Full text document (pdf) Citation for published version Chitil, Olaf (2006) Promoting Non-Strict Programming. In: Draft Proceedings of the 18th International Symposium on Implementation

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Efficient Interpretation by Transforming Data Types and Patterns to Functions

Efficient Interpretation by Transforming Data Types and Patterns to Functions Efficient Interpretation by Transforming Data Types and Patterns to Functions Jan Martin Jansen 1, Pieter Koopman 2, Rinus Plasmeijer 2 1 The Netherlands Ministry of Defence, 2 Institute for Computing

More information

Performance Cockpit: An Extensible GUI Platform for Performance Tools

Performance Cockpit: An Extensible GUI Platform for Performance Tools Performance Cockpit: An Extensible GUI Platform for Performance Tools Tianchao Li and Michael Gerndt Institut für Informatik, Technische Universität München, Boltzmannstr. 3, D-85748 Garching bei Mu nchen,

More information

Towards Generating Domain-Specific Model Editors with Complex Editing Commands

Towards Generating Domain-Specific Model Editors with Complex Editing Commands Towards Generating Domain-Specific Model Editors with Complex Editing Commands Gabriele Taentzer Technical University of Berlin Germany gabi@cs.tu-berlin.de May 10, 2006 Abstract Domain specific modeling

More information

Accelerated Library Framework for Hybrid-x86

Accelerated Library Framework for Hybrid-x86 Software Development Kit for Multicore Acceleration Version 3.0 Accelerated Library Framework for Hybrid-x86 Programmer s Guide and API Reference Version 1.0 DRAFT SC33-8406-00 Software Development Kit

More information

Parallelism with Haskell

Parallelism with Haskell Parallelism with Haskell T-106.6200 Special course High-Performance Embedded Computing (HPEC) Autumn 2013 (I-II), Lecture 5 Vesa Hirvisalo & Kevin Hammond (Univ. of St. Andrews) 2013-10-08 V. Hirvisalo

More information

Skeleton Composition vs Stable Process Systems in Eden

Skeleton Composition vs Stable Process Systems in Eden Skeleton Composition vs Stable Process Systems in Eden M. DIETERLE, T. HORSTMEYER, R. LOOGEN Philipps-Universität Marburg, Germany Email: [dieterle, horstmey, loogen]@informatik.uni-marburg.de, J. BERTHOLD

More information

Control-Flow Refinment via Partial Evaluation

Control-Flow Refinment via Partial Evaluation Control-Flow Refinment via Partial Evaluation Jesús Doménech 1, Samir Genaim 2, and John P. Gallagher 3 1 Universidad Complutense de Madrid, Spain jdomenec@ucm.es 2 Universidad Complutense de Madrid, Spain

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 17 January 2017 Last Thursday

More information

Detecting Structural Refactoring Conflicts Using Critical Pair Analysis

Detecting Structural Refactoring Conflicts Using Critical Pair Analysis SETra 2004 Preliminary Version Detecting Structural Refactoring Conflicts Using Critical Pair Analysis Tom Mens 1 Software Engineering Lab Université de Mons-Hainaut B-7000 Mons, Belgium Gabriele Taentzer

More information

A Functional Graph Library

A Functional Graph Library A Functional Graph Library Christian Doczkal Universität des Saarlandes Abstract. Algorithms on graphs are of great importance, both in teaching and in the implementation of specific problems. Martin Erwig

More information

Evaluation of Parallel Programs by Measurement of Its Granularity

Evaluation of Parallel Programs by Measurement of Its Granularity Evaluation of Parallel Programs by Measurement of Its Granularity Jan Kwiatkowski Computer Science Department, Wroclaw University of Technology 50-370 Wroclaw, Wybrzeze Wyspianskiego 27, Poland kwiatkowski@ci-1.ci.pwr.wroc.pl

More information

Hierarchical Master-Worker Skeletons

Hierarchical Master-Worker Skeletons Hierarchical Master-Worker Skeletons Jost Berthold, Mischa Dieterle, Rita Loogen, and Steffen Priebe Philipps-Universität Marburg, Fachbereich Mathematik und Informatik Hans Meerwein Straße, D-35032 Marburg,

More information

which a value is evaluated. When parallelising a program, instances of this class need to be produced for all the program's types. The paper commented

which a value is evaluated. When parallelising a program, instances of this class need to be produced for all the program's types. The paper commented A Type-Sensitive Preprocessor For Haskell Noel Winstanley Department of Computer Science University of Glasgow September 4, 1997 Abstract This paper presents a preprocessor which generates code from type

More information

code pattern analysis of object-oriented programming languages

code pattern analysis of object-oriented programming languages code pattern analysis of object-oriented programming languages by Xubo Miao A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science Queen s

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Implementing Mobile Haskell

Implementing Mobile Haskell Chapter 6 Implementing Mobile Haskell André Rauber Du Bois 1, Phil Trinder 1, Hans-Wolfgang Loidl 2 Abstract Mobile computation enables computations to move between a dynamic set of locations and is becoming

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 28 August 2018 Last Thursday Introduction

More information

Detection and Analysis of Iterative Behavior in Parallel Applications

Detection and Analysis of Iterative Behavior in Parallel Applications Detection and Analysis of Iterative Behavior in Parallel Applications Karl Fürlinger and Shirley Moore Innovative Computing Laboratory, Department of Electrical Engineering and Computer Science, University

More information

Urmas Tamm Tartu 2013

Urmas Tamm Tartu 2013 The implementation and optimization of functional languages Urmas Tamm Tartu 2013 Intro The implementation of functional programming languages, Simon L. Peyton Jones, 1987 Miranda fully lazy strict a predecessor

More information

What are the characteristics of Object Oriented programming language?

What are the characteristics of Object Oriented programming language? What are the various elements of OOP? Following are the various elements of OOP:- Class:- A class is a collection of data and the various operations that can be performed on that data. Object- This is

More information

The Need for Speed: Understanding design factors that make multicore parallel simulations efficient

The Need for Speed: Understanding design factors that make multicore parallel simulations efficient The Need for Speed: Understanding design factors that make multicore parallel simulations efficient Shobana Sudhakar Design & Verification Technology Mentor Graphics Wilsonville, OR shobana_sudhakar@mentor.com

More information

Job Re-Packing for Enhancing the Performance of Gang Scheduling

Job Re-Packing for Enhancing the Performance of Gang Scheduling Job Re-Packing for Enhancing the Performance of Gang Scheduling B. B. Zhou 1, R. P. Brent 2, C. W. Johnson 3, and D. Walsh 3 1 Computer Sciences Laboratory, Australian National University, Canberra, ACT

More information

ON IMPLEMENTING THE FARM SKELETON

ON IMPLEMENTING THE FARM SKELETON ON IMPLEMENTING THE FARM SKELETON MICHAEL POLDNER and HERBERT KUCHEN Department of Information Systems, University of Münster, D-48149 Münster, Germany ABSTRACT Algorithmic skeletons intend to simplify

More information

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008 SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem

More information

Developing MapReduce Programs

Developing MapReduce Programs Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

7. Introduction to Denotational Semantics. Oscar Nierstrasz

7. Introduction to Denotational Semantics. Oscar Nierstrasz 7. Introduction to Denotational Semantics Oscar Nierstrasz Roadmap > Syntax and Semantics > Semantics of Expressions > Semantics of Assignment > Other Issues References > D. A. Schmidt, Denotational Semantics,

More information

Al-Saeed, Majed Mohammed Abdullah (2015) Profiling a parallel domain specific language using off-the-shelf tools. PhD thesis.

Al-Saeed, Majed Mohammed Abdullah (2015) Profiling a parallel domain specific language using off-the-shelf tools. PhD thesis. Al-Saeed, Majed Mohammed Abdullah (2015) Profiling a parallel domain specific language using off-the-shelf tools. PhD thesis. http://theses.gla.ac.uk/6561/ Copyright and moral rights for this thesis are

More information

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems On Object Orientation as a Paradigm for General Purpose Distributed Operating Systems Vinny Cahill, Sean Baker, Brendan Tangney, Chris Horn and Neville Harris Distributed Systems Group, Dept. of Computer

More information

Object Oriented Programming in Java. Jaanus Pöial, PhD Tallinn, Estonia

Object Oriented Programming in Java. Jaanus Pöial, PhD Tallinn, Estonia Object Oriented Programming in Java Jaanus Pöial, PhD Tallinn, Estonia Motivation for Object Oriented Programming Decrease complexity (use layers of abstraction, interfaces, modularity,...) Reuse existing

More information

Implementing Concurrent Futures in Concurrent Haskell

Implementing Concurrent Futures in Concurrent Haskell Implementing Concurrent Futures in Concurrent Haskell Working Draft David Sabel Institut für Informatik Johann Wolfgang Goethe-Universität Postfach 11 19 32 D-60054 Frankfurt, Germany sabel@ki.informatik.uni-frankfurt.de

More information

Refactoring via Database Representation

Refactoring via Database Representation 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Refactoring via Database Representation Péter Diviánszky 1, Rozália Szabó-Nacsa 2, Zoltán Horváth 1 1 Department

More information

Building custom components IAT351

Building custom components IAT351 Building custom components IAT351 Week 1 Lecture 1 9.05.2012 Lyn Bartram lyn@sfu.ca Today Review assignment issues New submission method Object oriented design How to extend Java and how to scope Final

More information

Parallel FFT With Eden Skeletons

Parallel FFT With Eden Skeletons Parallel FFT With Eden Skeletons Jost Berthold, Mischa Dieterle, Oleg Lobachev, and Rita Loogen Philipps-Universität Marburg, Fachbereich Mathematik und Informatik Hans Meerwein Straße, D-35032 Marburg,

More information

The goal of the Pangaea project, as we stated it in the introduction, was to show that

The goal of the Pangaea project, as we stated it in the introduction, was to show that Chapter 5 Conclusions This chapter serves two purposes. We will summarize and critically evaluate the achievements of the Pangaea project in section 5.1. Based on this, we will then open up our perspective

More information

EMF Metrics: Specification and Calculation of Model Metrics within the Eclipse Modeling Framework

EMF Metrics: Specification and Calculation of Model Metrics within the Eclipse Modeling Framework EMF Metrics: Specification and Calculation of Model Metrics within the Eclipse Modeling Framework Thorsten Arendt a, Pawel Stepien a, Gabriele Taentzer a a Philipps-Universität Marburg, FB12 - Mathematics

More information

B. Evaluation and Exploration of Next Generation Systems for Applicability and Performance (Volodymyr Kindratenko, Guochun Shi)

B. Evaluation and Exploration of Next Generation Systems for Applicability and Performance (Volodymyr Kindratenko, Guochun Shi) A. Summary - In the area of Evaluation and Exploration of Next Generation Systems for Applicability and Performance, over the period of 10/1/10 through 12/30/10 the NCSA Innovative Systems Lab team continued

More information

Source-Based Trace Exploration Work in Progress

Source-Based Trace Exploration Work in Progress Source-Based Trace Exploration Work in Progress Olaf Chitil University of Kent, UK Abstract. Hat is a programmer s tool for generating a trace of a computation of a Haskell 98 program and viewing such

More information

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA M. GAUS, G. R. JOUBERT, O. KAO, S. RIEDEL AND S. STAPEL Technical University of Clausthal, Department of Computer Science Julius-Albert-Str. 4, 38678

More information

Overview of research activities Toward portability of performance

Overview of research activities Toward portability of performance Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into

More information

Lecturer: William W.Y. Hsu. Programming Languages

Lecturer: William W.Y. Hsu. Programming Languages Lecturer: William W.Y. Hsu Programming Languages Chapter 9 Data Abstraction and Object Orientation 3 Object-Oriented Programming Control or PROCESS abstraction is a very old idea (subroutines!), though

More information

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING Larson Hogstrom, Mukarram Tahir, Andres Hasfura Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 18.337/6.338

More information

Object-Interaction Diagrams: Sequence Diagrams UML

Object-Interaction Diagrams: Sequence Diagrams UML Object-Interaction Diagrams: Sequence Diagrams UML Communication and Time In communication diagrams, ordering of messages is achieved by labelling them with sequence numbers This does not make temporal

More information

A Bandwidth Latency Tradeoff for Broadcast and Reduction

A Bandwidth Latency Tradeoff for Broadcast and Reduction A Bandwidth Latency Tradeoff for Broadcast and Reduction Peter Sanders and Jop F. Sibeyn Max-Planck-Institut für Informatik Im Stadtwald, 66 Saarbrücken, Germany. sanders, jopsi@mpi-sb.mpg.de. http://www.mpi-sb.mpg.de/sanders,

More information

CPS221 Lecture: Threads

CPS221 Lecture: Threads Objectives CPS221 Lecture: Threads 1. To introduce threads in the context of processes 2. To introduce UML Activity Diagrams last revised 9/5/12 Materials: 1. Diagram showing state of memory for a process

More information

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05 adavid@cs.aau.dk Introduction Focus on data parallelism, scale with size. Task parallelism limited. Notion of scalability

More information

Claude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique Got 2 seconds Sequential 84 seconds Expected 84/84 = 1 second!?! Got 25 seconds MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Séminaire MATHEMATIQUES

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

Multi-paradigm Declarative Languages

Multi-paradigm Declarative Languages Michael Hanus (CAU Kiel) Multi-paradigm Declarative Languages ICLP 2007 1 Multi-paradigm Declarative Languages Michael Hanus Christian-Albrechts-University of Kiel Programming Languages and Compiler Construction

More information

challenges in domain-specific modeling raphaël mannadiar august 27, 2009

challenges in domain-specific modeling raphaël mannadiar august 27, 2009 challenges in domain-specific modeling raphaël mannadiar august 27, 2009 raphaël mannadiar challenges in domain-specific modeling 1/59 outline 1 introduction 2 approaches 3 debugging and simulation 4 differencing

More information

Parallel programming models. Main weapons

Parallel programming models. Main weapons Parallel programming models Von Neumann machine model: A processor and it s memory program = list of stored instructions Processor loads program (reads from memory), decodes, executes instructions (basic

More information

Clean-CORBA Interface Supporting Pipeline Skeleton

Clean-CORBA Interface Supporting Pipeline Skeleton 6 th International onference on Applied Informatics Eger, Hungary, January 27 31, 2004. lean-orba Interface Supporting Pipeline Skeleton Zoltán Hernyák 1, Zoltán Horváth 2, Viktória Zsók 2 1 Department

More information

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Objective PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Explain what is meant by compiler. Explain how the compiler works. Describe various analysis of the source program. Describe the

More information

Parameterized Complexity of Independence and Domination on Geometric Graphs

Parameterized Complexity of Independence and Domination on Geometric Graphs Parameterized Complexity of Independence and Domination on Geometric Graphs Dániel Marx Institut für Informatik, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany. dmarx@informatik.hu-berlin.de

More information

Execution Architecture

Execution Architecture Execution Architecture Software Architecture VO (706.706) Roman Kern Institute for Interactive Systems and Data Science, TU Graz 2018-11-07 Roman Kern (ISDS, TU Graz) Execution Architecture 2018-11-07

More information

EPOS: an Object-Oriented Operating System

EPOS: an Object-Oriented Operating System EPOS: an Object-Oriented Operating System Antônio Augusto Fröhlich 1 Wolfgang Schröder-Preikschat 2 1 GMD FIRST guto@first.gmd.de 2 University of Magdeburg wosch@ivs.cs.uni-magdeburg.de Abstract This position

More information

Department of Computing Science Granularity in Large-Scale Parallel Functional Programming Hans Wolfgang Loidl A thesis submitted for a Doctor of Philosophy Degree in Computing Science at the University

More information

A Skeleton for Distributed Work Pools in Eden

A Skeleton for Distributed Work Pools in Eden A Skeleton for Distributed Work Pools in Eden Mischa Dieterle 1, Jost Berthold 2, and Rita Loogen 1 1 Philipps-Universität Marburg, Fachbereich Mathematik und Informatik Hans Meerwein Straße, D-35032 Marburg,

More information

Concept as a Generalization of Class and Principles of the Concept-Oriented Programming

Concept as a Generalization of Class and Principles of the Concept-Oriented Programming Computer Science Journal of Moldova, vol.13, no.3(39), 2005 Concept as a Generalization of Class and Principles of the Concept-Oriented Programming Alexandr Savinov Abstract In the paper we describe a

More information

On the scalability of tracing mechanisms 1

On the scalability of tracing mechanisms 1 On the scalability of tracing mechanisms 1 Felix Freitag, Jordi Caubet, Jesus Labarta Departament d Arquitectura de Computadors (DAC) European Center for Parallelism of Barcelona (CEPBA) Universitat Politècnica

More information

Diffusing Your Mobile Apps: Extending In-Network Function Virtualisation to Mobile Function Offloading

Diffusing Your Mobile Apps: Extending In-Network Function Virtualisation to Mobile Function Offloading Diffusing Your Mobile Apps: Extending In-Network Function Virtualisation to Mobile Function Offloading Mario Almeida, Liang Wang*, Jeremy Blackburn, Konstantina Papagiannaki, Jon Crowcroft* Telefonica

More information

Offloading Java to Graphics Processors

Offloading Java to Graphics Processors Offloading Java to Graphics Processors Peter Calvert (prc33@cam.ac.uk) University of Cambridge, Computer Laboratory Abstract Massively-parallel graphics processors have the potential to offer high performance

More information

Parallel & Concurrent Haskell 1: Basic Pure Parallelism. Simon Marlow (Microsoft Research, Cambridge, UK)

Parallel & Concurrent Haskell 1: Basic Pure Parallelism. Simon Marlow (Microsoft Research, Cambridge, UK) Parallel & Concurrent Haskell 1: Basic Pure Parallelism Simon Marlow (Microsoft Research, Cambridge, UK) Background The parallelism landscape is getting crowded Every language has their own solution(s)

More information

Reference Requirements for Records and Documents Management

Reference Requirements for Records and Documents Management Reference Requirements for Records and Documents Management Ricardo Jorge Seno Martins ricardosenomartins@gmail.com Instituto Superior Técnico, Lisboa, Portugal May 2015 Abstract When information systems

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

COP 3330 Final Exam Review

COP 3330 Final Exam Review COP 3330 Final Exam Review I. The Basics (Chapters 2, 5, 6) a. comments b. identifiers, reserved words c. white space d. compilers vs. interpreters e. syntax, semantics f. errors i. syntax ii. run-time

More information

HPX. High Performance ParalleX CCT Tech Talk Series. Hartmut Kaiser

HPX. High Performance ParalleX CCT Tech Talk Series. Hartmut Kaiser HPX High Performance CCT Tech Talk Hartmut Kaiser (hkaiser@cct.lsu.edu) 2 What s HPX? Exemplar runtime system implementation Targeting conventional architectures (Linux based SMPs and clusters) Currently,

More information

Black-Box Program Specialization

Black-Box Program Specialization Published in Technical Report 17/99, Department of Software Engineering and Computer Science, University of Karlskrona/Ronneby: Proceedings of WCOP 99 Black-Box Program Specialization Ulrik Pagh Schultz

More information

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger Parallel Programming Concepts Parallel Algorithms Peter Tröger Sources: Ian Foster. Designing and Building Parallel Programs. Addison-Wesley. 1995. Mattson, Timothy G.; S, Beverly A.; ers,; Massingill,

More information

Intel VTune Amplifier XE

Intel VTune Amplifier XE Intel VTune Amplifier XE Vladimir Tsymbal Performance, Analysis and Threading Lab 1 Agenda Intel VTune Amplifier XE Overview Features Data collectors Analysis types Key Concepts Collecting performance

More information

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements Programming Languages Third Edition Chapter 9 Control I Expressions and Statements Objectives Understand expressions Understand conditional statements and guards Understand loops and variation on WHILE

More information

Lecture Notes on Programming Languages

Lecture Notes on Programming Languages Lecture Notes on Programming Languages 85 Lecture 09: Support for Object-Oriented Programming This lecture discusses how programming languages support object-oriented programming. Topics to be covered

More information

Operating Systems. 18. Remote Procedure Calls. Paul Krzyzanowski. Rutgers University. Spring /20/ Paul Krzyzanowski

Operating Systems. 18. Remote Procedure Calls. Paul Krzyzanowski. Rutgers University. Spring /20/ Paul Krzyzanowski Operating Systems 18. Remote Procedure Calls Paul Krzyzanowski Rutgers University Spring 2015 4/20/2015 2014-2015 Paul Krzyzanowski 1 Remote Procedure Calls 2 Problems with the sockets API The sockets

More information