Milind Kulkarni Research Statement

Size: px
Start display at page:

Download "Milind Kulkarni Research Statement"

Transcription

1 Milind Kulkarni Research Statement With the increasing ubiquity of multicore processors, interest in parallel programming is again on the upswing. Over the past three decades, languages and compilers researchers have struggled with easing the burden of writing parallel programs. While these efforts have met with success in some domains dense linear algebra and SQL programming being two well-known examples writing efficient parallel code is still largely the purview of expert programmers. One of the great challenges facing the programming languages community is to make parallel programming accessible and effective for average programmers. I believe the key to making parallel programming accessible is to hide as much complexity as possible behind intuitive abstractions which capture important information about parallelism and locality. These abstractions can then be exploited by reusable libraries and run-time systems written by expert programmers, allowing most programmers to write algorithms in an intuitive, nearly sequential style. My research has focused on discovering useful and natural abstractions for writing irregular programs programs that manipulate pointer-based data structures such as trees and graphs and developing the compiler techniques and run-time systems needed to exploit those abstractions. This research has opened up a number of additional directions that I would like to explore: (1) building new systems that exploit program semantics to enhance parallelism and locality; (2) developing modeling tools that allow programmers to better understand the parallelism in algorithms and systems; (3) using autotuning techniques to allow parallel programs to adapt to novel architectures; and (4) exploring new application domains which offer new challenges for parallel programming. Tackling these problems are important steps towards solving the problem of parallel programming. Research approach My research tends to adopt the following pattern: (1) study interesting problems arising in important real-world applications; (2) find general patterns in those problems; (3) develop abstractions that capture those general patterns; (4) produce efficient implementations of those abstractions. This approach has served me well throughout my research, as demonstrated by my development of the Galois system for optimistic parallelization [1]: (1) Study: I begin projects by carefully studying important applications from a variety of domains. By understanding specific applications, I can find out where current approaches for optimization or parallelization fall short, and why. To drive this search for interesting problems, I have collaborated with researchers from a variety of application domains, in fields ranging from computational geometry to graphics to data mining. The genesis of the Galois system came from studying two real-world algorithms from these domains, Delaunay mesh refinement and agglomerative clustering. (2) Generalize: Armed with a deep understanding of applications, and having identified particular problems to solve, the next step is to generalize. For example, computation in Delaunay mesh refinement is structured as processing elements from a worklist in an arbitrary order, a pattern that appears in a variety of irregular applications. While worklist items often exhibit a complex pattern of dependences, there is nevertheless parallelism to be exploited by processing

2 independent worklist elements concurrently. I call this pattern of parallelism amorphous data parallelism. This type of parallelism is an ideal target for speculative parallelization. (3) Abstract: These generalizations allow me to develop abstractions which capture important program behavior. Good abstractions possess two key properties: they should be intuitive, and they should expose useful program semantics. For example, amorphous data parallelism can be expressed through the use of optimistic iterators, which highlight the opportunity for parallelism in a program. Despite having simple sequential semantics, these iterators expose ordering properties that are otherwise hidden and provide a hint that speculative parallelization can be profitable. The abstractions I develop are heavily informed by my experience with real-world applications. For example, I realized that existing speculative parallelization techniques such as threadlevel speculation would detect a number of benign dependences in irregular applications. Because violating a benign dependence does not affect correctness, parallelism can be improved by ignoring such dependences, provided they can be identified. I used the notion of semantic commutativity, an abstraction that precisely exposes the object semantics required for exact dependence checking, to produce object libraries that allow significant amounts of concurrency during speculative execution. (4) Implement: The final step of a research project is to produce efficient implementations of the abstractions. This can encompass any number of techniques, from compiler transformations to run-time systems. The Galois system comprises a software run-time that can parallelize programs written using optimistic iterators and object libraries which leverage semantic commutativity to perform precise dependence checking. This implementation achieved low overhead and good scalability, proving that the abstractions I developed could be efficiently supported and exploited. Prior Research My past research reflects the approach outlined above. The abstractions for expressing and exploiting amorphous data parallelism described above formed the basis of the Galois system. This initial work showed that it was possible to write irregular programs in a straightforward, nearly sequential manner and still achieve useful parallelism. Two further contributions of my research have been developing locality abstractions for irregular data structures [2] and scheduling abstractions for amorphous data parallelism [3], and integrating these abstractions into the Galois system. This research has met with significant industry interest Intel and IBM have contributed funding and continuing to collaborate with industry is a key point in my research agenda. Locality Abstractions To truly unlock the potential parallelism in amorphous data-parallel programming requires attending to locality. Achieving locality is the key to high-performance parallel programs. Naive parallel implementations of irregular algorithms can suffer from poor cache locality (e.g., because computations scheduled for a single processor access data from all regions of a data structure), and, similarly, naively running locality-preserving sequential implementations in parallel may result in high contention (e.g., because computations scheduled simultaneously on multiple processors require accessing the same region of a data structure). This interplay between locality and parallelism is especially problematic in irregular programs, as there is no well-defined notion of locality in irregular data structures such as graphs or trees. The problem is obvious: How can a programmer exploit locality in a data structure that doesn t seem to have any?

3 I answered this question in [2] by proposing an abstraction that captures semantic locality in irregular data structures. Semantic locality refers to locality that arises in an irregular data structure due to the semantics of its access patterns. For example, in a graph, a node is semantically local to its neighbors, as from a given node you can access its neighbors. Note that this locality is preserved regardless of the implementation of the graph. To capture semantic locality, I introduced the abstraction of partitioning: irregular data structures are logically partitioned, with the property that regions of the data structure in the same partition are semantically local (and vice versa). I showed how to exploit this partitioning to improve parallelism, by scheduling computations affecting different partitions on different processors; to improve locality, by scheduling computations within a single partition to exploit temporal locality; and to reduce the overhead of conflict detection, by replacing precise conflict detection with locks on partitions. Thus, I showed that a simple, intuitive abstraction can allow locality to be successfully exploited for irregular data structures. Scheduling Abstractions My experience with the Galois system and partition-based scheduling made it clear that scheduling, the assignment of work to processors, has an enormous effect on performance. Unfortunately, the behavior of a given schedule is highly application dependent, and the space of possible schedules is vast. In [3] I developed a scheduling framework for describing computation schedules for amorphous data-parallel programs. This framework is built around three abstractions, which together fully describe a given schedule: (i) clustering, which specifies chunks of work that should be executed on a single processor; (ii) labeling, which assigns clusters of work to particular processors; and (iii) ordering, which determines what order a processor executes its assigned work. I showed that this framework is general: it can be instantiated to produce all the schedules used in data-parallel frameworks such as OpenMP, as well as the partition-based computation scheduling used in [2]. I also showed that the framework is useful: I gave instantiations of the framework which produced novel schedules that gave greater performance than existing schedules. This work demonstrates that a small number of simple abstractions suffice to describe the vast space of schedules that can be applied to parallel, irregular applications. Future Research Short term My short term research goals fall into two categories: (1) finding new ways to exploit program semantics to produce efficient parallel programs, and (2) giving programmers new tools to model the parallelism in algorithms and profile parallel implementations of those algorithms. Along the first line of inquiry, partitioning information may be valuable when assigning threads to cores in a hierarchical architecture. Intuitively, threads that are likely to communicate with one another should be assigned to cores that enjoy low-latency communication through mechanisms such as shared caches. Partitioning information, as well as partition-aware scheduling, can allow a system to make intelligent assumptions about communication patterns, even for irregular programs. In managed languages such as Java, it may be possible to use partitioning information during garbage collection to re-layout data structures, turning the exposed semantic locality into spatial locality. The scheduling framework I developed in [3] is descriptive; it does not mandate a particular implementation of schedules. I plan to develop a language for schedules that will allow programmers to specify in a declarative manner the scheduling properties they want, as in systems like OpenMP. Given this specification, a compiler can generate a scheduler which will be used within the Galois run-time. This compiler machinery will enable autotuning: a meta-

4 compiler can automatically generate a number of potential schedulers for an application and evaluate them on a test input, choosing the schedule that performs best for a given architecture. I have also recently become interested in modeling the behavior of amorphous dataparallel programs. I wrote a tool called ParaMeter to begin investigating the parallelism available in such programs [4]. ParaMeter estimates parallelism by finding a maximal independent set of work at each step in a computation, providing an upper bound on the amount of parallelism in a program. The current version of ParaMeter makes simplifying assumptions about how long work takes (each piece of work takes unit time) and communication costs (no cost). I plan to extend ParaMeter to provide more accurate models of parallelism by accounting for work irregularity and communication behavior. I believe this will be a useful tool not only to the programming languages and systems community, but also to the algorithms community, as it will provide insight into the expected parallel performance of irregular algorithms. Long term An interesting pattern that I have noticed in my work is that many of the abstractions I developed allow irregular programs to be transformed in much the same way that regular, densematrix programs are. The optimistic iterators I proposed are the basic parallel loop construct, analogous to DO-ALL loops in languages like Fortran, and techniques like partition-aware scheduling are analogous to loop-tiling in matrix codes. It may be possible to lift other high-level program transformations from the world of regular programs to the world of irregular programs. For example, consider loop interchange, which can improve locality in matrix codes by changing traversal order. What does loop interchange mean when applied to an irregular program consisting of repeated traversals of an irregular data structure (a pattern that appears in, e.g., n- body codes)? A transformation analogous to loop interchange, when applied to such a code, might produce a reordered sequence of partial traversals that can be grouped together to promote locality. Are these types of transformations always legal? Is there a general way to express such transformations? Autotuning techniques may be more broadly applicable in programs written at a suitable level of abstraction. As long as the abstractions are well defined, it may be possible to automatically search a space of possible instantiations of those abstractions to choose the best possible concrete implementation of a program. I am especially interested in dynamic autotuning, where the parameters of a program are changed at run time in response to input characteristics or runtime behavior. To me, the key question when thinking about future research is identifying new and exciting application domains. I think that there are several emerging areas that are the target of a lot of research, and will require substantial amounts of parallelism. Computational biology brings software analysis to bear on massive data sets. A number of algorithms common in computational biology are irregular in nature, such as Survey Propagation for solving SAT problems. What new techniques will be needed to parallelize and optimize irregular algorithms that work with vast amounts of data? How can speculation techniques like Galois be brought to bear on applications that require distributed memory architectures? On a lighter note, games are always on the cutting edge of the performance curve, and the algorithms underlying high-performance games will hence require parallelism. While some tasks such as shading are inherently parallelizable, many are more difficult. Maintaining game state requires tracking the position and behavior of thousands of objects, each of which can interact with others; simultaneously updating the states of these game objects fits naturally into the framework of amorphous data parallelism. What sorts of systems are needed to parallelize game algorithms while adhering to real-time constraints? Can hardware usually devoted to graphics be leveraged to improve the performance of other gaming algorithms?

5 References [1] Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala and L. Paul Chew. Optimistic Parallelism Requires Abstractions. In Programming Languages Design and Implementation, June [2] Milind Kulkarni, Keshav Pingali, Ganesh Ramanarayanan, Bruce Walter, Kavita Bala and L. Paul Chew. Optimistic Parallelism Benefits From Data Partitioning. In Architectural Support for Programming Languages and Operating Systems, March [3] Milind Kulkarni, Patrick Carribault, Keshav Pingali, Ganesh Ramanarayanan, Bruce Walter, Kavita Bala and L. Paul Chew. Scheduling Strategies for Optimistic Parallelization of Irregular Programs. In Symposium on Parallelism in Algorithms and Architectures, June [4] Milind Kulkarni, Martin Burtscher, R. Inkulu, Keshav Pingali and Calin Cascaval. How Much Parallelism is There in Irregular Applications? In Principles and Practices of Parallel Programming, February 2009 (to appear).

Scheduling Issues in Optimistic Parallelization

Scheduling Issues in Optimistic Parallelization Scheduling Issues in Optimistic Parallelization Milind Kulkarni and Keshav Pingali University of Texas at Austin Department of Computer Science Austin, TX {milind, pingali}@cs.utexas.edu Abstract Irregular

More information

Lecture 13: March 25

Lecture 13: March 25 CISC 879 Software Support for Multicore Architectures Spring 2007 Lecture 13: March 25 Lecturer: John Cavazos Scribe: Ying Yu 13.1. Bryan Youse-Optimization of Sparse Matrix-Vector Multiplication on Emerging

More information

DITA for Enterprise Business Documents Sub-committee Proposal Background Why an Enterprise Business Documents Sub committee

DITA for Enterprise Business Documents Sub-committee Proposal Background Why an Enterprise Business Documents Sub committee DITA for Enterprise Business Documents Sub-committee Proposal Background Why an Enterprise Business Documents Sub committee Documents initiate and record business change. It is easy to map some business

More information

Parallel Programming Interfaces

Parallel Programming Interfaces Parallel Programming Interfaces Background Different hardware architectures have led to fundamentally different ways parallel computers are programmed today. There are two basic architectures that general

More information

Fast Agglomerative Clustering for Rendering

Fast Agglomerative Clustering for Rendering Fast Agglomerative Clustering for Rendering Bruce Walter, Kavita Bala, Cornell University Milind Kulkarni, Keshav Pingali University of Texas, Austin Clustering Tree Hierarchical data representation Each

More information

Parallel Programming in the Age of Ubiquitous Parallelism. Keshav Pingali The University of Texas at Austin

Parallel Programming in the Age of Ubiquitous Parallelism. Keshav Pingali The University of Texas at Austin Parallel Programming in the Age of Ubiquitous Parallelism Keshav Pingali The University of Texas at Austin Parallelism is everywhere Texas Advanced Computing Center Laptops Cell-phones Parallel programming?

More information

Optimistic Parallelism Benefits from Data Partitioning

Optimistic Parallelism Benefits from Data Partitioning Optimistic Parallelism Benefits from Data Partitioning Abstract Milind Kulkarni, Keshav Pingali Department of Computer Science The University of Texas at Austin {milind, pingali}@cs.utexas.edu Recent studies

More information

CS4961 Parallel Programming. Lecture 10: Data Locality, cont. Writing/Debugging Parallel Code 09/23/2010

CS4961 Parallel Programming. Lecture 10: Data Locality, cont. Writing/Debugging Parallel Code 09/23/2010 Parallel Programming Lecture 10: Data Locality, cont. Writing/Debugging Parallel Code Mary Hall September 23, 2010 1 Observations from the Assignment Many of you are doing really well Some more are doing

More information

Automatically Enhancing Locality for Tree Traversals with Traversal Splicing

Automatically Enhancing Locality for Tree Traversals with Traversal Splicing Purdue University Purdue e-pubs ECE Technical Reports Electrical and Computer Engineering 2-15-2012 Automatically Enhancing Locality for Tree Traversals with Traversal Splicing Youngjoon Jo Electrical

More information

Administration. Course material. Prerequisites. CS 395T: Topics in Multicore Programming. Instructors: TA: Course in computer architecture

Administration. Course material. Prerequisites. CS 395T: Topics in Multicore Programming. Instructors: TA: Course in computer architecture CS 395T: Topics in Multicore Programming Administration Instructors: Keshav Pingali (CS,ICES) 4.26A ACES Email: pingali@cs.utexas.edu TA: Xin Sui Email: xin@cs.utexas.edu University of Texas, Austin Fall

More information

Administration. Prerequisites. Meeting times. CS 380C: Advanced Topics in Compilers

Administration. Prerequisites. Meeting times. CS 380C: Advanced Topics in Compilers Administration CS 380C: Advanced Topics in Compilers Instructor: eshav Pingali Professor (CS, ICES) Office: POB 4.126A Email: pingali@cs.utexas.edu TA: TBD Graduate student (CS) Office: Email: Meeting

More information

Lonestar: A Suite of Parallel Irregular Programs

Lonestar: A Suite of Parallel Irregular Programs Lonestar: A Suite of Parallel Irregular Programs Milind Kulkarni a, Martin Burtscher a,călin Caşcaval b, and Keshav Pingali a a The University of Texas at Austin b IBM T.J. Watson Research Center Abstract

More information

Parallel Inclusion-based Points-to Analysis

Parallel Inclusion-based Points-to Analysis Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali, Institute for Computational Engineering and Sciences, University of Texas, Austin, TX Dept. of Computer Science.

More information

Enhancing Locality for Recursive Traversals of Recursive Structures

Enhancing Locality for Recursive Traversals of Recursive Structures Enhancing Locality for Recursive Traversals of Recursive Structures Youngjoon Jo and Milind Kulkarni School of Electrical and Computer Engineering Purdue University {yjo,milind}@purdue.edu Abstract While

More information

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 17 January 2017 Last Thursday

More information

Administration. Prerequisites. Website. CSE 392/CS 378: High-performance Computing: Principles and Practice

Administration. Prerequisites. Website. CSE 392/CS 378: High-performance Computing: Principles and Practice CSE 392/CS 378: High-performance Computing: Principles and Practice Administration Professors: Keshav Pingali 4.126 ACES Email: pingali@cs.utexas.edu Jim Browne Email: browne@cs.utexas.edu Robert van de

More information

SYNTHESIZING CONCURRENT SCHEDULERS FOR IRREGULAR ALGORITHMS. Donald Nguyen, Keshav Pingali

SYNTHESIZING CONCURRENT SCHEDULERS FOR IRREGULAR ALGORITHMS. Donald Nguyen, Keshav Pingali SYNTHESIZING CONCURRENT SCHEDULERS FOR IRREGULAR ALGORITHMS Donald Nguyen, Keshav Pingali TERMINOLOGY irregular algorithm opposite: dense arrays work on pointered data structures like graphs shape of the

More information

Summary: Open Questions:

Summary: Open Questions: Summary: The paper proposes an new parallelization technique, which provides dynamic runtime parallelization of loops from binary single-thread programs with minimal architectural change. The realization

More information

Parallelizing Irregular Algorithms: A Pattern Language

Parallelizing Irregular Algorithms: A Pattern Language Parallelizing Irregular Algorithms: A Pattern Language Pedro Monteiro CITI, Departamento de Informática Faculdade de Ciências e Tecnologia Universidade Nova de Lisboa 2829-516 Caparica, Portugal +351 212

More information

Parallel Graph Partitioning on Multicore Architectures

Parallel Graph Partitioning on Multicore Architectures Parallel Graph Partitioning on Multicore Architectures Xin Sui 1, Donald Nguyen 1, and Keshav Pingali 1,2 1 Department of Computer Science, University of Texas, Austin 2 Institute for Computational Engineering

More information

Lightcuts. Jeff Hui. Advanced Computer Graphics Rensselaer Polytechnic Institute

Lightcuts. Jeff Hui. Advanced Computer Graphics Rensselaer Polytechnic Institute Lightcuts Jeff Hui Advanced Computer Graphics 2010 Rensselaer Polytechnic Institute Fig 1. Lightcuts version on the left and naïve ray tracer on the right. The lightcuts took 433,580,000 clock ticks and

More information

INTRODUCTION. Chapter GENERAL

INTRODUCTION. Chapter GENERAL Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which

More information

Optimistic Parallelism Requires Abstractions

Optimistic Parallelism Requires Abstractions Optimistic Parallelism Requires Abstractions Abstract Milind Kulkarni, Keshav Pingali Department of Computer Science, University of Texas, Austin. {milind, pingali}@cs.utexas.edu Irregular applications,

More information

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark by Nkemdirim Dockery High Performance Computing Workloads Core-memory sized Floating point intensive Well-structured

More information

Grand Central Dispatch

Grand Central Dispatch A better way to do multicore. (GCD) is a revolutionary approach to multicore computing. Woven throughout the fabric of Mac OS X version 10.6 Snow Leopard, GCD combines an easy-to-use programming model

More information

Multicore Computing and Scientific Discovery

Multicore Computing and Scientific Discovery scientific infrastructure Multicore Computing and Scientific Discovery James Larus Dennis Gannon Microsoft Research In the past half century, parallel computers, parallel computation, and scientific research

More information

Administration. Prerequisites. CS 395T: Topics in Multicore Programming. Why study parallel programming? Instructors: TA:

Administration. Prerequisites. CS 395T: Topics in Multicore Programming. Why study parallel programming? Instructors: TA: CS 395T: Topics in Multicore Programming Administration Instructors: Keshav Pingali (CS,ICES) 4.126A ACES Email: pingali@cs.utexas.edu TA: Aditya Rawal Email: 83.aditya.rawal@gmail.com University of Texas,

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

CS229 Project: TLS, using Learning to Speculate

CS229 Project: TLS, using Learning to Speculate CS229 Project: TLS, using Learning to Speculate Jiwon Seo Dept. of Electrical Engineering jiwon@stanford.edu Sang Kyun Kim Dept. of Electrical Engineering skkim38@stanford.edu ABSTRACT We apply machine

More information

Mapping Vector Codes to a Stream Processor (Imagine)

Mapping Vector Codes to a Stream Processor (Imagine) Mapping Vector Codes to a Stream Processor (Imagine) Mehdi Baradaran Tahoori and Paul Wang Lee {mtahoori,paulwlee}@stanford.edu Abstract: We examined some basic problems in mapping vector codes to stream

More information

A Characterization of Shared Data Access Patterns in UPC Programs

A Characterization of Shared Data Access Patterns in UPC Programs IBM T.J. Watson Research Center A Characterization of Shared Data Access Patterns in UPC Programs Christopher Barton, Calin Cascaval, Jose Nelson Amaral LCPC `06 November 2, 2006 Outline Motivation Overview

More information

Parallel Programming in the Age of Ubiquitous Parallelism. Andrew Lenharth Slides: Keshav Pingali The University of Texas at Austin

Parallel Programming in the Age of Ubiquitous Parallelism. Andrew Lenharth Slides: Keshav Pingali The University of Texas at Austin Parallel Programming in the Age of Ubiquitous Parallelism Andrew Lenharth Slides: Keshav Pingali The University of Texas at Austin Parallel Programming in the Age of Ubiquitous Parallelism Andrew Lenharth

More information

Parallelism. Parallel Hardware. Introduction to Computer Systems

Parallelism. Parallel Hardware. Introduction to Computer Systems Parallelism We have been discussing the abstractions and implementations that make up an individual computer system in considerable detail up to this point. Our model has been a largely sequential one,

More information

Foundations of the C++ Concurrency Memory Model

Foundations of the C++ Concurrency Memory Model Foundations of the C++ Concurrency Memory Model John Mellor-Crummey and Karthik Murthy Department of Computer Science Rice University johnmc@rice.edu COMP 522 27 September 2016 Before C++ Memory Model

More information

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 1 Today Characteristics of Tasks and Interactions (3.3). Mapping Techniques for Load Balancing (3.4). Methods for Containing Interaction

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Chapter 15: Transactions

Chapter 15: Transactions Chapter 15: Transactions! Transaction Concept! Transaction State! Implementation of Atomicity and Durability! Concurrent Executions! Serializability! Recoverability! Implementation of Isolation! Transaction

More information

Amorphous Data-parallelism in Irregular Algorithms

Amorphous Data-parallelism in Irregular Algorithms Amorphous Data-parallelism in Irregular Algorithms Keshav Pingali 1,2, Milind Kulkarni 2, Donald Nguyen 1, Martin Burtscher 2, Mario Mendez-Lojo 2, Dimitrios Prountzos 1, Xin Sui 3, Zifei Zhong 1 1 Department

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

Lab 1c: Collision detection

Lab 1c: Collision detection Concepts in Multicore Programming February 9, 2010 Massachusetts Institute of Technology 6.884 Charles E. Leiserson Handout 6 Lab 1c: Collision detection In this lab, you will use Cilk++ to parallelize

More information

Optimize Data Structures and Memory Access Patterns to Improve Data Locality

Optimize Data Structures and Memory Access Patterns to Improve Data Locality Optimize Data Structures and Memory Access Patterns to Improve Data Locality Abstract Cache is one of the most important resources

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

Transactions. Prepared By: Neeraj Mangla

Transactions. Prepared By: Neeraj Mangla Transactions Prepared By: Neeraj Mangla Chapter 15: Transactions Transaction Concept Transaction State Concurrent Executions Serializability Recoverability Implementation of Isolation Transaction Definition

More information

Control flow graphs and loop optimizations. Thursday, October 24, 13

Control flow graphs and loop optimizations. Thursday, October 24, 13 Control flow graphs and loop optimizations Agenda Building control flow graphs Low level loop optimizations Code motion Strength reduction Unrolling High level loop optimizations Loop fusion Loop interchange

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 28 August 2018 Last Thursday Introduction

More information

Ch 1: The Architecture Business Cycle

Ch 1: The Architecture Business Cycle Ch 1: The Architecture Business Cycle For decades, software designers have been taught to build systems based exclusively on the technical requirements. Software architecture encompasses the structures

More information

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh

More information

Class Analysis for Testing of Polymorphism in Java Software

Class Analysis for Testing of Polymorphism in Java Software Class Analysis for Testing of Polymorphism in Java Software Atanas Rountev Ana Milanova Barbara G. Ryder Rutgers University, New Brunswick, NJ 08903, USA {rountev,milanova,ryder@cs.rutgers.edu Abstract

More information

Parallel Methods for Verifying the Consistency of Weakly-Ordered Architectures. Adam McLaughlin, Duane Merrill, Michael Garland, and David A.

Parallel Methods for Verifying the Consistency of Weakly-Ordered Architectures. Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Parallel Methods for Verifying the Consistency of Weakly-Ordered Architectures Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader Challenges of Design Verification Contemporary hardware

More information

An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm. Martin Burtscher Department of Computer Science Texas State University-San Marcos

An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm. Martin Burtscher Department of Computer Science Texas State University-San Marcos An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm Martin Burtscher Department of Computer Science Texas State University-San Marcos Mapping Regular Code to GPUs Regular codes Operate on

More information

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Analytical Modeling of Parallel Systems To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

Parallel Graph Partitioning on Multicore Architectures

Parallel Graph Partitioning on Multicore Architectures Parallel Graph Partitioning on Multicore Architectures Xin Sui 1, Donald Nguyen 1, Martin Burtscher 2, and Keshav Pingali 1,3 1 Department of Computer Science, University of Texas at Austin 2 Department

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 5.1 Introduction You should all know a few ways of sorting in O(n log n)

More information

THE GALOIS SYSTEM: OPTIMISTIC PARALLELIZATION OF IRREGULAR PROGRAMS

THE GALOIS SYSTEM: OPTIMISTIC PARALLELIZATION OF IRREGULAR PROGRAMS THE GALOIS SYSTEM: OPTIMISTIC PARALLELIZATION OF IRREGULAR PROGRAMS A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the

More information

Overview: Memory Consistency

Overview: Memory Consistency Overview: Memory Consistency the ordering of memory operations basic definitions; sequential consistency comparison with cache coherency relaxing memory consistency write buffers the total store ordering

More information

1 Overview, Models of Computation, Brent s Theorem

1 Overview, Models of Computation, Brent s Theorem CME 323: Distributed Algorithms and Optimization, Spring 2017 http://stanford.edu/~rezab/dao. Instructor: Reza Zadeh, Matroid and Stanford. Lecture 1, 4/3/2017. Scribed by Andreas Santucci. 1 Overview,

More information

Programming Models for Supercomputing in the Era of Multicore

Programming Models for Supercomputing in the Era of Multicore Programming Models for Supercomputing in the Era of Multicore Marc Snir MULTI-CORE CHALLENGES 1 Moore s Law Reinterpreted Number of cores per chip doubles every two years, while clock speed decreases Need

More information

Multi-core Computing Lecture 2

Multi-core Computing Lecture 2 Multi-core Computing Lecture 2 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh August 21, 2012 Multi-core Computing Lectures: Progress-to-date

More information

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value KNOWLEDGENT INSIGHTS volume 1 no. 5 October 7, 2011 Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value Today s growing commercial, operational and regulatory

More information

Transactions These slides are a modified version of the slides of the book Database System Concepts (Chapter 15), 5th Ed

Transactions These slides are a modified version of the slides of the book Database System Concepts (Chapter 15), 5th Ed Transactions These slides are a modified version of the slides of the book Database System Concepts (Chapter 15), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan. Original slides are available

More information

Relaxed Memory-Consistency Models

Relaxed Memory-Consistency Models Relaxed Memory-Consistency Models [ 9.1] In Lecture 13, we saw a number of relaxed memoryconsistency models. In this lecture, we will cover some of them in more detail. Why isn t sequential consistency

More information

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures University of Virginia Dept. of Computer Science Technical Report #CS-2011-09 Jeremy W. Sheaffer and Kevin

More information

SAT, SMT and QBF Solving in a Multi-Core Environment

SAT, SMT and QBF Solving in a Multi-Core Environment SAT, SMT and QBF Solving in a Multi-Core Environment Bernd Becker Tobias Schubert Faculty of Engineering, Albert-Ludwigs-University Freiburg, 79110 Freiburg im Breisgau, Germany {becker schubert}@informatik.uni-freiburg.de

More information

UNIT I. Introduction

UNIT I. Introduction UNIT I Introduction Objective To know the need for database system. To study about various data models. To understand the architecture of database system. To introduce Relational database system. Introduction

More information

ICOM 5016 Database Systems. Chapter 15: Transactions. Transaction Concept. Chapter 15: Transactions. Transactions

ICOM 5016 Database Systems. Chapter 15: Transactions. Transaction Concept. Chapter 15: Transactions. Transactions ICOM 5016 Database Systems Transactions Chapter 15: Transactions Amir H. Chinaei Department of Electrical and Computer Engineering University of Puerto Rico, Mayagüez Slides are adapted from: Database

More information

Chapter 3 Parallel Software

Chapter 3 Parallel Software Chapter 3 Parallel Software Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers

More information

Rubicon: Scalable Bounded Verification of Web Applications

Rubicon: Scalable Bounded Verification of Web Applications Joseph P. Near Research Statement My research focuses on developing domain-specific static analyses to improve software security and reliability. In contrast to existing approaches, my techniques leverage

More information

is easing the creation of new ontologies by promoting the reuse of existing ones and automating, as much as possible, the entire ontology

is easing the creation of new ontologies by promoting the reuse of existing ones and automating, as much as possible, the entire ontology Preface The idea of improving software quality through reuse is not new. After all, if software works and is needed, just reuse it. What is new and evolving is the idea of relative validation through testing

More information

Statement of Research for Taliver Heath

Statement of Research for Taliver Heath Statement of Research for Taliver Heath Research on the systems side of Computer Science straddles the line between science and engineering. Both aspects are important, so neither side should be ignored

More information

Chapter 13: Transactions

Chapter 13: Transactions Chapter 13: Transactions Transaction Concept Transaction State Implementation of Atomicity and Durability Concurrent Executions Serializability Recoverability Implementation of Isolation Transaction Definition

More information

RATCOP: Relational Analysis Tool for Concurrent Programs

RATCOP: Relational Analysis Tool for Concurrent Programs RATCOP: Relational Analysis Tool for Concurrent Programs Suvam Mukherjee 1, Oded Padon 2, Sharon Shoham 2, Deepak D Souza 1, and Noam Rinetzky 2 1 Indian Institute of Science, India 2 Tel Aviv University,

More information

to automatically generate parallel code for many applications that periodically update shared data structures using commuting operations and/or manipu

to automatically generate parallel code for many applications that periodically update shared data structures using commuting operations and/or manipu Semantic Foundations of Commutativity Analysis Martin C. Rinard y and Pedro C. Diniz z Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 fmartin,pedrog@cs.ucsb.edu

More information

JIVE: Dynamic Analysis for Java

JIVE: Dynamic Analysis for Java JIVE: Dynamic Analysis for Java Overview, Architecture, and Implementation Demian Lessa Computer Science and Engineering State University of New York, Buffalo Dec. 01, 2010 Outline 1 Overview 2 Architecture

More information

Notes and Comments for [1]

Notes and Comments for [1] Notes and Comments for [1] Zhang Qin July 14, 007 The purpose of the notes series Good Algorithms, especially for those natural problems, should be simple and elegant. Natural problems are those with universal

More information

Transactions. Lecture 8. Transactions. ACID Properties. Transaction Concept. Example of Fund Transfer. Example of Fund Transfer (Cont.

Transactions. Lecture 8. Transactions. ACID Properties. Transaction Concept. Example of Fund Transfer. Example of Fund Transfer (Cont. Transactions Transaction Concept Lecture 8 Transactions Transaction State Implementation of Atomicity and Durability Concurrent Executions Serializability Recoverability Implementation of Isolation Chapter

More information

Story so far. Parallel Data Structures. Parallel data structure. Working smoothly with Galois iterators

Story so far. Parallel Data Structures. Parallel data structure. Working smoothly with Galois iterators Story so far Parallel Data Structures Wirth s motto Algorithm + Data structure = Program So far, we have studied parallelism in regular and irregular algorithms scheduling techniques for exploiting parallelism

More information

AUTOMATIC VECTORIZATION OF TREE TRAVERSALS

AUTOMATIC VECTORIZATION OF TREE TRAVERSALS AUTOMATIC VECTORIZATION OF TREE TRAVERSALS Youngjoon Jo, Michael Goldfarb and Milind Kulkarni PACT, Edinburgh, U.K. September 11 th, 2013 Youngjoon Jo 2 Commodity processors and SIMD Commodity processors

More information

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads) Parallel Programming Models Parallel Programming Models Shared Memory (without threads) Threads Distributed Memory / Message Passing Data Parallel Hybrid Single Program Multiple Data (SPMD) Multiple Program

More information

Design of Parallel Algorithms. Models of Parallel Computation

Design of Parallel Algorithms. Models of Parallel Computation + Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms and Concurrency n Introduction to Parallel Algorithms n Tasks and Decomposition n Processes and Mapping n Processes

More information

Business Rules Extracted from Code

Business Rules Extracted from Code 1530 E. Dundee Rd., Suite 100 Palatine, Illinois 60074 United States of America Technical White Paper Version 2.2 1 Overview The purpose of this white paper is to describe the essential process for extracting

More information

20762B: DEVELOPING SQL DATABASES

20762B: DEVELOPING SQL DATABASES ABOUT THIS COURSE This five day instructor-led course provides students with the knowledge and skills to develop a Microsoft SQL Server 2016 database. The course focuses on teaching individuals how to

More information

Parallel Programming Must Be Deterministic by Default

Parallel Programming Must Be Deterministic by Default Parallel Programming Must Be Deterministic by Default Robert L. Bocchino Jr., Vikram S. Adve, Sarita V. Adve and Marc Snir University of Illinois at Urbana-Champaign {bocchino,vadve,sadve,snir}@illinois.edu

More information

Adaptive Assignment for Real-Time Raytracing

Adaptive Assignment for Real-Time Raytracing Adaptive Assignment for Real-Time Raytracing Paul Aluri [paluri] and Jacob Slone [jslone] Carnegie Mellon University 15-418/618 Spring 2015 Summary We implemented a CUDA raytracer accelerated by a non-recursive

More information

Adaptive Lock. Madhav Iyengar < >, Nathaniel Jeffries < >

Adaptive Lock. Madhav Iyengar < >, Nathaniel Jeffries < > Adaptive Lock Madhav Iyengar < miyengar@andrew.cmu.edu >, Nathaniel Jeffries < njeffrie@andrew.cmu.edu > ABSTRACT Busy wait synchronization, the spinlock, is the primitive at the core of all other synchronization

More information

Oracle Developer Studio 12.6

Oracle Developer Studio 12.6 Oracle Developer Studio 12.6 Oracle Developer Studio is the #1 development environment for building C, C++, Fortran and Java applications for Oracle Solaris and Linux operating systems running on premises

More information

Question 1: What is a code walk-through, and how is it performed?

Question 1: What is a code walk-through, and how is it performed? Question 1: What is a code walk-through, and how is it performed? Response: Code walk-throughs have traditionally been viewed as informal evaluations of code, but more attention is being given to this

More information

Novel Lossy Compression Algorithms with Stacked Autoencoders

Novel Lossy Compression Algorithms with Stacked Autoencoders Novel Lossy Compression Algorithms with Stacked Autoencoders Anand Atreya and Daniel O Shea {aatreya, djoshea}@stanford.edu 11 December 2009 1. Introduction 1.1. Lossy compression Lossy compression is

More information

Morph Algorithms on GPUs

Morph Algorithms on GPUs Morph Algorithms on GPUs Rupesh Nasre 1 Martin Burtscher 2 Keshav Pingali 1,3 1 Inst. for Computational Engineering and Sciences, University of Texas at Austin, USA 2 Dept. of Computer Science, Texas State

More information

FADA : Fuzzy Array Dataflow Analysis

FADA : Fuzzy Array Dataflow Analysis FADA : Fuzzy Array Dataflow Analysis M. Belaoucha, D. Barthou, S. Touati 27/06/2008 Abstract This document explains the basis of fuzzy data dependence analysis (FADA) and its applications on code fragment

More information

AOSA - Betriebssystemkomponenten und der Aspektmoderatoransatz

AOSA - Betriebssystemkomponenten und der Aspektmoderatoransatz AOSA - Betriebssystemkomponenten und der Aspektmoderatoransatz Results obtained by researchers in the aspect-oriented programming are promoting the aim to export these ideas to whole software development

More information

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger Parallel Programming Concepts Parallel Algorithms Peter Tröger Sources: Ian Foster. Designing and Building Parallel Programs. Addison-Wesley. 1995. Mattson, Timothy G.; S, Beverly A.; ers,; Massingill,

More information

Memory Hierarchy Management for Iterative Graph Structures

Memory Hierarchy Management for Iterative Graph Structures Memory Hierarchy Management for Iterative Graph Structures Ibraheem Al-Furaih y Syracuse University Sanjay Ranka University of Florida Abstract The increasing gap in processor and memory speeds has forced

More information

8. Hardware-Aware Numerics. Approaching supercomputing...

8. Hardware-Aware Numerics. Approaching supercomputing... Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 48 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum

More information

1 Publishable Summary

1 Publishable Summary 1 Publishable Summary 1.1 VELOX Motivation and Goals The current trend in designing processors with multiple cores, where cores operate in parallel and each of them supports multiple threads, makes the

More information

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN... INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data

More information

8. Hardware-Aware Numerics. Approaching supercomputing...

8. Hardware-Aware Numerics. Approaching supercomputing... Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 22 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum

More information

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters 1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk

More information