VASim: An Open Virtual Automata Simulator for Automata Processing Research
|
|
- Edward Greene
- 6 years ago
- Views:
Transcription
1 University of Virginia Technical Report #CS VASim: An Open Virtual Automata Simulator for Automata Processing Research J. Wadden 1, K. Skadron 1 We present VASim, an open, extensible virtual automata simulator for automata processing research. VASim is open source, and easily extensible. VASim allows users to implement automata transformation and optimization passes, for increased performance. VASim is parametrically multithreaded, and can automatically divide connected automata components, and sections of the input stream among parallel threads. We compare performance of VASim against two closed-source automata processing tools, and show 10 to 1000 of times speedups. Keywords: automata processing, non-deterministic finite automata, parallel processing, simulation, benchmarking. Introduction Micron's Automata Processor (AP) is a memory-based architecture purpose built to accelerate computation of non-deterministic finite automata (NFAs) [4]. Evaluation, prototyping, and debugging of potential applications by automata developers is limited to two software tools provided by Micron: batchsim, and apemulate [1]. Each tool is designed to functionally simulate the behavior of the hardware, assisting automata developers in prototyping, and evaluation of automata-based applications. The first software tool, batchsim, takes an automata (defined by Micron's Automata Network Markup Language or ANML [1]) and an input file, and simulates the functionality of the AP by running the automata on the input file. batchsim has three main drawbacks: 1) it is closed source, meaning that existing critical bugs are fixed slowly, and feature requests, simulation enhancements, or experimental extensions to the computational model are limited by Micron's engineering resources and interests, 2) it is slow, often taking hours to simulate complex automata with reasonably sized input, and consumes massive amounts of memory, leading to killed processes if run for longer than a few hours and 3) it is tied to Micron's SDK, which makes the tool more cumbersome to acquire, install, and use. The second tool, apemulate is functionally very similar to batchsim, but apemulate uses a hardware accurate emulation of an AP, rather than a functional simulation to execute automata. An abstract automaton must first be compiled from an ANML description by Microns apcompile tool and placed and routed into a hardware specific finite state machine. Otherwise, apemulate is functionally identical to batchsim, and can simulate compiled finite state machines on input byte streams, emulating the AP hardware. apemulate is usually much faster than batchsim, due to automata optimizations made during the compilation step, and is thus much more usable as a prototyping tool. However, because automata must first be successfully compiled, placed, and routed using Microns apcompile tool, evaluation of potential applications on the AP using apemulate are limited to those that can be placed and routed by Microns compiler. Hypothetical automata with interesting additional features, or automata that barely violate these architectural restrictions cannot be evaluated at all. Similar to batchsim, apemulate is closed source, and tied to Micron's SDK, and suffers from the same associated drawbacks. To address the above issues with existing tools, we present the Virtual Automata Simulator, or VASim. VASim is an open source automata simulation, optimization, and transformation platform designed for easy prototyping, debugging, research extensions, and evaluation of automata-based applications. VASim was constructed using object oriented design in modern 1 University of Virginia, Dept. of Computer Science: {wadden,skadron}@virginia.edu 1
2 1 VASIM ARCHITECTURE AND FEATURES C++ and is designed from the ground up to be easy to understand, maintain, and extend. While performance was not the main driving consideration during development, VASim is fast, and is currently times faster than batchsim and 2-9 times faster than apemulate in our evaluation. When VASim applies performance optimizations to the automata topography, VASim performs times faster than batchsim, and times faster than apemulate in single threaded computation. Furthermore, VASim is parametrically multi-threaded in two dimensions of parallelism common to automata-based applications, further increasing its performance and usefulness over batchsim and apemulate. This paper first presents the architecture of VASim and its design considerations. We then evaluate six applications previously published in the literature using VASim and compare its single-threaded and multi-threaded performance against batchsim and apemulate. 1. VASim Architecture and Features 1.1. Object-oriented Design VASim uses object oriented design principles throughout its architecture to ensure easy understanding of code, code reuse, and easy feature creation or extension. Object-oriented design also allows for easier bug isolation, and performance profiling, which was critical in the development of the tool, and will allow easy modifications and enhancements. VASim is responsible for one or more Automata objects, defined initially by one or more ANML files, although automata can be constructed within VASim programmatically if desired. Each Automata object contains a directed graph of automata Elements. Automata Elements are divided into two main classes: the STE class represents the functionality of an AP state transition element (STE) [4]. The SpecialElement class represents any other type of automata logic or functionality, e.g. counters and boolean elements as available in the first generation AP micro-architecture[4]. Each Element can be enabled by incoming signals from other Elements in the graph. If enabled, an Element computes a boolean function (e.g. matching for STEs, or boolean logic for logic gates). If the result of that computation is true, the Element activates and enables its immediate successors in the directed automata graph. While VASim is designed to simulate Micron's AP, hypothetical new enhancements to Micron's architecture can be easily inserted into VASim, debugged, and evaluated as hypothetical features. Furthermore, new automata models can be added, increasing the available computational models for any application ANML Parser and Optimizations VASim uses Micron's Automata Network Markup Language (ANML) [1] as the main automata input language. While ANML was chosen because it is the automata definition language used for prototyping and evaluation in Micron's software and hardware environment, VASim could potentially process any arbitrary automata format, or programmatically construct automata internally. Once an automata has been parsed into an Automata object data structure, VASim allows the programmer to implement optimization and transformation passes over the graph. One example optimization already implemented in both VASim and Micron s apcompile ANML compiler tool is left minimization. Left minimization does a breadth first search from an NFA s start states, combining functionally identical elements reducing the size of the automata. By eliminating redundant states, left minimization is essentially an automata compression that increases the space efficiency of automata on spatial automata processors (like the AP). By eliminating redundant state transitions, left-minimization also greatly increases performance of automata 2
3 2 EVALUATION 1.3 Multi-dimensional Parameterized Multi-threading processing on von Neumann-based architectures for highly-compressable automata. This feature is available to apemulate via Micron's compiler optimizations, but is not available to batchsim. The benefits of left minimization are evaluated in Section Multi-dimensional Parameterized Multi-threading Automata processing can be parallelized in two dimensions 1) automata can be partitioned (e.g. independent connected components of the automata graph) into parallel pieces, and divided among threads, or 2) the input stream of bytes can be partitioned (if allowable by the application) and offered to threads processing identical automata. VASim offers zero-effort multi-threading, allowing the programmer to define how many threads should be devoted to each dimension of parallel automata computation, furthering its usefulness over batchsim and apemulate VASim Core Simulation Architecture The core architecture of VASim is a loop that processes input symbols, and keeps track of activity in the automata. Each loop iteration (symbol cycle) is divided into five main stages: 1) The first stage enables all start STE Elements in the graph, initializing computation. If an Element is enabled, it is added to a special EnabledSTEsQueue. 2) The second stage reads all enabled STEs off of the EnabledQueue and computes the match function of each. If an STE matches, it is pushed pushed to an activatedstes queue. The third stage reads all activated STEs from the ActivatedSTES queue and propagates their output signals to all child elements. Each child element is added to one of two queues, EnabledSTEs if the child is an STE, or EnabledSpecialElements if the child is a SpecialElement. 4) Because the AP accomplishes logic computation immediately after STE computation within a single symbol cycle, stage four reads SpecialElements from the EnabledSpecialElements queue, and computes the particular SpecialElement boolean functions. Children of SpecialElements that compute true are pushed to a special queue ActivatedSpecialElements. 5) Similar to stage 3, stage 5 pushes the children of all SpecialElements in the ActivatedSpecialElements queue to their respective enable queues. These queues allow VASim to only ever operate on Elements in the graph that require computation, increasing efficiency over naive implementations that might consider all states on every cycle. Furthermore, the execution core can be easily modified or extended, by modifying or adding one or more stages in the main simulation pipeline. 2. Evaluation While speed of automata processing was not considered a main design constraint in the original VASim prototype, VASim is fast, and uses efficient data structures and algorithms to reduce the amount and expense of all automata computation. Furthermore, VASim is multithreaded making it trivial to scale automata computation across multiple cores. In this section, we evaluate VASim against existing, single-threaded tools Benchmarks We consider six previously published automata applications as benchmarks for evaluating the performance of VASim against batchsim and apemulate. Brill Tagging [7], a widely used rule-based part-of-speech tagger. Hamming Distance [5], for calculating the number of differences between two strings. We evaluate 1000, Hamming distance 3 automata of length 8. Levenshtein Automata [6], for calculating the edit distance between two strings. We evaluate 80, edit distance 3 Levenshtein automata of length 31 for random nucleotide sequences. Protomata [5], a pattern-based approach for identifying protein motifs build from the prosite database. Entity 3
4 2.2 Results 3 CONCLUSIONS AND FUTURE WORK Resolution [3], for reconciling duplicate entries in databases. PowerEN Complex1 [2], a synthetic set of simple regular expressions Results Table 1 shows runtimes of VASim, batchsim, and apemulate over all benchmarks. All experiments were performed on a 6-core (12-thread) Intel i7-5820k running at 3.3GHz, with 32GB of RAM using Micron's AP SDK version Each automata was run in its original form, and then run after applying optimizations (either VASim's left-minimization pass for VASim and batchsim, or apcompile s -O1 flag for apemulate). All tools were run with output reporting enabled and runtime was measured using the Unix time command. Single-threaded VASim speedups are highlighted in brackets, and do not consider VASims optimizations when applied to batchsim. When left-minimization is enabled in VASim, it is never less than 20 faster than batchsim, and 11 faster than apemulate, and is often much faster. batchsim benefits greatly from VASim's optimizations, increasing its performance 1.24 to 15.4 depending on the benchmark, and sometimes even prevents the tool from being killed in the case of the Entity Resolution benchmark. apemulate is generally faster than batchsim when optimizations are considered, but fails to place and route some automata (e.g. a Levenshtein automata) that can otherwise be functionally simulated by VASim or batchsim. Benchmark Input(KB) Opt? VASim(s) batchsim(s) apemulate(s) VASimMT(s) Brill Tagging 9, ,776.3 [26.9x] 3,555 [29.2x] 18.0 Brill Tagging 9,885 D [1,388x] [86.9x] 0.8 Protomata 13,210 1,481 killed 15,424 [21.9x] Protomata 13,210 D killed 4,366 [28.7x] 24.3 Hamming [20.2x] [11.5x] 2.1 Hamming 100 D [36.1x] [14.2x] 1.8 Levenshtein 10, ,175 [36.1x] 9,761 [34x] 23.8 Levenshtein 10,000 D ,402 [55.8x] failed 26.1 Entity Res. 2, killed 15,424 [134.8x] 8.43 Entity Res. 2,174 D [NA] [207.6x] 0.6 PowerEN 20,000 1,017 killed 24,688 [24.3x] 91.1 PowerEN 20,000 D killed 8,706 [15.3x] 62.6 Table 1. Runtimes of each benchmark application when evaluated using VASim, batchsim, apemulate, and the best performing multi-threaded VASim 3. Conclusions and Future Work This paper presented VASim, an open, extensible, and fast virtual automata simulator for automata processing research. VASim is written in C++ using object-oriented design to make it easy to understand and extend. VASim allows users to implement automata transformation and optimization passes, for increased performance. VASim is parametrically multi-threaded, and can automatically divide connected automata components, and sections of the input stream among threads. VASim offers greatly improved performance and functionality over existing automata evaluation tools, and will be publicly released. Future work will include further performance enhancements, and implementation of other automata transformations. We would like to thank the Center for Automata Processing and Micron Technologies for their help in providing benchmarks for evaluation. 4
5 REFERENCES REFERENCES References 1. AP SDK Documentation K. Atasu, F. Doerfler, J. van Lunteren, and C. Hagleitner. Hardware-accelerated Regular Expression Matching with Overlap Handling on IBM PowerEN Processor. In Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on. IEEE, C. Bo, K. Wang, J.J. Fox, and K. Skadron. Entity Resolution Acceleration using Microns Automata Processor. Workshop on Architectures and Systems for Big Data (ASBD), in conjunction with ISCA, P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, and H. Noyes. An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing. IEEE Transactions on Parallel and Distributed Systems, 99, I. Roy. Algorithmic Techniques for the Micron Automata Processor. PhD thesis, Georgia Institute of Technology, T. Tracy II, M. Stan, N. Brunelle, J. Wadden, K. Wang, K. Skadron, and G. Robins. Nondeterministic Finite Automata in Hardware - the Case of the Levenshtein Automaton. Workshop on Architectures and Systems for Big Data (ASBD), in conjunction with ISCA, K. Zhou, J.J. Fox, Ke Wang, D.E. Brown, and K. Skadron. Brill Tagging on the Micron Automata Processor. In Semantic Computing (ICSC), 2015 IEEE International Conference on, pages , Feb
AN OVERVIEW OF MICRON S
AN OVERVIEW OF MICRON S 1 Ke Wang, 1 Kevin Angstadt, 1 Chunkun Bo, 1 Nathan Brunelle, 1 Elaheh Sadredini, 2 Tommy Tracy II, 1 Jack Wadden, 2 Mircea Stan, 1 Kevin Skadron Center for Automata Computing 1
More informationAutomata-to-Routing: An Open-Source Toolchain for Design-Space Exploration of Spatial Automata Processing Architectures
Automata-to-Routing: An Open-Source Toolchain for Design-Space Exploration of Spatial Automata Processing Architectures Jack Wadden, Samira Khan, and Kevin Skadron University of Virginia Charlottesville,
More informationANMLZoo: A Benchmark Suite for Exploring Bottlenecks in Automata Processing Engines and Architectures
ANMLZoo: A Benchmark Suite for Exploring Bottlenecks in Automata Processing Engines and Architectures Jack Wadden, Vinh Dang, Nathan Brunelle, Tommy Tracy II, Deyuan Guo, Elaheh Sadredini, Ke Wang, Chunkun
More informationAutomata Computing. Mircea R. Stan UVA-ECE Center for Automata Processing (CAP) Co-Director Kevin Skadron, UVA-CS, CAP Director
Automata Computing Mircea R. Stan (mircea@virginia.edu), UVA-ECE Center for Automata Processing (CAP) Co-Director Kevin Skadron, UVA-CS, CAP Director 2013 Micron Technology, Inc. All rights reserved. Products
More informationEntity Resolution Acceleration using the Automata Processor
Entity Resolution Acceleration using the Automata Processor Chunkun Bo 1, Ke Wang 1, Jeffrey J. Fox 2, Kevin Skadron 1 1 Department of Computer Science, 2 Department of Material Science University of Virginia
More informationSTRING KERNEL TESTING ACCELERATION USING MICRON S AUTOMATA PROCESSOR
STRING KERNEL TESTING ACCELERATION USING MICRON S AUTOMATA PROCESSOR Chunkun Bo 1,2, Ke Wang 1,2, Yanjun (Jane) Qi 1, Kevin Skadron 1,2 1 Department of Computer Science 2 Center for Automata Processing
More informationRAPID Programming of Pattern-Recognition Processors
RAPID Programming of Pattern-Recognition Processors Kevin Angstadt Westley Weimer Kevin Skadron Department of Computer Science University of Virginia {angstadt, weimer, skadron}@cs.virginia.edu 6. April
More informationEntity Resolution Acceleration using Micron s Automata Processor
Entity Resolution Acceleration using Micron s Automata Processor Chunkun Bo 1, Ke Wang 1, Jeffrey J. Fox 2, and Kevin Skadron 1 1 Department of Computer Science 2 Department of Materials Science and Engineering
More informationREAPR: Reconfigurable Engine for Automata Processing
REAPR: Reconfigurable Engine for Automata Processing Ted Xie, Vinh Dang 2, Jack Wadden 2, Kevin Skadron 2, Mircea Stan Department of Electrical and Computer Engineering, University of Virginia 2 Department
More informationCellular Automata on the Micron Automata Processor
Cellular Automata on the Micron Automata Processor Ke Wang Department of Computer Science University of Virginia Charlottesville, VA kewang@virginia.edu Kevin Skadron Department of Computer Science University
More informationAccelerating Pattern Searches with Hardware. Kevin Angstadt CS 6354: Graduate Architecture 7. April 2016
Accelerating Pattern Searches with Hardware Kevin Angstadt angstadt@virginia.edu CS 6354: raduate Architecture 7. April 2016 Who am I? Work with Wes Weimer and Kevin Skadron PL + Architecture Programming
More informationHardware-accelerated regular expression matching with overlap handling on IBM PowerEN processor
Kubilay Atasu IBM Research Zurich 23 May 2013 Hardware-accelerated regular expression matching with overlap handling on IBM PowerEN processor Kubilay Atasu, Florian Doerfler, Jan van Lunteren, and Christoph
More informationYu, Jintao; Du Nguyen, Hoang Anh; Abu Lebdeh, Muath; Taouil, Mottaqiallah; Hamdioui, Said
Delft University of Technology Time-division Multiplexing Automata Processor Yu, Jintao; Du Nguyen, Hoang Anh; Abu Lebdeh, Muath; Taouil, Mottaqiallah; Hamdioui, Said Publication date 2019 Document Version
More informationPAPER Accelerating Weeder: A DNA Motif Search Tool using the Micron Automata Processor and FPGA
IEICE TRANS. INF. & SYST., VOL.Exx??, NO.xx XXXX 200x 1 PAPER Accelerating Weeder: A DNA Motif Search Tool using the Micron Automata Processor and FPGA Qiong WANG a), Member, Mohamed EL-HADEDY b), Kevin
More informationRAPID Programming of Pattern-Recognition Processors
RAPID Programming of Pattern-Recognition Processors Kevin Angstadt Department of Computer Science University of Virginia Charlottesville, VA 22904-4740 angstadt@cs.virginia.edu Abstract We present RAPID,
More informationPL in the Broader Research Community
PL in the Broader Research Community EECS 590: Advanced Programming Languages 27. November 2017 Kevin Angstadt angstadt@umich.edu 1 Who am I? Fourth-year PhD student (I did my first three years at UVA)
More informationAdvantages and challenges of programming the Micron Automata Processor
Graduate Theses and Dissertations Graduate College 2013 Advantages and challenges of programming the Micron Automata Processor Christopher Sabotta Iowa State University Follow this and additional works
More informationALGORITHMIC TECHNIQUES FOR THE MICRON AUTOMATA PROCESSOR
ALGORITHMIC TECHNIQUES FOR THE MICRON AUTOMATA PROCESSOR A Thesis Presented to The Academic Faculty by Indranil Roy In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the
More informationAppears in IJPP, first published online Jan The authoritative version appears at: DOI: /s y
Noname manuscript No. (will be inserted by the editor) Appears in IJPP, first published online Jan. 2017 The authoritative version appears at: DOI:10.1007/s10766-017-0489-y Hierarchical Pattern Mining
More informationDeveloping a Data Driven System for Computational Neuroscience
Developing a Data Driven System for Computational Neuroscience Ross Snider and Yongming Zhu Montana State University, Bozeman MT 59717, USA Abstract. A data driven system implies the need to integrate
More informationProgrammable Near-Memory Acceleration on ConTutto
Programmable Near- Acceleration on ConTutto Jan van Lunteren, IBM Research Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit IBM Zurich (CH) Team Jan van Lunteren, Christoph Hagleitner
More informationEfficient Data Structures for Tamper-Evident Logging
Efficient Data Structures for Tamper-Evident Logging Scott A. Crosby Dan S. Wallach Rice University Everyone has logs Tamper evident solutions Current commercial solutions Write only hardware appliances
More informationAutomatic Scaling Iterative Computations. Aug. 7 th, 2012
Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th, 2012 1 What are Non-Iterative Computations? Non-iterative computation flow Directed Acyclic Examples Batch style analytics
More informationRuler: High-Speed Packet Matching and Rewriting on Network Processors
Ruler: High-Speed Packet Matching and Rewriting on Network Processors Tomáš Hrubý Kees van Reeuwijk Herbert Bos Vrije Universiteit, Amsterdam World45 Ltd. ANCS 2007 Tomáš Hrubý (VU Amsterdam, World45)
More informationPreview. The Thread Model Motivation of Threads Benefits of Threads Implementation of Thread
Preview The Thread Model Motivation of Threads Benefits of Threads Implementation of Thread Implement thread in User s Mode Implement thread in Kernel s Mode CS 431 Operating System 1 The Thread Model
More informationPSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design
PSD3A Principles of Compiler Design Unit : I-V 1 UNIT I - SYLLABUS Compiler Assembler Language Processing System Phases of Compiler Lexical Analyser Finite Automata NFA DFA Compiler Tools 2 Compiler -
More informationTheoretical Part. Chapter one:- - What are the Phases of compiler? Answer:
Theoretical Part Chapter one:- - What are the Phases of compiler? Six phases Scanner Parser Semantic Analyzer Source code optimizer Code generator Target Code Optimizer Three auxiliary components Literal
More informationPerformance impact of dynamic parallelism on different clustering algorithms
Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu
More informationEnergy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package
High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction
More informationQuestions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process
Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation Why are threads useful? How does one use POSIX pthreads? Michael Swift 1 2 What s in a process? Organizing a Process A process
More informationTheory and Compiling COMP360
Theory and Compiling COMP360 It has been said that man is a rational animal. All my life I have been searching for evidence which could support this. Bertrand Russell Reading Read sections 2.1 3.2 in the
More informationParallel Exact Inference on the Cell Broadband Engine Processor
Parallel Exact Inference on the Cell Broadband Engine Processor Yinglong Xia and Viktor K. Prasanna {yinglonx, prasanna}@usc.edu University of Southern California http://ceng.usc.edu/~prasanna/ SC 08 Overview
More informationImproving Programming Support for Hardware Accelerators Through Automata Processing Abstractions
Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions PhD Dissertation Proposal Kevin Angstadt angstadt@umich.edu 8. December 28 By 22, it s estimated that for
More informationResource-efficient regular expression matching architecture for text analytics
Resource-efficient regular expression matching architecture for text analytics Kubilay Atasu IBM Research - Zurich Presented at ASAP 2014 SystemT: an algebraic approach to declarative information extraction
More informationHigh-Performance Holistic XML Twig Filtering Using GPUs. Ildar Absalyamov, Roger Moussalli, Walid Najjar and Vassilis Tsotras
High-Performance Holistic XML Twig Filtering Using GPUs Ildar Absalyamov, Roger Moussalli, Walid Najjar and Vassilis Tsotras Outline! Motivation! XML filtering in the literature! Software approaches! Hardware
More informationSupplementary Material for An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing
1 Supplementary Material for An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing Paul Dlugosch, Dave Brown, Paul Glendenning, Michael Leventhal, Harold Noyes Member, IEEE
More informationCS 406/534 Compiler Construction Putting It All Together
CS 406/534 Compiler Construction Putting It All Together Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy
More informationRe-architecting Virtualization in Heterogeneous Multicore Systems
Re-architecting Virtualization in Heterogeneous Multicore Systems Himanshu Raj, Sanjay Kumar, Vishakha Gupta, Gregory Diamos, Nawaf Alamoosa, Ada Gavrilovska, Karsten Schwan, Sudhakar Yalamanchili College
More informationBasic Memory Management. Basic Memory Management. Address Binding. Running a user program. Operating Systems 10/14/2018 CSC 256/456 1
Basic Memory Management Program must be brought into memory and placed within a process for it to be run Basic Memory Management CS 256/456 Dept. of Computer Science, University of Rochester Mono-programming
More informationA Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function
A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function Chen-Ting Chang, Yu-Sheng Chen, I-Wei Wu, and Jyh-Jiun Shann Dept. of Computer Science, National Chiao
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Multithreading Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to large compute clusters Can perform multiple
More informationRed Fox: An Execution Environment for Relational Query Processing on GPUs
Red Fox: An Execution Environment for Relational Query Processing on GPUs Georgia Institute of Technology: Haicheng Wu, Ifrah Saeed, Sudhakar Yalamanchili LogicBlox Inc.: Daniel Zinn, Martin Bravenboer,
More informationMetropolitan Road Traffic Simulation on FPGAs
Metropolitan Road Traffic Simulation on FPGAs Justin L. Tripp, Henning S. Mortveit, Anders Å. Hansson, Maya Gokhale Los Alamos National Laboratory Los Alamos, NM 85745 Overview Background Goals Using the
More informationait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS
ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS Christian Ferdinand and Reinhold Heckmann AbsInt Angewandte Informatik GmbH, Stuhlsatzenhausweg 69, D-66123 Saarbrucken, Germany info@absint.com
More informationCIS 1.5 Course Objectives. a. Understand the concept of a program (i.e., a computer following a series of instructions)
By the end of this course, students should CIS 1.5 Course Objectives a. Understand the concept of a program (i.e., a computer following a series of instructions) b. Understand the concept of a variable
More informationSequential Pattern Mining with the Micron Automata Processor
Sequential Pattern Mining with the Micron Automata Processor Ke Wang, Elaheh Sadredini, Kevin Skadron Department of Computer Science University of Virginia Charlottesville, VA, 2294 USA {kewang, elaheh,
More informationRed Fox: An Execution Environment for Relational Query Processing on GPUs
Red Fox: An Execution Environment for Relational Query Processing on GPUs Haicheng Wu 1, Gregory Diamos 2, Tim Sheard 3, Molham Aref 4, Sean Baxter 2, Michael Garland 2, Sudhakar Yalamanchili 1 1. Georgia
More informationPerformance Best Practices Paper for IBM Tivoli Directory Integrator v6.1 and v6.1.1
Performance Best Practices Paper for IBM Tivoli Directory Integrator v6.1 and v6.1.1 version 1.0 July, 2007 Table of Contents 1. Introduction...3 2. Best practices...3 2.1 Preparing the solution environment...3
More informationCSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions
CSE45 Translation of Programming Languages Lecture 2: Automata and Regular Expressions Finite Automata Regular Expression = Specification Finite Automata = Implementation A finite automaton consists of:
More informationCapriccio : Scalable Threads for Internet Services
Capriccio : Scalable Threads for Internet Services - Ron von Behren &et al - University of California, Berkeley. Presented By: Rajesh Subbiah Background Each incoming request is dispatched to a separate
More informationBasic Memory Management
Basic Memory Management CS 256/456 Dept. of Computer Science, University of Rochester 10/15/14 CSC 2/456 1 Basic Memory Management Program must be brought into memory and placed within a process for it
More informationAccelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors
Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte
More informationProcesses & Threads. Process Management. Managing Concurrency in Computer Systems. The Process. What s in a Process?
Process Management Processes & Threads Managing Concurrency in Computer Systems Process management deals with several issues: what are the units of execution how are those units of execution represented
More informationComparing and Contrasting different Approaches of Code Generator(Enum,Map-Like,If-else,Graph)
Comparing and Contrasting different Approaches of Generator(Enum,Map-Like,If-else,Graph) Vivek Tripathi 1 Sandeep kumar Gonnade 2 Mtech Scholar 1 Asst.Professor 2 Department of Computer Science & Engineering,
More informationGPU Sparse Graph Traversal. Duane Merrill
GPU Sparse Graph Traversal Duane Merrill Breadth-first search of graphs (BFS) 1. Pick a source node 2. Rank every vertex by the length of shortest path from source Or label every vertex by its predecessor
More informationComputer Architecture: Dataflow/Systolic Arrays
Data Flow Computer Architecture: Dataflow/Systolic Arrays he models we have examined all assumed Instructions are fetched and retired in sequential, control flow order his is part of the Von-Neumann model
More informationNode Prefetch Prediction in Dataflow Graphs
Node Prefetch Prediction in Dataflow Graphs Newton G. Petersen Martin R. Wojcik The Department of Electrical and Computer Engineering The University of Texas at Austin newton.petersen@ni.com mrw325@yahoo.com
More informationAutomatic Intra-Application Load Balancing for Heterogeneous Systems
Automatic Intra-Application Load Balancing for Heterogeneous Systems Michael Boyer, Shuai Che, and Kevin Skadron Department of Computer Science University of Virginia Jayanth Gummaraju and Nuwan Jayasena
More informationNoise Injection Techniques to Expose Subtle and Unintended Message Races
Noise Injection Techniques to Expose Subtle and Unintended Message Races PPoPP2017 February 6th, 2017 Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Martin Schulz and Christopher M. Chambreau
More informationLarge-Scale Network Simulation Scalability and an FPGA-based Network Simulator
Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Stanley Bak Abstract Network algorithms are deployed on large networks, and proper algorithm evaluation is necessary to avoid
More informationFrequent Subtree Mining on the Automata Processor: Challenges and Opportunities.
Frequent Subtree Mining on the Automata Processor: Challenges and Opportunities Elaheh Sadredini Dept. of Computer Science University of Virginia Charloesville, VA, 22903 USA elaheh@virginia.edu Reza Rahimi
More informationIntroduction to Computing and Systems Architecture
Introduction to Computing and Systems Architecture 1. Computability A task is computable if a sequence of instructions can be described which, when followed, will complete such a task. This says little
More informationSecurity Based Heuristic SAX for XML Parsing
Security Based Heuristic SAX for XML Parsing Wei Wang Department of Automation Tsinghua University, China Beijing, China Abstract - XML based services integrate information resources running on different
More informationEquivalence of NTMs and TMs
Equivalence of NTMs and TMs What is a Turing Machine? Similar to a finite automaton, but with unlimited and unrestricted memory. It uses an infinitely long tape as its memory which can be read from and
More informationCS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer
CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer Assigned: Thursday, September 16, 2004 Due: Tuesday, September 28, 2004, at 11:59pm September 16, 2004 1 Introduction Overview In this
More informationOptimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems
Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems Alexey Paznikov Saint Petersburg Electrotechnical University
More informationDetecting Manipulated Remote Call Streams
Detecting Manipulated Remote Call Streams Jonathon Giffin, Somesh Jha, Barton Miller Computer Sciences Department University of Wisconsin giffin@cs.wisc.edu Intrusion Detection and Specification-Based
More informationLexical Scanning COMP360
Lexical Scanning COMP360 Captain, we re being scanned. Spock Reading Read sections 2.1 3.2 in the textbook Regular Expression and FSA Assignment A new assignment has been posted on Blackboard It is due
More informationescience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows
escience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows Jie Li1, Deb Agarwal2, Azure Marty Platform Humphrey1, Keith Jackson2, Catharine van Ingen3, Youngryel Ryu4
More informationNeha 1, Abhishek Sharma 2 1 M.Tech, 2 Assistant Professor. Department of Cse, Shri Balwant College of Engineering &Technology, Dcrust University
Methods of Regular Expression Neha 1, Abhishek Sharma 2 1 M.Tech, 2 Assistant Professor Department of Cse, Shri Balwant College of Engineering &Technology, Dcrust University Abstract - Regular expressions
More informationCS Lecture 2. The Front End. Lecture 2 Lexical Analysis
CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture
More informationAN 831: Intel FPGA SDK for OpenCL
AN 831: Intel FPGA SDK for OpenCL Host Pipelined Multithread Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1 Intel FPGA SDK for OpenCL Host Pipelined Multithread...3 1.1
More informationCS 426 Parallel Computing. Parallel Computing Platforms
CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:
More informationLECTURE NOTES ON COMPILER DESIGN P a g e 2
LECTURE NOTES ON COMPILER DESIGN P a g e 1 (PCCS4305) COMPILER DESIGN KISHORE KUMAR SAHU SR. LECTURER, DEPARTMENT OF INFORMATION TECHNOLOGY ROLAND INSTITUTE OF TECHNOLOGY, BERHAMPUR LECTURE NOTES ON COMPILER
More informationCompiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore
Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No. # 01 Lecture No. # 01 An Overview of a Compiler This is a lecture about
More informationCS415 Compilers. Lexical Analysis
CS415 Compilers Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 7 1 Announcements First project and second homework
More informationNordiaSoft SCA Architect 2016
SCA Architect NordiaSoft SCA Architect is the modeling tool used by developers to compose and assemble software components into applications. Based on a Model-Driven Development (MDD) concept, SCA Architect
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Computer Systems Engineering: Spring Quiz I Solutions
Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.033 Computer Systems Engineering: Spring 2011 Quiz I Solutions There are 10 questions and 12 pages in this
More informationTeaching and Training Formal Methods for Safety Critical Systems
Teaching and Training Formal Methods for Safety Critical Systems Michael Lipaczewski and Frank Ortmeier Computer Systems in Engineering Otto-von-Guericke University Magdeburg {michael.lipaczewski,frank.ortmeier}@ovgu.de
More informationChapter 12. CPU Structure and Function. Yonsei University
Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor
More informationCompiler phases. Non-tokens
Compiler phases Compiler Construction Scanning Lexical Analysis source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2011 01 21 parse tree
More informationLDetector: A low overhead data race detector for GPU programs
LDetector: A low overhead data race detector for GPU programs 1 PENGCHENG LI CHEN DING XIAOYU HU TOLGA SOYATA UNIVERSITY OF ROCHESTER 1 Data races in GPU Introduction & Contribution Impact correctness
More informationCellSs Making it easier to program the Cell Broadband Engine processor
Perez, Bellens, Badia, and Labarta CellSs Making it easier to program the Cell Broadband Engine processor Presented by: Mujahed Eleyat Outline Motivation Architecture of the cell processor Challenges of
More information9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation
Language Implementation Methods The Design and Implementation of Programming Languages Compilation Interpretation Hybrid In Text: Chapter 1 2 Compilation Interpretation Translate high-level programs to
More informationBERKELEY PAR LAB. RAMP Gold Wrap. Krste Asanovic. RAMP Wrap Stanford, CA August 25, 2010
RAMP Gold Wrap Krste Asanovic RAMP Wrap Stanford, CA August 25, 2010 RAMP Gold Team Graduate Students Zhangxi Tan Andrew Waterman Rimas Avizienis Yunsup Lee Henry Cook Sarah Bird Faculty Krste Asanovic
More informationAction Language Verifier, Extended
Action Language Verifier, Extended Tuba Yavuz-Kahveci 1, Constantinos Bartzis 2, and Tevfik Bultan 3 1 University of Florida 2 Carnegie Mellon University 3 UC, Santa Barbara 1 Introduction Action Language
More informationWhite Paper. The Benefits of Object-Based Architectures for SCADA and Supervisory Systems. What s Inside:
White Paper The Benefits of Object-Based Architectures for SCADA and Supervisory Systems Author: Steven D. Garbrecht, Vice President of Software and Advanced Applications Marketing, Invensys Operations
More informationFahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou. University of Maryland Baltimore County
Accelerating a climate physics model with OpenCL Fahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou University of Maryland Baltimore County Introduction The demand to increase forecast predictability
More informationSimulation of Timed Input/Output Automata
Simulation of Timed Input/Output Automata M.Eng Thesis Proposal Panayiotis P. Mavrommatis December 13, 2005 Abstract This proposal describes the design of the TIOA Simulator, a vital component of the TIOA
More informationFractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures
Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures University of Virginia Dept. of Computer Science Technical Report #CS-2011-09 Jeremy W. Sheaffer and Kevin
More informationHigh Performance Computing on GPUs using NVIDIA CUDA
High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and
More informationApproximate Search and Data Reduction Algorithms
Approximate Search and Data Reduction Algorithms Research Questions Kyle Porter NTNU Gjøvik Outline of Presentation Introduction: Problems General Goals Research Questions Brief theoretical/practical background
More informationHigh performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli
High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering
More informationSystem Call. Preview. System Call. System Call. System Call 9/7/2018
Preview Operating System Structure Monolithic Layered System Microkernel Virtual Machine Process Management Process Models Process Creation Process Termination Process State Process Implementation Operating
More informationOutline. Threads. Single and Multithreaded Processes. Benefits of Threads. Eike Ritter 1. Modified: October 16, 2012
Eike Ritter 1 Modified: October 16, 2012 Lecture 8: Operating Systems with C/C++ School of Computer Science, University of Birmingham, UK 1 Based on material by Matt Smart and Nick Blundell Outline 1 Concurrent
More informationCMSC 350: COMPILER DESIGN
Lecture 11 CMSC 350: COMPILER DESIGN see HW3 LLVMLITE SPECIFICATION Eisenberg CMSC 350: Compilers 2 Discussion: Defining a Language Premise: programming languages are purely formal objects We (as language
More informationMulti-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation
Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M
More informationPerformance analysis basics
Performance analysis basics Christian Iwainsky Iwainsky@rz.rwth-aachen.de 25.3.2010 1 Overview 1. Motivation 2. Performance analysis basics 3. Measurement Techniques 2 Why bother with performance analysis
More informationAgenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2
Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process
More informationIntegrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim
Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim Farzad Farshchi, Qijing Huang, Heechul Yun University of Kansas, University of California, Berkeley SiFive Internship Rocket
More information