VASim: An Open Virtual Automata Simulator for Automata Processing Research

Size: px
Start display at page:

Download "VASim: An Open Virtual Automata Simulator for Automata Processing Research"

Transcription

1 University of Virginia Technical Report #CS VASim: An Open Virtual Automata Simulator for Automata Processing Research J. Wadden 1, K. Skadron 1 We present VASim, an open, extensible virtual automata simulator for automata processing research. VASim is open source, and easily extensible. VASim allows users to implement automata transformation and optimization passes, for increased performance. VASim is parametrically multithreaded, and can automatically divide connected automata components, and sections of the input stream among parallel threads. We compare performance of VASim against two closed-source automata processing tools, and show 10 to 1000 of times speedups. Keywords: automata processing, non-deterministic finite automata, parallel processing, simulation, benchmarking. Introduction Micron's Automata Processor (AP) is a memory-based architecture purpose built to accelerate computation of non-deterministic finite automata (NFAs) [4]. Evaluation, prototyping, and debugging of potential applications by automata developers is limited to two software tools provided by Micron: batchsim, and apemulate [1]. Each tool is designed to functionally simulate the behavior of the hardware, assisting automata developers in prototyping, and evaluation of automata-based applications. The first software tool, batchsim, takes an automata (defined by Micron's Automata Network Markup Language or ANML [1]) and an input file, and simulates the functionality of the AP by running the automata on the input file. batchsim has three main drawbacks: 1) it is closed source, meaning that existing critical bugs are fixed slowly, and feature requests, simulation enhancements, or experimental extensions to the computational model are limited by Micron's engineering resources and interests, 2) it is slow, often taking hours to simulate complex automata with reasonably sized input, and consumes massive amounts of memory, leading to killed processes if run for longer than a few hours and 3) it is tied to Micron's SDK, which makes the tool more cumbersome to acquire, install, and use. The second tool, apemulate is functionally very similar to batchsim, but apemulate uses a hardware accurate emulation of an AP, rather than a functional simulation to execute automata. An abstract automaton must first be compiled from an ANML description by Microns apcompile tool and placed and routed into a hardware specific finite state machine. Otherwise, apemulate is functionally identical to batchsim, and can simulate compiled finite state machines on input byte streams, emulating the AP hardware. apemulate is usually much faster than batchsim, due to automata optimizations made during the compilation step, and is thus much more usable as a prototyping tool. However, because automata must first be successfully compiled, placed, and routed using Microns apcompile tool, evaluation of potential applications on the AP using apemulate are limited to those that can be placed and routed by Microns compiler. Hypothetical automata with interesting additional features, or automata that barely violate these architectural restrictions cannot be evaluated at all. Similar to batchsim, apemulate is closed source, and tied to Micron's SDK, and suffers from the same associated drawbacks. To address the above issues with existing tools, we present the Virtual Automata Simulator, or VASim. VASim is an open source automata simulation, optimization, and transformation platform designed for easy prototyping, debugging, research extensions, and evaluation of automata-based applications. VASim was constructed using object oriented design in modern 1 University of Virginia, Dept. of Computer Science: {wadden,skadron}@virginia.edu 1

2 1 VASIM ARCHITECTURE AND FEATURES C++ and is designed from the ground up to be easy to understand, maintain, and extend. While performance was not the main driving consideration during development, VASim is fast, and is currently times faster than batchsim and 2-9 times faster than apemulate in our evaluation. When VASim applies performance optimizations to the automata topography, VASim performs times faster than batchsim, and times faster than apemulate in single threaded computation. Furthermore, VASim is parametrically multi-threaded in two dimensions of parallelism common to automata-based applications, further increasing its performance and usefulness over batchsim and apemulate. This paper first presents the architecture of VASim and its design considerations. We then evaluate six applications previously published in the literature using VASim and compare its single-threaded and multi-threaded performance against batchsim and apemulate. 1. VASim Architecture and Features 1.1. Object-oriented Design VASim uses object oriented design principles throughout its architecture to ensure easy understanding of code, code reuse, and easy feature creation or extension. Object-oriented design also allows for easier bug isolation, and performance profiling, which was critical in the development of the tool, and will allow easy modifications and enhancements. VASim is responsible for one or more Automata objects, defined initially by one or more ANML files, although automata can be constructed within VASim programmatically if desired. Each Automata object contains a directed graph of automata Elements. Automata Elements are divided into two main classes: the STE class represents the functionality of an AP state transition element (STE) [4]. The SpecialElement class represents any other type of automata logic or functionality, e.g. counters and boolean elements as available in the first generation AP micro-architecture[4]. Each Element can be enabled by incoming signals from other Elements in the graph. If enabled, an Element computes a boolean function (e.g. matching for STEs, or boolean logic for logic gates). If the result of that computation is true, the Element activates and enables its immediate successors in the directed automata graph. While VASim is designed to simulate Micron's AP, hypothetical new enhancements to Micron's architecture can be easily inserted into VASim, debugged, and evaluated as hypothetical features. Furthermore, new automata models can be added, increasing the available computational models for any application ANML Parser and Optimizations VASim uses Micron's Automata Network Markup Language (ANML) [1] as the main automata input language. While ANML was chosen because it is the automata definition language used for prototyping and evaluation in Micron's software and hardware environment, VASim could potentially process any arbitrary automata format, or programmatically construct automata internally. Once an automata has been parsed into an Automata object data structure, VASim allows the programmer to implement optimization and transformation passes over the graph. One example optimization already implemented in both VASim and Micron s apcompile ANML compiler tool is left minimization. Left minimization does a breadth first search from an NFA s start states, combining functionally identical elements reducing the size of the automata. By eliminating redundant states, left minimization is essentially an automata compression that increases the space efficiency of automata on spatial automata processors (like the AP). By eliminating redundant state transitions, left-minimization also greatly increases performance of automata 2

3 2 EVALUATION 1.3 Multi-dimensional Parameterized Multi-threading processing on von Neumann-based architectures for highly-compressable automata. This feature is available to apemulate via Micron's compiler optimizations, but is not available to batchsim. The benefits of left minimization are evaluated in Section Multi-dimensional Parameterized Multi-threading Automata processing can be parallelized in two dimensions 1) automata can be partitioned (e.g. independent connected components of the automata graph) into parallel pieces, and divided among threads, or 2) the input stream of bytes can be partitioned (if allowable by the application) and offered to threads processing identical automata. VASim offers zero-effort multi-threading, allowing the programmer to define how many threads should be devoted to each dimension of parallel automata computation, furthering its usefulness over batchsim and apemulate VASim Core Simulation Architecture The core architecture of VASim is a loop that processes input symbols, and keeps track of activity in the automata. Each loop iteration (symbol cycle) is divided into five main stages: 1) The first stage enables all start STE Elements in the graph, initializing computation. If an Element is enabled, it is added to a special EnabledSTEsQueue. 2) The second stage reads all enabled STEs off of the EnabledQueue and computes the match function of each. If an STE matches, it is pushed pushed to an activatedstes queue. The third stage reads all activated STEs from the ActivatedSTES queue and propagates their output signals to all child elements. Each child element is added to one of two queues, EnabledSTEs if the child is an STE, or EnabledSpecialElements if the child is a SpecialElement. 4) Because the AP accomplishes logic computation immediately after STE computation within a single symbol cycle, stage four reads SpecialElements from the EnabledSpecialElements queue, and computes the particular SpecialElement boolean functions. Children of SpecialElements that compute true are pushed to a special queue ActivatedSpecialElements. 5) Similar to stage 3, stage 5 pushes the children of all SpecialElements in the ActivatedSpecialElements queue to their respective enable queues. These queues allow VASim to only ever operate on Elements in the graph that require computation, increasing efficiency over naive implementations that might consider all states on every cycle. Furthermore, the execution core can be easily modified or extended, by modifying or adding one or more stages in the main simulation pipeline. 2. Evaluation While speed of automata processing was not considered a main design constraint in the original VASim prototype, VASim is fast, and uses efficient data structures and algorithms to reduce the amount and expense of all automata computation. Furthermore, VASim is multithreaded making it trivial to scale automata computation across multiple cores. In this section, we evaluate VASim against existing, single-threaded tools Benchmarks We consider six previously published automata applications as benchmarks for evaluating the performance of VASim against batchsim and apemulate. Brill Tagging [7], a widely used rule-based part-of-speech tagger. Hamming Distance [5], for calculating the number of differences between two strings. We evaluate 1000, Hamming distance 3 automata of length 8. Levenshtein Automata [6], for calculating the edit distance between two strings. We evaluate 80, edit distance 3 Levenshtein automata of length 31 for random nucleotide sequences. Protomata [5], a pattern-based approach for identifying protein motifs build from the prosite database. Entity 3

4 2.2 Results 3 CONCLUSIONS AND FUTURE WORK Resolution [3], for reconciling duplicate entries in databases. PowerEN Complex1 [2], a synthetic set of simple regular expressions Results Table 1 shows runtimes of VASim, batchsim, and apemulate over all benchmarks. All experiments were performed on a 6-core (12-thread) Intel i7-5820k running at 3.3GHz, with 32GB of RAM using Micron's AP SDK version Each automata was run in its original form, and then run after applying optimizations (either VASim's left-minimization pass for VASim and batchsim, or apcompile s -O1 flag for apemulate). All tools were run with output reporting enabled and runtime was measured using the Unix time command. Single-threaded VASim speedups are highlighted in brackets, and do not consider VASims optimizations when applied to batchsim. When left-minimization is enabled in VASim, it is never less than 20 faster than batchsim, and 11 faster than apemulate, and is often much faster. batchsim benefits greatly from VASim's optimizations, increasing its performance 1.24 to 15.4 depending on the benchmark, and sometimes even prevents the tool from being killed in the case of the Entity Resolution benchmark. apemulate is generally faster than batchsim when optimizations are considered, but fails to place and route some automata (e.g. a Levenshtein automata) that can otherwise be functionally simulated by VASim or batchsim. Benchmark Input(KB) Opt? VASim(s) batchsim(s) apemulate(s) VASimMT(s) Brill Tagging 9, ,776.3 [26.9x] 3,555 [29.2x] 18.0 Brill Tagging 9,885 D [1,388x] [86.9x] 0.8 Protomata 13,210 1,481 killed 15,424 [21.9x] Protomata 13,210 D killed 4,366 [28.7x] 24.3 Hamming [20.2x] [11.5x] 2.1 Hamming 100 D [36.1x] [14.2x] 1.8 Levenshtein 10, ,175 [36.1x] 9,761 [34x] 23.8 Levenshtein 10,000 D ,402 [55.8x] failed 26.1 Entity Res. 2, killed 15,424 [134.8x] 8.43 Entity Res. 2,174 D [NA] [207.6x] 0.6 PowerEN 20,000 1,017 killed 24,688 [24.3x] 91.1 PowerEN 20,000 D killed 8,706 [15.3x] 62.6 Table 1. Runtimes of each benchmark application when evaluated using VASim, batchsim, apemulate, and the best performing multi-threaded VASim 3. Conclusions and Future Work This paper presented VASim, an open, extensible, and fast virtual automata simulator for automata processing research. VASim is written in C++ using object-oriented design to make it easy to understand and extend. VASim allows users to implement automata transformation and optimization passes, for increased performance. VASim is parametrically multi-threaded, and can automatically divide connected automata components, and sections of the input stream among threads. VASim offers greatly improved performance and functionality over existing automata evaluation tools, and will be publicly released. Future work will include further performance enhancements, and implementation of other automata transformations. We would like to thank the Center for Automata Processing and Micron Technologies for their help in providing benchmarks for evaluation. 4

5 REFERENCES REFERENCES References 1. AP SDK Documentation K. Atasu, F. Doerfler, J. van Lunteren, and C. Hagleitner. Hardware-accelerated Regular Expression Matching with Overlap Handling on IBM PowerEN Processor. In Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on. IEEE, C. Bo, K. Wang, J.J. Fox, and K. Skadron. Entity Resolution Acceleration using Microns Automata Processor. Workshop on Architectures and Systems for Big Data (ASBD), in conjunction with ISCA, P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, and H. Noyes. An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing. IEEE Transactions on Parallel and Distributed Systems, 99, I. Roy. Algorithmic Techniques for the Micron Automata Processor. PhD thesis, Georgia Institute of Technology, T. Tracy II, M. Stan, N. Brunelle, J. Wadden, K. Wang, K. Skadron, and G. Robins. Nondeterministic Finite Automata in Hardware - the Case of the Levenshtein Automaton. Workshop on Architectures and Systems for Big Data (ASBD), in conjunction with ISCA, K. Zhou, J.J. Fox, Ke Wang, D.E. Brown, and K. Skadron. Brill Tagging on the Micron Automata Processor. In Semantic Computing (ICSC), 2015 IEEE International Conference on, pages , Feb

AN OVERVIEW OF MICRON S

AN OVERVIEW OF MICRON S AN OVERVIEW OF MICRON S 1 Ke Wang, 1 Kevin Angstadt, 1 Chunkun Bo, 1 Nathan Brunelle, 1 Elaheh Sadredini, 2 Tommy Tracy II, 1 Jack Wadden, 2 Mircea Stan, 1 Kevin Skadron Center for Automata Computing 1

More information

Automata-to-Routing: An Open-Source Toolchain for Design-Space Exploration of Spatial Automata Processing Architectures

Automata-to-Routing: An Open-Source Toolchain for Design-Space Exploration of Spatial Automata Processing Architectures Automata-to-Routing: An Open-Source Toolchain for Design-Space Exploration of Spatial Automata Processing Architectures Jack Wadden, Samira Khan, and Kevin Skadron University of Virginia Charlottesville,

More information

ANMLZoo: A Benchmark Suite for Exploring Bottlenecks in Automata Processing Engines and Architectures

ANMLZoo: A Benchmark Suite for Exploring Bottlenecks in Automata Processing Engines and Architectures ANMLZoo: A Benchmark Suite for Exploring Bottlenecks in Automata Processing Engines and Architectures Jack Wadden, Vinh Dang, Nathan Brunelle, Tommy Tracy II, Deyuan Guo, Elaheh Sadredini, Ke Wang, Chunkun

More information

Automata Computing. Mircea R. Stan UVA-ECE Center for Automata Processing (CAP) Co-Director Kevin Skadron, UVA-CS, CAP Director

Automata Computing. Mircea R. Stan UVA-ECE Center for Automata Processing (CAP) Co-Director Kevin Skadron, UVA-CS, CAP Director Automata Computing Mircea R. Stan (mircea@virginia.edu), UVA-ECE Center for Automata Processing (CAP) Co-Director Kevin Skadron, UVA-CS, CAP Director 2013 Micron Technology, Inc. All rights reserved. Products

More information

Entity Resolution Acceleration using the Automata Processor

Entity Resolution Acceleration using the Automata Processor Entity Resolution Acceleration using the Automata Processor Chunkun Bo 1, Ke Wang 1, Jeffrey J. Fox 2, Kevin Skadron 1 1 Department of Computer Science, 2 Department of Material Science University of Virginia

More information

STRING KERNEL TESTING ACCELERATION USING MICRON S AUTOMATA PROCESSOR

STRING KERNEL TESTING ACCELERATION USING MICRON S AUTOMATA PROCESSOR STRING KERNEL TESTING ACCELERATION USING MICRON S AUTOMATA PROCESSOR Chunkun Bo 1,2, Ke Wang 1,2, Yanjun (Jane) Qi 1, Kevin Skadron 1,2 1 Department of Computer Science 2 Center for Automata Processing

More information

RAPID Programming of Pattern-Recognition Processors

RAPID Programming of Pattern-Recognition Processors RAPID Programming of Pattern-Recognition Processors Kevin Angstadt Westley Weimer Kevin Skadron Department of Computer Science University of Virginia {angstadt, weimer, skadron}@cs.virginia.edu 6. April

More information

Entity Resolution Acceleration using Micron s Automata Processor

Entity Resolution Acceleration using Micron s Automata Processor Entity Resolution Acceleration using Micron s Automata Processor Chunkun Bo 1, Ke Wang 1, Jeffrey J. Fox 2, and Kevin Skadron 1 1 Department of Computer Science 2 Department of Materials Science and Engineering

More information

REAPR: Reconfigurable Engine for Automata Processing

REAPR: Reconfigurable Engine for Automata Processing REAPR: Reconfigurable Engine for Automata Processing Ted Xie, Vinh Dang 2, Jack Wadden 2, Kevin Skadron 2, Mircea Stan Department of Electrical and Computer Engineering, University of Virginia 2 Department

More information

Cellular Automata on the Micron Automata Processor

Cellular Automata on the Micron Automata Processor Cellular Automata on the Micron Automata Processor Ke Wang Department of Computer Science University of Virginia Charlottesville, VA kewang@virginia.edu Kevin Skadron Department of Computer Science University

More information

Accelerating Pattern Searches with Hardware. Kevin Angstadt CS 6354: Graduate Architecture 7. April 2016

Accelerating Pattern Searches with Hardware. Kevin Angstadt CS 6354: Graduate Architecture 7. April 2016 Accelerating Pattern Searches with Hardware Kevin Angstadt angstadt@virginia.edu CS 6354: raduate Architecture 7. April 2016 Who am I? Work with Wes Weimer and Kevin Skadron PL + Architecture Programming

More information

Hardware-accelerated regular expression matching with overlap handling on IBM PowerEN processor

Hardware-accelerated regular expression matching with overlap handling on IBM PowerEN processor Kubilay Atasu IBM Research Zurich 23 May 2013 Hardware-accelerated regular expression matching with overlap handling on IBM PowerEN processor Kubilay Atasu, Florian Doerfler, Jan van Lunteren, and Christoph

More information

Yu, Jintao; Du Nguyen, Hoang Anh; Abu Lebdeh, Muath; Taouil, Mottaqiallah; Hamdioui, Said

Yu, Jintao; Du Nguyen, Hoang Anh; Abu Lebdeh, Muath; Taouil, Mottaqiallah; Hamdioui, Said Delft University of Technology Time-division Multiplexing Automata Processor Yu, Jintao; Du Nguyen, Hoang Anh; Abu Lebdeh, Muath; Taouil, Mottaqiallah; Hamdioui, Said Publication date 2019 Document Version

More information

PAPER Accelerating Weeder: A DNA Motif Search Tool using the Micron Automata Processor and FPGA

PAPER Accelerating Weeder: A DNA Motif Search Tool using the Micron Automata Processor and FPGA IEICE TRANS. INF. & SYST., VOL.Exx??, NO.xx XXXX 200x 1 PAPER Accelerating Weeder: A DNA Motif Search Tool using the Micron Automata Processor and FPGA Qiong WANG a), Member, Mohamed EL-HADEDY b), Kevin

More information

RAPID Programming of Pattern-Recognition Processors

RAPID Programming of Pattern-Recognition Processors RAPID Programming of Pattern-Recognition Processors Kevin Angstadt Department of Computer Science University of Virginia Charlottesville, VA 22904-4740 angstadt@cs.virginia.edu Abstract We present RAPID,

More information

PL in the Broader Research Community

PL in the Broader Research Community PL in the Broader Research Community EECS 590: Advanced Programming Languages 27. November 2017 Kevin Angstadt angstadt@umich.edu 1 Who am I? Fourth-year PhD student (I did my first three years at UVA)

More information

Advantages and challenges of programming the Micron Automata Processor

Advantages and challenges of programming the Micron Automata Processor Graduate Theses and Dissertations Graduate College 2013 Advantages and challenges of programming the Micron Automata Processor Christopher Sabotta Iowa State University Follow this and additional works

More information

ALGORITHMIC TECHNIQUES FOR THE MICRON AUTOMATA PROCESSOR

ALGORITHMIC TECHNIQUES FOR THE MICRON AUTOMATA PROCESSOR ALGORITHMIC TECHNIQUES FOR THE MICRON AUTOMATA PROCESSOR A Thesis Presented to The Academic Faculty by Indranil Roy In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the

More information

Appears in IJPP, first published online Jan The authoritative version appears at: DOI: /s y

Appears in IJPP, first published online Jan The authoritative version appears at: DOI: /s y Noname manuscript No. (will be inserted by the editor) Appears in IJPP, first published online Jan. 2017 The authoritative version appears at: DOI:10.1007/s10766-017-0489-y Hierarchical Pattern Mining

More information

Developing a Data Driven System for Computational Neuroscience

Developing a Data Driven System for Computational Neuroscience Developing a Data Driven System for Computational Neuroscience Ross Snider and Yongming Zhu Montana State University, Bozeman MT 59717, USA Abstract. A data driven system implies the need to integrate

More information

Programmable Near-Memory Acceleration on ConTutto

Programmable Near-Memory Acceleration on ConTutto Programmable Near- Acceleration on ConTutto Jan van Lunteren, IBM Research Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit IBM Zurich (CH) Team Jan van Lunteren, Christoph Hagleitner

More information

Efficient Data Structures for Tamper-Evident Logging

Efficient Data Structures for Tamper-Evident Logging Efficient Data Structures for Tamper-Evident Logging Scott A. Crosby Dan S. Wallach Rice University Everyone has logs Tamper evident solutions Current commercial solutions Write only hardware appliances

More information

Automatic Scaling Iterative Computations. Aug. 7 th, 2012

Automatic Scaling Iterative Computations. Aug. 7 th, 2012 Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th, 2012 1 What are Non-Iterative Computations? Non-iterative computation flow Directed Acyclic Examples Batch style analytics

More information

Ruler: High-Speed Packet Matching and Rewriting on Network Processors

Ruler: High-Speed Packet Matching and Rewriting on Network Processors Ruler: High-Speed Packet Matching and Rewriting on Network Processors Tomáš Hrubý Kees van Reeuwijk Herbert Bos Vrije Universiteit, Amsterdam World45 Ltd. ANCS 2007 Tomáš Hrubý (VU Amsterdam, World45)

More information

Preview. The Thread Model Motivation of Threads Benefits of Threads Implementation of Thread

Preview. The Thread Model Motivation of Threads Benefits of Threads Implementation of Thread Preview The Thread Model Motivation of Threads Benefits of Threads Implementation of Thread Implement thread in User s Mode Implement thread in Kernel s Mode CS 431 Operating System 1 The Thread Model

More information

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design PSD3A Principles of Compiler Design Unit : I-V 1 UNIT I - SYLLABUS Compiler Assembler Language Processing System Phases of Compiler Lexical Analyser Finite Automata NFA DFA Compiler Tools 2 Compiler -

More information

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer: Theoretical Part Chapter one:- - What are the Phases of compiler? Six phases Scanner Parser Semantic Analyzer Source code optimizer Code generator Target Code Optimizer Three auxiliary components Literal

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction

More information

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation Why are threads useful? How does one use POSIX pthreads? Michael Swift 1 2 What s in a process? Organizing a Process A process

More information

Theory and Compiling COMP360

Theory and Compiling COMP360 Theory and Compiling COMP360 It has been said that man is a rational animal. All my life I have been searching for evidence which could support this. Bertrand Russell Reading Read sections 2.1 3.2 in the

More information

Parallel Exact Inference on the Cell Broadband Engine Processor

Parallel Exact Inference on the Cell Broadband Engine Processor Parallel Exact Inference on the Cell Broadband Engine Processor Yinglong Xia and Viktor K. Prasanna {yinglonx, prasanna}@usc.edu University of Southern California http://ceng.usc.edu/~prasanna/ SC 08 Overview

More information

Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions

Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions PhD Dissertation Proposal Kevin Angstadt angstadt@umich.edu 8. December 28 By 22, it s estimated that for

More information

Resource-efficient regular expression matching architecture for text analytics

Resource-efficient regular expression matching architecture for text analytics Resource-efficient regular expression matching architecture for text analytics Kubilay Atasu IBM Research - Zurich Presented at ASAP 2014 SystemT: an algebraic approach to declarative information extraction

More information

High-Performance Holistic XML Twig Filtering Using GPUs. Ildar Absalyamov, Roger Moussalli, Walid Najjar and Vassilis Tsotras

High-Performance Holistic XML Twig Filtering Using GPUs. Ildar Absalyamov, Roger Moussalli, Walid Najjar and Vassilis Tsotras High-Performance Holistic XML Twig Filtering Using GPUs Ildar Absalyamov, Roger Moussalli, Walid Najjar and Vassilis Tsotras Outline! Motivation! XML filtering in the literature! Software approaches! Hardware

More information

Supplementary Material for An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing

Supplementary Material for An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing 1 Supplementary Material for An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing Paul Dlugosch, Dave Brown, Paul Glendenning, Michael Leventhal, Harold Noyes Member, IEEE

More information

CS 406/534 Compiler Construction Putting It All Together

CS 406/534 Compiler Construction Putting It All Together CS 406/534 Compiler Construction Putting It All Together Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy

More information

Re-architecting Virtualization in Heterogeneous Multicore Systems

Re-architecting Virtualization in Heterogeneous Multicore Systems Re-architecting Virtualization in Heterogeneous Multicore Systems Himanshu Raj, Sanjay Kumar, Vishakha Gupta, Gregory Diamos, Nawaf Alamoosa, Ada Gavrilovska, Karsten Schwan, Sudhakar Yalamanchili College

More information

Basic Memory Management. Basic Memory Management. Address Binding. Running a user program. Operating Systems 10/14/2018 CSC 256/456 1

Basic Memory Management. Basic Memory Management. Address Binding. Running a user program. Operating Systems 10/14/2018 CSC 256/456 1 Basic Memory Management Program must be brought into memory and placed within a process for it to be run Basic Memory Management CS 256/456 Dept. of Computer Science, University of Rochester Mono-programming

More information

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function Chen-Ting Chang, Yu-Sheng Chen, I-Wei Wu, and Jyh-Jiun Shann Dept. of Computer Science, National Chiao

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Multithreading Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to large compute clusters Can perform multiple

More information

Red Fox: An Execution Environment for Relational Query Processing on GPUs

Red Fox: An Execution Environment for Relational Query Processing on GPUs Red Fox: An Execution Environment for Relational Query Processing on GPUs Georgia Institute of Technology: Haicheng Wu, Ifrah Saeed, Sudhakar Yalamanchili LogicBlox Inc.: Daniel Zinn, Martin Bravenboer,

More information

Metropolitan Road Traffic Simulation on FPGAs

Metropolitan Road Traffic Simulation on FPGAs Metropolitan Road Traffic Simulation on FPGAs Justin L. Tripp, Henning S. Mortveit, Anders Å. Hansson, Maya Gokhale Los Alamos National Laboratory Los Alamos, NM 85745 Overview Background Goals Using the

More information

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS Christian Ferdinand and Reinhold Heckmann AbsInt Angewandte Informatik GmbH, Stuhlsatzenhausweg 69, D-66123 Saarbrucken, Germany info@absint.com

More information

CIS 1.5 Course Objectives. a. Understand the concept of a program (i.e., a computer following a series of instructions)

CIS 1.5 Course Objectives. a. Understand the concept of a program (i.e., a computer following a series of instructions) By the end of this course, students should CIS 1.5 Course Objectives a. Understand the concept of a program (i.e., a computer following a series of instructions) b. Understand the concept of a variable

More information

Sequential Pattern Mining with the Micron Automata Processor

Sequential Pattern Mining with the Micron Automata Processor Sequential Pattern Mining with the Micron Automata Processor Ke Wang, Elaheh Sadredini, Kevin Skadron Department of Computer Science University of Virginia Charlottesville, VA, 2294 USA {kewang, elaheh,

More information

Red Fox: An Execution Environment for Relational Query Processing on GPUs

Red Fox: An Execution Environment for Relational Query Processing on GPUs Red Fox: An Execution Environment for Relational Query Processing on GPUs Haicheng Wu 1, Gregory Diamos 2, Tim Sheard 3, Molham Aref 4, Sean Baxter 2, Michael Garland 2, Sudhakar Yalamanchili 1 1. Georgia

More information

Performance Best Practices Paper for IBM Tivoli Directory Integrator v6.1 and v6.1.1

Performance Best Practices Paper for IBM Tivoli Directory Integrator v6.1 and v6.1.1 Performance Best Practices Paper for IBM Tivoli Directory Integrator v6.1 and v6.1.1 version 1.0 July, 2007 Table of Contents 1. Introduction...3 2. Best practices...3 2.1 Preparing the solution environment...3

More information

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions CSE45 Translation of Programming Languages Lecture 2: Automata and Regular Expressions Finite Automata Regular Expression = Specification Finite Automata = Implementation A finite automaton consists of:

More information

Capriccio : Scalable Threads for Internet Services

Capriccio : Scalable Threads for Internet Services Capriccio : Scalable Threads for Internet Services - Ron von Behren &et al - University of California, Berkeley. Presented By: Rajesh Subbiah Background Each incoming request is dispatched to a separate

More information

Basic Memory Management

Basic Memory Management Basic Memory Management CS 256/456 Dept. of Computer Science, University of Rochester 10/15/14 CSC 2/456 1 Basic Memory Management Program must be brought into memory and placed within a process for it

More information

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte

More information

Processes & Threads. Process Management. Managing Concurrency in Computer Systems. The Process. What s in a Process?

Processes & Threads. Process Management. Managing Concurrency in Computer Systems. The Process. What s in a Process? Process Management Processes & Threads Managing Concurrency in Computer Systems Process management deals with several issues: what are the units of execution how are those units of execution represented

More information

Comparing and Contrasting different Approaches of Code Generator(Enum,Map-Like,If-else,Graph)

Comparing and Contrasting different Approaches of Code Generator(Enum,Map-Like,If-else,Graph) Comparing and Contrasting different Approaches of Generator(Enum,Map-Like,If-else,Graph) Vivek Tripathi 1 Sandeep kumar Gonnade 2 Mtech Scholar 1 Asst.Professor 2 Department of Computer Science & Engineering,

More information

GPU Sparse Graph Traversal. Duane Merrill

GPU Sparse Graph Traversal. Duane Merrill GPU Sparse Graph Traversal Duane Merrill Breadth-first search of graphs (BFS) 1. Pick a source node 2. Rank every vertex by the length of shortest path from source Or label every vertex by its predecessor

More information

Computer Architecture: Dataflow/Systolic Arrays

Computer Architecture: Dataflow/Systolic Arrays Data Flow Computer Architecture: Dataflow/Systolic Arrays he models we have examined all assumed Instructions are fetched and retired in sequential, control flow order his is part of the Von-Neumann model

More information

Node Prefetch Prediction in Dataflow Graphs

Node Prefetch Prediction in Dataflow Graphs Node Prefetch Prediction in Dataflow Graphs Newton G. Petersen Martin R. Wojcik The Department of Electrical and Computer Engineering The University of Texas at Austin newton.petersen@ni.com mrw325@yahoo.com

More information

Automatic Intra-Application Load Balancing for Heterogeneous Systems

Automatic Intra-Application Load Balancing for Heterogeneous Systems Automatic Intra-Application Load Balancing for Heterogeneous Systems Michael Boyer, Shuai Che, and Kevin Skadron Department of Computer Science University of Virginia Jayanth Gummaraju and Nuwan Jayasena

More information

Noise Injection Techniques to Expose Subtle and Unintended Message Races

Noise Injection Techniques to Expose Subtle and Unintended Message Races Noise Injection Techniques to Expose Subtle and Unintended Message Races PPoPP2017 February 6th, 2017 Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Martin Schulz and Christopher M. Chambreau

More information

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Stanley Bak Abstract Network algorithms are deployed on large networks, and proper algorithm evaluation is necessary to avoid

More information

Frequent Subtree Mining on the Automata Processor: Challenges and Opportunities.

Frequent Subtree Mining on the Automata Processor: Challenges and Opportunities. Frequent Subtree Mining on the Automata Processor: Challenges and Opportunities Elaheh Sadredini Dept. of Computer Science University of Virginia Charloesville, VA, 22903 USA elaheh@virginia.edu Reza Rahimi

More information

Introduction to Computing and Systems Architecture

Introduction to Computing and Systems Architecture Introduction to Computing and Systems Architecture 1. Computability A task is computable if a sequence of instructions can be described which, when followed, will complete such a task. This says little

More information

Security Based Heuristic SAX for XML Parsing

Security Based Heuristic SAX for XML Parsing Security Based Heuristic SAX for XML Parsing Wei Wang Department of Automation Tsinghua University, China Beijing, China Abstract - XML based services integrate information resources running on different

More information

Equivalence of NTMs and TMs

Equivalence of NTMs and TMs Equivalence of NTMs and TMs What is a Turing Machine? Similar to a finite automaton, but with unlimited and unrestricted memory. It uses an infinitely long tape as its memory which can be read from and

More information

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer Assigned: Thursday, September 16, 2004 Due: Tuesday, September 28, 2004, at 11:59pm September 16, 2004 1 Introduction Overview In this

More information

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems Alexey Paznikov Saint Petersburg Electrotechnical University

More information

Detecting Manipulated Remote Call Streams

Detecting Manipulated Remote Call Streams Detecting Manipulated Remote Call Streams Jonathon Giffin, Somesh Jha, Barton Miller Computer Sciences Department University of Wisconsin giffin@cs.wisc.edu Intrusion Detection and Specification-Based

More information

Lexical Scanning COMP360

Lexical Scanning COMP360 Lexical Scanning COMP360 Captain, we re being scanned. Spock Reading Read sections 2.1 3.2 in the textbook Regular Expression and FSA Assignment A new assignment has been posted on Blackboard It is due

More information

escience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows

escience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows escience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows Jie Li1, Deb Agarwal2, Azure Marty Platform Humphrey1, Keith Jackson2, Catharine van Ingen3, Youngryel Ryu4

More information

Neha 1, Abhishek Sharma 2 1 M.Tech, 2 Assistant Professor. Department of Cse, Shri Balwant College of Engineering &Technology, Dcrust University

Neha 1, Abhishek Sharma 2 1 M.Tech, 2 Assistant Professor. Department of Cse, Shri Balwant College of Engineering &Technology, Dcrust University Methods of Regular Expression Neha 1, Abhishek Sharma 2 1 M.Tech, 2 Assistant Professor Department of Cse, Shri Balwant College of Engineering &Technology, Dcrust University Abstract - Regular expressions

More information

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture

More information

AN 831: Intel FPGA SDK for OpenCL

AN 831: Intel FPGA SDK for OpenCL AN 831: Intel FPGA SDK for OpenCL Host Pipelined Multithread Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1 Intel FPGA SDK for OpenCL Host Pipelined Multithread...3 1.1

More information

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:

More information

LECTURE NOTES ON COMPILER DESIGN P a g e 2

LECTURE NOTES ON COMPILER DESIGN P a g e 2 LECTURE NOTES ON COMPILER DESIGN P a g e 1 (PCCS4305) COMPILER DESIGN KISHORE KUMAR SAHU SR. LECTURER, DEPARTMENT OF INFORMATION TECHNOLOGY ROLAND INSTITUTE OF TECHNOLOGY, BERHAMPUR LECTURE NOTES ON COMPILER

More information

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No. # 01 Lecture No. # 01 An Overview of a Compiler This is a lecture about

More information

CS415 Compilers. Lexical Analysis

CS415 Compilers. Lexical Analysis CS415 Compilers Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 7 1 Announcements First project and second homework

More information

NordiaSoft SCA Architect 2016

NordiaSoft SCA Architect 2016 SCA Architect NordiaSoft SCA Architect is the modeling tool used by developers to compose and assemble software components into applications. Based on a Model-Driven Development (MDD) concept, SCA Architect

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Computer Systems Engineering: Spring Quiz I Solutions

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Computer Systems Engineering: Spring Quiz I Solutions Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.033 Computer Systems Engineering: Spring 2011 Quiz I Solutions There are 10 questions and 12 pages in this

More information

Teaching and Training Formal Methods for Safety Critical Systems

Teaching and Training Formal Methods for Safety Critical Systems Teaching and Training Formal Methods for Safety Critical Systems Michael Lipaczewski and Frank Ortmeier Computer Systems in Engineering Otto-von-Guericke University Magdeburg {michael.lipaczewski,frank.ortmeier}@ovgu.de

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

Compiler phases. Non-tokens

Compiler phases. Non-tokens Compiler phases Compiler Construction Scanning Lexical Analysis source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2011 01 21 parse tree

More information

LDetector: A low overhead data race detector for GPU programs

LDetector: A low overhead data race detector for GPU programs LDetector: A low overhead data race detector for GPU programs 1 PENGCHENG LI CHEN DING XIAOYU HU TOLGA SOYATA UNIVERSITY OF ROCHESTER 1 Data races in GPU Introduction & Contribution Impact correctness

More information

CellSs Making it easier to program the Cell Broadband Engine processor

CellSs Making it easier to program the Cell Broadband Engine processor Perez, Bellens, Badia, and Labarta CellSs Making it easier to program the Cell Broadband Engine processor Presented by: Mujahed Eleyat Outline Motivation Architecture of the cell processor Challenges of

More information

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation Language Implementation Methods The Design and Implementation of Programming Languages Compilation Interpretation Hybrid In Text: Chapter 1 2 Compilation Interpretation Translate high-level programs to

More information

BERKELEY PAR LAB. RAMP Gold Wrap. Krste Asanovic. RAMP Wrap Stanford, CA August 25, 2010

BERKELEY PAR LAB. RAMP Gold Wrap. Krste Asanovic. RAMP Wrap Stanford, CA August 25, 2010 RAMP Gold Wrap Krste Asanovic RAMP Wrap Stanford, CA August 25, 2010 RAMP Gold Team Graduate Students Zhangxi Tan Andrew Waterman Rimas Avizienis Yunsup Lee Henry Cook Sarah Bird Faculty Krste Asanovic

More information

Action Language Verifier, Extended

Action Language Verifier, Extended Action Language Verifier, Extended Tuba Yavuz-Kahveci 1, Constantinos Bartzis 2, and Tevfik Bultan 3 1 University of Florida 2 Carnegie Mellon University 3 UC, Santa Barbara 1 Introduction Action Language

More information

White Paper. The Benefits of Object-Based Architectures for SCADA and Supervisory Systems. What s Inside:

White Paper. The Benefits of Object-Based Architectures for SCADA and Supervisory Systems. What s Inside: White Paper The Benefits of Object-Based Architectures for SCADA and Supervisory Systems Author: Steven D. Garbrecht, Vice President of Software and Advanced Applications Marketing, Invensys Operations

More information

Fahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou. University of Maryland Baltimore County

Fahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou. University of Maryland Baltimore County Accelerating a climate physics model with OpenCL Fahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou University of Maryland Baltimore County Introduction The demand to increase forecast predictability

More information

Simulation of Timed Input/Output Automata

Simulation of Timed Input/Output Automata Simulation of Timed Input/Output Automata M.Eng Thesis Proposal Panayiotis P. Mavrommatis December 13, 2005 Abstract This proposal describes the design of the TIOA Simulator, a vital component of the TIOA

More information

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures University of Virginia Dept. of Computer Science Technical Report #CS-2011-09 Jeremy W. Sheaffer and Kevin

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

Approximate Search and Data Reduction Algorithms

Approximate Search and Data Reduction Algorithms Approximate Search and Data Reduction Algorithms Research Questions Kyle Porter NTNU Gjøvik Outline of Presentation Introduction: Problems General Goals Research Questions Brief theoretical/practical background

More information

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering

More information

System Call. Preview. System Call. System Call. System Call 9/7/2018

System Call. Preview. System Call. System Call. System Call 9/7/2018 Preview Operating System Structure Monolithic Layered System Microkernel Virtual Machine Process Management Process Models Process Creation Process Termination Process State Process Implementation Operating

More information

Outline. Threads. Single and Multithreaded Processes. Benefits of Threads. Eike Ritter 1. Modified: October 16, 2012

Outline. Threads. Single and Multithreaded Processes. Benefits of Threads. Eike Ritter 1. Modified: October 16, 2012 Eike Ritter 1 Modified: October 16, 2012 Lecture 8: Operating Systems with C/C++ School of Computer Science, University of Birmingham, UK 1 Based on material by Matt Smart and Nick Blundell Outline 1 Concurrent

More information

CMSC 350: COMPILER DESIGN

CMSC 350: COMPILER DESIGN Lecture 11 CMSC 350: COMPILER DESIGN see HW3 LLVMLITE SPECIFICATION Eisenberg CMSC 350: Compilers 2 Discussion: Defining a Language Premise: programming languages are purely formal objects We (as language

More information

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M

More information

Performance analysis basics

Performance analysis basics Performance analysis basics Christian Iwainsky Iwainsky@rz.rwth-aachen.de 25.3.2010 1 Overview 1. Motivation 2. Performance analysis basics 3. Measurement Techniques 2 Why bother with performance analysis

More information

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process

More information

Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim

Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim Farzad Farshchi, Qijing Huang, Heechul Yun University of Kansas, University of California, Berkeley SiFive Internship Rocket

More information