VASim: An Open Virtual Automata Simulator for Automata Processing Research

Size: px

Start display at page:

Download "VASim: An Open Virtual Automata Simulator for Automata Processing Research"

Edward Greene
6 years ago
Views:

1 University of Virginia Technical Report #CS VASim: An Open Virtual Automata Simulator for Automata Processing Research J. Wadden 1, K. Skadron 1 We present VASim, an open, extensible virtual automata simulator for automata processing research. VASim is open source, and easily extensible. VASim allows users to implement automata transformation and optimization passes, for increased performance. VASim is parametrically multithreaded, and can automatically divide connected automata components, and sections of the input stream among parallel threads. We compare performance of VASim against two closed-source automata processing tools, and show 10 to 1000 of times speedups. Keywords: automata processing, non-deterministic finite automata, parallel processing, simulation, benchmarking. Introduction Micron's Automata Processor (AP) is a memory-based architecture purpose built to accelerate computation of non-deterministic finite automata (NFAs) [4]. Evaluation, prototyping, and debugging of potential applications by automata developers is limited to two software tools provided by Micron: batchsim, and apemulate [1]. Each tool is designed to functionally simulate the behavior of the hardware, assisting automata developers in prototyping, and evaluation of automata-based applications. The first software tool, batchsim, takes an automata (defined by Micron's Automata Network Markup Language or ANML [1]) and an input file, and simulates the functionality of the AP by running the automata on the input file. batchsim has three main drawbacks: 1) it is closed source, meaning that existing critical bugs are fixed slowly, and feature requests, simulation enhancements, or experimental extensions to the computational model are limited by Micron's engineering resources and interests, 2) it is slow, often taking hours to simulate complex automata with reasonably sized input, and consumes massive amounts of memory, leading to killed processes if run for longer than a few hours and 3) it is tied to Micron's SDK, which makes the tool more cumbersome to acquire, install, and use. The second tool, apemulate is functionally very similar to batchsim, but apemulate uses a hardware accurate emulation of an AP, rather than a functional simulation to execute automata. An abstract automaton must first be compiled from an ANML description by Microns apcompile tool and placed and routed into a hardware specific finite state machine. Otherwise, apemulate is functionally identical to batchsim, and can simulate compiled finite state machines on input byte streams, emulating the AP hardware. apemulate is usually much faster than batchsim, due to automata optimizations made during the compilation step, and is thus much more usable as a prototyping tool. However, because automata must first be successfully compiled, placed, and routed using Microns apcompile tool, evaluation of potential applications on the AP using apemulate are limited to those that can be placed and routed by Microns compiler. Hypothetical automata with interesting additional features, or automata that barely violate these architectural restrictions cannot be evaluated at all. Similar to batchsim, apemulate is closed source, and tied to Micron's SDK, and suffers from the same associated drawbacks. To address the above issues with existing tools, we present the Virtual Automata Simulator, or VASim. VASim is an open source automata simulation, optimization, and transformation platform designed for easy prototyping, debugging, research extensions, and evaluation of automata-based applications. VASim was constructed using object oriented design in modern 1 University of Virginia, Dept. of Computer Science: {wadden,skadron}@virginia.edu 1

2 1 VASIM ARCHITECTURE AND FEATURES C++ and is designed from the ground up to be easy to understand, maintain, and extend. While performance was not the main driving consideration during development, VASim is fast, and is currently times faster than batchsim and 2-9 times faster than apemulate in our evaluation. When VASim applies performance optimizations to the automata topography, VASim performs times faster than batchsim, and times faster than apemulate in single threaded computation. Furthermore, VASim is parametrically multi-threaded in two dimensions of parallelism common to automata-based applications, further increasing its performance and usefulness over batchsim and apemulate. This paper first presents the architecture of VASim and its design considerations. We then evaluate six applications previously published in the literature using VASim and compare its single-threaded and multi-threaded performance against batchsim and apemulate. 1. VASim Architecture and Features 1.1. Object-oriented Design VASim uses object oriented design principles throughout its architecture to ensure easy understanding of code, code reuse, and easy feature creation or extension. Object-oriented design also allows for easier bug isolation, and performance profiling, which was critical in the development of the tool, and will allow easy modifications and enhancements. VASim is responsible for one or more Automata objects, defined initially by one or more ANML files, although automata can be constructed within VASim programmatically if desired. Each Automata object contains a directed graph of automata Elements. Automata Elements are divided into two main classes: the STE class represents the functionality of an AP state transition element (STE) [4]. The SpecialElement class represents any other type of automata logic or functionality, e.g. counters and boolean elements as available in the first generation AP micro-architecture[4]. Each Element can be enabled by incoming signals from other Elements in the graph. If enabled, an Element computes a boolean function (e.g. matching for STEs, or boolean logic for logic gates). If the result of that computation is true, the Element activates and enables its immediate successors in the directed automata graph. While VASim is designed to simulate Micron's AP, hypothetical new enhancements to Micron's architecture can be easily inserted into VASim, debugged, and evaluated as hypothetical features. Furthermore, new automata models can be added, increasing the available computational models for any application ANML Parser and Optimizations VASim uses Micron's Automata Network Markup Language (ANML) [1] as the main automata input language. While ANML was chosen because it is the automata definition language used for prototyping and evaluation in Micron's software and hardware environment, VASim could potentially process any arbitrary automata format, or programmatically construct automata internally. Once an automata has been parsed into an Automata object data structure, VASim allows the programmer to implement optimization and transformation passes over the graph. One example optimization already implemented in both VASim and Micron s apcompile ANML compiler tool is left minimization. Left minimization does a breadth first search from an NFA s start states, combining functionally identical elements reducing the size of the automata. By eliminating redundant states, left minimization is essentially an automata compression that increases the space efficiency of automata on spatial automata processors (like the AP). By eliminating redundant state transitions, left-minimization also greatly increases performance of automata 2

3 2 EVALUATION 1.3 Multi-dimensional Parameterized Multi-threading processing on von Neumann-based architectures for highly-compressable automata. This feature is available to apemulate via Micron's compiler optimizations, but is not available to batchsim. The benefits of left minimization are evaluated in Section Multi-dimensional Parameterized Multi-threading Automata processing can be parallelized in two dimensions 1) automata can be partitioned (e.g. independent connected components of the automata graph) into parallel pieces, and divided among threads, or 2) the input stream of bytes can be partitioned (if allowable by the application) and offered to threads processing identical automata. VASim offers zero-effort multi-threading, allowing the programmer to define how many threads should be devoted to each dimension of parallel automata computation, furthering its usefulness over batchsim and apemulate VASim Core Simulation Architecture The core architecture of VASim is a loop that processes input symbols, and keeps track of activity in the automata. Each loop iteration (symbol cycle) is divided into five main stages: 1) The first stage enables all start STE Elements in the graph, initializing computation. If an Element is enabled, it is added to a special EnabledSTEsQueue. 2) The second stage reads all enabled STEs off of the EnabledQueue and computes the match function of each. If an STE matches, it is pushed pushed to an activatedstes queue. The third stage reads all activated STEs from the ActivatedSTES queue and propagates their output signals to all child elements. Each child element is added to one of two queues, EnabledSTEs if the child is an STE, or EnabledSpecialElements if the child is a SpecialElement. 4) Because the AP accomplishes logic computation immediately after STE computation within a single symbol cycle, stage four reads SpecialElements from the EnabledSpecialElements queue, and computes the particular SpecialElement boolean functions. Children of SpecialElements that compute true are pushed to a special queue ActivatedSpecialElements. 5) Similar to stage 3, stage 5 pushes the children of all SpecialElements in the ActivatedSpecialElements queue to their respective enable queues. These queues allow VASim to only ever operate on Elements in the graph that require computation, increasing efficiency over naive implementations that might consider all states on every cycle. Furthermore, the execution core can be easily modified or extended, by modifying or adding one or more stages in the main simulation pipeline. 2. Evaluation While speed of automata processing was not considered a main design constraint in the original VASim prototype, VASim is fast, and uses efficient data structures and algorithms to reduce the amount and expense of all automata computation. Furthermore, VASim is multithreaded making it trivial to scale automata computation across multiple cores. In this section, we evaluate VASim against existing, single-threaded tools Benchmarks We consider six previously published automata applications as benchmarks for evaluating the performance of VASim against batchsim and apemulate. Brill Tagging [7], a widely used rule-based part-of-speech tagger. Hamming Distance [5], for calculating the number of differences between two strings. We evaluate 1000, Hamming distance 3 automata of length 8. Levenshtein Automata [6], for calculating the edit distance between two strings. We evaluate 80, edit distance 3 Levenshtein automata of length 31 for random nucleotide sequences. Protomata [5], a pattern-based approach for identifying protein motifs build from the prosite database. Entity 3

4 2.2 Results 3 CONCLUSIONS AND FUTURE WORK Resolution [3], for reconciling duplicate entries in databases. PowerEN Complex1 [2], a synthetic set of simple regular expressions Results Table 1 shows runtimes of VASim, batchsim, and apemulate over all benchmarks. All experiments were performed on a 6-core (12-thread) Intel i7-5820k running at 3.3GHz, with 32GB of RAM using Micron's AP SDK version Each automata was run in its original form, and then run after applying optimizations (either VASim's left-minimization pass for VASim and batchsim, or apcompile s -O1 flag for apemulate). All tools were run with output reporting enabled and runtime was measured using the Unix time command. Single-threaded VASim speedups are highlighted in brackets, and do not consider VASims optimizations when applied to batchsim. When left-minimization is enabled in VASim, it is never less than 20 faster than batchsim, and 11 faster than apemulate, and is often much faster. batchsim benefits greatly from VASim's optimizations, increasing its performance 1.24 to 15.4 depending on the benchmark, and sometimes even prevents the tool from being killed in the case of the Entity Resolution benchmark. apemulate is generally faster than batchsim when optimizations are considered, but fails to place and route some automata (e.g. a Levenshtein automata) that can otherwise be functionally simulated by VASim or batchsim. Benchmark Input(KB) Opt? VASim(s) batchsim(s) apemulate(s) VASimMT(s) Brill Tagging 9, ,776.3 [26.9x] 3,555 [29.2x] 18.0 Brill Tagging 9,885 D [1,388x] [86.9x] 0.8 Protomata 13,210 1,481 killed 15,424 [21.9x] Protomata 13,210 D killed 4,366 [28.7x] 24.3 Hamming [20.2x] [11.5x] 2.1 Hamming 100 D [36.1x] [14.2x] 1.8 Levenshtein 10, ,175 [36.1x] 9,761 [34x] 23.8 Levenshtein 10,000 D ,402 [55.8x] failed 26.1 Entity Res. 2, killed 15,424 [134.8x] 8.43 Entity Res. 2,174 D [NA] [207.6x] 0.6 PowerEN 20,000 1,017 killed 24,688 [24.3x] 91.1 PowerEN 20,000 D killed 8,706 [15.3x] 62.6 Table 1. Runtimes of each benchmark application when evaluated using VASim, batchsim, apemulate, and the best performing multi-threaded VASim 3. Conclusions and Future Work This paper presented VASim, an open, extensible, and fast virtual automata simulator for automata processing research. VASim is written in C++ using object-oriented design to make it easy to understand and extend. VASim allows users to implement automata transformation and optimization passes, for increased performance. VASim is parametrically multi-threaded, and can automatically divide connected automata components, and sections of the input stream among threads. VASim offers greatly improved performance and functionality over existing automata evaluation tools, and will be publicly released. Future work will include further performance enhancements, and implementation of other automata transformations. We would like to thank the Center for Automata Processing and Micron Technologies for their help in providing benchmarks for evaluation. 4

5 REFERENCES REFERENCES References 1. AP SDK Documentation K. Atasu, F. Doerfler, J. van Lunteren, and C. Hagleitner. Hardware-accelerated Regular Expression Matching with Overlap Handling on IBM PowerEN Processor. In Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on. IEEE, C. Bo, K. Wang, J.J. Fox, and K. Skadron. Entity Resolution Acceleration using Microns Automata Processor. Workshop on Architectures and Systems for Big Data (ASBD), in conjunction with ISCA, P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, and H. Noyes. An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing. IEEE Transactions on Parallel and Distributed Systems, 99, I. Roy. Algorithmic Techniques for the Micron Automata Processor. PhD thesis, Georgia Institute of Technology, T. Tracy II, M. Stan, N. Brunelle, J. Wadden, K. Wang, K. Skadron, and G. Robins. Nondeterministic Finite Automata in Hardware - the Case of the Levenshtein Automaton. Workshop on Architectures and Systems for Big Data (ASBD), in conjunction with ISCA, K. Zhou, J.J. Fox, Ke Wang, D.E. Brown, and K. Skadron. Brill Tagging on the Micron Automata Processor. In Semantic Computing (ICSC), 2015 IEEE International Conference on, pages , Feb

AN OVERVIEW OF MICRON S

AN OVERVIEW OF MICRON S 1 Ke Wang, 1 Kevin Angstadt, 1 Chunkun Bo, 1 Nathan Brunelle, 1 Elaheh Sadredini, 2 Tommy Tracy II, 1 Jack Wadden, 2 Mircea Stan, 1 Kevin Skadron Center for Automata Computing 1