PatternFinder is a tool that finds non-overlapping or overlapping patterns in any input sequence.

Similar documents
Moving data to the cloud using the MoveToCloud script

Building a 3rd Generation Weather-Model System Test Suite. Paul Madden Tom Henderson.

Chapter 1: Why Program? Computers and Programming. Why Program?

Computer Basics 1/24/13. Computer Organization. Computer systems consist of hardware and software.

Computer Basics 1/6/16. Computer Organization. Computer systems consist of hardware and software.

Modifying image file contents with Ghost Explorer. This section includes the following topics:

Installation Notes for Enhydra Director Netscape/IPlanet Web Servers

ECE 2400 Computer Systems Programming, Fall 2018 PA2: List and Vector Data Structures

Chapter 14: Files and Streams

Ftp Command Line Commands Linux Example Windows Put

Supported platforms & compilers Required software Where to download the packages Geant4 toolkit installation (release 10.1.p02)

COBOL-IT Developer Studio 2.0

Main Memory and the CPU Cache

Centreon SSH Connector Documentation

Python 2.6 Ubuntu 10.04LTS Minimum suggested hardware: Dual core x86 CPU, 1Gb RAM Python-fuse (for mount.py360 / py360.py)

Using Visual Studio and VS Code for Embedded C/C++ Development. Marc Goodner, Principal Program Manager, Microsoft

User Manual. Admin Report Kit for IIS 7 (ARKIIS)

Topics. Hardware and Software. Introduction. Main Memory. The CPU 9/21/2014. Introduction to Computers and Programming

EL2310 Scientific Programming

Using Symantec NetBackup 6.5 with Symantec Security Information Manager 4.7

Utilities. Introduction. Working with SCE Platform Files. Working with Directories CHAPTER

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

root.smart Power NET Bandwidth Usage CPU Usage

How to install and build an application. Giuliana Milluzzo INFN-LNS

Building 3D Slicer. MACbioIDi February March Carlos Luque

Download, Install and Setup the Linux Development Workload Create a New Linux Project Configure a Linux Project Configure a Linux CMake Project

CS3210: Operating Systems

Supercomputing environment TMA4280 Introduction to Supercomputing

manifold Documentation

Microsoft Dynamics CRM 2011 Data Load Performance and Scalability Case Study

Binary Markup Toolkit Quick Start Guide Release v November 2016

File System Internals. Jo, Heeseung

Clustering Application Benchmark

Monitoring and Trouble Shooting on BioHPC

DataMan. version 6.5.4

LGTM Enterprise System Requirements. Release , August 2018

Segmentation with Paging. Review. Segmentation with Page (MULTICS) Segmentation with Page (MULTICS) Segmentation with Page (MULTICS)

Zephyr Kernel Installation & Setup Manual

Shell Project Part 1 (140 points)

Preview. COSC350 System Software, Fall

Network+ Guide to Networks, Fourth Edition. Chapter 8 Network Operating Systems and Windows Server 2003-Based Networking

8/16/12. Computer Organization. Architecture. Computer Organization. Computer Basics

Xprint 8.0C MyStats manual

C++ for CMEA II. Kjetil Olsen Lye. September 24, 2015

USING NGC WITH GOOGLE CLOUD PLATFORM

CSE 351, Spring 2010 Lab 7: Writing a Dynamic Storage Allocator Due: Thursday May 27, 11:59PM

Lecture 3. Essential skills for bioinformatics: Unix/Linux

Appendix A GLOSSARY. SYS-ED/ Computer Education Techniques, Inc.

How to install and build an application

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Software Concepts. It is a translator that converts high level language to machine level language.

Introduction to LLVM. UG3 Compiling Techniques Autumn 2018

Using BigSim to Estimate Application Performance

Simulink S-Function for RT-LAB Document 1b Creation of a S-Function From C Code and Protection of the Source Code Version 1.2

User s Manual CONTENT. Nano NAS Server for USB storages. 1. Product Information Product Specifications System requirements..

CS197U: A Hands on Introduction to Unix

Web Server (IIS 6) ArcGIS Server 9.1. ArcGIS Server 9.1 Server Object Manager. Server Object Container

Saleae Device SDK Starting a Device SDK Project on Windows Starting a Device SDK Project on Linux... 7

Error The Process Exit Code Was 128. The Step

For 100% Result Oriented IGNOU Coaching and Project Training Call CPD: ,

Time Left. sec(s) Quiz Start Time: 12:13 AM. Question # 5 of 10 ( Start time: 12:18:29 AM ) Total Marks: 1

BusinessObjects Enterprise / Crystal Reports Server XI R1 and R2

Apollo Project. Beta Version. User Guide. Nanometrics Inc. Kanata, Ontario Canada

Lesson 09: SD Card Interface

How to install and build an application

OrgPublisher 10 Architecture Overview

CS Lab 1 xv6 Introduction Setup and exercise

File System. Preview. File Name. File Structure. File Types. File Structure. Three essential requirements for long term information storage

h5perf_serial, a Serial File System Benchmarking Tool

CS 261 Recitation 1 Compiling C on UNIX

Python for Astronomers. Week 1- Basic Python

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.

EMS Installation. Workstation Requirements CHAPTER. EMS Lite (Windows 95/98) EMS NT (Windows NT 4.0)

Computer Systems Laboratory Sungkyunkwan University

Vaango Installation Guide

Tagprint 3.0. Single User and Network Installation Instructions

Full file at

MS6174 Data Adaptor (USB type) Software Manual

TangeloHub Documentation

MICROSOFT WINDOWS GENERAL INFORMATION ABOUT FILE EXTENSIONS

Performance Tools for Technical Computing

CSE 344: Section 1 Git Setup for HW Introduction to SQLite

Ingo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA Copyright 2003, SAS Institute Inc. All rights reserved.

Ftp Get Command Line Windows 7 Boot Into >>>CLICK HERE<<<

MRCP. Installation Manual. Developer Guide. Powered by Universal Speech Solutions LLC

How to install and build an application

The FreeBSD Package Cluster

Hardware and Software - Revision Summary

ECE 598 Advanced Operating Systems Lecture 14

Aventail Advanced Reporting Installation Instructions

Galigeo for Cognos HTML5 Installation Guide - G18.0

HW2d Project Addenda

Introduction to Process in Computing Systems SEEM

STORAGE MADE EASY: S3 DRIVE & S3 EXPLORER

Long-term Information Storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple proces

Shell Programming Overview

Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose

CROWDCOIN MASTERNODE SETUP COLD WALLET ON WINDOWS WITH LINUX VPS

Padaco Instruction Manual

Perform Backup and Restore

Transcription:

PatternFinder is a tool that finds non-overlapping or overlapping patterns in any input sequence. Pattern Finder Input Parameters: USAGE: PatternDetective.exe [ -help /? -f [filename] -min -max [minimum pattern length] [maximum pattern length] -c -threads [number of threads] -mem [memory limit in MB] -ram -hd -i [minimum times a pattern has to occur in order to keep track of it] -v [verbosity level] -n -o -his -lr -hr [hd files] [low range pattern search] [high range pattern search] Options: -help Displays this help page /? Displays this help page -f [string] Sets file name to be processed

-min [unsigned long] Sets the minimum pattern length to be searched -max [unsigned long] Sets the maximum pattern length to be searched -c Finds the best threading scheme for computer -threads [unsigned int] Sets thread count to be used -mem [unsigned long] Sets the maximum RAM memory that can be used in MB -ram -hd Forces program to use only RAM Forces program to use Hard Disk based on -mem -i [unsigned long] Minimum occurrences to consider a pattern (Default occurences will be 2) -v [unsigned long] Verbosity level, turn logging and pattern generation on or off with 1 or 0 -n Non overlapping pattern search -o Overlapping pattern search, this is set by default -his HD processing file history keeps or removes files level by level with a 1 or a 0 -lr -hr Search for patterns that begin with the value lr to 255 if hr isn't set, otherwise lr to hr range Search for patterns that begin with the value hr to 0 if lr isn't set, otherwise lr to hr range

How to build PatternFinder: PREREQS: cmake version 2.5 or higher c++11 compatible compiler python 2.7 to run parallel serial jobs Visual Studio 2012 or 2015 for building with Windows download repo using https address https://github.com/octopusprime314/patterndetective.git or use git and ssh using address git@github.com:octopusprime314/patterndetective.git BUILD INSTRUCTIONS:!!!!!!!!!!!ALWAYS BUILD IN RELEASE UNLESS DEBUGGING CODE!!!!!!!!!! Linux: create a build folder at root directory cd into build cmake -D CMAKE_BUILD_TYPE=Release -G "Unix Makefiles".. cmake --build. Windows: create a build folder at root directory cd into build cmake -G "Visual Studio 11 2012 Win64".. OR cmake -G "visual Studio 14 2015 Win64".. cmake --build. --config Release

How to run PatternFinder as a standalone executable: LOCATION OF FILES TO BE PROCESSED: Place your file to be processed in the Database/Data folder EXAMPLE USES OF PATTERNFINDER: 1)./PatternFinder f Database v 1 threads 4 ram Pattern searches all files recursively in directory using DRAM with 4 threads 2)./PatternFinder f TaleOfTwoCities.txt v 1 c -ram Finds the most optimal thread usage for processing a file 3)./PatternFinder f TaleOfTwoCities.txt v 1 Processes file using memory prediction per level for HD or DRAM processing 4)./PatternFinder f TaleOfTwoCities.txt v 1 mem 1000 Processes file using memory prediction per level for HD or DRAM processing with a memory constraint of 1 GB 5)./PatternFinder f TaleOfTwoCities.txt min 5 max 100 Finds patterns of length 5 to 100 and then terminates processing 6)./PatternFinder f Boosh.avi -n Processes file using non overlapping processing 7)./PatternFinder f TaleOfTwoCities.txt v 1 -hd Processes file using the hard disk only. 8)./PatternFinder f Boosh.avi -o 10 Processes patterns that occur at least 10 times or more. Default is 2.

How to run PatternFinder Python Scripts: PYTHON RUN EXAMPLES: 1) python splitfileforprocessing.py [file path] [number of chunks] Use splitfileforprocessing.py Python script to split files into chunks and run multiple instances of PatternFinder on those chunks Ex. python splitfileforprocessing.py ~/Github/PatternDetective/Database/Data/TaleOfTwoCities.txt 4 equally splits up TaleOfTwoCities.txt into 4 files and 4 instances of PatternFinder get dispatched each processing one of the split up files. 2) python segmentrootprocessing.py [file path] [number of jobs] [threads per job] Use segmentrootprocessing.py Python script splits up PatternFinder jobs to search for patterns starting with a certain value Ex. python segmentedrootprocessing.py../database/data/boosh.avi 4 4 Dispatches 4 processes equipped with 4 threads each. Each PatternFinder will only look for patterns starting with the byte representation of 0-63, 64-127, 128-191, 192-255.

PatternFinder Input Files: PatternFinder accepts any type of input file because it processes at the byte level. PatternFinder Output Files: Eight outputs are available. One is a general logger using ascii text format and the remaining seven are Comma Separated Variable files used for post processing in MATLAB. 1) Logger file: records general information including the most common patterns, number of time a pattern occurs and the pattern s coverage at every level until the last pattern is found. Simple text file. 2) Collective Pattern Data file: records each level s most common pattern and number of times the pattern occurs in CSV format. 3) File Processing Time: records each file s processing time in CSV format. Used for processing large data sets with many files. 4) File Coverage: records the most common pattern s coverage of the file in CSV format. 5) File Size Processing Time file: records each file s processing time and corresponding size in CSV format. Used primarily to isolate files in a large dataset that contain large patterns. 6) Thread Throughput: records the processing throughput improvement while incrementing the number of processing threads in CSV format. Typically used with -c option which tests threads in multiples of 2 starting at 1 until the number of cores on the machine has been met. 7) Thread Speed: records the processing time taken while incrementing the number of processing threads in CSV format. Typically used with -c option which tests threads in multiples of 2 starting at 1 until the number of cores on the machine has been met. Logger file OUTPUT: Level 1 count is 42 with most common pattern being: "0" occured 3735172 and coverage was 0.00509581% Level 2 count is 10752 with most common pattern being: "00" occured 257717 and coverage was 0.000703194% Level 3 count is 2752331 with most common pattern being: "00d" occured 241001 and coverage was 0.000986376% Level 4 count is 12705452 with most common pattern being: "00dc" occured 240914 and coverage was 0.00131469% Level 5 count is 677035 with most common pattern being: "00dc " occured 120019 and coverage was 0.000818695%

PatternFinder post processing scripts using the seven available CSV outputs with MATLAB: 1) DRAM versus HD Processing Speeds->DRAMtoHDProcessingLiminationSpeeds.m 2) DRAM versus HD Performance->DRAMVsHardDiskPerformance.m 3) Most Common Pattern versus Coverage->MostCommonPatternLengthVsCoveragePercentage.m 4) Overlapping versus Non Overlapping Comparison->Overlapping_NonOverlappingComparison.m 5) Overlapping versus Non Overlapping File Speeds->OverlappingVsNonOverlappingFileSpeeds.m 6) Process Time versus File Size->ProcessTimeVsFileSize.m