Introducing legacy program scripting to molecular biology toolkit (MBT)

Similar documents
Introducing Legacy Program Scripting to Molecular Biology Toolkit (MBT)

Molecular viewer using Spiegel

Build your own languages with

JavaCC: SimpleExamples

Grammars and Parsing, second week

Programming Languages. Dr. Philip Cannata 1

Introduction to Programming Using Java (98-388)

Programming Languages. Dr. Philip Cannata 1

Pace University. Fundamental Concepts of CS121 1

CA Compiler Construction

An Object Oriented Programming with C

Structure Visualization with PyMOL

Wisconsin Science Olympiad Protein Folding Challenge. A Guide to Using RasMol for Exploring Protein Structure

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Outline. Top Down Parsing. SLL(1) Parsing. Where We Are 1/24/2013

Integrating database and data stream systems

Functional Parsing A Multi-Lingual Killer- Application

In this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely: abc and abcc

DOWNLOAD PDF CORE JAVA APTITUDE QUESTIONS AND ANSWERS

So, You Want to Test Your Compiler?

Java for Programmers Course (equivalent to SL 275) 36 Contact Hours

Chapter 2 A Quick Tour

PROGRAMMING FUNDAMENTALS

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Agenda CS121/IS223. Reminder. Object Declaration, Creation, Assignment. What is Going On? Variables in Java

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Today. Review. Unix as an OS case study Intro to Shell Scripting. What is an Operating System? What are its goals? How do we evaluate it?

MIT Top-Down Parsing. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

PyMOL is a molecular visualization tool. There are many such tools available both commercially and publicly available:

More on Objects in JAVA TM

Chapter 6 Introduction to Defining Classes

Topics in Object-Oriented Design Patterns

The beginning of this guide offers a brief introduction to the Protein Data Bank, where users can download structure files.

Myth- an extension to C

Java: Classes. An instance of a class is an object based on the class. Creation of an instance from a class is called instantiation.

Chapter 4. Abstract Syntax

CS121/IS223. Object Reference Variables. Dr Olly Gotel

COLLEGE OF IMAGING ARTS AND SCIENCES. Medical Illustration

8. Control statements

Question Points Score

static MM_Index snap(mm_index corect, MM_Index ligct, int imatch0, int *moleatoms, i

CSE 70 Final Exam Fall 2009

Designing Robust Classes

Outline. Java Models for variables Types and type checking, type safety Interpretation vs. compilation. Reasoning about code. CSCI 2600 Spring

Language Reference Manual simplicity

Chapter 4 Defining Classes I

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Exceptions. What exceptional things might our programs run in to?

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Internet Application Developer

CS107 Handout 37 Spring 2007 May 25, 2007 Introduction to Inheritance

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

Atelier Java - J1. Marwan Burelle. EPITA Première Année Cycle Ingénieur.

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

Murach s Beginning Java with Eclipse

CISC-124. Passing Parameters. A Java method cannot change the value of any of the arguments passed to its parameters.

C and C++ I. Spring 2014 Carola Wenk

Zhifu Pei CSCI5448 Spring 2011 Prof. Kenneth M. Anderson

Lecture 8: Deterministic Bottom-Up Parsing

Using R to provide statistical functionality for QSAR modeling in CDK

I BCS-031 BACHELOR OF COMPUTER APPLICATIONS (BCA) (Revised) Term-End Examination. June, 2015 BCS-031 : PROGRAMMING IN C ++

RE-USABLE COMPONENTS FOR STRUCTURAL

This course supports the assessment for Scripting and Programming Applications. The course covers 4 competencies and represents 4 competency units.

The NetRexx Interpreter

2 rd class Department of Programming. OOP with Java Programming

Lecture 7: Deterministic Bottom-Up Parsing

Java Programming Fundamentals

ICOM 4015 Advanced Programming Laboratory. Chapter 1 Introduction to Eclipse, Java and JUnit

Active Learning: Streams

Visualizing the inner structure of n-body data using skeletonization

Programming. Syntax and Semantics

Chapter 9. Exception Handling. Copyright 2016 Pearson Inc. All rights reserved.

S.E. Sem. III [CMPN] Object Oriented Programming Methodology

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

The Prickly Pear Archive: A Portable Hypermedia for Scholarly Publication

COP 3330 Final Exam Review

Parser Combinators 11/3/2003 IPT, ICS 1

5/3/2006. Today! HelloWorld in BlueJ. HelloWorld in BlueJ, Cont. HelloWorld in BlueJ, Cont. HelloWorld in BlueJ, Cont. HelloWorld in BlueJ, Cont.

Core Java Syllabus. Pre-requisite / Target Audience: C language skills (Good to Have)

Java Threads. COMP 585 Noteset #2 1

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

AutoDock Virtual Screening: Raccoon & Fox Tools

Interpreting Languages for the Java Platform

Design Patterns in Parsing

CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 3: SEP. 13TH INSTRUCTOR: JIAYIN WANG

Programming in C. main. Level 2. Level 2 Level 2. Level 3 Level 3

Lecture 2. Examples of Software. Programming and Data Structure. Programming Languages. Operating Systems. Sudeshna Sarkar

Compiler Construction D7011E

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

About this exam review

Mr. Monroe s Guide to Mastering Java Syntax

CS612 - Algorithms in Bioinformatics

BCS THE CHARTERED INSTITUTE FOR IT. BCS HIGHER EDUCATION QUALIFICATIONS BCS Level 5 Diploma in IT. Object Oriented Programming

Java How to Program, 10/e. Copyright by Pearson Education, Inc. All Rights Reserved.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Introduction to the Protein Data Bank Master Chimie Info Roland Stote Page #

PROGRAMMING LANGUAGE 2

CPS 506 Comparative Programming Languages. Syntax Specification

Towards OpenMP for Java

What are the characteristics of Object Oriented programming language?

Transcription:

Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 2008 Introducing legacy program scripting to molecular biology toolkit (MBT) Todd Newell Follow this and additional works at: http://scholarworks.rit.edu/theses Recommended Citation Newell, Todd, "Introducing legacy program scripting to molecular biology toolkit (MBT)" (2008). Thesis. Rochester Institute of Technology. Accessed from This Master's Project is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact ritscholarworks@rit.edu.

Introducing Legacy Program Scripting to Molecular Biology Toolkit (MBT) Graduate Project Defense for MS Computer Science Degree Todd A. Newell October 28, 2005

Table of Contents Introduction Description of Problem Chime, RasMol, Molecular Biology Toolkit (MBT) Goals Scripting Definition Background Present in Chime and RasMol, lacking in MBT Using MBT for Visualization How MBT is used (2D, 3D, text) Third-Party Applications Architecture JavaCC Tokens Grammars Token Manager and Parser LOOKAHEAD MBT Packages Classes StructureStyles and StructureMap Execution Future Work

Abstract Ever-changing landscape of molecular visualization requires a common thread Introduction of powerful visualization language, MBT, gives users a comprehensive collection of libraries with the freedom to develop unique applications Lacks scripting capabilities, however, something that older, popular languages such as Chime and RasMol have MBT lacks handholds for widespread acceptance without scripting This project will use JavaCC to parse legacy commands from Chime and RasMol into single, simple commands usable by MBT scripting Bridging to a user-centric and friendly method of use for endusers not entrenched in software development

Introduction Description of Problem Visualization software is always changing and it is tough for users to keep up These users are often chemists, biologist, and physicists instead of computer scientists Desire to change with the times comes at a price - users have to learn new application with potentially long learning curve Methods of use should be transferable Scripting.pdb Files

Chime, RasMol, and MBT Two popular, older languages have scripting abilities and are widely used Chime: free but incomplete version supplied, not open source. Chime: popular tool leads to many scripts being created RasMol: older and unsupported RasMol: free, open source, command list is subset of Chime s RasMol: also large number of scripts existing for application New toolkit, MBT, incorporates many of the benefits from Chime and RasMol, however not all of them Open source, free Toolkit library that needs applications to render images and interact with molecule Uses.pdb files to input compound data Chemical information listed for each atom.pdb files are the same wherever used

Sample.pdb File ATOM 1 OH OSP 1-0.332 0.440-0.016 1.00 0.00 ATOM 2 1HH OSP 1 0.596 0.456 0.228 1.00 0.00 ATOM 3 2HH OSP 1-0.596-0.456-0.228 1.00 0.00 TER 4 OSP 1 END Type of Component, Atom Serial Number, Atom Name, Coordinates of Atom Conforms to standards identical each time used Read by the viewer to manufacture initial view of the molecule.

MBT Requires additional application to run MBT s libraries arranged in hierarchy which allows only a few classes to affect molecular components StructureStyles.java have hooks into atoms, bonds, residues, fragments, and chains

Goals Last component needed is a parser to evaluate input languages JavaCC Allows us to bridge from older languages to the new, via scripting Goals This project was designed to create a parsing language that accepts scripting commands from two ancestor programs, checking them for correctness, and eventually using the commands to affect the chemical components in the modified molecular visualization toolkit

Scripting What is scripting? Classic method of execution there is no way to change variables or run methods after compiling without a portal into the program Graphical programs have pull-down menus Scripting allows users to make changes without the knowledge of target applications Easy to remember, accessible commands that run more complex pieces of code

Scripting Example Scripting command to select all components in a molecule: select all MBT method to select all components: public void selectall() { selection.clearall(); setatomselection( selection.getmin(), selection.getmax(), true ); bondselection.clearall(); setbondselection( bondselection.getmin(), bondselection.getmax(), true ); }

Importance of Scripting Chime and RasMol have scripting ability, while MBT does not It was the void we decided to fill commands that were used to run Chime and RasMol could be made to execute MBT methods In this way, users without a comprehensive background in programming can jump to a toolkit like MBT by reusing familiar commands Gives the user a simple method of interaction once application has been compiled and executed

Using MBT for Visualization MBT is a library and not a free-standing application Hierarchies are arranged to be intuitive Users can create applications to fit specific needs MBT can be used to render and display molecules either 2D or 3D using Java3D Also allows non-graphical manipulation of molecules loading the same.pdb files lets the user create a virtual representation and change parts of the components without ever seeing the molecule Scripting can act on either type of representation

Architecture JavaCC Designed to compile programming languages Can analyze any left-decent grammar Tokens are the lowest common denominator Consume tokens, process them, and analyze them for correctness Tokens act as reserve words that disallow their use elsewhere in the program

Architecture - JavaCC JavaCC User defines the grammar which JavaCC program uses to analyze Rules are added as needed making the grammar as flexible as needed Only requirement: grammar is non-ambiguous so no infinite loops

Architecture - JavaCC JavaCC Tokens can be anything regular expressions, keywords, or literals 2 distinct functions Token Manager separates commands into tokens (as defined by grammar whatever size) Parser examines the rules to try to match tokens, in order, to the functions defined in the program

Architecture - JavaCC JavaCC LOOKAHEAD used to notify the program of a way to avoid ambiguity By specifying number of tokens ahead of consumed token to check, parsing path directed around spots with similar command structures E.g.: LOOKAHEAD(2) color() two or more functions begin with the same token < COLOR >. Alert parser to check 2 tokens ahead of consumed token, instead of 1 Most of the program can operate in a processor efficient manner, only slowing down when meets with stumbling block

Architecture JavaCC / ChimeParser.jj PARSER_BEGIN( ChimeParser ) /* IMPORTS entire MBT library, each package, and any needed Java imports */ public class ChimeParser { /* global variable declaration must be static */ private static Explorer explorer = new Explorer(); public static void main( String args[] ) throws ParseException { explorer.initialize(); try { parser.cmdlist(); } catch ( ParseException e ) { System.out.println( e.getmessage() ); System.out.println ( "ChimeParser: Encountered errors during parse."); } } } PARSER_END( ChimeParser )

Architecture JavaCC / ChimeParser.jj TOKEN : /* command list */ { < ANIM : "anim" "animation" > < DIRECTION : "direction" > } void anim() : { String animstr; Token animtok, lasttok; } { animtok = < ANIM > animstr = animparser() { System.out.println( animtok.image + " " + animstr ); } [ lasttok = < SEMICOLON > { insertsemicolon(); } ] }

Architecture - MBT MBT - packages edu.sdsc.mbt: This package provides classes which define the core data storage containers for the MBT toolkit. StructureMap edu.sdsc.mbt.filters: This package provides classes which enable filtering or subsetting of a structure's constituent components. edu.sdsc.mbt.gui: This package provides classes which implement graphical user interface (GUI) component classes for MBT. edu.sdsc.mbt.io: This package provides classes which enable molecular biology data sets (such as protein structures, sequence data, etc) to be loaded into an MBT application as a Structure object. edu.sdsc.mbt.util: This package provides classes which implement extra functionality on top of the core MBT classes (that is, capabilities that are not absolutely required by, or would otherwise overcomplicate, the core classes). edu.sdsc.mbt.viewables: This package provides classes which enable molecular biology data to be represented as visible/renderable viewable objects, plus, this package defines a top-level StructureDocument object to encapsulate the complete state of these objects and properties. edu.sdsc.mbt.viewers:this package provides classes which enable data to be graphically visualized or to be processed by other means based upon coordinated events (that is, any MBT component that wishes to observe data and then respond and interact to changes in that data).

Architecture - MBT MBT Classes StructureStyles and StructureMap used the most when developing ChimeParser.jj Spans the hierarchy can make direct calls to the basic components as well as accessing collections StructureStyles contains most of the methods to make changes such as color, selection, and visibility Method calls directly from Explorer.java : structurestyles.setbondselection ( index, index, true ); StructureMaps edu.sdsc.mbt package and arranges the components after a Structure object is created Accessed whenever need to isolate a component, like the atoms, and check each one in turn: for( int index = 0; index < bondcount; index++ ) { Bond bond = structuremap.getbond( index ); }

Architecture - MBT Explorer.java The viewer chosen to render the molecule Not fully implemented, but was designed to make basic changes to the molecule Other viewers developed to access specific type of component or chemical interaction (i.e.: hydrogen bonds or visualizing single nucleotide polymorphisms (SNP) ) Additional methods added to support ChimeParser.jj hbonds( boolean selected) labels( boolean selected ) selectatoms( boolean selected) setbonds( boolean selected) / setallbonds( boolean selected) setatomcolorbyrgb( float[] colors ) / setbondcolorbyrgb( float[] colors ) updateatomcolors( StructureStyles ss, boolean selected ) / updatebondcolors( StructureStyles ss, boolean selected )

Execution ChimeParser.jj first point of execution, responsible for parsing input commands Running: % java ChimeParser Await input command to begin token manager i.e.: color list {listname} [color] {list of colors} broken down to defined tokens: <color>, <list>, <allowable string regular expression>, [<color>], (<valid color>)+

Execution ChimeParser.jj Parser was then applied to token stream to check for correctness Initializes an Explorer object when executes the main method Drops into the command functions, checking each for possible entrance Once finds matching first token, enters into that command s parsing tree (left-decent), comparing to operating grammar

Execution Program execution % javacc ChimeParser.jj Creates 7.java files: ChimeParser.java transformation of grammar definitions to Java code ChimeParserConstants.java ChimeParserTokenManager.java ParseException.java SimpleCharacterStream.java because using default options with ChimeParser.jj Token.java TokenMgrError.java % javac *.java compiles all.java in directory (including Explorer.java ) % java ChimeParser [ < ] [<command filename>] < : force commands from file to be taken individually from Standard.in <command filename>: file user wishes to parse with no load command, commands will echo back to user if correct % <commands user wishes to parse>

Future Work Implement full Chime and RasMol libraries Improve the depth of the chemistry Cross-department work to redesign effect of commands on MBT Expanding supported input languages PyMol amazing graphics, interface is lacking Use ChimeParser.jj as backend for any visualization language that conforms to grammar rules (non-ambiguous) Artificial Intelligence Protein active site anticipation and extrapolation Study protein interactions with hypothetical molecules Drug research: i.e.: G-protein coupled receptors Add scripting ability directly to MBT Eliminate ChimeParser.jj and JavaCC tokenless? Additional implementation techniques Web-based Databases

References 1. MDL Chime http://www.mdl.com 2. RasMol Home Page http://www.umass.edu/microbio/rasmol/ 3. Norvell, Theodore S. (2004). The JavaCC Faq. Retrieved from Memorial University of Newfoundland Web site: http://www.engr.mun.ca/~theo/javacc-faq/javacc-faq.pdf 4. JavaCC Tutorial http://www.cs.rit.edu/~jmg/javacc/ 5. The Molecular Biology Toolkit. J.L. Moreland, A.Gramada, O.V. Buzko and P.E. Bourne 2005 The Molecular Biology Toolkit (MBT): A Modular Platform for Developing Molecular Visualization Applications. BMC Bioinformatics, 6:21. Funded by Grant NIGMS 1P01GM63208. Web site: http://mbt.sdsc.edu/ 6. Kenekin, T. (2005). New Bull s-eyes For Drugs, Scientific American. October 2005, pp. 50-57.