RUNNING MOLECULAR DYNAMICS SIMULATIONS WITH CHARMM: A BRIEF TUTORIAL

RUNNING MOLECULAR DYNAMICS SIMULATIONS WITH CHARMM: A BRIEF TUTORIAL While you can probably write a reasonable program that carries out molecular dynamics (MD) simulations, it s sometimes more efficient to use existing MD packages that are optimized to run on distributed supercomputing clusters. In this example we will use the CHARMM (Chemistry at HARvard Molecular Mechanics) integrator and force fields (www.charmm.org) to simulate a protein in both an implicit, and explicit solvent. Before you get started, you will need to download and setup a couple of utilities: (A) Access to a local terminal. If you re running Linux or OSX, a terminal utility is already built into your operating system. However, if you re running Windows, you will need to download and install PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html) and WinSCP (https://winscp.net/eng/download.php). PuTTY will be used to SSH and run commands in the terminal, while WinSCP will be used to transfer files to-and-from various machines. (B) A user account on Knot. Knot is a supercomputing cluster located at the University of California, Santa Barbara. Your username on Knot will be tstc0#, where # represents your group number. For instance if you were assigned to group 7, your Knot username would be tstc07. To check this in OSX/Linux, type ssh yourusername@knot.cnsi.ucsb.edu p 43210. For Windows users, you ll need to setup a new PuTTY profile that specifies knot.cnsi.ucsb.edu as the server name, 43210 as the port number, and your unique username as the user. When prompted for a password, type in SummerCharm2015 (that s Charm with one m). If you re successfully able to login to Knot, go ahead and make a folder with your last name (i.e. mkdir mylastname ). For any subsequent work you perform on Knot, make sure to log in with your unique username, and only write to your own mylastname directory. For a more complete list of terminal commands, check out http://www.dummies.com/how-to/content/how-to-use-basic-unix-commands-to-work-interminal.html. (Don t be fooled by the domain name!) (C) VMD (Visual Molecular Dynamics). You can download VMD at http://www.ks.uiuc.edu/research/vmd/, however you will need to register for a free account. Installation directions are available at http://www.ks.uiuc.edu/research/vmd/current/ig/ig.html. A detailed user guide for VMD can be found at http://www.ks.uiuc.edu/research/vmd/current/ug.pdf. RUNNING MD WITH AN IMPLICIT SOLVENT. Here, we are going to simulate a 67-residue protein sequence that is found in the proto-oncogene tyrosine-protein kinase Fyn (PDB: 1NYG) in an implicit solvent. More information about this sequence can be found at http://www.rcsb.org/pdb/explore.do?structureid=1nyg. To help set this system up, we re going to enlist the help of CHARMM-GUI 1, which is a web-server dedicated to

setting up CHARMM simulations. (1.) Go to http://www.charmm-gui.org and click on 'Input Generator' to the left. Then click on 'Implicit Solvent Modeller'. This should take you to http://www.charmmgui.org/?doc=input/implicit (2.) Choose the 'EEF1' implicit solvent model, and then enter '1NYG' under Download PDB file. Click next. (3.) We won t bother modifying any segment IDs (SEGIDs), names, or residue numbers, so click next once again. (4.) Keep the default options here, and click next. (However, if you want to play around with the structure after the end of these exercises, feel free to mutate the structure at this step and compare it to the un-mutated protein structures.) (5.) CHARMM-GUI should have now produced the files you need. Download them (in *.tgz format) on the right-hand side. (6.) Go ahead and unzip the *.tgz file, and look through each of the files. Specifically, try opening step2_implicit.pdb in VMD by typing vmd step2_implicit.pdb (or in Windows, you ll need to open VMD separately, and manually load in the file). You should see a threedimensional rendering of the protein, which you can move around and rotate. Also, try playing around with the molecular representations by clicking on Graphics > Representations from the top menu bar. (7.) Upload your *.tgz file to knot by typing 'scp P 43210 -r charm-gui.tgz \ yourusername@knot.cnsi.ucsb.edu:/home/yourusername/yourlastname'. Make sure though that you ve already created the directory yourlastname before transferring files to it. (8.) Now, log into Knot by typing 'ssh yourusername@knot.cnsi.ucsb.edu -p 43210' if you re running OSX or Linux. If you re running on Windows, select the appropriate Knot profile in PuTTY and click connect. (9.) You should initially be located in your home directory when you log into Knot, so navigate to your mylastname directory once you log in, which should contain the tgz file you just uploaded. Unzip/Untar the file by typing tar zxvf charm-gui.tgz (or whatever name you chose for your tgz file), and then navigate to the directory it just created (usually called charmm-gui ). (10.) The first thing we ll want to do is to edit the step2_implicit.inp CHARMM input file. Go ahead and change the value of 'nstep' from 100 to 5000. Time is counted in units of dt, which is 2 femtoseconds here. Therefore, 5000*dt = 5000*0.002 fs = 10 ps. Because the input file will tell CHARMM to write to a trajectory every 500*dt (1 ps), that should give us 10 frames to work

with. (11.) Now that we re ready to run CHARMM, we need to do so on compute nodes that are optimized for computation, rather than on the login node that we re typing commands into. To submit our job to the Knot scheduler, copy /home/zlevine/submit to your working directory. This will be used to submit simulation jobs on Knot, and request distributed computing resources. (12.) After you copy the submit file to your working directory, edit the file and change the email address towards the top to your own email address (this will notify you when your job has begun running, or has completed.) Then, update the CHARMM_INPUT variable so that it has a value of 'step2_implicit.inp'. This will be the input file that CHARMM will run during submission. (13.) Submit your job by typing 'qsub submit'. You can check the status of your job by typing `qstat -u yourusername`. Because the job is queued, you can log out and log back in to check the status of your jobs at any time without losing your progress. (14.) When your job is done (i.e. when qstat u yourusername returns no running jobs), you should see a new trajectory file called run.dcd. Since the trajectory is written in binary, you can only see what is in this file by downloading run.dcd and the initial pdb file (step1_pdbreader.pdb) to your computer, and loading them into VMD ( vmd step1_pdbreader.pdb dcd run.dcd). Using the slider in the main VMD menu, you can cycle back and forth between difference trajectory frames, and see the protein wiggle in time. (15.) Another option to extract data from the trajectory file is to output its contents into a textreadable format. To do this, we can use another input script to load the trajectory into CHARMM, and output its contents to individual pdb files. Copy /home/zlevine/write_pdb.inp to your working directory, and run it by typing 'charmm < write_pdb.inp'. This should create 10 consecutively-numbered pdb files every 1 ps, from our 10 ns trajectory. (Note: This is one of the only times that we should directly call CHARMM from the Knot login node, since the computational requirements are small. Usually when users perform too much computation on the shared login node, the administrative gods punish you by temporarily disabling your account. So, try to keep computation on the login node to a minimum.) (16.) Finally, in order to analyze these pdb files, we can utilize scripts (written in, e.g., Python or Perl) to quickly extract the data we want. Go ahead and copy the perl file /home/zlevine/analysis.pl to your working directory (which contains your newly created pdb files). This file will parse through your pdb files and calculate various quantities from them. Once the script is copied, you can run it by typing `perl analysis.pl`. This will write the protein s end-to-end distance (Ree) and it s radius of gyration (Rg) to the files Ree_log and Rg_log, respectively. We will want to compare these values to those derived from explicitly solvated proteins in the next section, and see how similar (or dissimilar) they are from one another.

Reference(s): 1 Jo, S., Kim, T., Iyer, V. G. and Im, W. (2008), CHARMM-GUI: A web-based graphical user interface for CHARMM. J. Comput. Chem., 29: 1859 1865. doi: 10.1002/jcc.20945 RUNNING MD WITH AN EXPLICIT SOLVENT. This simulation will be similar to the implicit solvent section, but with the addition of discrete water molecules solvating the protein (versus a continuum of water.) Make sure you run these new simulations in a separate folder from the previous section, since you don t want to accidently write over files that you created earlier. An example might be creating the explicit folder in your yourlastname directory. (1.) As before, go to http://www.charmm-gui.org and click on 'Input Generator', but this time select 'Quick MD Simulator' on the left menu bar. This should take you to http://www.charmmgui.org/?doc=input/mdsetup. (2.) Enter in '1NYG' for the PDB file, then click next on the lower right hand side. (3.) As before, we don't need to update the segment ID or residue numbers, so go ahead and click next again. (4.) Moreover, we don't need to add anything exotic, so click next. (5.) Here, CHARMM-GUI allows us to download our files so far (as was the case for implicit water), but instead let's continue utilizing the web server so that it generates more content for us. You can choose varying box sizes (in angstroms) and electrolyte concentrations here, but let's stick with the default values for now. Click next. (Note that because CHARMM-GUI does a lot of the initial heavy lifting server-side, it may take some time to progress to the next step. Tip: using CHARMM-GUI at 'unpopular' times will significantly speed up computation from one step to another.) (6.) Eventually* you will get to the next step, where you can (once again) use the default values for invoking periodic boundary conditions. Click next here. (* you may start to see CHARMM- GUI slow down right about now. This is normal.) (7.) Now, unselect all of the output formats, except for 'CHARMM/OpenMM'. Keep the equilibration ensemble canonically-defined (NVT), and the dynamics ensemble set to NPT. This essentially equilibrates our water/protein system at constant volume. Afterwards, we will allow our box volume to relax, while a constant pressure (of 1 bar) is maintained in all directions. Click next. (8.) Finally, when we arrive at the final step from CHARMM-GUI, we can download our zipped

files in *.tgz format (on the right-hand side). This should contain all of the files you've generated so far. (9.) Unzip the *.tgz file, and take a look at the some of the files. In particular, load step3_pbcsetup.pdb into vmd by typing 'vmd step3_pbcsetup.pdb'. Notice the abundance of discrete water molecules, and the multiple ways you can interact with your pdb file (by changing, e.g., textures/color/viewpoints/spotlights/etc.) (10.) Now we need to upload these files to Knot. As before, navigate to the directory where your tgz file is located, and type 'scp -P 43210 -r charmm-gui/ yourusername@knot.cnsi.ucsb.edu:/home/yourusername/yourlastname/explicit' if you re running OSX or Linux. Similarly, use WinSCP if you re running in Windows. (I m assuming that you created a separate explicit directory for this part of the tutorial.) (11.) Log into Knot (in your terminal) by typing 'ssh yourusername@knot.cnsi.ucsb.edu -p 43210'. Or in PuTTY, select the Knot profile and connect. Navigate to the 'yourlastname' directory, where you just uploaded your files to. (12.) Unzip/Untar your file by typing ('tar -zxvf file.tgz'). Then navigate to the directory that you just created (most likely also named charmm-gui hence the importance of using a separate directory). CHARMM-GUI generated the appropriate run input (.inp) files for us, but we still need to run the actual simulations on a distributed, supercomputing cluster (like Knot). (13.) Edit (using 'nano' or 'VI') step4_equilibration.inp. Insert 'BOMLEV -6' after the initial header (marked by asterisks). This allows us to proceed, even if a certain number of errors are encountered (which only come up here because we are using a development version of CHARMM). In general though, it is not advisable to artificially ignore errors! Additionally, insert: open write unit 13 file name trajectory.dcd' immediately after: 'open write unit 12 card name step4_equilibration.rst' This will open a trajectory file that we will subsequently write to. Then, substitute 'iuncrd 13' in place of 'iuncrd -1', and 'nsavc 1000' for 'nsavc 0'. This will direct the coordinates to be written to the trajectory file (in namespace 13) that we just declared, and at a frequency of 1000*dt, or every 1 ps. (in this exercise, dt = 1 femtosecond) (14.) To run this simulation, copy /home/zlevine/submit to your CHARMM directory. Edit the file as you did with implicit water, and make sure that the CHARMM_INPUT variable is set to 'step4_equilibration.inp'. (15.) Submit your job by typing 'qsub submit'. You can check the status of your job by typing

`qstat -u yourusername`. Jobs can sometimes take time to queue and subsequently run, so you may have to wait 30-60 minutes for this step. However, because the job is queued, you can logout and check the status of your jobs at any time without losing your progress. (16.) When your job is finished, it should have produced a trajectory.dcd file. This file contains the time evolution of your simulation for 25 ns. (17.) Try downloading your trajectory file (and initial pdb file -- 'step3_pbcsetup.pdb') onto your personal computer. Then open the trajectory locally by typing 'vmd step3_pbcsetup.pdb -dcd trajectory.dcd'. You should be able to move the slider in the VMD control panel, and see the various water/protein structures move around. Notice the thermal fluctuations of individual water molecules. (18.) Back on Knot, copy the 'write_pdb.inp' file from the implicit example, and add in the 'BOMLEV -6' line after the initial header. Now, try and see if you can modify this script to now extract 25 ps of information (i.e. 25 frames) from trajectory.dcd. Note that you will have to change some input files that are hard-coded into the.inp file. (19.) Run the input file by typing 'charmm < write_pdb.inp' to extract pdb files every 1 ps. You should have 25 files in total. (20.) Copy /home/zlevine/analysis.pl and run it (as before) on the resulting PDB files. This will, once again, produce values for Ree and Rg. (21.) How do the values for Ree and Rg in explicit water compare with those derived from implicit water models?