A Guide to Condor. Joe Antognini. October 25, Condor is on Our Network What is an Our Network?

Similar documents
Introduction to Programming

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

For Volunteers An Elvanto Guide

DIRECTV Message Board

It s possible to get your inbox to zero and keep it there, even if you get hundreds of s a day.

CS125 : Introduction to Computer Science. Lecture Notes #11 Procedural Composition and Abstraction. c 2005, 2004 Jason Zych

Your . A setup guide. Last updated March 7, Kingsford Avenue, Glasgow G44 3EU

1 Installation (briefly)

An Introduction to Cluster Computing Using Newton

_APP A_541_10/31/06. Appendix A. Backing Up Your Project Files

The name of our class will be Yo. Type that in where it says Class Name. Don t hit the OK button yet.

CSC209. Software Tools and Systems Programming.

Unit 9 Tech savvy? Tech support. 1 I have no idea why... Lesson A. A Unscramble the questions. Do you know which battery I should buy?

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change

Within Kodi you can add additional programs called addons. Each of these addons provides access to lots of different types of video content.

9 R1 Get another piece of paper. We re going to have fun keeping track of (inaudible). Um How much time do you have? Are you getting tired?

How to Get Your Inbox to Zero Every Day

CSC209. Software Tools and Systems Programming.

2016 All Rights Reserved

Lutheran High North Technology The Finder

ICANN Start, Episode 1: Redirection and Wildcarding. Welcome to ICANN Start. This is the show about one issue, five questions:

Welcome to the world of .

A Brief Introduction to the Linux Shell for Data Science

Excel programmers develop two basic types of spreadsheets: spreadsheets

Section 0.3 The Order of Operations

Matlab for FMRI Module 1: the basics Instructor: Luis Hernandez-Garcia

: Intro Programming for Scientists and Engineers Assignment 1: Turtle Graphics

I/O and Shell Scripting

How To Get Your Word Document. Ready For Your Editor

1 Jane s dress is... yours. A the same than B the same to C similar than D similar to

Part II Composition of Functions

CSCI 1100L: Topics in Computing Lab Lab 1: Introduction to the Lab! Part I

Yammer Product Manager Homework: LinkedІn Endorsements

ECE 574 Cluster Computing Lecture 4

Assignment 3, Due October 4

Taskbar: Working with Several Windows at Once

Close Your File Template

5 R1 The one green in the same place so either of these could be green.

Unix Tutorial Haverford Astronomy 2014/2015

Table Of Contents. 1. Zoo Information a. Logging in b. Transferring files 2. Unix Basics 3. Homework Commands

Practice CS106B Midterm Solutions

Trombone players produce different pitches partly by varying the length of a tube.

Hello World! Computer Programming for Kids and Other Beginners. Chapter 1. by Warren Sande and Carter Sande. Copyright 2009 Manning Publications

Study Guide Processes & Job Control

By Jonathan Leger. AdSense Gold - It s time to start cashing in

DESIGN YOUR OWN BUSINESS CARDS

Where The Objects Roam

CS354 gdb Tutorial Written by Chris Feilbach

Web Host. Choosing a. for Your WordPress Site. What is web hosting, and why do you need it?

Welcome Back! Without further delay, let s get started! First Things First. If you haven t done it already, download Turbo Lister from ebay.

Lab #2 Physics 91SI Spring 2013

CMO Briefing Google+:

Frequently Asked Questions about the NDIS

Introduction to remote command line Linux. Research Computing Team University of Birmingham

Subversion was not there a minute ago. Then I went through a couple of menus and eventually it showed up. Why is it there sometimes and sometimes not?

ENCM 339 Fall 2017: Editing and Running Programs in the Lab

EPISODE 23: HOW TO GET STARTED WITH MAILCHIMP

What You Need to Know When Buying a New Computer JackaboutComputers.com

SharePoint 2010 Site Owner s Manual by Yvonne M. Harryman

Company System Administrator (CSA) User Guide

Bishop Blanchet Intranet Documentation

The Unix Shell & Shell Scripts

Click on a link below for additional information.

9.2 Linux Essentials Exam Objectives

1.7 Limit of a Function

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

Copyright All rights reserved worldwide.

Accounts and Passwords

Getting Started. Excerpted from Hello World! Computer Programming for Kids and Other Beginners

Robert Ragan s TOP 3

Sucuri Webinar Q&A HOW TO IDENTIFY AND FIX A HACKED WORDPRESS WEBSITE. Ben Martin - Remediation Team Lead

Speed Up Windows by Disabling Startup Programs

PROBLEM SOLVING 11. July 24, 2012

1 GSW Bridging and Switching

Troubleshooting and Tips

Text Input and Conditionals

18 Final Submission and Essay

Project 1 Balanced binary

CMSC 201 Fall 2016 Lab 09 Advanced Debugging

Web Hosting. Important features to consider

Getting started with social media and comping

LeakDAS Version 4 The Complete Guide

Learn Linux in a Month of Lunches by Steven Ovadia

beyond the install 10 Things you should do after you install WordPress by Terri Orlowski beyond the office

Simple Shell Scripting for Scientists

Bonus Chapter: Going Live. Lesson One: Check Your Site

Enter the site Title: Student Name s eportfolio Choose your Website Domain: Use a Subdomain of Weebly.com

Spam. Time: five years from now Place: England

IT 220 Course Notes. Don Colton Brigham Young University Hawaii

CS451 - Assignment 8 Faster Naive Bayes? Say it ain t so...

APPENDIX B. Fortran Hints

Verifying Cache Coherence in ACL2. Ben Selfridge Oracle / UT Austin

Using the Zoo Workstations

2) Craigslist s homepage and about Craigslist Craigslist has many sections which are listed on their homepage:

COMP2100/2500 Lecture 17: Shell Programming II

(RAPID) Landing Page Building. A Practical Guide Presented by Thrive Themes

A NETWORK PRIMER. An introduction to some fundamental networking concepts and the benefits of using LANtastic.

How to Configure Outlook 2016 to connect to Exchange 2010

STAT 625: Statistical Case Studies

C Pointers 2013 Author Riko H i

Transcription:

A Guide to Condor Joe Antognini October 25, 2013 1 Condor is on Our Network What is an Our Network? The computers in the OSU astronomy department are all networked together. In fact, they re networked together in the most intimate of ways. You will find that if you unplug your ethernet cable your computer will not work so good. Things that you take for granted on your computer will not work. Things like ls. So don t unplug your computer. One of the consequences of this is that you can see the files on everyone s computer on the network. Just go to /home and you ll see all the computers on the network. If you have files on your computer that you don t want anyone to see, remember to change your permissions appropriately. (chmod 600 foo) Another consequence of this is that you can harness the power of anyone else s machine. All you have to do is ssh into their computer. Then you can run your favorite programs using their CPU. This is useful if they have a more powerful computer than you or you are already using all of your cores. (Though if you have more than a few jobs to run, you really should use Condor, which we ll get to in a bit.) It s good form to ask the owner if you can use their computer if they re going to be using it while your job is running. After all, you don t want your jobs to interfere with their ability to watch cat videos on YouTube. Moreover, if you re going to use their computer, remember to use nice -n 19 foo program. This will set your job to lowest priority so that it doesn t interfere with anything they re running. If you don t like the owner of the computer you re using, you may be tempted to use nice to set the priority of your job to be as high as possible so that the owner can t use his or her computer at all. But as it turns out you can t set the priority to anything higher than the owner s default priority unless you re root. So you ll just have to politely ask David Will for the password to root. One other thing to remember is that nice only limits your CPU usage. Some jobs don t use a lot of processing power, but spend a lot of time shuffling bits around the computers and are therefore I/O limited. If this is the case, running nice won t prevent you from slowing down your nice friend s computer. If you believe that your job will involve a lot of I/O, preface the command with ionice -c 0 foo program. 1

2 Who Is This Condor and What Is He Doing to My Computer? ssh ing into other people s computers to run your jobs is all well and good certainly better than running them all on your computer if you have a lot. But there s a better way the Condor Way. Condor takes advantage of all the computers in the network to run large batches of computing jobs. All you do is submit all the jobs you want run to Condor, and then Condor will automagically distribute them to all the computers on the network. The jobs will then run on the spare CPU cycles of those computers. 2.1 Condor is opt-in By default, your computer will not be on the Condor network you have to optin. You should do that. If you have large batches of jobs to run, Condor will help immensely. If you don t plan on running large batches of jobs, you should do it anyway for the benefit of those who do. By design, putting your computer on the Condor network should have no adverse consequences. Some people will claim that Condor slows down your computer. I think they re full of it. Having sixty Firefox tabs open with YouTube videos will make your computer slow. 1 The only thing Condor did was give them something to blame their slow computer on. But even if you re inclined to believe them, put your computer on the network anyway and see for yourself. If you think your computer is running unbearably slow, you can always ask David Will to remove you from the network later. (Don t feel guilty about it it s easy for him to do that.) 2.2 Is Condor for you? You will find Condor useful if you have a program you have to run a large number of times ( 10), maybe with different arguments on each run. In order for this whole Condor thing to work, your program has to satisfy two constraints: (1) The program has to run independently on everyone else s computer, and (2) the program has to print all its output to stdout or stderr. To unpack these two constraints a bit, before submitting a batch of jobs to Condor, make sure that your program can run on everyone else s machine independently. This just means that your program shouldn t depend on any libraries or files that exist only on your own machine. If you have 100 jobs running across the network that are all calling for a file on your computer, your computer will be slow and none of the jobs will run quickly. For a similar reason, make sure that your program is not writing its output directly to a file. Instead print your program s output to stdout. If you have 100 jobs across the network which are all trying to write files to your computer, none of them will be able to do so efficiently and your jobs will all run super slowly. Condor works best by dealing with stdout and stderr. Condor will 1 Seriously, though, Firefox has a memory leak, so you ll benefit from closing it occasionally. 2

collect all the output to stdout and save it up and then write it to your computer in batches so that the jobs aren t slowed down by waiting for your computer to write data. 3 How to Use Condor for Fun,???, and Profit So you re convinced. You see you and Condor living together happily ever after. More specifically, you have four thousand jobs you need to run before tomorrow. What do? 3.1 Submitting individual jobs Each job is submitted individually to Condor. So the first thing you have to know to submit large batches to Condor is how to submit an individual job. The heart of the Condor submission process is the Condor submit file. A Condor submit file is a text file that looks something like this: Executable = program_foo Requirements = (OpSys == "LINUX" && Arch == "X86_64") Rank = Machine == "milkyway.astronomy.osu.edu" universe = vanilla arguments = --foo 123 --bar 456 --baz 789 output = foo.dat error = foo.err log = foo.log queue 1 Okay, let s go over this. The first line ( Executable ) just says what program you want to run. Don t put arguments here. Save those for later. One thing to note here is that this program is going to run on someone else s computer. (Which is probably what you re hoping!) So make sure that your program doesn t require libraries that are only on your machine. Be sure that your program can run independently on everyone else s machines. The Requirements line specifies that Condor should only put your jobs on computers running Linux and 64-bit machines. If you want, you can edit this so that your jobs will only run on 32-bit machines or on both. (Though a lot of programs will only run on one kind of architecture so test that out before submitting all your jobs.) There s not really much point in including 32-bit machines, though, since they don t have a lot of computing power on our network. The only reason you might want to do it is if you don t have a whole lot of jobs you want to run (so you re not computing-power limited) and you do your code development on a 32-bit machine. The Rank line lists machines that Condor should give priority to. Some computers on the network are more powerful than others, so you may want to have Condor preferentially put your jobs on certain computers. I ve put 3

Stanek s machine on here just as an example, but you can change this to whatever computers you want. You can also put multiple computers on this line by separating the computers addresses like this: Rank = Machine == "foo.astronomy.osu.edu" Machine == "bar.astronomy.osu.edu" The universe line isn t important. Do not pay attention to the man behind the curtain. 2 On the arguments line, just write down any arguments that you would supply to your program. The Output line specifies the file to which Condor will write data printed to stdout. Similarly, the error line specifies the file to which Condor will write data printed to stderr. Finally, Condor itself will generate a log file which will contain information like when the job was submitted, when it started running, when it finished, etc. The log line specifies the file to which Condor will write this log. The last line is the queue line. This tells Condor how many times to run this job. If you want to run the exact same program 100 times, change this line to queue 100. So that s the Condor submit file! Suppose you have saved it to a file called C Submit. Now to submit this job to Condor, all you have to do is run condor submit C Submit. 3.2 Submitting many jobs So what if you want to submit a whole bunch of jobs with different arguments? Well, all you have to do is submit a whole bunch of individual jobs many times. The easiest way to do this is through a bash script. Suppose you wanted to run a program called program foo 100 times with an argument --bar x where x varies from 1 to 100. Then you would write a bash script which generates the appropriate C Submit file and then submits it to Condor. It should look something like this: #! /bin/bash i=1 iend=100 while [ $i -lt $iend ]; do echo "Executable = program_foo" > C_Submit echo Requirements = (OpSys == "LINUX" && Arch == "X86_64") >> C_Submit echo Rank = Machine == "milkyway.astronomy.osu.edu" >> C_Submit 2 There are fancier versions of Condor that can do much cooler things. But it s a pain for David Will to install, so he hasn t done it. The vanilla universe just specifies that we are using the most basic version of Condor. But if enough people start using Condor, we might be able to convince David Will that it s worth his time to upgrade to a cooler version of Condor! 4

echo "universe = vanilla" >> C_Submit echo "arguments = --bar "$i >> C_Submit echo "output = foo_"$i".dat" >> C_Submit echo "error = foo_"$i".err" >> C_Submit echo "log = foo_"$i".log" >> C_Submit echo "queue 1" >> C_Submit condor_submit C_Submit sleep.2 let i++ done One note here is that I ve added a sleep command. If you try to submit jobs to condor too rapidly, Condor can sometimes get confused. 3.3 A more elegant way of doing something similar If your index runs from 0 to some number n (say, 500), you don t have to explicitly write a loop. Instead, you can just submit a single Condor script which looks like this: Executable = program_foo Requirements = (OpSys == "LINUX" && Arch == "X86_64") Rank = Machine == "milkyway.astronomy.osu.edu" universe = vanilla arguments = --bar $(Process) output = foo_$(process).dat error = foo_$(process).err log = foo_$(process).log queue 500 The queue line tells Condor to run this process 500 times, and $(Process) is a variable which runs from 0 to 499. 4 Some other loose ends So that s it! There are a few other things that will be useful to know. You may want to check on the status of your jobs and on the status of the Condor network in general. The two commands that will be most useful are condor q and condor status. The condor status command lists all the computers on the Condor network and says whether they re running a job or not. The other command is condor q which will list all the jobs which have been submitted to Condor. You may be only interested in checking to see how your jobs are getting along. In that case, 5

type condor q username to just display your own jobs. If you don t want to see every individual job, but just the total number you have left, pipe the output to tail -1. Another command you might use on occasion is condor run. If you have just job you want to run on Condor (say, foo.sh), all you have to do is condor run foo.sh. This can be useful if you want to test something out and you don t want to go to all the trouble of making a C Submit file. Finally, if you realize you ve made a mistake and you want to get rid of your jobs, just type condor rm username. This will remove all of your jobs. You can also remove individual jobs by typing condor rm Condor ID. 4.1 Debugging Sometimes you may submit a bunch of jobs only to find them all sitting idle. To see what s going wrong, the command condor q -analyze is inordinately useful. In all probability, what has happened is that you have made a typo in the requirements section of your C Submit file such that none of the computers on the network fulfill your requirement. The condor q -analyze command will suggest which requirements to change. 4.2 I/O heavy jobs If your job involves a lot of I/O, the people will start to become restless and grumble. Condor only prevents your jobs from using too much CPU time. It won t do anything to prevent someone s computer from reading and writing a lot of data. If your job is I/O limited, you will slow down people s computers and people will hate you more than they already do. To get around this, change your executable to ionice and put your own program as an argument. As you might guess, ionice is like nice, but for I/O operations instead of CPU cycles. It has a different priority system though, so read the documentation. 4.3 A last note on long jobs If your job takes longer than a day to run, you may run into problems. Users who haven t run jobs in a while ( 1 day) are given priority. If your job takes longer than a day, you might find your job booted to make room for a newer user. This is generally not too big a deal since your job will start running again once the new user s job has finished running. But if two people both want to run jobs that take longer than a day, they ll alternately get kicked off to make room for each other and no one s jobs will get done. In those situations you should talk to the other person and come to some agreement in the Department Thunderdome. 6

5 Acknowledgements Sadly, I was so tired when writing a draft of this document that Ben Shappee managed to improve it. Rubab Khan also told me it would be a good idea to talk about condor run. So I did. The End. 7