COSC 6385 Computer Architecture - Homework Fall 2008 1 st Assignment Rules Each team should deliver Source code (.c,.h and Makefiles files) Please: no.o files and no executables! Documentation (.pdf,.doc,.tex or.txt file) Deliver electronically to gabriel@cs.uh.edu Expected by Monday, October 13, 11.59pm In case of questions: ask the TAs first, if they don t know the answer, they will ask me. Ask early, not the day before the submission is due 1
About the Project Given the source code for sequential image segmentation code ( File cosc6385-hw.tar.gz). You can open the archive with tar xzvf cosc6385-hw.tar.gz The archive contains the among others following files Makefile /* To compile everything on Linux/Unix */ main.c /* The only file that you have to modify! */ OutputStack.conf /* A configuration file to be used for the hw */ OutputStack.raw /* the raw image */ About the Project A sequential code performing image segmentation for a multi-spectral image Code provided by Shishir Shah Input file: a flat image ( no compression ) a configuration file Start the application by Compiling: just type make Run: allocate a node (see later in the lecture) type:./multiscalegabor OutputStack.conf 2
Configuration file OutputStack.raw //name of the image 1040 // image height 1392 // image width 3 // no. of segments to be created 1 //smoothing flag 0:no smoothing, 1:smoothing 0 //write texture information to file 0:no 1:yes 0 //fftw flag 0:FFTW_ESTIMATE, 1:FFTW_MEASURE 4 //no. of channels, e.g. bw:1 color:3 The source code The source code contains of the following parts: Domain 1: Perform I/O operations (e.g. read image, write texture information, write result of segmentation ) Step 4, 6c and 11 Domain 2: Create a filter bank of gabor filters Step 5 Domain 3: Perform a convolution operation of each filter on the image Steps 2, 6a, 6b, and 6d Domain 4: Determine texture statistics and clustering Step 8 and 9 Domain 5: Perform spatial smoothing on the labels Step 10 3
Part 1: Instrument the main file main.c in order to use hardware performance counters to determine the behavior of each Domain described on the previous page separately. The hardware performance counters should be based on the PAPI library, and you could monitor the following values: Level 2 Cache Hits and Level 2 Cache misses Number of floating point operations and integer instructions Floating point performance Whether you can access these values will depend on the processor you are really using! Please note that counter values might overflow, and PAPI can handle that, you have to include however special functions calls for that. Part 2: Run the modified code on the shark cluster. Generate graphs for at least 3 PAPI hardware counters showing the values for each Domain separately.. Please document (you can use PAPI to figure many of these things out!) : Processor type, frequency Operating System (as precisely as possible) Cache sizes Each team has a single account 4
Part 3 ( only for the two-person teams!) Generate an estimate of the cache usage of the original code (without PAPI calls in it) using the valgrind toolkit with cachegrind, e.g. valgrind tool=cachegrind./multiscalegabor OutputStack.conf If possible, compare the data produced by valgrind to the data obtained with PAPI Note: the execution of the application using valgrind/cachegrind will be significantly slower than without it! Notes The PAPI version installed on shark is 3.6.0 On the front-end node you can find tons ton s of examples in C and Fortran on how to use PAPI in /opt/papi-3.6.0/share/examples/. E.g. src/ctests/avail.c -> how to check on a processor whether a counter is available src/ctests/high_level.c -> how to use the high-level API of PAPI src/ctests/memory.c -> how to extract information of the memory subsystem (e.g. cache sizes) 5
1 st Assignment The Documentation should contain (Brief) Problem description Solution strategy Results section Description of resources used Description of measurements performed Results (graphs + findings) 1 st Assignment The document should not contain Replication of the entire source code that s why you have to deliver the sources Screen shots of every single measurement you made Actually, no screen shots at all. The slurm output files 6
How to use a cluster A cluster usually consists of a front-end node and compute nodes You can login to the front end node using ssh (from windows or linux machines) using the login name and the password assigned to you. The front end node is supposed to be there for editing, and compiling - not for running jobs! If 40 teams would run their jobs on the same processor, everything would stall!!!!! To allocate a node for interactive development: teamxy@shark:~>salloc N 1 bash teamxy@shark:~>squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 489 calc smith R 0:02 1 shark08 teamxy@shark:~> ssh shark08 How to use a cluster (II) Once your code is correct and you would like to do the measurements: You have to submit a batch job The command you need is sbatch, e.g. sbatch N 1./ImageAnalysis.sh Your job goes into a queue, and will be executed as soon as a node is available. You can check the status of your job with sqeueu 7
How to use a cluster (III) The output of squeue gives you a job-id for your job Once your job finishes, you will have a file called slurm-<jobid>.out in your home directory, which contains all the output of your printf statements etc. Note the batch script used for the job submission (e.g. ImageAnalysis.sh) has to be executable. This means, that after you downloaded it from the webpage and copied it to shark, you have to type chmod +x ImageAnalysis.sh Please do not edit the ImageAnalysis.sh file on MS Windows. Windows does not add the UNIX EOF markers, and this confuses slurm when reading the file. Notes PAPI Project webpage: http://icl.cs.utk.edu/papi PAPI Programmer s guide: http://icl.cs.utk.edu/projects/papi/files/documentation/papi_prog_r ef.pdf PAPI User s guide: http://icl.cs.utk.edu/projects/papi/files/documentation/papi_user_gui DE_306.pdf If you need hints on how to use a UNIX/Linux machine through ssh: http://www.cs.uh.edu/~gabriel/cosc4397_s06/parco_08_introductionunix.pdf How to use a cluster such as shark http://pstl.cs.uh.edu/resources.html 8