Operating Systems, Assignment 2 Threads and Synchronization

Operating Systems, Assignment 2 Threads and Synchronization Responsible TA's: Zohar and Matan Assignment overview The assignment consists of the following parts: 1) Kernel-level threads package 2) Synchronization primitives: mutex, condition variable (Mesa and Hoare) 3) A monitor solution for a synchronization problem (of choosing grading slots) Task 0: Running xv6 As always, begin by downloading our revision of xv6, from the os152 repository: Open a shell, and traverse to the desired working directory. Execute the following command: git clone http://www.cs.bgu.ac.il/~os152/xv6_2.git This will create a new folder that will contain all project s files. Build xv6 by calling: make. Run xv6 on top of QEMU by calling: make qemu. Kernel level threads (KLT) Kernel level threads (KLT) are implemented and managed by the kernel. Our implementation of KLT will be based on a new struct you must define called "thread". One key characteristic of threads is that all threads of the same process share the same virtual memory space. You will need to decide (for each field in the proc struct) if it will stay in proc or will move to the new struct thread. Be prepared to explain your choices in the frontal check. Thread creation is very similar to process creation in the fork system call but requires less initialization, for example, there is no need to create a new memory space. 1) We added a file named kthread.h which contains some constants, as well as the prototypes of the KLT package API functions. Review this file before you start. 2) Read this whole part thoroughly before you start implementing. 3) Understand what each of the fields in proc struct means to the process. Task 1: Kernel threads To implement kernel threads as stated above you will have to initially transfer the entire system to work with the new thread struct. After doing so every process will have 1 initial thread running in it. Once this part is done you will need to create the system calls that will allow the user to create and manage new threads.

Task 1.1: Moving to threads This task includes passing over a lot of kernel code and understanding any access to the proc field. Once this is done, figure out how to replace those accesses appropriately in order to replace the proc field with a new thread field. Another important issue is understanding how to synchronize changes in process state in a multi-cpu environment. Add a constant to proc.h called NTHREAD = 16. This will be the maximum number of threads each process can hold. Every CPU holds specific data, this data can be accessed by using the same field name in the code which will then link to the data of the current running CPU. Looking at proc.h you can see 2 such fields: A pointer to the cpu struct that contains CPU information and another pointer to a proc struct which is how you access the current running process. You must change it so that the CPU will be able to access the thread struct of the currently running thread. The CPU knows the location of this data using the gs register so you need to set the memory address of the new thread field relative to the gs register of the current CPU. Below the cpu struct you can see how this is done to proc and cpu, try to explain why proc is at gs + 4. As you have seen in assignment 1, XV6 implements scheduling policy that goes over the list of processes and chooses the next RUNNABLE process in the array. You are expected to change the scheduling so it will run over all the available threads of a specific process before proceeding to the next process. This is similar to the original scheduling policy, just over threads. Handling a multi-cpu environment means that you will have to think about handling shared process information. For example: growproc is a function in proc.c which is responsible for retrieving more memory when the process asks for it. If not protected, 2 CPUs running threads of the same process calling this function may override the sz parameter and cause issues in the system. Consider synchronizing those shared process fields when they are accessed. Be prepared to explain why you did or didn't synchronize those accesses. Expected behavior of existing system calls: 1) Fork should only duplicate the calling thread, if other threads exist in the process they will not exist in the new process 2) Exec should start running on a single thread of the new process. Notice that in a multi-cpu environment a thread might be still running on one CPU while another thread of the same process, running on another CPU, is attempting to perform exec. The thread performing exec should "tell" other threads of the same process to destroy themselves upon exit from user mode and only then complete the exec task. Hint: You can review the code that is performed when the proc->killed field is set and write your implementation similarly. 3) Exit should kill the process and all of its threads, remember while a thread running on one CPU is executing exit, others threads of the same process might still be running. Task 1.2: Thread system calls In this part of the assignment you will add system calls that will allow applications to create kernel threads. Implement the following new system calls: int kthread_create( void*(*start_func)(), void* stack, uint stack_size );

Calling kthread_create will create a new thread within the context of the calling process. The newly created thread state will be runnable. The caller of kthread_create must allocate a user stack for the new thread to use, whose size is defined by the MAX_STACK_SIZE macro in proc.h. This does not replace the kernel stack for the thread. start_func is a pointer to the entry function, which the thread will start executing. Upon success, the identifier of the newly created thread is returned. In case of an error, a non-positive value is returned. The kernel thread creation system call on real Linux does not receive a user stack pointer. In Linux the kernel allocates the memory for the new thread stack. Since we didn t teach memory management yet, you will need to create the stack in user mode and send a pointer to its end in the system call. (Allocating the stack in the kernel, which we don't do in this assignment, means you will later need to manage virtual memory addresses when a stack is removed or created.) int kthread_id(); Upon success, this function returns the caller thread's id. In case of error, a non-positive error identifier is returned. Remember, thread id and process id are not identical. void kthread_exit(); This function terminates the execution of the calling thread. If called by a thread (even the main thread) while other threads exist within the same process, it shouldn t terminate the whole process. If it is the last running thread, process should terminate. Each thread must explicitly call kthread_exit() in order to terminate normally. int kthread_join(int thread_id); This function suspends the execution of the calling thread until the target thread, indicated by the argument thread_id, terminates. If the thread has already exited, execution should not be suspended. If successful, the function returns zero. Otherwise, -1 should be returned to indicate an error. It is recommended that you write a user program to test all the system calls stated above. Task 2: Synchronization primitives In this task we will implement two types of synchronization primitives: mutex and condition variable. The mutex will be implemented in the kernel, and the implementation of condition variables will be done as a user library (implemented in user space), and will be based on mutexes. Your next task (task 3) will be to implement two different types of monitors (another synchronization mechanism). Each of the types is based on a different implementation of an appropriate condition variable. Hence, in this task, you are required to implement two types of condition variable: Mesa and Hoare. Before implementing this task you should do the following: 1) Read about mutexes and condition variables in POSIX. Our implementation will emulate the behavior and API of pthread_mutex and pthread_cond. A nice tutorial is provided by Lawrence Livermore National Laboratory. 2) Examine the implementation of spinlocks in xv6's kernel (for example, the scheduler uses a spinlock). Spinlocks are used in xv6 in order to synchronize kernel code. Your task is to implement the required synchronization primitives as a kernel service to

applications, via system calls. Locking and releasing can be based on the implementation of spinlocks to synchronize access to the data structures which you will create to support mutexes and condition variables. 3) Read about monitors and specifically about Hoare vs. Mesa monitors. A good tutorial can be found at Monitor, Monitor Types. Task 2.1: mutex Quoting from the pthreads tutorial by Lawrence Livermore National Laboratory: Mutex is an abbreviation for "mutual exclusion". Mutex variables are one of the primary means of implementing thread synchronization and for protecting shared data when multiple writes occur. A mutex variable acts like a "lock" protecting access to a shared data resource. The basic concept of a mutex, as used in pthreads, is that only one thread can lock (or own) a mutex variable at any given time. Thus, even if several threads try to lock a mutex only one thread will be successful, and all other threads will get blocked until the mutex becomes available. Mutex differs from a binary semaphore in its unlock semantics - only the thread which locked the mutex is allowed to unlock it, unlike binary semaphores on which any thread can perform up. Examine pthreads' implementation, and define the type kthread_mutex_t inside the xv6 kernel to represent a mutex object. You can use a global static array to hold the mutex objects in the system. The size of the array should be defined by the macro MAX_MUTEXES. The API for mutex functions is as follows: int kthread_mutex_alloc(); Allocates a mutex object and initializes it; the initial state should be unlocked. The function should return the ID of the initialized mutex, or -1 upon failure. int kthread_mutex_dealloc( int mutex_id ); De-allocates a mutex object which is no longer needed. The function should return 0 upon success and -1 upon failure (for example, if the given mutex is currently locked). int kthread_mutex_lock( int mutex_id ); This function is used by a thread to lock the mutex specified by the argument mutex_id. If the mutex is already locked by another thread, this call will block the calling thread (change the thread state to BLOCKED) until the mutex is unlocked. int kthread_mutex_unlock( int mutex_id ); This function unlocks the mutex specified by the argument mutex_id if called by the owning thread, and if there are any blocked threads, the longest waiting thread will acquire the mutex. This is called a fair mutex. An error code will be returned if the mutex was already unlocked, or if the mutex is owned by another thread. The mutex may be owned by one thread and unlocked by another! int kthread_mutex_yieldlock( int mutex_id1, int mutex_id2);

Hoare condition variable requires the ability of one thread to yield a locked mutex directly to another thread. In order to do so, an appropriate system call is required. This function will provide this functionality by passing the lock mutex_id1 (locked by the calling thread) to one of the threads that are waiting on the lock mutex_id2. The call should pass both locks directly to one of the threads waiting on mutex_id2, and return 0 upon success. If the caller doesn t hold mutex_id1 it should fail and return -1. If there are no threads waiting on mutex_id2, the function should just take both locks from the caller (and return 0). Implementation notes: In all the above functions, the value 0 should be returned upon success, and a negative value upon failure. No busy waiting should be used whatsoever, except for synchronizing access to the mutex array using short-time waiting in spinlock. You are required to implement the exact API detailed here and in kthread.h. The graders will run automatic tests to validate your implementation. You are allowed to add additional functions if you find it necessary for the implementation of the conditional variables or the monitors. However, if you choose to do so, be ready to justify your decision in your frontal meeting with the grader. Task 2.2: Condition variable A condition variable is a synchronization mechanism that allows threads to suspend execution and relinquish the processor until some condition is satisfied. The basic operations on condition variables (CV) are: Wait on the CV, suspending the thread's execution until another thread signals the CV and wakes up the waiting thread. Signal a thread that is waiting on the CV. A CV must always have an associated mutex, to avoid a race condition where a thread prepares to wait on a CV and another thread signals the CV just before the first thread actually starts waiting on it. When wait is called, it receives the mutex (in a locked state) as an argument and releases it atomically when the thread starts waiting, and locks it once again after that thread has received a signal (just before it returns from wait ). Mesa vs. Hoare: There is a delicate difference between the Mesa and Hoare implementations of CVs. In a Mesa CV when a thread calls signal it is guaranteed that the function will return immediately, and one of the threads (if any) waiting for the CV will receive the signal. However, the thread that received the signal may may have to also wait for the mutex associated with the CV in order to continue running. In Hoare s CV, the signaler yields the mutex. I.e., once a thread sends a signal, it gives up control on the mutex associated with the CV, and passes it directly to the signaled thread. Hence, the signaler might be blocked for a while, until it regains control on that mutex. However, the thread receiving the signal is guaranteed to run immediately once it receives a signal, since the mutex is also passed to it directly. It is also guaranteed that after the thread has received the signal the required condition for which it waits holds (Why? Why isn t it the

situation in the Mesa implementation?). For further explanations and examples, read about Monitor Types. Implementation: Given a mutex, it is possible to implement a condition-variable library in user space. You are required to implement two such libraries, according to the API s in mesa_cond.h and hoare_cond.h. Again, it is important to follow the API exactly in order to pass the automatic tests. First define the type mesa_cond_t in mesa_cond.h, and the type hoare_cond_t in hoare_cond.h. The API for Mesa condition variable functions is as follows: mesa_cond_t * mesa_cond_alloc(); Allocates a new condition variable and initializes it. The function should return a pointer to the initialized CV, or 0 upon failure. int mesa_cond_dealloc(mesa_cond_t * cond); De-allocates the condition variable object cond, which is no longer needed. Return 0 upon success and -1 upon failure (e.g., if there are threads waiting on cond). int mesa_cond_wait(mesa_cond_t * cond, int mutex_id); Blocks the calling thread until it is signaled. This routine should be called while mutex is locked by the calling thread, and it will atomically release the mutex and put the thread to sleep. After a signal is received by the thread, it is awakened in a state where the mutex is locked for use by it. The thread is then responsible for unlocking the mutex when the thread is finished with it. int mesa_cond_signal(mesa_cond_t * cond); This routine is used to signal (wake up) another thread that is waiting on the condition variable. It should be called after the relevant mutex is locked, and it is the code's responsibility to unlock the mutex afterwards (so that the signaled thread will be able to complete its mesa_cond_wait call). The API for Hoare condition variable functions is as follows: hoare _cond_t * hoare _cond_alloc(); Exactly the same as mesa_cond_alloc(). int hoare _cond_dealloc(hoare _cond_t * cond); Exactly the same as mesa_cond_dealloc(). int hoare_cond_wait(hoare _cond_t * cond, int mutex_id); Similar to mesa_cond_wait. The only difference should be that once some thread calls signal it also passes the mutex lock directly to the thread receiving the signal, hence it will run immediately after the signal is sent (this does not allow a third thread to lock the mutex

between the call of signal and the awakening from wait ). Return 0 upon success and -1 upon failure. int hoare _cond_signal(hoare _cond_t * cond, int mutex_id); Similar to mesa_cond_signal. Here the user should also pass a mutex_id, which is the mutex that should be passed from the signaler thread to the signaled thread directly. It must be called after the relevant mutex is locked. A call of this function may block the caller until the mutex is released by the signaled thread and locked again by this function, before it returns. Return 0 upon success and -1 upon failure. Task 3: A Monitor solution for a Synchronization problem Consider the following synchronization problem: in BGU Operating Systems course, students have to register grading slots in order to receive a grade for their assignment. Since the grader doesn t know how many students want to be graded, and she doesn t want slots to remain empty, she publishes only a few slots at a time. She waits for all slots to be taken by students and only when all slots are taken she publishes a few more slots. This process continues until all students are done taking slots. First, you are required to implement two types of monitors: mesa_slots_ monitor and hoare_slots_ monitor, which will implement the same API by the use of different condition variables (Mesa and Hoare, correspondingly). Task 3.1: Monitors The API for the slots monitors is as follows: mesa/hoare_slots _monitor_t* mesa/hoare slots_mesa_monitor_alloc(); Allocates a new monitor. int mesa/hoare_slots_monitor_dealloc(slots_mesa_monitor_t* monitor); Deallocates the given monitor. int mesa/hoare_slots_monitor_addslots(slots_mesa_monitor_t* monitor,int n); Add n slots, and notify the waiting students that there are free slots (so if some students are waiting for slots they will take it and continue). If the current number of slots is greater than zero, this function should block the caller until all slots are taken or until all students are done taking their slots. int mesa/hoare_slots_monitor_takeslot(slots_mesa_monitor_t*); Take one slot. If there are no slots, this function should block the caller until there is a free slot. If the last slot is taken, it should send a signal to notify that there are no more slots (so the grader will add more). Otherwise, if there are more free slots, notify the other students that there are free slots which may result in a chain of signals from one student to another until all slots are taken. int mesa/hoare_slots_monitor_stopadding(slots_mesa_monitor_t*); Should notify the grader to stop adding slots.

Implementation notes: Make sure that you implement the API twice: once for each type of monitor. The different types of monitors must use the different types of condition variables. The returned integer for all the above functions (except alloc) should be 0 upon success and -1 upon failure. Task 3.2: Simulation You are required to implement user programs called slots_mesa and slots_hoare that will simulate this scenario: once with the mesa slots monitor and once with a hoare slots monitor. Each version needs to be implemented with the appropriate monitor as mentioned above. Your programs should receive as parameters: m - The number of students. n - The number of slots published by the grader each time. A new thread will be created for the grader itself and for each student. The program should let the graders-thread add n slots at a time, and each of the m student-threads take a slot. Then the main thread of the program should: 1. Wait until all student threads are done. 2. Notify the grader thread to stop adding slots. 3. Wait for the grader thread to exit, and return. Implementation notes: Make sure you understand the difference between the Hoare and Mesa implementations of monitors and condition variables. You should be able to explain to the grader how these differences are reflected in your program. Print some outputs that will demonstrate the sequence of operations done by the different threads. Simulation with KLT on multiple CPUs The current configuration in the Makefile defines that xv6 runs on a single CPU. Change xv6's configuration to work with 2 CPUs instead of 1. Note that changes should be made to the Makefile to support multi-core. You may have noticed that QEMU uses another piece of software called KVM. KVM (Kernel based virtual machine) is a virtualization infrastructure for the Linux kernel. KVM supports native virtualization on processors with hardware virtualization extensions [Wikipedia]. For the KVM to exploit multiple CPUs on a multi-core computer it s necessary to set the CPU virtualization flag in the BIOS (most modern

CPUs support it). Labs 302/34 and 142/90 have these settings enabled, additional labs might also have them but this is not guaranteed. If you work on your own computer you ll have to apply the mentioned settings in your computer's BIOS. If you are running Linux on a virtual machine it might be a problem (nested virtualization) and we didn t test it in such a configuration. Note that except for the multi-core support (and some other features which we don t use for now) QEMU and KVM are functioning without the BIOS modifications. You can develop on any computer which has QEMU installed, and then test on the labs computers which support hardware virtualization. Submission guidelines Assignment due date: 23:59, 10/05/2015 17/05/2015 Make sure that your Makefile is properly updated and that your code compiles with no warnings whatsoever. We strongly recommend documenting your code changes with remarks these are often handy when discussing your code with the graders. Due to our constrained resources, assignments are only allowed in pairs. Please note this important point and try to match up with a partner as soon as possible. Submissions are only allowed through the submission system. To avoid submitting a large number of xv6 builds you are required to submit a svn patch (i.e. a file which patches the original xv6 and applies all your changes). Tip: although graders will only apply your latest patch file, the submission system supports multiple uploads. Use this feature often and make sure you upload patches of your current work even if you haven t completed the assignment. Finally, you should note that graders are instructed to examine your code on lab computers only (!) - Test your code on lab computers prior to submission!!! Open a clean folder, download a clean version of xv6 from svn and apply your patch.