CSCI 380 -- Lab 4

Locks All the Way Down...

Handout

A tarfile can be acquired here: lab4-handout.tar

Part 0 (Background): Mutex

We've been looking at locks and mutual exclusion a bit in class, so let's dive a little deeper. A lock, by definition, can either be available or unavailable. If a lock is available, then it can be acquired (which changes the lock's state to unavailable). If a lock is unavailable, then only the thread that has acquired the lock can release the lock (changing its state to available). We use the terms of wait and signal for acquiring and releasing, respectively.

def wait (lock l):
    while l == UNAVAILABLE:
        wait()
    l.state = UNAVAILABLE

def signal (lock l):
    l.state = AVAILABLE

The problem with this example above when we have more than one thread running at the same time. Two threads may both observe that the lock has been made available, and both will then set the state to unavailable. In fact, for a correct implementation of a mutex, we often need to rely on hardware support that can ensure the while-loop from above happens atomically (acting like one instruction). Intel x86-64 hardware and ARM support atomic instructions .


#define AVAILABLE   0
#define UNAVAILABLE 1

// attempts to set lock IFF it is available (to unavailable)
#define TRY_LOCK(Mutex) \
  __sync_lock_test_and_set (&(Mutex)->lock, UNAVAILABLE)

void mutex_wait (volatile struct mutex* m) {
  // if we fail to get the lock quickly, switch to a spin lock
  if (TRY_LOCK(m)) {
    spin_lock (mutex);
  }
}

void mutex_signal (volatile struct mutex* m) {
  // sets lock to AVAILABLE
  __sync_lock_release (&m->lock); 
}

#define SPIN_LIMIT ((1 << 6) - 1)

static void spin_lock (volatile struct mutex* m) {
  int spins = 0;
  while (m->lock != AVAILABLE || TRY_LOCK(m)) {
    if (!(++spins & SPIN_LIMIT)) {
      // relinquish thread
      sched_yield ();
    }
  }
}

Part 1: Semaphores

Mutexes are really convenient for allowing us to limit parts of code such that only one thread may execute at once, but what if we want to state that two or more threads can be in the same region at once? A mutex will not work because it is either available or unavailable.

We need to keep track of a count of how many threads are within a given region, so let's introduce a count.
We need to safely update count, so we must introduce a mutex which guards it.
We need to somehow "block" a thread from running if we have reached a maximum number of concurrent threads in a region, so we introduce another mutex called delay

These three pieces make up our semaphore. For the first part of your lab, it is your responsibility to implement three functions which correctly implement a semaphore

struct semaphore {
  struct mutex mutex;
  struct mutex delay;
  int count;
};

We have two different ways of implementing a semaphore. I am providing the pseudocode, so you will only need to translate the algorithm to an implementation in C.

Option 1:

def init (s : Semaphore, avail: Int):
  init (s.mutex)
  init (s.delay)
  wait (s.delay)
  s.count = avail

def wait (s : Semaphore):
  wait (s.mutex);
  s.count -= 1
  if s.count < 0:
    signal (s.mutex)
    wait (s.delay)
  signal (s.mutex)

def signal (s : Semaphore):
  wait (s.mutex)
  s.count += 1
  if s.count <= 0:
    signal (s.delay)
  else:
    signal (s.mutex);

Option 2:

def init (s : Semaphore, avail: Int):
  init (s.mutex)
  init (s.delay)
  if avail == 0:
    wait (s.delay)
  s.count = avail

def wait (s : Semaphore):
  wait (s.delay)
  wait (s.mutex);
  s.count -= 1
  if s.count > 0:
    signal (s.delay)
  signal (s.mutex)

def signal (s : Semaphore):
  wait (s.mutex)
  s.count += 1
  if s.count == 1:
    signal (s.delay)
  signal (s.mutex);

Note: whichever option you choose, you will be required in your writeup to include:

The number of mutex wait()s and signal()s when the semaphore is available/unavailable within wait()
The number of mutex wait()s and signal()s when the semaphore is available/unavailable within signal()
Justification as to why you chose a particular implementation

Part 2: Making a Circular Buffer Queue Thread-Safe

A queue is defined as a structure with the following operations. The first item added into the structure will always be the first item removed (FIFO).

add - inserts an element into the structure
remove - removes an element from the structure
peek - inspects the front element of the structure

We can implement a queue as a circular buffer which limits the number of elements which can exist in the queue. To make operations as fast as possible, we don't keep track of size and always assume is safe to add and remove.

I provide you with a functional implementation of queue within queue.h and queue.c. It is up to you to determine which synchronization primitives (mutex and/or semaphore) to add to your queue implementation. You will modify the queue.h header file to augment your struct queue definition. Then you will modify the queue.c implementation file to make add, remove, and peek thread safe.

You will then add to WRITEUP.txt your design rationale and discussion on performance of your thread-safe queue.

I provide a driver program, driver.c, which creates a specified number of "readers" and "writers" to remove/add values from a shared queue. I will use this driver for showing experimental correctness (manual analysis will be required for proving correctness).

Putting It All Together

Command	Description
`make`	builds the program
`make clean`	removes all object files and `driver` executable
`make tidy`	removes all dependency files, object files, and `driver` executable
`make handin`	creates a tarball of `semaphore.c` `queue.h` `queue.c` and `WRITEUP.txt`
`./driver`	runs the driver program

driver usage:

Readers ==> number of threads which should read from the queue
Writers ==> number of threads which should write to the queue
Buffer Size ==> size of the internal (circular) buffer used by the queue
Elements ==> total number of elements to put into the queue (and eventually remove)

NOTE: the calls to fprintf() must stay within queue.c

Stress Testing:

By default, DIFFICULTY is set to 0 (a.k.a. easy mode)

You can explicitly set the difficulty up to 4 by changing your invocation to make:

make clean && DIFFICULTY=4 make

Submission

You can invoke make handin to generate a handin.tar file. Upload/submit this file to autolab.

Grading Criteria

This lab will be graded out of 100 points. Please note: only 25 points is autograded

20pt - Correct implementation of semaphore
10pt - Corresponding required writeup of semaphore in WRITEUP.txt
10pt - Modifications to queue.h indicating synchronization primitives added
20pt - Modifications to queue.c which make queue_add, queue_remove, and queue_peek thread-safe
15pt - Writeup on what changes you made to queue.c and queue.h and why you made them. This should be several sentences long. You can use psuedocode as well to support your statements
25pt - Stress test on correctness and throughput for your queue implementation with the provided driver program