CSCI 362 — Lab 7b

Advanced Sorting

Due: July 7, 2019 @ 11:59PM

Handout

Grab the handout tarfile from here: http://cs.millersville.edu/~wkillian/2019/summer/files/csci362/lab7b-advanced-sorts.tar

Overview

This lab is split into two parts. The first part (worth 50 points) is implementing various sorting algorithms

h_sort
shell_sort
quick_sort
merge_sort
heap_sort
radix_sort
counting_sort

The second part is writing a "driver" application which will run the various sorts, time them, and report the total number of comparisons, swaps, and/or moves (if relevant).

Required Functions (45 points)

namespace SortUtils
{

struct Statistics {
  size_t compares, swaps, moves;
  ...
};

// ----------------------------------------------------------------------

// [5] h-sort on a range
// - implement what was discussed in class
// - Update "stat" -- number of compares and swaps should increase.
// - You will need to split the compound logical test into two parts
//
template<typename Iter, Requires (concepts::RandomAccessIterator, Iter)>
void
h_sort (Iter begin, Iter end, Statistics& stat, size_t h = 1);

// [5] -- Shell sort!
// - Use the Knuth sequence (as discussed in class)
// - ultimately call h_sort a bunch
//
template<typename Iter, Requires (concepts::RandomAccessIterator, Iter)>
void
shell_sort (Iter begin, Iter end, Statistics& stat);

// ----------------------------------------------------------------------

// extend your implementation of median3 from the last lab to count the
// number of comparisons and swaps.
//template<typename Iter>
Iter
median3 (Iter begin, Iter end, Statistics& stat);

// extend your implementation of partition from the last lab to count the
// number of comparisons and swaps.
//
template<typename Iter, typename Value>
std::pair<Iter, Iter>
partition (Iter first, Iter last, Value const& pivot, Statistics& stat)

// [5] -- recursive implementation of quicksort.
// - Special case to add:
//   If distance(begin, end) <= 10, just call insertion sort (h_sort = 1)
//
template<typename Iter, Requires (concepts::RandomAccessIterator, Iter)>
void
quick_sort (Iter begin, Iter end, Statistics& stat);

// ----------------------------------------------------------------------

// extend your implementation of merge from the last lab to count the
// number of comparisons and MOVES
//
// NOTE: whenever you have an assignment *out = *iter, you will want to say:
//       *out = std::move(*iter);
//
template<typename Iter, typename OIter,
         Requires (concepts::RandomAccessIterator, Iter),
         Requires (concepts::RandomAccessIterator, OIter)>
OIter
merge (Iter first1, Iter last1, Iter first2, Iter last2, OIter out, Statistics& stat);

// [5] -- recursive implementation of mergesort.
//
// NOTE: after the call to merge, remember to MOVE the range [out, out + N) back to begin
//       see: std::move (begin, end, dest) -- similar to std::copy()
// Be sure to add "N" to stat's count of moves
template<typename Iter, Requires (concepts::RandomAccessIterator, Iter)>
void
merge_sort (Iter begin, Iter end, Iter out, Statistics& stat);

// ----------------------------------------------------------------------

// [5] -- heap sort
// use std::make_heap and std::pop_heap to implement heap_sort
//
template<typename Iter, Requires (concepts::RandomAccessIterator, Iter)>
void
heap_sort (Iter begin, Iter end);

// ----------------------------------------------------------------------

// [10] LSD (least significant digit) radix sort.
//
// - Use a std::array of 10 std::vector's to perform the sort (one for
//   each decimal digit).
//
// - BONUS: make it work for any base by including the following line
//   near the top [+2pts]:
//       static constexpr size_t BASE = 10 /* but could be changed*/;
//
//   Note: you should only reference BASE and not "10". e.g. I should
//   be able to change the 10 to 2, 16, or any other number and it
//   should still properly sort.
//
template<typename Iter, Requires (concepts::RandomAccessIterator, Iter)>
void
radix_sort (Iter begin, Iter end, size_t d);

// ----------------------------------------------------------------------


// [10] Counting sort!
//
// - Do a SINGLE pass through to find the minimum and maximum values
// - figure out the size of the vector (hint: [min, max] contains how many?
// - allocate a vector of the appropriate size of type "int"
// - iterate through each value and increment the appropriate index (subtracting min)
// - compute the partial sum of the counts to determine which index to
//   place values
// - reverse iterate through the range [end - 1, begin] and:
//   1. determine the index of the value (hint: same as first loop)
//   2. get the value stored in the counts array at the index -- this is where
//      you need to copy the value
//   3. decrement the counts array element
//   4. assign the dereferenced iterator to "output"'s index
// - copy all elements from output back to [begin, end)
// - ONLY 2*(N-1) total comparisons of the values should be performed
//
template<typename Iter, Requires (concepts::RandomAccessIterator, Iter)>
void
counting_sort (Iter begin, Iter end);

} // end namespace SortUtils

Driver Application (35 points)

Input

a single integer N denoting the size of the vector to populate/generate
a single integer seed denoting the seed to use for random number generation
a single integer max denoting the maximum value for random numbers. We will generate numbers in the range [0, max)
a single integer d denoting the number of digits to use for radix sort

Vector generation

We will create random numbers by combining two pieces found within the C++ Standard Library

Random Engine: a source of random data. We will be using a std::mt19937_t (Mersenne Twister engine). The fancy number (19937) denotes the period of the random numbers generated $2^19937 - 1$
Distribution: how we mandate random numbers are distributed. Ideally, we want a uniform distribution. For this, we will use a std::uniform_int_distribution

std::mt19937_t gen (seed);                     // constructs a generator to use
std::uniform_int_distribution<int> dist(1, 6); // fair dice roll
int roll = dist(gen);                          // generates a single number [1, 6]

HINT: I recommend creating a function called "populateVector" that takes in the size and seed and returns a vector.

Timing / Sorting

I've included a Timer class for you to use. It's API is pretty simple:

class Timer {
public:
  Timer();            // default construct -- also "starts" the timer
  void start();       // starts the timer
  void stop();        // stops the timer
  double elapsedMs(); // gets the elapsed total time in milliseconds
};

For any sorting algorithm you wish to time, you'll need to do the following:

make a copy of the original vector
reset the Statistics instance by calling .reset()
start an instance of Timer
invoke the sorting algorithm
stop the instance of Timer
print out the relevant information

I recommend creating a scoped block:

// running std::sort
{
  std::vector<int> copy (original);
  Statistics stat;
  Timer t;
  std::sort (copy.begin(), copy.end());
  t.stop();
  // print code goes here
}
// repeat above block for other algorithms

Algorithm	Time	Comparisons	Swaps	Moves
`std::sort`	Yes
`shell_sort`	Yes	Yes	Yes
`quick_sort`	Yes	Yes	Yes
`merge_sort`	Yes	Yes		Yes
`heap_sort`	Yes
`radix_sort`	Yes
`counting_sort`	Yes

Sample Output

Note: whenever there is something in {}, I want you to print the value

N    ===> [read from user here]
seed ===> [read from user here]
max  ===> [read from user here]
d    ===> [read from user here]

Sorting {N} elements generated with a seed of {seed} in the range [0, {max})

std::sort
Time (ms):  xxxx.xx

shell_sort
Time (ms):  xxxx.xx
Comps:      xxxx
Swaps:      xxxx

quick_sort
Time (ms):  xxxx.xx
Comps:      xxxx
Swaps:      xxxx

merge_sort
Time (ms):  xxxx.xx
Comps:      xxxx
Moves:      xxxx

heap_sort
Time (ms):  xxxx.xx

radix_sort (d={d})
Time (ms):  xxxx.xx

counting_sort
Time (ms):  xxxx.xx

Analysis (20 points)

With the driver written, I want you to run your driver with a few different settings:

N = 10'000'000   seed = 161   max =     10'000   d = 5
N = 10'000'000   seed = 162   max =    100'000   d = 6
N = 10'000'000   seed = 362   max =  1'000'000   d = 7
N = 10'000'000   seed = 330   max = 10'000'000   d = 8
N = 20'000'000   seed = 340   max = 20'000'000   d = 8
N = 40'000'000   seed = 370   max = 40'000'000   d = 8
N = 80'000'000   seed = 380   max = 80'000'000   d = 8

Discuss your results. Specifically talk about how each algorithm performs with regard to (1) size and (2) max. Mention swaps/moves and comparisons if relevant. Include copy-pasted output in your discussion. Include your discussion in WRITEUP.txt

Running the autograder

Invoking make should create the autograder.

There can be a lot of output shown from the autograder. Some of it could be overwhelming at times. I recommend adding a couple of arguments to the autograder to help ease the pain.

./autograder -a

Will run the autograder and stop as soon as the first FAILURE is found

./autograder -t

Will list all of the tags for each of the tests. If you want to run a specific subset of tags you can do so by stating the following:

./autograder [access],[iterator]

Will run all tests that have the access and iterator tags

./autograder --failed

Will list all failed tests at the end of the autograder output

./autograder --passed

Will list all passed tests at the end of the autograder output

Style

Use good programming style:

Write comments where appropriate
Remember to include a comment block at the top of your program that includes:
- Your name
- The course
- The last date of modification
- The assignment name
- A brief description of what your program does
FORMAT YOUR CODE CONSISTENTLY

Submission

Since this consists of multiple files, a tarball will be required for submission.

invoking make handin should create handin.tar

Please submit handin.tar to autolab

Grading

This lab is graded out of 100 points.

[45pts] Unit tests
[35pts] Driver program -- this must have clean and clear formatting. Use good style!
[20pts] Writeup
NOTE: if your program does not compile/run, the highest score you will earn will be a 20/100