CMSC 476: Assignment 3

Prologue

“It was the best of times, it was the worst of times, it was the age of parallelism, it was the age of seriality…”

Overview

Refactor your Assignment 1b solution to use threads instead of processes. To achieve parallelism, spawn p std::thread-s, and have each reduce a span of elements (use the partitioning LOGIC from CLASS to ensure the load is balanced). Make the reduced value accessible to the master thread (somehow), which will print it along with the time it took to compute. As before, we want to allow an arbitrary transform reduction over any type, rather than hardcoding to int. You will use my UnaryFunction and BinaryFunction concepts. In addition, compute the reduction using std::transform_reduce with a parallel execution policy, and serially in the master thread.

We want higher quality random numbers, but we don’t want to wait a long time to generate them. Thus we will generate them in parallel. Use a range of [ 0, 4 ], but this time use std::mt19937 as the generator and std::random_device for the seed to ensure every run produces different numbers.

Use a vector<char>, an initial value of 5.0, a lambda transformer that computes std::sqrt as a double, and a combiner function with an appropriate version of std::plus. Ensure you test your logic with vector-s of various types, with various transformers and combiners.

Avoid races between threads, but do NOT use a mutex or other form of data synchronization since they are NOT necessary. You may add additional parameters to the function transformReduceOnThread, as long as they are sensible. Do NOT use global variables.

Put your reduction and generation code in ParallelReduce.hpp, and write main and I/O logic in a driver file.

Obtain timings as you did before, ensuring you compile with -O3 when you’re done debugging. Check for correctness using my equal function template.

Lastly, write a Makefile that will build your executable or clean up. Ensure ALL dependencies are properly listed. When I type make your project MUST build.

Input Specification

Input the number of child threads p and the vector size N. Assume 1 ≤ p ≤ 16 and Np . Read N as a string and convert it to an unsigned long, ignoring apostrophes.

Use PRECISELY the format below with the EXACT same SPACING and SPELLING.

p ==> <UserInput>

N ==> <UserInput>

Output Specification

Output the hardware concurrency (via std::thread), this time BEFORE we prompt for p, the parallel reduction result, the parallel time, the parallel transform_reduce algorithm result, the parallel transform_reduce algorithm time, the serial reduction result, and the serial time. Lastly, output whether YOUR parallel reduction result matches the serial reduction result (“yes” or “NO!”). For the times use milliseconds and LIMIT the decimal places to TWO.

Use PRECISELY the format below with the EXACT same SPACING and SPELLING. Each deviation in spacing or spelling will result in AT LEAST a twenty point deduction.

Sample Output

This output is for a vector of char, a double std::sqrt transformer, std::plus, and an initial value of 5.0.

(hardware concurrency = 4)
p ==> 4

N ==> 100'000'000

// sum:       122915065.2428683
// time:      xx.xx ms

// alg sum:   122915065.2428683
// alg time:  xx.xx ms

Serial sum:   122915065.24638017
Serial time:  xx.xx ms

Correct:      yes

Required Types, Concepts, and Functions

See source files.

What to Submit

Submit your C++ driver file, ParallelReduce.hpp, Timer.hpp, and a Makefile (named Makefile). If your Makefile doesn’t work you will receive a ZERO.

Ensure your parallel and serial results are equal.

Hints

Comments

How does the performance of this version compare to the one using processes?


Gary M. Zoppetti, Ph.D.