CMSC 476: Assignment 3
Prologue
“It was the best of times, it was the worst of times, it was the age of parallelism, it was the age of seriality…”
Overview
Refactor your Assignment 1b
solution to use threads instead of processes. To
achieve parallelism, spawn p std::thread-s,
and have each reduce a span of elements (use the partitioning LOGIC from
CLASS to ensure the load is balanced). Make the reduced value accessible
to the master thread (somehow), which will print it along with the time
it took to compute. As before, we want to allow an arbitrary transform
reduction over any type, rather than hardcoding to int. You
will use my UnaryFunction and BinaryFunction
concepts. In addition, compute the reduction using std::transform_reduce
with a parallel execution policy, and serially in the master thread.
We want higher quality random numbers, but we don’t want to wait a
long time to generate them. Thus we will generate them in parallel. Use
a range of [ 0, 4 ], but this time use
std::mt19937 as the generator and
std::random_device for the seed to ensure every run
produces different numbers.
Use a vector<char>, an initial value of
5.0, a lambda transformer that computes
std::sqrt as a double, and a combiner function
with an appropriate version of std::plus. Ensure you test
your logic with vector-s of various types, with various
transformers and combiners.
Avoid races between threads, but do NOT use a mutex or other form of
data synchronization since they are NOT necessary. You may add
additional parameters to the function
transformReduceOnThread, as long as they are sensible. Do
NOT use global variables.
Put your reduction and generation code in ParallelReduce.hpp, and write
main and I/O logic in a driver file.
Obtain timings as you did before, ensuring you compile with
-O3 when you’re done debugging. Check for correctness using
my equal function template.
Lastly, write a Makefile that will build your executable
or clean up. Ensure ALL dependencies are properly listed. When I type
make your project MUST build.
Input Specification
Input the number of child threads p and the vector size
N. Assume 1 ≤ p ≤ 16 and N ≥
p . Read N as a string and convert it to an
unsigned long, ignoring apostrophes.
Use PRECISELY the format below with the EXACT same SPACING and SPELLING.
p ==> <UserInput>
N ==> <UserInput>
Output Specification
Output the hardware concurrency (via std::thread), this
time BEFORE we prompt for p, the parallel reduction result,
the parallel time, the parallel transform_reduce algorithm
result, the parallel transform_reduce algorithm time, the
serial reduction result, and the serial time. Lastly, output whether
YOUR parallel reduction result matches the serial reduction result
(“yes” or “NO!”). For the times use milliseconds and
LIMIT the decimal places to TWO.
Use PRECISELY the format below with the EXACT same SPACING and SPELLING. Each deviation in spacing or spelling will result in AT LEAST a twenty point deduction.
Sample Output
This output is for a vector of char, a
double std::sqrt transformer, std::plus, and
an initial value of 5.0.
(hardware concurrency = 4)
p ==> 4
N ==> 100'000'000
// sum: 122915065.2428683
// time: xx.xx ms
// alg sum: 122915065.2428683
// alg time: xx.xx ms
Serial sum: 122915065.24638017
Serial time: xx.xx ms
Correct: yes
Required Types, Concepts, and Functions
See source files.
What to Submit
Submit your C++ driver file, ParallelReduce.hpp,
Timer.hpp, and a Makefile (named
Makefile). If your Makefile doesn’t work you will receive a
ZERO.
Ensure your parallel and serial results are equal.
Hints
Ensure you are using
clang-format. I will no longer repeat this.Compile with
-Walland remove ALL warnings.See my C++ Threads example.
Look into helper functions std::ref and std::cref to pass arguments by reference to a thread.
Compile with
-ltbbto link to the necessary libraries.
Comments
How does the performance of this version compare to the one using processes?
Gary M. Zoppetti, Ph.D.