CMSC 476: Assignment 1b
Overview
Refactor your Assignment 1 solution
to improve its generality and behavior. We want to allow an arbitrary
transform reduction over any type, rather than hardcoding to
int. Modify the UnaryFunction concept to allow
it to return a type convertible to ReturnType, and modify
the BinaryFunction concept to return a result convertible
to T. Then modify the prototypes for
transformReducePar and transformReduceOnProc
appropriately, so that they can take a generic span of
T const and a suitable combiner and transformer.
Set the A values and seed as before for the version you
submit. Use a vector<char>, an initial value of
5.0, a lambda transformer that computes
std::sqrt as a double, and a combiner function
with an appropriate version of std::plus. Ensure you test
your logic with vector-s of various types, with various
transformers and combiners.
No longer pass an initial value to
transformReduceOnProc, as the initial value should only be
combined ONCE in transformReducePar. As an
example, if I transform reduce the sequence [ 1, 3, 5 ]
with initial value 10, std::plus, and
std::identity, the result should be 10 + 1 + 3 + 5 = 19
(although the order of computation may differ).
This time input N as a string, and allow the user to use
apostrophes as a digit separator.
Obtain timings as you did before, ensuring you compile with
-O3 when you’re done debugging.
Input Specification
Input the number of child processes p and the vector
size N. Assume 1 ≤ p ≤ 16 and N ≥
p . Read N as a string and convert it to an
unsigned long, ignoring apostrophes.
Use PRECISELY the format below with the EXACT same SPACING and SPELLING.
p ==> <UserInput>
N ==> <UserInput>
Output Specification
Output the hardware concurrency (via std::thread),
parallel reduction result, the parallel time, the serial reduction
result, and the serial time. For the times use milliseconds and
LIMIT the decimal places to TWO.
Use PRECISELY the format below with the EXACT same SPACING and SPELLING. Each deviation in spacing or spelling will result in AT LEAST a ten point deduction.
Sample Output
This output is for a vector of char, a
double std::sqrt transformer, std::plus, and
an initial value of 5.0.
p ==> 4
(hardware concurrency = 4)
N ==> 100'000'000
// sum: 122915065.2428683
// time: 49.71 ms
Serial sum: 122915065.24638017
Serial time: 192.07 ms
Required Types, Concepts, and Functions
These are as before, with the following exceptions.
// Concept for binary function that takes two T-s and returns a
// type convertible to T
template<typename F, typename T>
concept BinaryFunction = // ...
// Concept for unary function that takes a T and returns a
// type convertible to S
// Determine template header
concept UnaryFunction = // ...
// Compute a parallel reduction on a contiguous subrange
// of elements of 'v' using process-level parallelism.
// Uses 'numProcesses' processes.
template<typename ParamType, typename ReturnType>
// Determine return type
transformReducePar (// Determine param types);
// You may NOT rearrange the parameters from the previous lab.
// You are only determining how to use the template types.
// Compute a reduction on a contiguous subrange of elements of 'v'.
// 'id': ID of the process, in the range [0, 'numProcesses').
// Use our partitioning logic so multiple cores run this function
// in parallel.
template<typename ParamType, typename ReturnType>
// Determine return type
transformReduceOnProc (// Determine param types);
// You may NOT rearrange the parameters from the previous lab.
// Remove "init" as it is no longer needed.What to Submit
Submit your C++ driver file AND
Timer.hpp. Do NOT rename
Timer.hpp.
Ensure your parallel and serial results match when reducing integers.
Hints
Ensure you are using
clang-format.Compile with
-Walland remove ALL warnings.
Comments
How is speedup affected by the transformer? Why?
Gary M. Zoppetti, Ph.D.