CMSC 476: Assignment 1b

Overview

Refactor your Assignment 1 solution to improve its generality and behavior. We want to allow an arbitrary transform reduction over any type, rather than hardcoding to int. Modify the UnaryFunction concept to allow it to return a type convertible to ReturnType, and modify the BinaryFunction concept to return a result convertible to T. Then modify the prototypes for transformReducePar and transformReduceOnProc appropriately, so that they can take a generic span of T const and a suitable combiner and transformer.

Set the A values and seed as before for the version you submit. Use a vector<char>, an initial value of 5.0, a lambda transformer that computes std::sqrt as a double, and a combiner function with an appropriate version of std::plus. Ensure you test your logic with vector-s of various types, with various transformers and combiners.

No longer pass an initial value to transformReduceOnProc, as the initial value should only be combined ONCE in transformReducePar. As an example, if I transform reduce the sequence [ 1, 3, 5 ] with initial value 10, std::plus, and std::identity, the result should be 10 + 1 + 3 + 5 = 19 (although the order of computation may differ).

This time input N as a string, and allow the user to use apostrophes as a digit separator.

Obtain timings as you did before, ensuring you compile with -O3 when you’re done debugging.

Input Specification

Input the number of child processes p and the vector size N. Assume 1 ≤ p ≤ 16 and N ≥ p . Read N as a string and convert it to an unsigned long, ignoring apostrophes.

Use PRECISELY the format below with the EXACT same SPACING and SPELLING.

p ==> <UserInput>

N ==> <UserInput>

Output Specification

Output the hardware concurrency (via std::thread), parallel reduction result, the parallel time, the serial reduction result, and the serial time. For the times use milliseconds and LIMIT the decimal places to TWO.

Use PRECISELY the format below with the EXACT same SPACING and SPELLING. Each deviation in spacing or spelling will result in AT LEAST a ten point deduction.

Sample Output

This output is for a vector of char, a double std::sqrt transformer, std::plus, and an initial value of 5.0.

p ==> 4
(hardware concurrency = 4)

N ==> 100'000'000

// sum:       122915065.2428683
// time:      49.71 ms

Serial sum:   122915065.24638017
Serial time:  192.07 ms

Required Types, Concepts, and Functions

These are as before, with the following exceptions.

// Concept for binary function that takes two T-s and returns a
//   type convertible to T
template<typename F, typename T>
concept BinaryFunction = // ...

// Concept for unary function that takes a T and returns a
//   type convertible to S
// Determine template header
concept UnaryFunction = // ...

// Compute a parallel reduction on a contiguous subrange
//   of elements of 'v' using process-level parallelism.
// Uses 'numProcesses' processes.
template<typename ParamType, typename ReturnType>
// Determine return type
transformReducePar (// Determine param types);
// You may NOT rearrange the parameters from the previous lab.
// You are only determining how to use the template types.

// Compute a reduction on a contiguous subrange of elements of 'v'.
// 'id': ID of the process, in the range [0, 'numProcesses').
// Use our partitioning logic so multiple cores run this function
//   in parallel.
template<typename ParamType, typename ReturnType>
// Determine return type
transformReduceOnProc (// Determine param types);
// You may NOT rearrange the parameters from the previous lab.
// Remove "init" as it is no longer needed.

What to Submit

Submit your C++ driver file AND Timer.hpp. Do NOT rename Timer.hpp.

Ensure your parallel and serial results match when reducing integers.

Hints

Ensure you are using clang-format.
Compile with -Wall and remove ALL warnings.

Comments

How is speedup affected by the transformer? Why?

Gary M. Zoppetti, Ph.D.