Introduction

CSCI 362: Data Structures

Prof. William Killian

Why study Data Structures and Algorithms?

  • Course is required
  • Write more efficient code (in terms of space, speed, power)
  • Develop non-trivial software
  • Become a better problem solver

Data Structures

What is a data structure?

  • A particular way of organizing data in a computer so that it can be used efficiently Wikipedia

Examples?

  • Primitive types: bool, char, short, int, etc.
  • Composite types: array, struct/class, union
  • Abstract Data types: stack, queue, list, set, map

Algorithms

What is an algorithm?

  • A self-contained step-by-step set of operations to be performed (Wikipedia)

Examples?

  • Sequential search, binary search
  • Bubble sort, selection sort, merge sort
  • Inserting an element into a Binary Search Tree

Data Structures and Algorithms

  • Programs = Data Structures + Algorithms
  • Data Structures
    • (efficient) storage and manipulation of (large) data sets
    • Arrays (both static and dynamic), linked lists, linked trees
    • Used to implement abstract data types e.g. set, list
  • Algorithms
    • Operations on data
    • Insertion and removal of elements
    • Rearranging data (reverse, sort, merge)
    • Searching data

Complexity

Two types of complexity

  • Speed
    • relate operations to input size
  • Space
    • relate number of bytes to input size
    • example: space for linked list of ints vs array of ints

Big-O notation

  • Machine-independent means for specifying efficiency (complexity)
  • Concerned with *asymptotic behavior
  • Generally count most significant statements/operations and related to input size ($N$) using a runtime function $T$
    • example: if $T(N) = 9N^2 + 43N + 7$ then the algorithm is $O(N^2)$

We can subdivide complexity into three cases:

  • Best case
  • Average case
  • Worst case (default)

Example: linear search time complexity:

In [1]:
int linear_search (int A[], int N, int searchValue) {
    for (int i = 0; i < N; ++i) {
        if (A[i] == searchValue) {
            return i;
        }
    }
    return -1;
}
  • What should we count? In the best case? Worst case? Average case?
  • For worst case, $T(N) = ? = O(?)$
In [2]:
#include <iostream>
int a[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
In [3]:
linear_search (a, 10, 0)
Out[3]:
10
In [4]:
linear_search (a, 10, 1)
Out[4]:
0
In [5]:
linear_search (a, 10, 5)
Out[5]:
4
In [6]:
linear_search (a, 10, 10)
Out[6]:
9

C++ Standard Library

Languages include data structures/ADTs/algorithms

  • Java: Java Collections framework (ArrayList, TreeSet)
  • C++: C++ Standard Library (commonly referred to as Standard Template Library {STL})
  • C#: .NET class library

The C++ Standard Container Library provides:

Container classes

  • array, vector, deque, list, forward_list
  • set, multiset, map, multimap
  • unordered_set, unordered_multiset, unordered_map, unordered_multimap
  • stack, queue, priority_queue
  • string, valarray, bitset (container-like)

Lots of algorithms

  • accumulate
  • copy
  • sort
  • lower_bound, upper_bound
  • nth_element
  • partition

Storage Containers (Vectors)

  • Position-based
  • Data structure: dynamic array
In [7]:
#include <vector>
In [8]:
std::vector<int> v { 7, 4, 9, 3, 1};

for (size_t i = 0; i < v.size(); ++i) {
    std::cout << v[i] << " ";
}
7 4 9 3 1 
In [9]:
v.push_back(2);
v
Out[9]:
{ 7, 4, 9, 3, 1, 2 }
In [10]:
v.insert (v.begin(), 5);
v
Out[10]:
{ 5, 7, 4, 9, 3, 1, 2 }

Resizing a vector

In [11]:
v = {7, 4, 9, 1, 3 }
Out[11]:
{ 7, 4, 9, 1, 3 }
In [12]:
v.resize(8); // grow to 8 elements
v
Out[12]:
{ 7, 4, 9, 1, 3, 0, 0, 0 }
In [13]:
v.resize(3); // shrink to 3 elements
v
Out[13]:
{ 7, 4, 9 }
  • Vectors allow efficient subscripting: $O(1)$
  • Position-based indexing (why it's called a sequence container)
  • Which vector operations are slow?
In [14]:
v.shrink_to_fit();
std::cout << "Size: " << v.size() << '\n';
std::cout << "Capacity: " << v.capacity() << '\n';
v
Size: 3
Capacity: 3
Out[14]:
{ 7, 4, 9 }
In [15]:
v.insert(v.begin(), 10);
std::cout << "Size: " << v.size() << '\n';
std::cout << "Capacity: " << v.capacity() << '\n';
v
Size: 4
Capacity: 6
Out[15]:
{ 10, 7, 4, 9 }
In [16]:
v.erase(v.begin());
std::cout << "Size: " << v.size() << '\n';
std::cout << "Capacity: " << v.capacity() << '\n';
v
Size: 3
Capacity: 6
Out[16]:
{ 7, 4, 9 }

Storage Containers (Stacks and Queues)

a.k.a. Container Adapters

  • Stack (std::stack)
    • Header <stack>
    • Data Structure: dynamic array
  • Queue (std::queue)
    • Header <queue>
    • Data Structure: dynamic array
  • Priority Queue (std::priority_queue)
    • Header <queue>
    • Data Structure: dynamic array

Storage Containers (Maps, Sets)

std::set, std::map, std::multiset, std::multimap

  • Headers: <set> and <map>
  • Key-based data structure
  • Data structure: (balanced) Binary Search Tree

Question: Maximum number of comparisons for search?

01-bst.png

Abstract Data Types (ADTs)

  • Use ADT to specify a type
  • Abstract Data Types (ADTs)
    • collection of values (e.g. ints, Persons, Lists)
    • set of operations on the data with pre- and post- conditions
    • Note: what's missing?
  • Typically implement ADTs with a class
  • A class encapsulates two things:
    • data (data members {Java: instance variables})
    • operations (member functions {Java: methods})

ADT Operations Description

  • operationName -- action statement specifies:
    • input parameters
    • type of operations on elemenets of data structure
    • output parameter(s)
  • Preconditions:
    • necessary conditions that must apply to
      1. input parameters of operations
      2. current state of object
    • which then allows for successful execution of the operation
  • Postconditions:
    • changes in data structure caused by operation

Abstract Data Type Example: time24

Suppose we have a time24 ADT that has a duration method

duration$(t)$

  • time $t$ is input parameter of type time24
  • measures length of time from $this$ time to time $t$
  • returns result as time24 value
  • Precondition: $this \le t$
  • Postcondition: object is unaffected
In [18]:
time24 now {14, 45};
time24 later {16, 25};
std::cout << now.duration (later) << '\n';
1:40
In [19]:
now = {14, 45};
later = {12, 15};
std::cout << now.duration (later) << '\n';
-3:30
struct time24 {
    int hour;
    int minute;

    time24 duration(time24 other) const {
        // assert ((other.minute + 60 * other.hour) >= (minute + 60 * hour))
        if (minute > other.minute) {
            --other.hour;
            other.minute += 60;
        }
        other.hour -= hour;
        other.minute -= minute;
        return other;
    }

    friend std::ostream&
    operator<< (std::ostream& os, const time24 dur) {
        os << dur.hour << ':' << dur.minute;
        return os;
    }
};

Separating Definition from Implementation

Definition

struct time24
    int hour;
    int minute;

    time24
    duration(time24 other) const;

    friend std::ostream&
    operator<< (std::ostream& os, const time24 dur);
}

Implementation

time24
time24::duration(time24 other) const
{
    // ...
}

std::ostream&
operator<< (std::ostream& os, const time24 dur)
{
    // ...
}

Classes (Implementing ADTs)

class MyClass
{
public:
    // public member functions and (maybe) data members
    // Form an interface -- Choose wisely!
private:
    // implementation details go here!
    // - private data members
    // - private helper functions
};
  • public section visible to clients
  • private section visible to member functions only (and friends)
In [20]:
class C
{
public:
    // methods with access to 'field'
private:
    int field;
};

C c;
c.field = 3;
input_line_42:10:3: error: 'field' is a private member of 'C'
c.field = 3;
  ^
input_line_42:6:9: note: declared private here
    int field;
        ^
Interpreter Error: 
  • Private section contains implementation
    • Data values of the object
    • Utility (helper) functions that support the class
    • Only accessible to member functions (and friends)
  • Key object-oriented programming concept: Data Abstraction
    • Separation of implementation from interface
  • In C++, place class definition (interface) in a header file
    • Header files usually end in .h or .hpp (e.g. MyClass.h)
  • In C++, place class implementation in a source file
    • Header files usually end in .cpp (e.g. MyClass.cpp)
  • Clients of MyClass include MyClass.h
#include "MyClass.h"

Example: Rectangle class

  • Default constructor
    • default arguments
    • member initializers
  • Const member functions
    • compiler contract to not modify this
  • Method definitions are inlined
    • see (future) examples on website for non-inlined versions
    • uses the special scope resolution operator ::
class Rectangle
{
public:  
  // This is the constructor. It's a special function
  // that allows us to initialize a new instance of
  // a Rectangle with a specified length and width
  Rectangle (double length = 0.0, double width = 0.0)

    // below is a member initializer list. For each
    // member variable of our class, we should always
    // initialize them according to some reasonable
    // default
  : m_length (length)
  , m_width (width)
  {
    // in this case, our constructor "body" is empty
  }
// area is a const member function (denote the trailing const)
  // it offers a read-only view of the current class and calculates
  // the area of the rectangle
  double
  area() const
  {
    // inside we can access (read) any member variable
    return m_length * m_width;
  }

  double
  perimeter() const
  {
    return 2 * (m_length + m_width);
  }
double
  getLength() const
  {
    return m_length;
  }

  double
  getWidth() const
  {
    return m_width;
  }

  // mutator method which sets a new length and width
  double
  setSides (double length, double width) {
    m_length = length;
    m_width = width;
  }
// member variables (usually) go at the end of the class
  // definition. They should probably be marked as "private"
  // most of the time
private:
  double m_length;
  double m_width;
};

APIs

Application Programming Interface

  • ADTs + syntax
  • Function prototypes for all accessible functions
    • constructors are usually listed separately
  • Description of methods, sometimes with pre- and post-conditions
  • Complexities (sometimes, not always required/necessary)
  • NOTE: does not expose implementation details

RandomNumber

Constructors

RandomNumber (unsigned int seed = 0);
  • Sets the seed for the random number generator
  • Postconditions: With the default value 0, the system clock initializes the seed; otherwise, the user provides the seed for the generator

Operations

double frandom();
  • Return a real number $x \mid 0.0 \le x < 1.0$

 

int random();
  • Return a 32-bit random integer $m \mid 0 \le m < 2^{31} - 1$

 

int random(unsigned int n);
  • Return a 32-bit random integer $m \mid 0 \le m < n$

Use

RandomNumber rndA, rndB (7);
for (int i = 0; i < 5; ++i)
{
    // 0 <= item < 40
    int item = rndA.random (40);
    // 0.0 <= x < 1.0
    double x = rndB.frandom ();
    std::cout << item << "  " << x << '\n';
}

std::string

  • portion of string API follows
  • Strings are mutable
  • Some useful member functions:
    • find_first_of
    • find_last_of
    • find
    • substr
    • insert
    • erase
    • string::npos (static)
  • Free operator+() to concatenate