Binary Search Trees

CSCI 362: Data Structures

Prof. William Killian

Trees

  • k-ary trees
    • type of graph: undirected, acyclic, connected
    • $|E| = |V| - 1$
    • why is this the case?
  • Each node has a max of $k$ children
  • Most popular: $k = 2$
  • Used to implement ordered, value-based ADTs

Trees

Used to implement C++ Standard Library Ordered Associative Containers

  • std::set, std::multiset
  • std::map, std::multimap

Expression Trees

  • leaves are operands (constants or variables)
  • internal nodes contain operators
  • how the computer (and compilers) evaluate expressions

Trees

Terminology

Level, Depth, and Path Length

Full Trees

  • Full tree of depth $d$
    • Nodes at levels $< d$ have two children
    • How many total nodes exist?

$$\sum_{i=0}^{d}{2^i} = ?$$

Complete Trees

Complete tree of depth $d$

  • Full to depth $d - 1$
  • level $d$ is filled left-to-right
  • depth as a function of N?

$$ \begin{equation*} 2^d \le N < 2^{(d + 1)} \\ d = \lfloor lg (N) \rfloor \\ \end{equation*} $$

Non-complete tree

Non-complete tree

Binary Tree Definition

A binary tree $t$ is a finite set of nodes such that

  • $T$ is empty
  • $T$ consists of a root $R$ and exactly two distinct binary trees
    • left subtree $T_L$
    • right subtree $T_R$

Size = 9, Depth = 3

Example Tree #2

Size = 5, Depth = 4

Node Composition

In [3]:
template <typename T>
struct Node {
  Node (T const& v = T(), Node* l = nullptr, Node* r = nullptr)
  : data (v), left(l), right(r) { }
  ~Node();
  T data;
  Node* left;
  Node* right;
};

// what is this similar to?
In [12]:
// We should probably keep track of a parent

template <typename T>
struct BTNode {
  BTNode (T const& v = T(), BTNode* p = nullptr, BTNode* l = nullptr, BTNode* r = nullptr)
  : data (v), parent(p), left(l), right(r) { }
  T data;
  BTNode* parent;
  BTNode* left;
  BTNode* right;
};
In [5]:
// but first -- cleaning up a tree!
template <typename T>
Node<T>::~Node() {
  if (left != nullptr) {
    delete left;
  }
  if (right != nullptr) {
    delete right;
  }
}

Tree Traversals

Systematically explore every node

3 primary methods

  • Preorder -- node, left, right
  • In-order -- left, node, right
  • Postorder -- left, right, node
In [6]:
auto root =              /* root -> 3     */
  new Node<int>(3,       /*        / \    */
    new Node<int>(1,     /*       /   \   */
      nullptr,           /*      /     \  */
      new Node<int>(2)), /*     1       5 */
    new Node<int>(5,     /*      \     /  */
      new Node<int>(4),  /*       2   4   */
      nullptr));

In-order

In [7]:
template <typename T, typename Function>
void in_order (Node<T>* t, Function&& fn) {
  if (t != nullptr) {
    in_order (t->left, std::forward<Function>(fn));
    fn (t->data);  
    in_order (t->right, std::forward<Function>(fn));
  }
}
In [7]:
#include <iostream>
in_order (root, [] (int v) {
  std::cout << v << ' ';
});
1 2 3 4 5 

Pre-order

In [8]:
template <typename T, typename Function>
void pre_order (Node<T>* t, Function&& fn) {
  if (t != nullptr) {
    fn (t->data);  
    pre_order (t->left, std::forward<Function>(fn));
    pre_order (t->right, std::forward<Function>(fn));
  }
}
In [9]:
pre_order (root, [] (int v) {
  std::cout << v << ' ';
});
3 1 2 5 4 

Post-order

In [10]:
template <typename T, typename Function>
void post_order (Node<T>* t, Function&& fn) {
  if (t != nullptr) {
    post_order (t->left, std::forward<Function>(fn));
    post_order (t->right, std::forward<Function>(fn));
    fn (t->data);
  }
}
In [11]:
post_order (root, [] (int v) {
  std::cout << v << ' ';
});
2 1 4 5 3 

Recursive Functions

  • depth()
  • countLeaves()
  • countInterior()
  • size()
In [13]:
template <typename T>
size_t depth(Node<T>* t) {
  if (t == nullptr) {
    return -1;
  }
  // ...
}
In [14]:
template <typename T>
size_t countLeaves(Node<T>* t) {
  if (t == nullptr) {
    return 0;
  }
  // ...
}
In [15]:
template <typename T>
size_t countInterior(Node<T>* t) {
  if (t == nullptr) {
    return 0;
  }
  // ...
}
In [16]:
template <typename T>
size_t size(Node<T>* t) {
  if (t == nullptr) {
    return 0;
  }
  // ...
}

Binary Search Trees

For each node $N$

  • Keys in $N$'s left subtree are $<$ key of $N$
  • Keys in $N$'s right subtree are $>$ key of $N$

Duplicates?

  • Don't allow them? (set, map)
  • Each node has count?
  • Put them in left (or right) subtree (multiset, multimap)

Applications of Binary Search Trees

In [17]:
#include <vector>
#include <set>
#include <initializer_list>

std::vector<int> v {7, 3, 2, 5, 3, 2, 9, 3};
std::set<int> s {v.begin(), v.end()};
In [18]:
v
Out[18]:
{ 7, 3, 2, 5, 3, 2, 9, 3 }
In [19]:
s
Out[19]:
{ 2, 3, 5, 7, 9 }

Treesort

To sort a vector:

  1. insert elements into BST
  2. retrieve them using in-order traversal

Complexity?

  • Worst case --- $O(N^2)$
  • Average case --- $O(N lg(N))$
  • Worst-case std::(multi)set --- $O(N lg(N))$

std::set

Constructors
set();
set(set const&);
set(set&&);
set(InputIterator first, InputIterator last);
set(std::initializer_list<value_type>);
Assignment
set& operator= (set const&);
set& operator= (set&&);
set& operator= (std::initializer_list<value_type>);
Destructor
~set();
Iterators
iterator       begin()        noexcept;
const_iterator begin()  const noexcept;
const_iterator cbegin() const noexcept;
iterator       end()          noexcept;
const_iterator end()    const noexcept;
const_iterator cend()   const noexcept;
Capacity
bool      empty()    const noexcept;
size_type size()     const noexcept;
size_type max_size() const noexcept;
Insert (oh boy!)
std::pair<iterator, bool> insert (value_type const& value);
std::pair<iterator, bool> insert (value_type&& value);
iterator insert (const_iterator hint, value_type const& value);
iterator insert (const_iterator hint, value_type&& value);
void insert (InputIterator first, InputIterator last);
void insert (std::initializer_list<value_type>);
Erase
void      clear();
iterator  erase(const_iterator pos);
iterator  erase(const_iterator first, const_iterator last);
size_type erase(key_type& key);
Lookup
bool contains(value_type const& key) const;
size_type count(value_type const& key) const;
const_iterator find(value_type const& key) const;
iterator       find(value_type const& key);

Operations of a Binary Search Tree

template <typename T>
class Set {
public:
  struct TreeNode {
    T data;
    TreeNode* parent, *left, *right;
  };
  Set() : m_header(), m_size(0) {
    m_header.parent = m_header.left = m_header.right = &m_header;
  }
  iterator find (T const& v);
  std::pair<iterator, bool> insert (T const& v);
  size_t erase (T const& v);
private:
  TreeNode m_header;
  // parent -> root of tree
  // left   -> max of tree
  // right  -> min of tree
  size_t   m_size;
};

find()

iterator find (T const& value) {
  if (size() == 0)
    return end();
  TreeNode* parent = m_header->parent;
  while (parent != nullptr) {
    if (parent->data == value) {
      // ...
    }
    // ...
  }
  // ...
}

and recursively?

insert()

std::pair<iterator,bool> insert (T const& value) {
  if (size() == 0) {
    TreeNode* newNode = new TreeNode (value, &m_header, nullptr, nullptr);
    m_header.parent = m_header.left = m_header.right = newNode;
    ++m_size;
    return std::make_pair (iterator (newNode), true);
  }
TreeNode* p = &m_header;
  for (TreeNode* n = m_header.parent; n != nullptr; ) { // find where to insert
    p = n;
    if (value < n->data) {
      n = n->left;
    } else if (value > n->data) {
      n = n->right;
    } else { // equal
      return std::make_pair (iterator (n), false);   
    }
  }
  TreeNode* newNode = new TreeNode (value, p, nullptr, nullptr);
if (value < p->data) { // p is the parent -- find out where we inserted (left or right)
    p->left = newNode;
    if (p == m_header.right) m_header.right = newNode; // update new min
  } else {
    p->right = newNode;
    if (p == m_header.left) m_header.left = newNode; // update new max
  }
  ++m_size;
  return std::make_pair (iterator (newNode), true);
}

How would we do this recursively?

Useful Functions

TreeNode* findMinNode (TreeNode* t) {
  // Iterative version
  if (t == nullptr) return t;
  // ...
}

TreeNode* findMaxNode (TreeNode* t) {
  // Iterative version
  if (t == nullptr) return t;
  // ...
}
// Assumes t has a right subtree.
// Used to erase a node with 2 children
// Finds the *smallest* node greater than t
TreeNode* findSuccessorNode (TreeNode* t) {
  // ... 
}

Erase

  1. Leaf node (no replacement needed)
  2. Has only left child
  3. Has only right child
  4. Has both children
    • Find in-order successor (IOS) to use as replacement
    • Splice out IOS (will have only one child at most)
    • Splice in IOS in place of node to erase
Erase (leaf node)
TreeNode* & n = /* node to remove */;

// NOTE: n is a reference to a pointer

iterator iter (n);
++iter;

// NOT SHOWN: check if min or max -- if so, replace with parent

delete std::exchange (n, nullptr);
--m_size;

return iter;
Erase (only left)
TreeNode* & n = /* node to remove */;

iterator iter (n);
++iter;

// NOT SHOWN: check if max -- if so, replace with parent

delete std::exchange (n, n->left);
--m_size;

return iter;
Erase (only right)
TreeNode* & n = /* node to remove */;

// NOT SHOWN: check if min -- if so, replace with parent

delete std::exchange (n, n->right);
--m_size;

return iterator (findMinNode (n));
Erase (both children)
TreeNode* & n = /* node to remove */;
TreeNode* next = findSuccessorNode (n); // may have a right child

// splice out next
next->parent->left = next->right;
if (next->right != nullptr)
  next->right->parent = next->parent;

// splice in next where n was 
next->parent = n->parent;
next->left = n->left;
  // update right -- make sure it doesn't reference itself
next->right = (n->right == next) ? nullptr : n->right;

delete std::exchange (n, next); // automatically fixes parent's link to n
--m_size;

return iterator (n);
In [ ]: