My Take on UCSD's Basic Data Structures course (CSE 12)
Last lecture, we talked about the concepts underlying a binary search tree. This lecture, we’ll implement some features (and you’ll do the rest in your PA!)
Firstly, let’s define our class. Our in-class BST will only hold integers, though your PA uses templates.
We’ll define our methods as virtual
, so that we can do subtyping in the future if we choose (hint hint);
class BST {
struct BSTNode {
int element;
BSTNode *left, *right;
int height;
BSTNode(int el, BSTNode *lt = nullptr, BSTNode *rt = nullptr) : element(el), left(lt), right(rt) {
this->height = 0;
}
~BSTNode(void) { }
};
BSTNode *root;
virtual int height(BSTNode *t) const;
virtual void balance(BSTNode *& t);
virtual void insert(const int x, BSTNode *& t);
virtual BSTNode * find_min_rec(BSTNode *t) const;
virtual BSTNode * find_min_iter(BSTNode *t) const;
virtual bool contains(const int x, BSTNode *t) const;
public:
BST(void);
virtual ~BST(void);
virtual const int find_min(void) const;
virtual bool contains(const int x) const;
virtual void insert(const int x);
// other private and public methods omitted
// that's for you to do in your PA :)
};
Great! Any questions about our class as we’ve defined it so far?
Let’s start by implementing the private methods one by one. We’ll go in order from top to bottom.
int BST::height(BSTNode *t) const {
return (t) ? t->height : -1; // null node has height -1
}
void BST::balance(BSTNode *& t) {
if(!t) return; // base case
t->height = std::max(height(t->left), height(t->right)) + 1;
}
These are two pretty simple helper methods, but we need these to do insert
. Speaking of which, how do we approach insert
?
void BST::insert(const int x, BSTNode *& t) {
// base case
if(!t) {
t = new BSTNode(x); // why does this work?
} else if(x < t->element) {
insert(x, t->left);
} else if(t->element < x) {
insert(x, t->right);
}
// set height
this->balance(t);
}
void BST::insert(const int x) {
this->insert(x, this->root);
}
Does this insert implementation work?
Next, constructor and destructor:
BST::BST(void) : root(nullptr) { }
BST::~BST(void) {
this->empty(); // empty and 1 arg empty omitted
}
Easy enough, eh? Let’s do the find_min
functions next:
// this is given in your PA
const int BST::find_min(void) const {
if(this->root) return this->find_min_rec(this->root)->element;
else throw std::out_of_range("Tree has no elements.");
}
BSTNode * BST::find_min_rec(BSTNode *t) const {
// smallest is found by going left
if(!t->left) return t;
return find_min_rec(t->left);
}
BSTNode * BST::find_min_iter(BSTNode *t) const {
// smallest is found by going left
while(t->left) t = t->left;
return t;
}
As you can see, this function is pretty straight forward both iteratively and recursively.
As a fun fact, the recursive one would (if compiled with optimizations on) get turned into the iterative one by the compiler due to tail call optimization, but that is a subject for a programming languages course or a compilers course :)
Next up, we’ll do the pair of contains
functions:
bool BST::contains(const int x, BSTNode *t) const {
if(!t) return false; // base case
if(/* what goes here? */) return this->contains(x, /* what goes here? */);
if(/* what goes here? */) return this->contains(x, /* what goes here? */);
return true;
}
bool BST::contains(const int x) const {
return this->contains(x, this->root);
}
What’s the one biggest problem we have in our code above?
Consider the tree created by running our insert function with the following input (in order):
1, 2, 3, 4, 5, 6, 7
How about this input (in order): 4, 2, 1, 3, 6, 5, 7
In a naïve implementation, the tree can get so skewed due to insertion order that we end up with horrible runtimes ($O(n)$) again.
The key here is that we need to verify that our trees stay balanced throughout. The easiest way to handle this is to rebalance the tree as we insert to maintain some standard.
We’ll now discuss two commonly used methods, as well as go over some specialty trees that accomplish the same effect for different purposes.
Which of these is not a valid AVL tree?
Try doing AVL insertion for [1,7] in that order. What shape do you get?
Crucially, this means that being an AVL tree is a stricter property than being a Red-Black tree.
Is the following tree both a valid AVL and red-black, or just red-black?
Is the following tree both a valid AVL and red-black, or just red-black?
We won’t cover insertion or removal into a red-black tree (that’s for CSE 100!) but you can read about it here!
The book mentions a few more data structures; namely, B trees, B+ trees, and splay trees. We won’t cover those here, but feel free to check out chapters 4 and 12 of the book for B trees and splay tree implementations, and read the original splay tree paper here! Be warned, it’s a hefty paper.
That’s all folks! Good luck on midterm 2, and next time we’ll cover heaps and priority queues (with a special appearance from another BST type).