Java's Data Structures

20 April 2016
coding
This week, I will have a technical interview for a Software Engineering internship. To solve coding problems it is essential to remember the most frequently used data structures. It is even better to know how they are implemented. So, in order to structure my thoughts, I will cover the major data structures and their implementation in Java 8. Before we start, let us take a look at the Collection<E> interface of Java. All following classes implement this interface. However, they differ in the runtime of executing certain operations. In order to compare the data structures, we will take a look at their complexity of the following methods of Collection: add(E e) and contains(Object o). Furthermore, we will consider the complexity of add(E e, int index), remove(int index) and get(int index), given that these are available.

Array / ArrayList

When to use it
Given that we need a list data structure, the only other option is LinkedList. The main benefit of ArrayList is its constant access time. get(int index) runs in \(\mathcal{O}(1)\). Even add(int index) runs in amortized constant runtime. As we will see below, LinkedList has only a few advantages over ArrayList. So in most cases, an ArrayList is the better choice. Nevertheless, if we know that the list will never exceed a certain size, we can use a simple array and avoid suboptimal resizing.
Implementation
Arrays are implemented directly in Java, while I see ArrayList as "wrapper" around an array to extend its functionality in order to implement the Collection interface. An ArrayList always operates on a single array. On creation, the default ArrayList holds an array with 10 empty slots. When the array is full and a new element is being added, all elements are copied into a new array with 1.5 times the capacity.
Notes
  • It's always best to set the object's initial capacity to the largest capacity that your program will need.
  • The class Vector is the synchronized and thus thread safe version of ArrayList. It also differs in the growth factor.
Runtimes of Operations
add(E e) add(E e, int index) contains(Object o) get(int index) remove(int index)
$$\mathcal{O}(1)^*$$ $$\mathcal{O}(n)$$ $$\mathcal{O}(n)$$ $$\mathcal{O}(1)$$ $$\mathcal{O}(n)$$
\(*\) means an amortized runtime.

LinkedList

When to use it
In contrast to ArrayList, accessing elements by index is expensive for a LinkedList. In exchange, adding and removing elements at the front and end of the list runs in \(\mathcal{O}(1)\). Therefore, the LinkedList is suitable for queues, for example during Breadth First Search. Adding and removing elements while iterating through the list with an ListIterator is also \(\mathcal{O}(1)\).
Implementation
LinkedList is an implementation of a doubly-linked list.
Runtimes of Operations
add(E e) add(E e, int index) contains(Object o) get(int index) remove(int index)
$$\mathcal{O}(1)$$ $$\mathcal{O}(n)$$ $$\mathcal{O}(n)$$ $$\mathcal{O}(n)$$ $$\mathcal{O}(n)$$

HashMap

The basic purpose of maps is to provide a dictionary-like data structure consisting of key-value pairs. One key can only have one value. The three implementations in Java 8 mainly differ in the way, the inserted elements are ordered. A HashMap provides no ordering of the elements, but ensures fast access to the value of a key.
When to use it
If the order of the key-value pairs in the HashMap is not important and we want fast access and storing times, the HashMap is the optimal choice.
Implementation
The core of a HashMap is an array of buckets. By default this array has a capacity of 16. When a key-value pair is inserted, the key is hashed to an integer smaller than the capacity of the array. The pair is then saved in the array at the index given by the hash function. If two different keys get hashed to the same index, a collision occurs. Therefore, we store a singly-linked list of key-value pairs at each slot in the array. Inserted key-value pairs are appended to the list at the index given by the hashed key. When we retrieve the value to a key, we iterate through the list until we find the exact same key. Once the size of the HashMap exceeds a certain load factor (0.75 by default), the array's capacity doubles.
Runtimes of Operations
put(K k, V v) containsKey(K k) get(K k) remove(Object object)
$$\mathcal{O}(1)^*$$ $$\mathcal{O}(1)^*$$ $$\mathcal{O}(1)^*$$ $$\mathcal{O}(1)^*$$

LinkedHashMap

The LinkedHashMap is a mixture between a HashMap and a doubly-linked list storing the Entry objects in their insertion order. The runtimes all are \(\mathcal{O}(1)^*\), just like for the HashMap. However, we need more space and time for updating the linked list. Therefore, we should only use a LinkedHashMap, if we need a predictably iteration order given by the insertion time.

TreeMap

When to use it
When we need natural ordering of the keys.
Implementation
A TreeMap does not use hashing to organize its key-value pairs, but a Red-Black tree. This tree is sorted according to the natural ordering of its keys. Each entry of the tree stores the key-value pair. The Red-Black tree data structure causes \(log(n)\) time for the four major operations.
Runtimes of Operations
put(K k, V v) containsKey(K k) get(K k) remove(Object object)
$$\mathcal{O}(log(n))^*$$ $$\mathcal{O}(log(n))^*$$ $$\mathcal{O}(log(n))^*$$ $$\mathcal{O}(log(n))^*$$

HashSet / TreeSet

A set simply stores elements without caring about their order. It provides three major methods: add(E e), contains(E e) and remove(E e). In Java, sets are implemented by storing keys in a map and using new Object() as value. A HashSet uses a HashMap and a TreeSet uses a TreeMap, giving the possibility to iterate the objects in their natural order. Therefore the runtimes of the major operations are equal to the runtimes of the corresponding map.

PriorityQueue

When to use it
A PriorityQueue is used, when we need to sort a queue not by the insertion order but by the priority of the elements, for example patients in a hospital. By giving the PriorityQueue a comparator or letting E implement Comparable, we can specify the priority of each element. The major operations are add(E e), peek(), poll() and remove(E e). A famous application of the PriorityQueue is Dijkstra's algorithm.
Implementation
The PriorityQueue uses a heap to store its elements. However, the implementation in Java assumes that the priority of added elements does not change. Therefore, to perform the decreaseKey operation, for example in Dijkstra's algorithm, we need to delete and reinsert the element with the new priority.
Runtimes of Operations
add(E e) poll() remove(Object object)
$$\mathcal{O}(1)^*$$ $$\mathcal{O}(log(n))^*$$ $$\mathcal{O}(log(n))^*$$

Links

Comments