COMP6700: Lectures Block 4

Alexei B Khorev

April 2013

Topics of the Block 4

In this part we shall consider basic algorithms and data structures (DS), including their implementation in Java Collections Framework — JCF:

  1. Algorithms

  2. Data types (interfaces) and data structures (implementations)

  3. JCF: Java’s Collections Framework

  4. Collections interfaces

  5. Implementations of Collections interfaces

  6. Iterator and collection traversal

  7. Map’s, hash functions and hash tables

  8. Trees as a data structure, Binary Search Trees

  9. Comparable objects

  10. Generic algorithms

  11. Copies and Views

Algorithms: Top-Down Design (Again)

Algorithms occupy the intermediate position between a solution and a method, both in terms of generality and detail. The origin — mathematical (algebraic), the essence — transformation of data (input → output) in a finite number of steps.

Writing a computer code often involves solving algorithmic problems. Some of these techniques known as the top-down design (remember Block 1):

  1. Clearly state the problem that you are trying to solve
  2. Define the inputs required by the program and the outputs to be produced
  3. Decompose the program into classes and their associated methods
  4. Design the algorithm that you intend to implement for each method

    • Repeat every step for devising the solution used at the higher level (step-wise refinement)
  5. Turn the algorithm into pseudo-code (easily translatable into Java statements)
  6. Test the resulting program

The top-down design is often used together with the bottom-up design — translating an existing solution (like a mathematical solution into the code and integration separate parts into one working code). The bottom-up design may also involve the data structure selection (or construction); this is called the data-driven design.

Algorithms: generality

A good algorithm always has a virtue of generality: it can be used on a family of data structure with specific characteristics reduced to minimum. An example — sorting an array of N floats:

sorting

This problem is a particular case of a more general sorting problem: Given an array of N > 0 objects, return an array which contains these N objects in ascending (descending) order.

It is assumed that objects standing for array elements can be compared, using unspecified criterion (in Java, it means that the object class implements the Comparable interface). When implemented with such generality, the algorithm is said to be generic.

Computer algorithms are characterised by:

  1. a large number of repetitive operations: large size array, looping constructs
  2. by a particular data model in case of sorting and searching (“data model” here means type of operations which can be performed of the DS)
  3. correctness and efficiency, where correctness means that a right result is obtained for all possible problem instances (size of the data structure, state of the input — sorted, unsorted, random, nearly-sorted etc), while efficiency means how the time and computer resources needed to successfully carry out the algorithm application scale with the problem size N (number of elements in the DS).

Algorithms: Complexity

To quantify the algorithm efficiency, one has to establish how T(N) — the number atomic operations (they do not depend on N), needed to perform the algorithm — scales with the size of the problem N. The time of execution depends on hardware platform and state of the execution environment; it is not an invariant measure. The computer science uses the big-\(\mathcal{O}\) notations (pronounced “big-omicron” by eggheads, “big-o” by pub frequenters) to characterise the efficiency (complexity) of an algorithm. It tells what is the leading term of the scaling dependency:

\[\mbox{if}\; \lim_{N\to\infty} T(N) = \mbox{const}\;\,\cdot f\,(N),\; \mbox{then} T(N) = \mathcal{O}\,(f\,(N)),\]

when N becomes large enough.

Complexity classes

The function f(N) usually has one of the following types of behaviour:

How complexity matters

alg_efficiency

Ο, Ω and Θ

Some rigour

To proper understand the \(\mathcal{O}\)–notations one should remember that \(\mathcal{O}(f(N))\) denotes not a single function, but a class of functions with the designated asymptotic behaviour. If we say that a function g(N) has an asymptotic of \(\mathcal{O}(f(N))\), \(g(N) = \mathcal{O}(f(N))\), it means not equality, but rather the inclusion of g(N) into the same class.

The \(g(n) = \mathcal{O}(f(N))\) has meaning of the upper-bound, ie that its magnitude is at most as big as const×f(N) (when N is large). At most! but it could also be (substantially) smaller. Hence, an algorithm should not be dismissed outright if it is characterised by the efficiency of, say, \(\mathcal{O}(N^{2})\) (like in sorting), just because its actual performance can be better. Opposite to upper-bound is an lower-bound estimate, ie that the function magnitude is at least as large as const×f(N) (N is large). This asymptotic behaviour (different set) is notated with another symbol, \(Omega\)\(g(N) = \Omega(f\,(N))\). The \(\Omega\)-properties of algorithms are harder to establish, but they are more valuable.

Even more valuable (and still harder to come by) are asymptotic with both lower- and upper bounds: that a function g(N) is such that for sufficiently large N, there exist two constants, C1 and C2 (C1 > C2), that \(C_{2}\cdot f(N) < g(N) < C_{2}\cdot F(N)\). Such asymptotic sets are notated with the symbol \(\Theta :\;  g(N) = \Theta(f(N))\).

“Greek” Asymptotes

asymptotics_omicron asymptotics_omega asymptotics_theta
bounded from above \(g(n)=\mathcal{O}(f(n))\) bounded from below \(g(n)=\Omega(f(n))\) bounded in between \(g(n)=\Theta(f(n))\)

One can proof that (something which is intuitively obvious):

\[g(n)=\mathcal{O}(f(n))\;\; and\;\; g(n)=\Omega(f(n)) \iff g(n)=\Theta(f(n))\]

The reason (not because “programmers are stupid”) of why the \(\mathcal{O}\)-notations are used predominantly (when \(\Omega\) would be more rigorous) is explained in a classic Donald Knuth’s paper “Big Omicron and Big Omega and Big Theta”, in SIGACT NEWS, Apr.-June 1967, pp.18–24.

Linear Search

Problem: Given unsorted array of N elements (doubles), find the smallest element in a subarray of this array.

Algorithm: (trivial, correctness follows from the facts that each element of the array will undergo a comparison with the tentative result and comparisons of integers and real numbers are transitive, ie from a> b and b > c follows a > c).

  1. Assume tentatively that the first number in the array is the smallest element
  2. Compare this number with every element in succession

    • If one of these subsequent numbers is smaller, replace the tentative choice by this element
  3. When all of the index locations have been exhausted, return the tentative result

/** @param data an array of doubles
  * @return value of smallest element  */
public static double smallest(double[] data, int lo, int hi) {
    assert (hi > lo);
    int iSmallest = lo; //Initial value
    for (int i=lo+1; i<=hi; i++)
        if (data[i] < data[iSmallest]) iSmallest = i;
        return data[iSmallest];
}

Complexity of this algorithm is \(\mathcal{O}(N)\), evidently, because we have to go from the first element to the very last one in search of the smallest (unsorted nature of the array is crucial).

SelectionSort

  1. In a loop which iterates a counter, i, over every index of an array in ascending order
  2. At every step, do the following

    • Compute the smallestInRange (the index of the smallest element) between the present array index, i, and the maximum array index. Call this index iSmallest
    • Interchange the array elements at i and iSmallest
/** @param data an array of doubles
  * smallestInRange() returns an index
  * of the smallest value */
public static void selectionSort(double[] data) {
   for (int i=0; i<data.length-1; i++) {
      int iSmallest = SelectionSortTest.
            smallestInRange(data,i,data.length-1);
      if (data[i] < data[iSmallest]) {
         double temp = data[i];
         data[i] = data[iSmallest];
         data[iSmallest] = temp; }
   }
}

As for-loop steps are executed, the array elements get rearranged

  • step 1: 8.0 5.0 3.0 7.0

  • step 2: 3.0 5.0 8.0 7.0

  • step 3: 3.0 5.0 8.0 7.0

  • step 4: 3.0 5.0 7.0 8.0

Complexity: number of iterations, ~ N, multiplied by the number of smallestInRange() calls, also ~ N, hence \(\mathcal{O}(N^{2})\). More accurate count gives the number N2/2, which is still \(\mathcal{O}(N^{2})\)! For an almost sorted array, the complexity becomes \(\mathcal{O}(N)\) as the majority of swaps are not needed.

Algorithm correctness

Correctness and efficiency of algorithm implementations must be tested both theoretically (by examining the code), and experimentally (by running comprehensive test harnesses with all possible types of input data, and measuring the time-size dependence). The test harnesses (special testing programs — not necessarily written in Java — and input/output data) represent assets of the development as much as the code itself. Correctness is the primary goal of testing, but the efficiency can also be an important (eg, to check a theoretical conclusion regarding the algorithm complexity).

To check the correctness of our implementation of SelectionSort do the following (example SelectionSortTest.java):

  1. Create an array and initialise it with N numbers (include cases 0 and 1!)
  2. Print the initial array, or run a routine to check that if it’s ordered
  3. Run SelectionSort
  4. Print the sorted array, or run a routine to check that if it’s ordered
  5. Repeat the above for many different sizes and initialisations:

    • sorted arrays
    • unsorted arrays
    • almost sorted, reversed sorted
    • arrays with randomly chosen elements
    • boundary cases — 0 or 1 element long arrays, very long arrays

Algorithm efficiency

To check the efficiency one has to:

  1. Work with large N (the upper limit is determined by the memory constraints)
  2. Perform time measurements to establish how the full time taken by the sorting method scales with N
  3. Consider “randomised” arrays (initialised by using a pseudo-random number generator) and limiting cases (fully and nearly sorted arrays, all and almost all equal elements etc). Often, the efficiency changes (deteriorates) in the limiting cases, eg, QuickSort (discussed later) for sorted array (\(N\cdot\log_{2}N\to N^{2}\)), which may force to reconsider the algorithm implementation.

Time measurements

  1. OK: use the Unix system utility command time -p when running the Java program (the full path is needed for some shells):

    /usr/bin/time -p java SelectionSortTest 

    Of three times printed, “user time” gives the time in seconds taken by the program including “paging” interrupts.

  2. Better: to call the Date class from java.util package, and to dump out times in milliseconds. The overhead here is the creation and call to Date objects. Also, in a time-sharing system, there is no way to find the time for your particular process alone (In both pervious approaches, the overhead effect flattens out for large N).
  3. Best: call the system clock via System.currentTimeMillis(), or create a utility class similar to StopWatch.java to include such calls for calculating elapsed times.

Examples SelectionSortWithTiming.java, ArrayUtil.java and StopWatch.java

BubbleSort and GapSort

These are two elegant algorithms for solving the sorting problem. Neither is as “naïvely intuitive” as the selection sort or the insertion sort in the next section. The gap sort is an amazingly fast variant of the bubble sort The essence of the BubbleSort algorithm:

  1. Run an inner loop which repeatedly steps through every element in the array from left to right comparing adjacent elements, and if the left element is larger than the right element, then the elements are swapped. The (new) right element then becomes the left element of the next comparison and so on.
  2. After one inner loop, the largest element is “floated” to the right hand side. The order of the other elements may have changed.
  3. After the second inner loop, the second largest element has floated right to be the data.length-2 element. The algorithm terminates when it makes one pass with no interchanges.

The gist of this (and other algorithms which try to improve the SelectionSort efficiency) is to minimise repetitive swaps. Yet, BubbleSort is still \(\mathcal{O}(N^{2})\) (demonstrate BubbleSortTest.java).

The analogous GapSort improves the efficiency, but the reason for this isn’t well understood. The algorithm achieves an intermediate performance between a standard \(\mathcal{O}(N^{2})\) and \(\mathcal{O}(N\cdot\log_{2}N)\) (the actual asymptotic fluctuates). GapSort compares elements which are separated by a “gap” number of elements (rather than one element as in the bubble sort). The initial gap is taken pretty large — approximately 3/4 of the size of the array. It is reduced in size every outer iteration by a “magic” factor (∼1.3). The intuition is that this gapping considerably reduces the number of repetitive swaps considerably (demo with GapSortTest.java).

InsertionSort

A simplified algorithm for the insertion sort (inspired by card playing):

  1. In the loop from left to right (index i = 0..(length-1)), set a temporary variable tmp = data[i]
  2. As we go along, the subarray on the left grows from 0 to lenght-1 and always remains sorted
  3. Insert tmp in to the appropriate place k in the left subarray, k = 0..(i - 1) in such a way that it remains sorted
  4. Shift array items to the right when necessary. The left subarray is enlarged by 1.

The insertion can be done naively, or via binary search (because the array on left, in which the insertion is done, is sorted). The naïve version is demonstrated in InsertionSortTest.java.

The efficiency of InsertionSort is still \(\mathcal{O}(N^{2})\), for generic (random) inputs. But for almost sorted inputs, it becomes almost \(\mathcal{O}(N)\). This property of InsertionSort is valuable, since the most efficient famous QuickSort (see below) suffers deterioration in such cases. By combining the two algorithms, an improved QuickSort can be implemented with much more robust performance.

Demo with SortingApplet.

A useful site with major sorting algorithms (animated like the above) including both time and space complexity analyses is Sorting Algorithm Animations.

Recursive methods

Methods which call themselves

A basic problem of calculating the factorial of N:

\[N\,! = N\cdot (N-1)\cdot (N-2)\cdot\ldots\cdot 2\cdot 1 = N\cdot (N-1)!\]

One implementation is “old fashioned” iterative:

public static int factorial(int n) {
   int r = 1;
   for (int i=1; i<=n; i++) 
      r=r*i;
   return r;
}

The recursive implementation:

public static int factorial(int n) {
   assert (n >= 0) : "The factorial is tricky if the argument is negative!"
   if (n == 0) return 1;
   return n * factorial(n-1);
}

Recursive algorithms: QuickSort

Recursion is a very elegant way of thinking about some algorithms and some recursive algorithms are very fast. However it is possible to write very slow recursive algorithms as well and some care is needed. (For example, you need to be aware that there is an overhead in making all of the method calls in the recursive factorial method.)

The most famous of all (and first recursive algorithm) is QuickSort, invented by Tony Hoare in 1962 before the very concept of recursions became known to computer professionals. The essence of QuickSort algorithm is divide-and-concur:

  1. select a pivot — index of an element whose value is between the min and max values of the array
  2. partition the array around the pivot — smaller elements go on the left, greater — to the right
  3. apply the QuickSort recursively to the left and the right subarrays
  4. recursion stops if subarray.length < 3

quicksort_inv

During the algorithm execution (after every partitioning is done) the following invariant must be maintained for an array x[a..b] with the pivot value t = x[m]):

forall i, a <= i < m, x[i] <= x[m] and forall i, m <= i <= b, x[m] < x[i]

QuickSort: The pseudo-code

m = l; t = x[m]; // selecting pivot
for i = [l+1, u] //closed interval
   if x[i] < t     // before swapping, increment m to give the
      swap(++m, i) // left subarray a room for the new element

That’s how the “detailed” partitioning algorithm looks like pictorially (code is QuickSort.java):

  1. during the loop execution

quicksort-1

  1. after the loop terminates

quicksort-2

  1. placing the pivot value in the right place

quicksort-3

  1. calling QuickSort on sub-arrays

quicksort-4

How Quicksort Shines

QuckSort is fast for generic (random-like) inputs, its efficiency is \(\mathcal{O}(N\cdot\log_{2}N)\).

quicksort_reg

How Quicksort Flounders

Yet, when an input becomes (almost) sorted, it degenerates to the “banal” \(\mathcal{O}(N^{2})\). Why?

  • Because the original array is (almost) sorted, partitioning results into one of the two subarrays having only one element (may be a few)
  • So the recursion goes, chipping off one element at a time
  • The number of steps at every level is still N, but the number of levels is also N
  • Hence the “snail”-like result: \(N\cdot\,\ell = N\cdot N = N^{2}\) (“disaster!”)

quicksort_deg

The actual analysis of a relatively complex algorithm such as QuickSort is rather complicated. Its weak points must be taken care of to prevent performance problems. That’s why an industrial strength implementation (found in good libraries) is more than 200 LOC long (instead of puny 20–30 in an “educational” case like ours.)

Robust QuickSort

Taken as-is, the above algorithm can be implemented into an efficient sorting routine with performance \(\mathcal{O}(N\,\log_{2} N)\) for random inputs. For special cases — identical elements arrays, (almost) sorted arrays etc — the performance will be worse, \(\mathcal{O}(N^{2})\). Code tuning is needed:

  • select the pivot not from the first element of the input, but from the middle, or randomly — will improve sorting of already (nearly) sorted inputs
  • do partitioning with two inner loops going from the opposite ends towards each other — will improve the performance for “all-equal” inputs

two_way_quicksort

  • Two-way partitioning in quicksort

The main weakness of QuickSort is sorting a large number of short arrays at the end of recursion chain. InsertionSort, on the other hand, is very effective if the input is an “array of subarrays” with each subarray unsorted, but all values in it are greater than the largest value in the preceding subarray, and not greater than the smallest value in the following subarray. Such “array of subarrays” is what quicksort() produces if stopped “prematurely”, when subarray.length >= cutoff > 2. Completing the sorting with InsertionSort produces a robust routine:

QuickSort(data, 0, data.length - 1, cutoff);                        
InsertionSort(data);                                               

The hybrid QuickSort was used in java.util.Array.sort() methods (until JDK 7).

MergeSort

The elegant merging algorithm begins with… well, merging two sorted arrays which preserves the sorted order (the resulting “big” array is also sorted). The merging starts with choosing either the first element of one or the first element of the other array. Then one looks at the next two possible elements and chose one of them. The coding for this is straightforward but a little bit long-winded. From time to time you will be selecting several elements from one of the arrays before you select any from the other (for an implementation see MergeSortTest.java).

public static void mergeSort(
       int[] a, int from , int to) {
   if (from == to) return;
   int mid = (from + to)/2;
   mergeSort(a, from, mid);
   mergeSort(a, mid +1, to);
   merge(a, from, mid, to);
}

merge_sort

The idea of the merge sort is that you divide the original array into two parts, sort them and then merge them. If you use recursion, then you can effectively keep dividing until the initial arrays have size 1 so that they are trivially sorted and all you need to do is merge them. Because it uses the same principle “divide-and-concur”, the expected performance is the same as for QuickSort \(\mathcal{O}(N\cdot\log_{2} N)\). A hybrid of MergeSort and InsertionSort — so called TimSort — is used in Array.sort() since JDK 7.

BinarySearch

This is another quintessential computational task — searching in a sorted database, the simplest case of which is a one-dimensional array. The algorithm seems almost trivial:

  1. find the middle index mid, compare value of its element with the target
  2. if the target is smaller than x[mid], apply the BinarySearch to the left subarray
  3. if the target == x[mid], Bingo!
  4. if the target is greater than x[mid], apply the BinarySearch to the right subarray
  • find position of target a = 3.7 in the array \(\to\)
  • if found return the value of target index: (2)
  • if not found return the special value -1

binary_search

Despite such almost trivial principle, it’s notoriously hard to achieve a correct implementation. D. Knuth, in the Vol. 3, reports that while the first BinarySearch algorithm was published in 1946, the first correct implementation was only achieved in 1962. The helpful (for getting it right) concept is a loop invariant —– see next slide and Ch.4 of “Programming Pearls” by Jon Bentley.

BinarySearch is a \(\mathcal{O}(\log_{2}N)\) algorithm. It’s often used “inside” other algorithms, eg, for finding roots of nonlinear equations via the Newton’s method.

Rôle of invariants

Implementing an algorithm is (pseudo-)code is often a more complex task than it may seem, even for simplest cases, like a binary search algorithm. Programs are like mathematical formulae. The latter can be (very) complex. The validity is ensured by application of mathematical axioms and rules of logic at every step of formula manipulation.

The algorithm correctness is ensured in a similar way with the addition of an invariant which is a formal expression of the algorithm purpose.

Algorithm’s invariant is a formal assertion about the program state which must be true at every stage of program execution. The invariant assertion involves input, program variables and output. A (single-threaded) program usually contains:

A program A built from these elements, \(A = A_{1}A_{2}\ldots A_{i}\ldots A_{n}\), will be correct if each of them preserves the invariant, and the program itself terminates. (Termination of loops and function calls should be proved independently.)

Invariants in BinarySearch

Correctness of a program (including an algorithm) is established by demonstrating that the full computational chain \(A = A_{1}A_{2}\ldots A_{i}\ldots A_{n}\) (provided it terminates) preserves the invariant. Deduction \(\{P_{i}\}A_{i}\{Q_{i}\}\) is performed using logic and “obvious” mathematics (Hoare’s logic and Dijkstra’s Weakest Precondition).

Choice of the invariant for a particular algorithm is more like an art than science. In the case of binary search in a linear array-like DS, the invariant can be formulated as follows (as taken from Jon Bentley’s book “Programming Pearls”):

If the algorithm preserves the invariant (and if it terminates), we can be (“almost”) assured that the result is correct. The algorithm analysis (“walk-through”) must establish that the invariant is never broken (including the proof that the algorithm halts at some point) at each execution step.

Breaking by blindness of obvious

Again

Here is the BinarySearch pseudo-code with the invariant:

01 { mustbe(0, n-1) }
02 l = 0; u = n-1;
03 { mustbe(l, u) }
04 loop
05   { mustbe(l, u) }
06   if l > u
07     { l > u && mustbe(l, u) }
08     { t is not in the array }
09     p = -1; break
10   { mustbe(l, u) && l <= u }
11   m = (l + u) / 2
12   { mustbe(l, u) &&
                    l <= m <= u }
13    case
14       x[m] < t:
15          { mustbe(l, u) && cantbe(0, m) }
16          { mustbe(m+1, u) }
17          l = m+1
18          { mustbe(l, u) }
19       x[m] == t:
20          { x[m] == t }
21          p = m; break
22       x[m] > t:
23          { mustbe(l, u) && cantbe(m, n) }
24          { mustbe(l, m-1) }
25          u = m-1
26          { mustbe(l, u) }
27    { mustbe(l, u) }

Latest Bug in BinarySearch

Bugs can stay hidden for a long time until execution environment changes and they get “activated”. This happened to Java’s implementation of the binary search as it was used in java.util.Arrays binarySearch() methods:

1:     public static int binarySearch(int[] a, int key) { 
2:         int low = 0;  int high = a.length - 1; 
3:         while (low <= high) { 
4:             int mid = (low + high) / 2; // Where the problem lurks!
5:             int midVal = a[mid]; 
6:             if (midVal < key) 
7:                low = mid + 1;
8:            else if (midVal > key) 
9:                high = mid - 1;
10:            else 
11:                return mid; // key found 
12:         } 
13:         return -(low + 1);  // key not found. 
14:     } 

The bug struck in 2006, when the size of the search array became too large and some int values exceeded the allowed range. The solution is to replace the line 4 on:

int mid = low + ((high - low) / 2);   OR   int mid = (low + high) >>> 1;

Recursions vs Loops

When one is better than another

Other algorithms

Study of algorithm is at the core of Computer Science. “Simple fundamental algorithms is the ultimate portable software” (R. Sedgewick, J. Bentley).

Algorithm Intensive CS Fields

  1. Sorting and searching on various data structures
  2. Graphs and Networks (include data mining)
  3. Optimisation (incl. parse trees in compiling)
  4. Number theory including encryption (factorisation, elliptic curve arithmetic)
  5. Computational geometry and computer graphics
  6. Numerical computations (matrix multiplication and spectral analysis, Newton’s method etc)

Few references

Data types and data structures

Introduction

To group multiple elements into a single unit, known as a collection or container, to store and retrieve the elements, and manipulate this group as a whole through a small set of operations, defined in its interface, is a common computational problem. The most basic examples of such collection types are arrays. Classes like Vector, and Hashtable (we’ve met) are separate (raw) collection types. With generics, collection types are vastly expanded and organised into the Java Collections Framework, endowed with the architecture for representing and manipulating its constituents. JCF tries to emulate the famous C++ STL. The framework is hosted in java.util package. It has the following components:

Collection Interfaces

Collection Metaphors

Kent Beck (a well known software expert, the co-author of JUnit, and one of the Extreme Programming’s pioneers) remarks, in his recent book “Implementation Patterns”, that collections should be regarded as the first-class language construct (alongside with variables), because they are multi-valued variables, and because their cardinality (“many”, meaning two and more), is, in fact, complimentary to other two fundamental mathematical cardinalities: zero (no field) and one (a usual field variable). The collection concept combines in itself three metaphors: it’s an object, which can hold multiple values at the same time, and has properties of a set.

Collections which you can encounter in a program can be classified according to their interface:

Collections

To put it simply — Collection objects store other objects (eg, ArrayList). We shall:

  1. re-enforce the OO idea of a separation between the interface to a collection and the implementation (DS) of that collection

  2. introduce (or review, in the case of array) new

    Collection Data Types Data Structures
    • lists
    • queues
    • stacks
    • maps (dictionaries)
    • arrays
    • linked lists
    • hash tables
    • binary trees
  3. describe how collections are defined in Java Collections Framework

Collection interface

Before considering other implementations of the BookList.java interface, let us describe its basic operations. These basic operations (alongside with the bulk operations) form the Collection<E> interface. The BookList interface is therefore a particular example of Collection<Book>. The linked list implementation of three of its methods are illustrated:

int size() boolean isEmpty() boolean contains(Book b)
String toString() Book get(int p) Iterator<Book> iterator()
boolean add(Book b) boolean remove(Book b) void insert(Book nB,int p)

add

remove

insert

All implementations of iterator() returns the implementation of Iterator given by the class BookListIterator.java.

Implementing with own Data Structures

The example of BookList.java defines an interface for the Library program for filing a collection of Book objects, processing and printing them. Its two implementations use different DS for internal storing collection elements, but their usage is exactly the same (polymorphism).

  1. The BookListWithLL.java implementation uses a linked list. It is an example of a dynamic DS (size change during runtime). Each node contains a Book object and a link to the next node in the list. A type like this is sometimes called self-referential because it contains a handle to an object of the same type. There is no limit on the number of elements in a list and it’s easy to break and reconnect links: The cost is in traversing the list to find the place to break and reconnect links.
  2. The second implementation BookListWithArray.java uses an array DS, which can provide random access to a its elements. The details are much simpler.

linked_list

The Collections API already provides all required DS, but we implemented the BookList interface from scratch. Versions which uses the API DS are discussed next.

Implementations of Collection interface via Inheritance

When a standard library offers an effective implementation of a collection type, it is more prudent to use it for implementing your own interface. We discuss two ways to use ArrayList collection class for implementation of our BookList interface. Two approaches are possible.

One approach is very convenient and allows a minimum of work because it uses

Inheritance

Make the class to implement BookList and inherit ArrayList, thus making the implementation class of the ArrayList type. In software design this is called “Is-A” relationship (between the classes) — by the virtue of inheritance BookListIsALjava is ArrayList. The trade-offs:

Implementations of Collection interface via Composition

Another approach, however, is often a better choice — it’s based on

Composition

Compose the class which implements BookList with a field of the ArrayList object. In software design this is called “Has-A” relationship — by the virtue of composition (aka containment), BookListHasAL.java has an instance of the ArrayList class among its fields. The trade-offs:

Stacks

Queues and stacks are very popular interfaces. Both are widely used in systems level programming (for managing memory and processes) as well as in other applications.

A Stack<E> is a “last in first out” (LIFO) collection type which can be implemented by adding (with a push method) and extracting (with a pop method) from the head of a list. The interface:

  • int size();
  • boolean isEmpty();
  • int search(E el);// similar to indexOf()
  • E top(E el); // JCF calls it peek()
  • void push(E newEl);
  • void pop();
  • String toString();

stack

Stack class in Java Collections Framework has slightly different interface. It extends Vector class to implement with operations which allow Vector to be treated as a stack. Java regards Stack (as well as Vector) as a legacy class (on the way out), and suggests to use ArrayList for implementing the Stack interface in applications. (What could be the reason for such policy?)

Queues

A Queue<E> is a “first in first out” (FIFO) type which can be implemented by adding objects (with an enqueue method) to the head of a list and by extracting them (with a dequeue method) from its tail. The interface:

  • int size();
  • boolean isEmpty();
  • E first(E element);
  • E last(E element);
  • void enqueue(E newEl);
  • void dequeue();
  • String toString();

queue

In JCF, Queue interface has slightly different names for the above operations, plus additional operations which allow to implement this interface into a stack, as well as to have a specific ordering (defined by the constructor with Comparator parameter), for instance, in the PriorityQueue class.

Lists

The List interface represents an ordered collection with controls where each new element is inserted, what element the user can access and search (any, given the index). Lists allow duplicate elements (unlike Sets) Two major implementations are the following classes (all generic):

deque

Iterator

List classes all implement (or inherit) the Iterator interface which defines three operations to step through (iterate) the collection contents one element at a time. The methods:

If your own list implements the Iterable or Iterator interfaces, the iterator() method (form Iterable, it returns an Iterator object), or the methods (listed above) of the Iterator can be used for the list traversal. The implementation of Iterable allows to traverse with for-each loop. However, this approach (see next slide) is not very safe, when one has to filter (remove by a certain criterion) the collection elements, or to traverse multiple collections simultaneously. Examples — BookListWithArray.java and BookListWithLL.java classes (each implements “Iter” interfaces differently).

The alternative way to traverse a collection involves the Iterator, which is the only safe way to modify the collection along the way. (The Iterator-based “plain” for-loop is also the right way to traverse more than one collections simultaneously). The correctly implemented iterator() method, which also requires implementation of Iterator<E> interface, must guarantee that when the collection gets modified the iterator will properly move through the subsequent elements of the collection. The slide below “Traversing a Collection with Iterator” demonstrates the two traversals, including failure of one of them.

Traversing a Collection with for-each loop

Java 1.5 “new” language feature

To cut on the amount of code when working with Collection types, the Java 1.5 (Tiger) introduced the enhanced for-loop (borrowed from Perl):

for( type var : collection ) statement block    

Two examples:

for( String arg : args ) { // args is an String[] type
   System.out.println( arg );
}
for (Book book : books) { // for BookListIsAL books implemented with ArrayList
   System.out.println( book );
}

A “for-each” loop can be used to iterate through most of the Java collection classes (arrays, ArrayLists, HashSets, etc) — anything that implements Collection interface. The example is in the Library.java, the client program of the BookList.java types.

Once again, remember: The collection type which you define yourself must implement Iterable interface to be amenable for for-each traversal.

The for-each loop can also be used on arrays, but it is not suitable if the traversal process uses the index explicitly, or one has to traverse in the opposite direction.

Traversing a Collection with Iterator

Safe removal while traversing

When we need to modify a collection (eg, filter out its elements based on some criterion), the safest way is to implement Iterable interface and use the standard for-loop. This can be subtle. The standard Collection classes (ArrayList, LinkedList) include a proper implementation of Iterator which guarantees a safe co-modification during a traversal:

BookListIsAL books = new BookListIsAL();
books.add(new Book("Java Software Solutions", false));
... ...
for (Iterator iter = books.iterator(); iter.hasNext(); ) {
    Book nextBook = iter.next();
    if (!nextBook.isChildren()) 
        iter.remove();// call "remove" only through Iterator reference!
}

The remove() method may be called only once per call to next() and throws an exception if this rule is violated. An attempt to achieve the same effect with the would be identical for-each loop — and direct call to remove() — will result in a run-time exception:

for (Book book: books) {                        // for-each hides the iterator,
    if (book.isChildren()) books.remove(book);  // and one cannot call remove()
}
... ... ...
Exception in thread "main" java.util.ConcurrentModificationException

Set and SortedSet

The extensions of Collection type which disallow duplicates (identical elements): a repeated invocation of add(elem) with the same element elem (or such elem1, that elem.equals(elem1) returns true) returns false, and the collection remains unchanged. Set types, therefore, are sets in the mathematical sense (some computer scientists call them “bags”).

The elements of a set are unordered. The subtype SortedSet (extension of the Set interface) represents a collection, whose elements can be compared — they implement Comparable interface. The ordering allows to introduce additional methods:

  • comparator() — returns the comparator associated with this sorted set (null if the natural ordering is used)
  • first(), last() — smallest and largest elements
  • subset(eMin,eMax), headSet(eMax), tailSet(eMin) — subsets of elements in corresponding ranges

set

Set is implemented by the HashSet class with a hash table as the DS. Content modification (add, set) and testing (contains) are \(\mathcal{O}(1)\) operations. SortedSet is implemented by the TreeSet class with a binary tree as the DS. If the implementation can maintain the balanced tree, the search and modify operations are \(\mathcal{O}(\log_{2}N)\). Example — SetTest.java.

Map and SortedMap

Unlike all previous types from JCF, the Map family of types does not extends the Collection interface: their contract is different — to represent not a collection of elements, but a correspondence between two collections. So, a map contains key/value pairs, with no duplicate keys (the keys form a set) and at most one value for each key. Map is a model of a mathematical abstraction called function. The interface Map<K,V> operations are:

  • V put(K key, V value)
  • V get(K key) and V remove(K key)
  • boolean containsKey(K key)
  • boolean containsValue(V value)
  • public Set keySet() — returns a view of the set of keys
  • public Collection values() — returns a view of the collection of stored values

map

The SortedMap extension requires the set of keys be sorted. Methods like firstKey() and lastKey() are added. The Map interface has two major implementation classes (similar to Set) — HashMap which uses a hash table DS for the implementation (with similar \(\mathcal{O}(1)\) performance for put/get operations), and TreeMap which implements SortedMap in the similar to TreeSet way (with \(\mathcal{O}(\log_{2}N)\) efficiency). Example — MapTest.java

Bulk operations of Collection interface

The basic Collection operations were listed above (slide Collection Interface). They allow to examine and manipulate the collection element by element. The bulk operations allow to manipulate the whole part of the collection in one go:

Another category of operations are array operations:

How to copy arrays properly

Arrays are objects, therefore when one array is assigned to another, the two identifiers point to the same memory location (you remember this from Block-1, dont’t you?). How to copy elements of an array into a different array?

int[] smallPrimes = {2,3,5,7,11,12};
int[] luckyNumbers = smallPrimes;
luckyNumbers[5] = 12;
System.out.println(smallPrimes[5]);// Prints 12

To achieve element-by-element copying, one has to use the System.arraycopy()method: To copy count elements from source array starting with index from to target array beginning with index to, do like this:

System.arraycopy(source,from,target,to,count);

arraycopy

int[] smallPrimes = {2,3,5,7,11,13}; 
int[] luckyNumbers = {1001,1002,1003,1004,1005,1006,1007}; 
System.arraycopy(smallPrimes,2,luckyNumbers,3,4); 
for (int i=0; i<luckyNumbers.length; i++) 
    System.out.println(i + ":" + luckyNumbers[i]);

Collections: summary of types

The interfaces of the Collections Framework and some of their implementations:

collections

Implementations

Maximum Performance Data Structures used in the implementation

Implementations
Interfaces Hash table Resizable array Tree Linked list
Set HashSet Makes no sense TreeSet Makes no sense
List

?

ArrayList

?

LinkedList
Deque

?

ArrayDeque

?

LinkedList
Map HashMap Makes no sense TreeMap Makes no sense

Essentially, there are only few prime data structures which can be used for implementation of the above interfaces (their nature is determined by the modern computer architecture, the von Neumann machine architecture):

  1. resizable arrays — random access linear data structure, used in BookListWithArray.java
  2. linked list — sequential access linear data structure, used in BookListWithLL.java
  3. binary tree — recursive data structure with two (or more) self-references
  4. hash table — hybrid data structure

Hashtable: implementing Set and Map

An ordinary array can be thought of as a table or mapping between an integer index value and a data value. Arrays enable a very efficient random access to the data, \(\mathcal{O}(1)\). The performance deteriorates when we want to copy in more than the original array can take — can we retain the high performance in this situation? Also, often one wants to index entries not by integers (eg, by Stringy names) — can we do it with \(\mathcal{O}(1)\) efficiency (instead of resorting to linked lists)?

hash_table

A data structure which allows us to do such things is the hash table (HT) — “the great invention of Computer Science” (Kernighan, Pike). A HT is an array of linked lists plus a hash code function, which maps the data element to the index value of the array.

Normally, different keys will compute to different table indices (if the hashcode is well implemented), but if not — hash collision — then we add a new entry to the bucket list. It is very important to keep the HT balanced and avoid growing bucket lists too long. For well-balanced HT, the performance of the data entry/retrieval operations is \(\mathcal{O}(1)\), otherwise it becomes \(\mathcal{O}(N)\).

Balancing a hash table: hashCode()

To achieve the optimal performance (reduce the number of collisions), one should:

Here is an example of the hashCode() for a simple Item class:

class Item {
   private String description;
   private int partNumber;
      ...
   public int hashCode() {
      // note use of prime number multipliers
      return 13*description.hashCode() + 17*partNumber; 
   }
}   

How to write the hashCode() function

Hash codes are meant to generate array indices which are uniformly and randomly distributed over the size of the array. For this reason they work on an idea of “scrambling up” the values of the data being hashed, so that it is not possible to predict where a particular object will be stored. This is how hashCode() method for the String class might have been implemented:

\[hashCode(s) = \sum\limits_{i=0}^{n-1}\,s_{i}\cdot \mbox{31}^{(n-1-i)},\]

where n = s.length() and \(s_{i}\) is the code of the character s.charAt(i) (Exercise: implement this function as a Java method.) Very similar inputs evaluate to quite different outputs:

String Hash value String Hash value
Hello 69609650 Harry 69496448
hello 99162322 Hacker -2141031506

To write a good hash function for every class whose instances need to be stored in a hash table would be a daunting task.Thankfully, all major Java objects that we might need to use as keys (String, Integer, Double, Date) have good hash code functions. When you override hashCode() for your own type, do it using the principle exemplified on the previous slide (class Item), and only use two- (or more) digit primes (13, 17, 31…) as factors. (To learn more, one has to take a dedicated course in algorithms and/or computer cryptology).

Trees

Trees are recursive data structure with more than one (two for binary trees) self-references. Of all standard DS they are most complex and interesting.

class Tree<E> {

    E stuff;
    Tree<E> left, right;

    public Tree(E e, Tree l, Tree r) {
        this.stuff = e;
        this.left = l;
        this.right = r;
    }
}

tree

The are different types of binary trees (see next slide) which differ by how the elements are arranged and what operations are defined. The most commonly used is a balanced binary search tree. This structure is defined recursively (as it should), and it enables a binary decision (see below) to be made at each level which reduces the search time to an \(\mathcal{O}(\log_{2}N)\) operation (similar to the binary search in a linear ordered array ).

Types of Binary Trees

Multivalent trees are almost always can be reduced to binary ones (albeit with more complex node structure). Of all data structure considered so far, binary trees are the most complex, diverse and interesting (for research). Their value as DS is due to the fact, that the tree traversal is a key element of every operation, and for a well structured (balanced search, or amortised) tree, the traversal only consists in monotone descent, it takes \(\mathcal{O}(h)\) steps (h is the height), and \(h\sim\log_{2}N\).

There are many types of tree DS (all listed below are binary except for B-Trees):

Binary Search Trees

Binary Search Trees (BST) are simplest case. They are used in implementation of SortetSet (TreeSet class) and SortedMap (TreeMap) interfaces.

BST are defined in a simple way:

  • Every value in the left subtree is less than the root
  • Every value in the right subtree is greater or equal than the root
  • The left and right subtrees are binary search trees

BST are the most simple and often used type of binary trees. The example SearchTree.java defines the SearchTree class with the following operations:

ordered_tree

Balanced Trees

It is important for a search tree to be balanced to exact the performance of its operations (listed on the previous slide) of \(\Omega(\log N)\). Obviously, this requires the tree to be everywhere “dense”, when every branch (path from the root to a leaf) has approximately the same value (maximum of which is called height). Such “dense” trees are usually called balanced. Balancing is broken some branches are abnormally long, with length \(\sim N\). If a search tree becomes strongly unbalanced, the efficiency of its operations deteriorates to \(\mathcal{O}(N)\).

Search trees are often build at run time, using the data which are streaming from an outside source (like it’s done in the example SearchTree.java). If the source data is not skewed, the resulting search tree will likely be balanced, bit this cannot be guaranteed. If unbalancing occur, the tree can be adjusted to restore the balance tree performance. This can be quite complex; examples of self-adjusting balanced trees include Red-Black Trees and Splay Trees.

There is a duality between algorithms and data structures. The temporal structure of an algorithm is dual to the “spatial” structure of data. When an algorithm performs in its generic fashion (with its “generic” efficiency, eg, QuickSort running at \(\Omega(N\log N))\)), the structure of recursion calls (the temporal portrait, as it were) looks exactly as a balance search tree — everywhere dense without anomaly of protruding separate branches. This is how the “spatial” structure of a balanced tree look like, and it gives an optimal tree traversal performance \(\mathcal{O}(\log_{2}N)\).

Temporal structure of one resembles structural (“spacial”) feature of another.

Heaps

There are two meaning of the term heap in Computer Science.

  1. A part of computer memory which is unused by the OS and other running programs, and from which each program can borrow. In Java, all explicitly created objects use the heap memory. If you declare a fixed size array with elements like literal strings, integers, enum constants etc, the required memory is allocated in advanced. Otherwise, when a DS changes during the execution (when adding new elements), the memory comes from the heap. When an object is destroyed by the garbage collector, the memory is returned to the heap.

  2. An array data structure a[i], i=0.. in which elements are placed on an imaginary tree in accordance with the following index rules for an element of the index i:

    • Its parent’s index is \(\lfloor (i-1)/2 \rfloor\)
    • Its left child index is \(2 i + 1\)
    • Its right child index is \(2 i + 2\)

    The heap element a[i] also satisfies a special condition — a so-called heap property:

    • \(\mathtt{a[parent(i)]}\geq \mathtt{a[i]}\) — for max-heaps
    • \(\mathtt{a[parent(i)]}\leq \mathtt{a[i]}\) — for min-heaps

    The array heap tree is always (almost) complete — the length of all branches is either equal to the tree height h, or h-1 (this follows from the trivial mathematical fact that any number N satisfies the inequalities: \(2^{h} \leq N \leq 2^{h+1} - 1\) for some positive integer h).

Heaps visualised

For an arbitrary array a[i] of the length N, one can always heapify it — calculate relationships between the elements as prescribed by the heap property; the effect is like placing the elements on a tree.

Why heaps are useful? Or, put it another way, what advantage does rearranging an array into a heap give? Max-heaps are used in the sorting algorithms HeapSort which gives \(\Omega(N\log_{2}N)\) in-place sorting performance. This performance does not deteriorate like QuickSort on almost sorted inputs. Min-heaps are used in implementation of priority queues.

An arbitrary array a[i] can be sorted by going through the steps:

  1. Heapify a. The cost of this operation is \(\mathcal{O}(N)\).

  2. Sort the heap-array using HeapSort algorithm that has efficiency \(\mathcal{O}(N\log_{2}N)\), it does not deteriorate like QuickSort (due to “heapness”). The factor N comes from the array traversal, and the factor \(\log N\) from the heap property (in heap, all access ops are \(\mathcal{O}(h)\), and \(h = \lfloor\log_{2}N\rfloor\)).

heap

Self-Balancing and Amortising Trees

If writing (adding and deleting nodes) into a tree is allowed, its initially balanced structure can be compromised with the consequences of worsened behaviour — access cost \(\mathcal{O}(h)\) (h is the tree height) instead being \(\mathcal{O}(\log N)\) will be \(\mathcal{O}(N)\). How to deal with this problem?

  1. Add additional markers to nodes for constraining the structure; when the constraint gets broken (after add or delete), perform an local transform (called rotation) to restore the constraint. A well-known example is Red-Black Trees, in which every node (in addition to actual data) has a colour marker, “red” or “black”. The constraint:

    • the root is always black
    • all leaves are black, and all nodes (except the root) are either black or red
    • a red node children are always black
    • for each node, all monotone descending paths to leaves contain the same number of black nodes

    As mentioned above, the constraint guarantees that the tree remains approximately balanced — same node descending path lengths differ at most by factor of 2. Complexity of RBT structure and performance analysis is prohibitive for us here. A valuable study of Red-Black Trees by R. Sedgewick includes a Java code and animations.

  2. As alternative to using additional markers, one can restructure the tree by moving a most recently accessed node towards the root. Such operation is called splaying, and the trees which support it are called Splay Trees. The case for using Splay Trees is when same elements are accessed repeatedly in close sequence (during which no large tree modification is performed). So despite an initial access can be costly, \(\mathcal{O}(N)\), after splaying, it becomes \(\Omega(1)\), and the average cost of one access gets amortised. It is possible to show that the average cost can be made the “usual” \(\mathcal{O}(\log_{2}N)\), but we shall not go into details.

Expression Trees

The above examples dealt with tree structures to store data. Trees also can also be used to represent complex constructs with well defined semantics (ie, they can be evaluated. Examples:

  • an arithmetic expression, eg, 7 * (42 - 3)
  • a text, eg, a computer program, Foo.java

A tree which represents a computer program is called a program tree graph, or a parse tree, or Abstract Syntax Tree.

expression_tree

text := (3 + 5) * (9 - 6) --> tree

Expression trees have operators for their nodes, and values for their leaves. A process which creates an expression tree from the input string (a conventional arithmetic formula like above, or a computer program text) is called parsing. If the input string contains (syntax) errors, the parsing fails. Once built, an expression tree can be modified, optimised, transformed and evaluated. Compilation is an example of parsing process. Parsing is done through application of grammar rules, which define how to substitute a chunks of the input string (aka, tokens) into syntactic categories (nodes of the parse tree) and terminal symbols (leaves). Grammar is a set of recursive relationships:

<expression> ::= <number> | <expression> <operator> <expression> | (<expression>)
<operator> ::= + | - | * | / | %

Out-of-Core Algorithms

When data is too big to fit into memory and resides on a secondary storage device (hard disk, database etc) the time to access it is 50,000–100,000 times longer (from 50 nano sec to milli-seconds). If an algorithm involves a data structure from the “disk” — a so-called out-of-core algorithm — it must minimise read and write operations (“disk op”) on such DS.

disk

A very usable choice for on-the-disk DS is B-Trees. B-Trees are balanced like Red-Black Trees but the number of children at every node can change from a few to virtually thousands. The larger the number of children (let’s say a B-Tree has t of them on average), the shorter the tree height: \(h = \mathcal{O}(\log_{t}N)\), where N is the total number of nodes. Since the disk ops involve a node access, the fewer of them performed, the faster the algorithm runs.

B-Trees

b-tree

B-Tree Definition details

  1. The tree node can have up to (t-1) sorted keys \(k_{i}\) which define t intervals: \(\lbrack -\infty,k_{1}\rbrack\), \(\lbrack k_{t-1},+\infty\rbrack\) and \(\lbrack k_{i-1},k_{i}\rbrack,\; i=1..(t-1)\).

  2. When a value k is sought (inserted, deleted), the descent is performed into the subtree of a node which is represented by the i-th interval which contains \(k\in\lbrack k_{i-1},k_{i}\rbrack\)

  3. The maximum number of children t is determined by the size of memory (a page) which the RW head can “scoop” in one op. The scale of \(t\sim 10^{3}\).

  4. Standard search, add and delete operations can be defined with the performance \(\Omega(\log_{t}N)\)

Operations on Trees

Just as for other kinds of DS, some operations are quite simple when they are defined recursively. For example, the search and return operation (call it get(e)) is called on the whole tree with the argument e of the type E data which is stored in tree nodes; if the node which contains the data equal to e is found, the tree node (which has the type Tree<E>) is returned, if not found — null value is returned. The operation add(e) must preserve the search-tree property: It should go down to one of the leaves and add a left or right subtree into which the new element e will be “deposited”. The key aspect in both case is the tree traversal, and it is quite simple since we use the search-tree property:

Deleting a Tree Node

The example SearchTree.java presents implementation of the first three operations in which recursion plays a vital role — get(e) and add(e) are downright recursive, and remove(e) relies on get(e) to the node e. Details of remove(e) are somewhat tricky.

One has to consider four cases — three simple and a tricky one. When a node slated for deletion is found, it is pulled out of the tree, and its place is taken by another node which is carefully chosen in such a way that the resulting tree is still a search tree.

delete_case1 delete_case3

(Case 2 is not shown — it’s similar to Case 1, just replace the right subtree onto the left one.)

Deleting a Tree Node (concl)

The most difficult case is generic one — when the deletable node z has a normal right subtree r, such that the right child is not the smallest element greater than that in the deletable node (like in Case 3). The delete operation requires a “surgery” to implant the node with smallest element y in place of z and rearranging broken links to preserve the search property.

  • Gist of the algorithm:

    1. Find y, smallest in z.r
    2. Extract z and move y in its place; y.l (formerly null) becomes z.l
    3. former y.r, x, now takes place of y with all links carefully adjusted

delete_case4

One should study the actual code in SearchTree.java to understand all the details and appreciate the subtlety. (Worry not — this is not going to be in the final exam! -:)

Non-Recursive Tree Search

Algorithms can be (non-)recursive, and DS can be (non-)recursive. Do all elements of this “\(2\times 2\)”-matrix make sense?

Comparable and Comparator

Comparing objects (elements in a collection) can be done in two ways:

class ItemComparator implements Comparator<Item> {
   public int compare(Item a, Item b) {
      String descrA = a.getDescription();
      String descrB = b.getDescription();
      return desrcA.compareTo(descrB);
   }
}
ItemComparator comp = new ItemComparator();
TreeSet<Item> sortByDescription = new TreeSet<Item>(comp);  

In the example Library1.java, the same collection of Books is first sorted by their natural order, and then by the the length of their title.

Algorithms: Standard operations

Generic algorithms, which operate on the collection classes are provided as static methods from Arrays and Collections (not to be confused with Collection interface) classes. The Collections methods include (method return values and parameters are bounded types, actually; consult the API)

Algorithms: examples

Various methods to manipulate arrays (such as sorting and searching) are contained in Arrays class. When combined with Collections class methods, one can achieve powerful results with just few lines of code. The following program (from Java Tutorial) prints out its arguments in alphabetical — natural for Strings — order (another example is Anagram.java):

public class Sort {
   public static void main(String[] args) {
      List<String> list = Arrays.asList(args);
      Collections.sort(list);
      System.out.println(list);
   }
}

Another example — a list with 100 elements each initialised to an Integer of value -1:

Integer init = new Integer(-1);
List values = new ArrayList(Collections.nCopies(100,init));

One more — the method Array.asList() (used above) returns a list view of the array:

Card[] cardDeck = new Card[52];
List cardList = Arrays.asList(cardDeck);

Because cardList is only a view of an underlying array, you can set the values of elements but you cannot add or remove them. The list view is useful if you want to pass a simple array to methods of the Collections class.

Reiterate

Major advantages of using the JCF (repeating the Java Tutorial)

No need to invent a wheel every time, the best (“well rounded”) wheels have been already invented! Use them.

Copies and Views

Collections can grow very large and consume large memory.

Views are normally created by calling a static factory method with the underlying collection object passed as a parameter. Technically speaking: “A view is an object of a class that implements one of the interface types in the JFC, and that permits restricted access to a data structure (CH).” The view object is like a shallow copy, with access to elements of the original Collection object (hence, a modifiable view can be used to modify the original).

Collections Views

Some operations declared in Collection interface are marked as optional (add(), clear(), remove() and some bulk operations). This is because for certain implementations of the interface, these operations are deemed “destructive” — they attempt to modify the collection, and, if this is not consistent with the collection contract, they are not supported and their invocation throws UnsupportedOperationException.

boolean remove(E o) { throw UnsupportedOperationException; }

Also, “…some Collection implementations have restrictions on the elements that they may contain, like prohibiting null elements, and some have restrictions on the types of their elements. Attempting to add an non-eligible element throws an unchecked exception, typically NullPointerException or ClassCastException. Attempting to query the presence of an ineligible element may throw an exception, or it may simply return false”, depending on implementation (Java API docs).

The reason for including optional operations in the first place is to increase re-usability and flexibility of the Collection types.

The mechanism of views increases the number of concrete JFC classes by an integer factor (number of views). The different views — normal (unsynchronised), synchronised, unmodifiable and checked wrappers — that can be obtained from a Collection (or Map) type as three new sorts of collections (or maps) except that they are supported by an underlying (Map) data structure. Wrapper implementations delegate all their real work to a specified collection but add extra functionality on top of what this collection offers. This is another example of the Decorator pattern.

Collections Wrappers

The wrapper views are obtained via static factory methods (methods which return newly created objects) defined in Collections class of java.util package. The views and their advantages are: