Linked lists

Linked lists constitute one of the most basic and best-known data structures.

In this section we will study a typed variant of singly linked lists, which are, for instance,

Note

What is the point of studying such an elementary data structure that is already implemented and readily usable in the Scala collections API?

First, we want to get a first intuition how data structures are implemented in practice in the underlying API, and how the chosen data structure affects the efficiency of the basic operations that we can perform on the data structure, such as adding or deleting elements.

Second, we require practice with recursion, and making recursive implementations of list operations is an excellent primer to learning to think in terms of recursion (which can be difficult, so we really want to start with a very basic data structure!) and tail recursion.

We can define singly linked lists with elements of some type T, or T-lists for short, recursively as follows:

  1. The empty list, denoted by Nil, is a T-list.

  2. If l is a T-list and e is an element of type T, then Cons(e, l) is a T-list that contains the element e as its first element, followed by the elements in the list l.

  3. There are no other T-lists (this last requirement is usually implicit and not explicitly stated).

For example, let us assume that T is Int. Then

  • Nil is an Int-list representing an empty sequence [] of integers,

  • Cons(1, Nil) is an Int-list representing the sequence [1],

  • Cons(1, Cons(3, Cons(5, Nil))) is an Int-list for [1,3,5], and

  • Cons(1, Cons(Nil, Cons(5, Nil))) is not an Int-list.

Given a T-list Cons(e, l), we say that e is the head of the list and l is its tail. The empty list Nil does not have a head or a tail. As an example,

  • the head of the Int-list Cons(5, Nil) is 5 while the tail is Nil, and

  • for the String-list Cons("first", Cons("second" Cons("last", Nil))), the head is the string "first" and the tail is the list Cons("second" Cons("last", Nil)).

We now present different ways to implement lists and some basic operations on them. The recommended way is the A second implementation with case classes but we will start without case classes to illustrate some concepts and to appreciate case classes more when we observe how they make life easier. All the implementations here, such as the class scala.collection.immutable.List, provide an immutable data structure.

In the exercises, your task will be to implement more operations on lists.

A first implementation

Let us make our first implementation of lists. This first version is for introductory purposes, you will find the source code of the version in the corresponding home assignment. We first define an abstract base class for lists over some other type A, and declare some basic methods:

package linkedLists
import scala.annotation.tailrec

abstract class LinkedList[A] {
  /**
   * Returns true if the list is empty.
   */
  def isEmpty: Boolean
  /**
   * Returns the first element in the list
   * (throws an exception if the list is empty).
   */
  def head: A
  /**
   * Returns the list obtained by removing the first element from this list
   * (throws an exception if the list is empty).
   */
  def tail: LinkedList[A]
}

In this implementation, we want the head and tail methods to be included in the base class so that we can access them in methods that we implement at the base class level (in the other approaches below we can implement the same functionality without doing this).

The concrete classes for an empty list (Nil) and a nonempty list (Cons) extend the base class and implement the remaining methods:

class Nil[A]() extends LinkedList[A] {
  def isEmpty = true
  def head = throw new java.util.NoSuchElementException("head of empty list")
  def tail = throw new java.util.NoSuchElementException("tail of empty list")
}
class Cons[A](val head: A, val tail: LinkedList[A]) extends LinkedList[A] {
  def isEmpty = false
}

Finally, the companion objects provide the usual helper methods for constructing new list objects:

/** Companion object for an empty list */
object Nil {
  def apply[A]() = new Nil[A]()
}
/** Companion object for a non-empty list */
object Cons {
  def apply[A](head : A, tail : LinkedList[A]) = new Cons(head, tail)
}

We can now create, for example, a small linked list over String objects and fetch its first element:

scala> val l = Cons("First", Cons("Second", Nil()))
l: linkedLists.Cons[String] = Cons(First, Cons(Second, Nil()))

scala> l.head
res: String = First

(Actually, the second line will not look like that until we have implemented the toString method below.)

Let us study what happens when we prepend existing lists with new elements. For example, let us execute the following code:

val l = Cons(5, Cons(4, Cons(-7, Nil())))
val m = Cons(3, l)

Now m is a list of four Int values corresponding to the sequence [3, 5, 4, -7]. Observe that when we create the list Cons(3, l), the list l was not copied or even accessed at all. So what happens in the memory?

The statement

val l = Cons(5, Cons(4, Cons(-7, Nil())))

first creates four objects:

_images/list-objects.svg

Next, the statement

val m = Cons(3, l)

just adds one more Cons object, whose tail references to the first object in the list l:

_images/list-objects-cons.svg

That is, the new list shares all the objects in the first list.

Observe that it is not possible to create cyclic lists because the recursive definition is well-founded (must start from Nil). Furthermore, the data structure is immutable so we cannot modify the tail references of existing lists.

The contains method

Let us now implement our first non-trivial method on lists, namely one that checks whether an element e is contained in the list. The recursive reasoning behind the implementation is as follows:

  1. an empty list contains no elements;

  2. if the head of the list is the element e, then the list contains the element;

  3. otherwise, the list contains the element if the tail part of the list contains the element.

This leads to the following tail-recursive implementation which we include in the base class LinkedList:

  /**
   * Returns true if the list contains the element e.
   */
  @tailrec final def contains(e: A): Boolean = {
    if(isEmpty) false
    else if(head == e) true
    else tail.contains(e)
  }

Let us test the method on a small list:

scala> val l = Cons("First", Cons("Second", Nil()))
l: linkedLists.Cons[String] = Cons(First, Cons(Second, Nil()))

scala> l.contains("Second")
res: Boolean = true

scala> l.contains("Third")
res: Boolean = false

Recall that for the @tailrec annotation above, forcing the Scala compiler to check that the function is really tail recursive, we must import its definition at the beginning with

import scala.annotation.tailrec

As the method is free of side effects, we can also illustrate its action by “expanding” it:

Cons(1, Cons(2, Cons(3, Nil()))).contains(3) =
Cons(2, Cons(3, Nil())).contains(3) =
Cons(3, Nil()).contains(3) =
true

Or with a call-stack:

Cons(1, Cons(2, Cons(3, Nil()))).contains(3):
 return Cons(2, Cons(3, Nil())).contains(3)
Cons(2, Cons(3, Nil())).contains(3):
 return Cons(3, Nil()).contains(3)
Cons(3, Nil()).contains(3):
 return true

The length method

So far, so good. Let us now implement the length method returning the number of elements in the list. As lists were defined recursively, it is easy to define the result of the length method recursively, as well:

  1. the length of the empty list Nil is 0, and

  2. the length of a non-empty list Cons(e, l) is the length of the tail list l, plus \(1\).

A non-tail recursive version

The above definition immediately leads to our first implementation of length. We include the method signature definition

def length: Int

in the abstract base class LinkedList and then implement it with

def length = 0

in Nil and with

def length = 1 + tail.length

in Cons. This is a correct implementation, easy to understand, and works well for small lists. However, as we easily observe, the implementation is not tail-recursive. Indeed, in Cons one has to first compute the value _tail.length and then add 1 to it. Therefore, if we experiment with long lists (say, 20000 in one environment)

    val l = (1 to 20000).foldRight[LinkedList[Int]](Nil[Int]())((v: Int, m: LinkedList[Int]) => Cons(v, m))
    println("The length is: "+l.length)

the result is a java.lang.StackOverflowError.

Again we can illustrate the behaviour of this method by “expanding” it:

Cons(1, Cons(2, Cons(3, Nil()))).length =
1 + Cons(2, Cons(3, Nil())).length =
1 + (1 + Cons(3, Nil()).length) =
1 + (1 + (1 + Nil().length)) =
1 + (1 + (1 + 0)) =
1 + (1 + 1) =
1 + 2 =
3

Or with a call-stack:

Cons(1, Cons(2, Cons(3, Nil()))).length:
 val tmp1 = Cons(2, Cons(3, Nil())).length
  Cons(2, Cons(3, Nil())).length:
   val tmp2 = Cons(3, Nil()).length
    Cons(3, Nil()).length:
     val tmp3 = Nil().length
      Nil().length:
       return 0
     // tmp3 == 0
     return 1 + tmp3
   // tmp2 == 1
   return 1 + tmp2
 // tmp1 == 2
 return 1 + tmp1
3

An imperative version

The length method is simple and easily expressible in an imperative way by just going from the beginning of the list to the end with a while loop, counting the number of elements encountered in the way. That is, we implement the whole method in the abstract base class with the following code:

  def length: Int = {
    var result = 0
    var remaining = this
    while(!remaining.isEmpty) {
      result += 1
      remaining = remaining.tail
    }
    result
  }

Observe that the implementation does not change the state of the LinkedList object at any point during its execution; therefore, the class LinkedList remains immutable.

In fact, many of the methods in the classes in the scala.collection.immutable package are implemented in this way: one uses variables to compute intermediate results but the values stored in the object itself are not changed at any point (as required by immutability).

Note

In our programming exercise on lists, you are not allowed to use imperative code. In particular, var declarations are forbidden by a custom Scala compiler we use in the evaluation phase.

A tail-recursive version

Since we are practising recursion, let us implement the length method in a tail-recursive way. One way to do this is just to “implement” the while loop above with an inner function and then call the inner function with the appropriate initial values:

  def length: Int = {
    @tailrec def inner(remaining: LinkedList[A], result: Int): Int = {
      if(remaining.isEmpty) result
      else inner(remaining.tail, result+1)
    }
    inner(this, 0)
  }

Observe that the variables of the imperative version are now maintained and passed as parameters to the inner function. Such use of inner functions to implement an iteration is a very common and useful programming pattern. Let us illustrate the operation of the method with “expansion”

Cons(1, Cons(2, Cons(3, Nil()))).length =
inner(Cons(1, Cons(2, Cons(3, Nil()))), 0) =
inner(Cons(2, Cons(3, Nil())), 1) =
inner(Cons(3, Nil()), 2) =
inner(Nil(), 3) =
3

And as a call-stack

Cons(1, Cons(2, Cons(3, Nil()))).length:
 return inner(Cons(1, Cons(2, Cons(3, Nil()))), 0)
  inner(Cons(1, Cons(2, Cons(3, Nil()))), 0):
   return inner(Cons(2, Cons(3, Nil())), 1)
  inner(Cons(2, Cons(3, Nil())), 1):
   return inner(Cons(3, Nil()), 2)
  inner(Cons(3, Nil()), 2):
   return inner(Nil(), 3)
  inner(Nil(), 3):
   return 3

The reverse and toString methods

Let us practise tail-recursive functions with two more methods:

  • reverse that gives another list obtained by listing the elements in the current list in the reverse order, and

  • toString that gives a string representation of the list.

Again, we implement the methods in the base class LinkedList and use inner functions to perform the iteration over the elements in the list:

  /**
   * Returns the reversed version of this list
   */
  def reverse: LinkedList[A] = {
    @tailrec def inner(remaining: LinkedList[A], result: LinkedList[A]): LinkedList[A] = {
      if(remaining.isEmpty) result
      else inner(remaining.tail, Cons(remaining.head, result))
    }
    inner(this, Nil())
  }

  /**
   * A string representation of the list.
   */
  override def toString: String = {
    @tailrec def inner(remaining: LinkedList[A], prefix: String, postfix: String): String = {
      if(remaining.isEmpty) prefix + "Nil()" + postfix
      else inner(remaining.tail, prefix + "Cons(" + remaining.head.toString + ", ", postfix + ")")
    }
    inner(this, "", "")
  }

It is probably not immediately clear how the reverse method works, so let us illustrate it with “expansion”:

Cons(1, Cons(2, Cons(3, Nil()))).reverse =
inner(Cons(1, Cons(2, Cons(3, Nil()))), Nil()) =
inner(Cons(2, Cons(3, Nil())), Cons(1, Nil())) =
inner(Cons(3, Nil()), Cons(2, Cons(1, Nil()))) =
inner(Nil(), Cons(3, Cons(2, Cons(1, Nil())))) =
Cons(3, Cons(2, Cons(1, Nil())))

A second implementation with case classes

Our second implementation takes advantage of a feature in Scala: case classes. Such classes are special in the sense that

  • a companion object is automatically generated, and

  • more importantly, they export their constructor for pattern matching.

Let us see how case classes work by re-implementing our list example with case classes. We again have the abstract base class LinkedList[A] (in which we implement more methods in a while) but the main difference is that the classes Nil and Cons are declared as case classes:

abstract class LinkedList[A] {
  def isEmpty: Boolean
  def head: A
  def tail: LinkedList[A]
}

case class Nil[A]() extends LinkedList[A] {
  def isEmpty = true
  def head = throw new java.util.NoSuchElementException("head of empty list")
  def tail = throw new java.util.NoSuchElementException("tail of empty list")
}

case class Cons[A](val head: A, val tail: LinkedList[A]) extends LinkedList[A] {
  def isEmpty = false
}

Companion objects are automatically generated for case classes, enabling us to write, as earlier:

scala> val l = Cons("First", Cons("Second", Nil()))
l: linkedLists.Cons[String] = Cons(First, Cons(Second, Nil()))

scala> l.head
res: String = First

The contains method

We could still use our previous implementation of the contains method but we can now also write it with pattern matching on case classes:

  /**
   * Returns true if the list contains the element e.
   */
  @tailrec final def contains(e: A): Boolean = this match {
    case Nil() => false
    case Cons(h, t) => if (h == e) true else t.contains(e)
  }

Here the statement “this match { ... }” takes the object this (i.e., the list on which the method is called) and matches it with the cases listed in the curly brackets:

  1. The first case, case Nil(), checks whether this is an instance of the Nil class (i.e., whether the list is empty). If this is the case, then the “this match { ... }” statement evaluates to the value false.

  2. The second case, case Cons(h, t), checks whether this is an instance of the Cons class and that

  • its first constructor parameter value _head equals to h, and

  • its second constructor parameter value _tail equals to t.

As h and t are not defined in the scope, new values are reserved for them and they can match to any value. These matched values are then available in the value evaluation part “if (h == e) true else t.contains(e)” of the case, allowing us to check (i) if the head of the list equals to e, or (ii) if it does not, perform the tail-recursive call on the tail of the list.

For example, if this is the list val l = Cons("First", Cons("Second", Nil())), then the case case Cons(h, t) is matched with the value of h being the string "First" and the value of t being the object Cons("Second", Nil()).

The cases are evaluated in the order they are given in program text, and the first case that matches is evaluated (if none of the cases match, then scala.MatchError error is thrown automatically).

The length, reverse and toString methods

In a similar way, we can re-implement these methods more concisely with case classes:

  /**
   * Returns the length of the list (the number of elements in it).
   */
  def length: Int = {
    @tailrec def inner(remaining: LinkedList[A], result: Int): Int =
      remaining match {
        case Nil() => result
        case Cons(_, t) => inner(t, result + 1)
      }
    inner(this, 0)
  }

  /**
   * Returns the reversed version of this list
   */
  def reverse: LinkedList[A] = {
    @tailrec def inner(remaining: LinkedList[A], result: LinkedList[A]): LinkedList[A] =
      remaining match {
        case Nil() => result
        case Cons(h, t) => inner(t, Cons(h, result))
      }
    inner(this, Nil())
  }

  /**
   * A string representation of the list.
   */
  override def toString: String = {
    @tailrec def inner(remaining: LinkedList[A], prefix: String, postfix: String): String =
      remaining match {
        case Nil() => prefix + "Nil()" + postfix
        case Cons(h, t) => inner(t, prefix + "Cons(" + h.toString + ", ", postfix + ")")
      }
    inner(this, "", "")
  }

Observe the underline _ in the case case Cons(_, t) => inner(t, result + 1) in the inner function of the length method. This tells the Scala compiler that “we are not interested in the _head value of the list in this case, do not generate the code needed for matching it”. It also increases readability as we need not introduce variables that we do not actually use.