Linked lists

Linked lists constitute one of the most basic and best-known data structures.

In this section we will study a typed variant of singly linked lists, which are, for instance,

implemented in Scala as the scala.collection.immutable.List class, and
the basic data structure of the Lisp programming language.

Note

What is the point of studying such an elementary data structure that is already implemented and readily usable in the Scala collections API?

First, we want to get a first intuition how data structures are implemented in practice in the underlying API, and how the chosen data structure affects the efficiency of the basic operations that we can perform on the data structure, such as adding or deleting elements.

Second, we require practice with recursion, and making recursive implementations of list operations is an excellent primer to learning to think in terms of recursion (which can be difficult, so we really want to start with a very basic data structure!) and tail recursion.

We can define singly linked lists with elements of some type T, or T-lists for short, recursively as follows:

The empty list, denoted by Nil, is a T-list.
If l is a T-list and e is an element of type T, then Cons(e, l) is a T-list that contains the element e as its first element, followed by the elements in the list l.
There are no other T-lists (this last requirement is usually implicit and not explicitly stated).

For example, let us assume that T is Int. Then

Nil is an Int-list representing an empty sequence [] of integers,
Cons(1, Nil) is an Int-list representing the sequence [1],
Cons(1, Cons(3, Cons(5, Nil))) is an Int-list for [1,3,5], and
Cons(1, Cons(Nil, Cons(5, Nil))) is not an Int-list.

Given a T-list Cons(e, l), we say that e is the head of the list and l is its tail. The empty list Nil does not have a head or a tail. As an example,

the head of the Int-list Cons(5, Nil) is 5 while the tail is Nil, and
for the String-list Cons("first", Cons("second" Cons("last", Nil))), the head is the string "first" and the tail is the list Cons("second" Cons("last", Nil)).

We now present different ways to implement lists and some basic operations on them. The recommended way is the A second implementation with case classes but we will start without case classes to illustrate some concepts and to appreciate case classes more when we observe how they make life easier. All the implementations here, such as the class scala.collection.immutable.List, provide an immutable data structure.

In the exercises, your task will be to implement more operations on lists.

A first implementation

Let us make our first implementation of lists. This first version is for introductory purposes, you will find the source code of the version in the corresponding home assignment. We first define an abstract base class for lists over some other type A, and declare some basic methods:

package linkedLists
import scala.annotation.tailrec

abstract class LinkedList[A]:
  /**
   * Returns true if the list is empty.
   */
  def isEmpty: Boolean
  /**
   * Returns the first element in the list
   * (throws an exception if the list is empty).
   */
  def head: A
  /**
   * Returns the list obtained by removing the first element from this list
   * (throws an exception if the list is empty).
   */
  def tail: LinkedList[A]
end LinkedList

In this implementation, we want the head and tail methods to be included in the base class so that we can access them in methods that we implement at the base class level (in the other approaches below we can implement the same functionality without doing this).

The concrete classes for an empty list (Nil) and a nonempty list (Cons) extend the base class and implement the remaining methods:

/**
 * An empty linked list
 */
class Nil[A]() extends LinkedList[A]:
  def isEmpty = true
  def head = throw new java.util.NoSuchElementException("head of empty list")
  def tail = throw new java.util.NoSuchElementException("tail of empty list")
end Nil

/**
 * A non-empty linked list.
 * @param _head the first element in the list
 * @param _tail the rest of the list
 */
class Cons[A](val head: A, val tail: LinkedList[A]) extends LinkedList[A]:
  def isEmpty = false
end Cons

Finally, the companion objects provide the usual helper methods for constructing new list objects:

/** Companion object for an empty list */
object Nil:
  def apply[A]() = new Nil[A]()
end Nil
/** Companion object for a non-empty list */
object Cons:
  def apply[A](head : A, tail : LinkedList[A]) = new Cons(head, tail)
end Cons

We can now create, for example, a small linked list over String objects and fetch its first element:

scala> val l = Cons("First", Cons("Second", Nil()))
l: linkedLists.Cons[String] = Cons(First, Cons(Second, Nil()))

scala> l.head
res: String = First

(Actually, the second line will not look like that until we have implemented the toString method below.)

Let us study what happens when we prepend existing lists with new elements. For example, let us execute the following code:

val l = Cons(5, Cons(4, Cons(-7, Nil())))
val m = Cons(3, l)

Now m is a list of four Int values corresponding to the sequence [3, 5, 4, -7]. Observe that when we create the list Cons(3, l), the list l was not copied or even accessed at all. So what happens in the memory?

The statement

val l = Cons(5, Cons(4, Cons(-7, Nil())))

first creates four objects:

Next, the statement

val m = Cons(3, l)

just adds one more Cons object, whose tail references to the first object in the list l:

That is, the new list shares all the objects in the first list.

Observe that it is not possible to create cyclic lists because the recursive definition is well-founded (must start from Nil). Furthermore, the data structure is immutable so we cannot modify the tail references of existing lists.

The `contains` method

Let us now implement our first non-trivial method on lists, namely one that checks whether an element e is contained in the list. The recursive reasoning behind the implementation is as follows:

an empty list contains no elements;
if the head of the list is the element e, then the list contains the element;
otherwise, the list contains the element if the tail part of the list contains the element.

This leads to the following tail-recursive implementation which we include in the base class LinkedList:

  /**
   * Returns true if the list contains the element e.
   */
  @tailrec final def contains(e: A): Boolean =
    if isEmpty then false
    else if head == e then true
    else tail.contains(e)
  end contains

Let us test the method on a small list:

scala> val l = Cons("First", Cons("Second", Nil()))
l: linkedLists.Cons[String] = Cons(First, Cons(Second, Nil()))

scala> l.contains("Second")
res: Boolean = true

scala> l.contains("Third")
res: Boolean = false

Recall that for the @tailrec annotation above, forcing the Scala compiler to check that the function is really tail recursive, we must import its definition at the beginning with

import scala.annotation.tailrec

As the method is free of side effects, we can also illustrate its action by “expanding” it:

Cons(1, Cons(2, Cons(3, Nil()))).contains(3) =
Cons(2, Cons(3, Nil())).contains(3) =
Cons(3, Nil()).contains(3) =
true

Or with a call-stack:

Cons(1, Cons(2, Cons(3, Nil()))).contains(3):
 return Cons(2, Cons(3, Nil())).contains(3)
Cons(2, Cons(3, Nil())).contains(3):
 return Cons(3, Nil()).contains(3)
Cons(3, Nil()).contains(3):
 return true

The `length` method

So far, so good. Let us now implement the length method returning the number of elements in the list. As lists were defined recursively, it is easy to define the result of the length method recursively, as well:

the length of the empty list Nil is 0, and
the length of a non-empty list Cons(e, l) is the length of the tail list l, plus \(1\).

A non-tail recursive version

The above definition immediately leads to our first implementation of length. We include the method signature definition

abstract class LinkedList[A]:
   // Rest as in LinkedList above ...

   // Define a length method
   def length: Int

end LinkedList

in the abstract base class LinkedList and then implement it with

class Nil[A]() extends LinkedList[A]:
   // Rest as in Nil above ...

   // Length of Nil is 0
   def length = 0

end Nil

in Nil and with

class Cons[A](val head: A, val tail: LinkedList[A]) extends LinkedList[A]:
   // Rest as in Cons above ...

   // Length of Cons is 1 + length of its tail
   def length = 1 + tail.length

end Cons

in Cons. This is a correct implementation, easy to understand, and works well for small lists. However, as we easily observe, the implementation is not tail-recursive.

Indeed, in Cons one has to first compute the value tail.length and then add 1 to it.

Therefore, if we experiment with long lists (say, 20000 in one environment)

val x = (1 to 20000).foldRight(Nil[Int]())((v: Int, m: LinkedList[Int]) => Cons(v, m))
println("The length of x is " +x.length)

the result is a java.lang.StackOverflowError.

Again we can illustrate the behaviour of this method by “expanding” it:

Cons(1, Cons(2, Cons(3, Nil()))).length =
+ Cons(2, Cons(3, Nil())).length =
+ (1 + Cons(3, Nil()).length) =
+ (1 + (1 + Nil().length)) =
+ (1 + (1 + 0)) =
+ (1 + 1) =
+ 2 =
3

Or with a call-stack:

Cons(1, Cons(2, Cons(3, Nil()))).length:
 val tmp1 = Cons(2, Cons(3, Nil())).length
  Cons(2, Cons(3, Nil())).length:
   val tmp2 = Cons(3, Nil()).length
    Cons(3, Nil()).length:
     val tmp3 = Nil().length
      Nil().length:
       return 0
     // tmp3 == 0
     return 1 + tmp3
   // tmp2 == 1
   return 1 + tmp2
 // tmp1 == 2
 return 1 + tmp1
3

An imperative version

The length method is simple and easily expressible in an imperative way by just going from the beginning of the list to the end with a while loop, counting the number of elements encountered in the way. That is, we implement the whole method in the abstract base class with the following code:

def length: Int =
  var result = 0
  var remaining = this
  while !remaining.isEmpty do
    result += 1
    remaining = remaining.tail
  end while
  result
end length

Observe that the implementation does not change the state of the LinkedList object at any point during its execution; therefore, the class LinkedList remains immutable.

In fact, many of the methods in the classes in the scala.collection.immutable package are implemented in this way: one uses variables to compute intermediate results but the values stored in the object itself are not changed at any point (as required by immutability).

Note

In our programming exercise on lists, you are not allowed to use imperative code. In particular, var declarations are forbidden by a custom Scala compiler we use in the evaluation phase.

A tail-recursive version

Since we are practising recursion, let us implement the length method in a tail-recursive way. One way to do this is just to “implement” the while loop above with an inner function and then call the inner function with the appropriate initial values:

  def length: Int =
    @tailrec def inner(remaining: LinkedList[A], result: Int): Int =
      if remaining.isEmpty then result
      else inner(remaining.tail, result+1)
    end inner
    inner(this, 0)
  end length

Observe that the variables of the imperative version are now maintained and passed as parameters to the inner function. Such use of inner functions to implement an iteration is a very common and useful programming pattern. Let us illustrate the operation of the method with “expansion”

Cons(1, Cons(2, Cons(3, Nil()))).length =
inner(Cons(1, Cons(2, Cons(3, Nil()))), 0) =
inner(Cons(2, Cons(3, Nil())), 1) =
inner(Cons(3, Nil()), 2) =
inner(Nil(), 3) =
3

And as a call-stack

Cons(1, Cons(2, Cons(3, Nil()))).length:
 return inner(Cons(1, Cons(2, Cons(3, Nil()))), 0)
  inner(Cons(1, Cons(2, Cons(3, Nil()))), 0):
   return inner(Cons(2, Cons(3, Nil())), 1)
  inner(Cons(2, Cons(3, Nil())), 1):
   return inner(Cons(3, Nil()), 2)
  inner(Cons(3, Nil()), 2):
   return inner(Nil(), 3)
  inner(Nil(), 3):
   return 3

The `reverse` and `toString` methods

Let us practise tail-recursive functions with two more methods:

reverse that gives another list obtained by listing the elements in the current list in the reverse order, and
toString that gives a string representation of the list.

Again, we implement the methods in the base class LinkedList and use inner functions to perform the iteration over the elements in the list:

  /**
   * Returns the reversed version of this list
   */
  def reverse: LinkedList[A] = 
    @tailrec def inner(remaining: LinkedList[A], result: LinkedList[A]): LinkedList[A] =
      if remaining.isEmpty then result
      else inner(remaining.tail, Cons(remaining.head, result))
    end inner

    inner(this, Nil())
  end reverse

  /**
   * A string representation of the list.
   */
  override def toString: String =
    @tailrec def inner(remaining: LinkedList[A], prefix: String, postfix: String): String =
      if remaining.isEmpty then prefix + "Nil()" + postfix
      else inner(remaining.tail, prefix + "Cons(" + remaining.head.toString + ", ", postfix + ")")
    end inner

    inner(this, "", "")
  end toString

It is probably not immediately clear how the reverse method works, so let us illustrate it with “expansion”:

Cons(1, Cons(2, Cons(3, Nil()))).reverse =
inner(Cons(1, Cons(2, Cons(3, Nil()))), Nil()) =
inner(Cons(2, Cons(3, Nil())), Cons(1, Nil())) =
inner(Cons(3, Nil()), Cons(2, Cons(1, Nil()))) =
inner(Nil(), Cons(3, Cons(2, Cons(1, Nil())))) =
Cons(3, Cons(2, Cons(1, Nil())))

A second implementation with case classes

Our second implementation takes advantage of a feature in Scala: case classes. Such classes are special in the sense that

a companion object is automatically generated, and
more importantly, they export their constructor for pattern matching.

Let us see how case classes work by re-implementing our list example with case classes. We again have the abstract base class LinkedList[A] (in which we implement more methods in a while) but the main difference is that the classes Nil and Cons are declared as case classes:

abstract class LinkedList[A]: 
  def isEmpty: Boolean
  def head: A
  def tail: LinkedList[A]
end LinkedList

case class Nil[A]() extends LinkedList[A]:
  def isEmpty = true
  def head = throw new java.util.NoSuchElementException("head of empty list")
  def tail = throw new java.util.NoSuchElementException("tail of empty list")
end Nil

case class Cons[A](val head: A, val tail: LinkedList[A]) extends LinkedList[A]:
  def isEmpty = false
end Cons

Companion objects are automatically generated for case classes, enabling us to write, as earlier:

scala> val l = Cons("First", Cons("Second", Nil()))
l: linkedLists.Cons[String] = Cons(First, Cons(Second, Nil()))

scala> l.head
res: String = First

The `contains` method

We could still use our previous implementation of the contains method but we can now also write it with pattern matching on case classes:

  @tailrec final def contains(e: A): Boolean = 
    this match
      case Nil() => false
      case Cons(h, t) => if (h == e) then true else t.contains(e)
    end match
  end contains

Here the statement “this match ...” takes the object this (i.e., the list on which the method is called) and matches it with the cases listed in the curly brackets:

The first case, case Nil(), checks whether this is an instance of the Nil class (i.e., whether the list is empty). If this is the case, then the “this match ...” statement evaluates to the value false.
The second case, case Cons(h, t), checks whether this is an instance of the Cons class and that

its first constructor parameter value head equals to h, and

its second constructor parameter value tail equals to t.

As h and t are not defined in the scope, new values are reserved for them and they can match to any value. These matched values are then available in the value evaluation part “if (h == e) true else t.contains(e)” of the case, allowing us to check (i) if the head of the list equals to e, or (ii) if it does not, perform the tail-recursive call on the tail of the list.

For example, if this is the list val l = Cons("First", Cons("Second", Nil())), then the case case Cons(h, t) is matched with the value of h being the string "First" and the value of t being the object Cons("Second", Nil()).

The cases are evaluated in the order they are given in program text, and the first case that matches is evaluated (if none of the cases match, then scala.MatchError error is thrown automatically).

The `length`, `reverse` and `toString` methods

In a similar way, we can re-implement these methods more concisely with case classes:

  def length: Int = 
    @tailrec def inner(remaining: LinkedList[A], result: Int): Int =
      remaining match
        case Nil() => result
        case Cons(_, t) => inner(t, result + 1)
      end match
    end inner
    inner(this, 0)
  end length

  def reverse: LinkedList[A] =
    @tailrec def inner(remaining: LinkedList[A], result: LinkedList[A]): LinkedList[A] =
      remaining match 
        case Nil() => result
        case Cons(h, t) => inner(t, Cons(h, result))
      end match
    inner(this, Nil())
  end reverse

  override def toString: String = 
    @tailrec def inner(remaining: LinkedList[A], prefix: String, postfix: String): String =
      remaining match 
        case Nil() => prefix + "Nil()" + postfix
        case Cons(h, t) => inner(t, prefix + "Cons(" + h.toString + ", ", postfix + ")")
      end match
    inner(this, "", "")
  end toString

Observe the underline _ in the case case Cons(_, t) => inner(t, result + 1) in the inner function of the length method. This tells the Scala compiler that “we are not interested in the _head value of the list in this case, do not generate the code needed for matching it”. It also increases readability as we need not introduce variables that we do not actually use.

Linked lists

A first implementation

The contains method

The length method

A non-tail recursive version

An imperative version

A tail-recursive version

The reverse and toString methods

A second implementation with case classes

The contains method

The length, reverse and toString methods

The `contains` method

The `length` method

The `reverse` and `toString` methods

The `contains` method

The `length`, `reverse` and `toString` methods