Linked lists
Linked lists constitute one of the most basic and best-known data structures.
In this section we will study a typed variant of singly linked lists, which are, for instance,
implemented in Scala as the scala.collection.immutable.List class, and
the basic data structure of the Lisp programming language.
Note
What is the point of studying such an elementary data structure that is already implemented and readily usable in the Scala collections API?
First, we want to get a first intuition how data structures are implemented in practice in the underlying API, and how the chosen data structure affects the efficiency of the basic operations that we can perform on the data structure, such as adding or deleting elements.
Second, we require practice with recursion, and making recursive implementations of list operations is an excellent primer to learning to think in terms of recursion (which can be difficult, so we really want to start with a very basic data structure!) and tail recursion.
We can define singly linked lists with elements of some type T
,
or T
-lists for short, recursively as follows:
The empty list, denoted by
Nil
, is aT
-list.If
l
is aT
-list ande
is an element of typeT
, thenCons(e, l)
is aT
-list that contains the elemente
as its first element, followed by the elements in the listl
.There are no other
T
-lists (this last requirement is usually implicit and not explicitly stated).
For example, let us assume that T
is Int
. Then
Nil
is anInt
-list representing an empty sequence [] of integers,Cons(1, Nil)
is anInt
-list representing the sequence [1],Cons(1, Cons(3, Cons(5, Nil)))
is anInt
-list for [1,3,5], andCons(1, Cons(Nil, Cons(5, Nil)))
is not anInt
-list.
Given a T
-list Cons(e, l)
, we say that e
is the head of the list and l
is its tail.
The empty list Nil
does not have a head or a tail.
As an example,
the head of the
Int
-listCons(5, Nil)
is5
while the tail isNil
, andfor the
String
-listCons("first", Cons("second" Cons("last", Nil)))
, the head is the string"first"
and the tail is the listCons("second" Cons("last", Nil))
.
We now present different ways to implement lists and some basic operations on them. The recommended way is the A second implementation with case classes but we will start without case classes to illustrate some concepts and to appreciate case classes more when we observe how they make life easier. All the implementations here, such as the class scala.collection.immutable.List, provide an immutable data structure.
In the exercises, your task will be to implement more operations on lists.
A first implementation
Let us make our first implementation of lists.
This first version is for introductory purposes, you will find
the source code of the version in the corresponding home assignment.
We first define an abstract base class for lists over some other type A
,
and declare some basic methods:
package linkedLists
import scala.annotation.tailrec
abstract class LinkedList[A] {
/**
* Returns true if the list is empty.
*/
def isEmpty: Boolean
/**
* Returns the first element in the list
* (throws an exception if the list is empty).
*/
def head: A
/**
* Returns the list obtained by removing the first element from this list
* (throws an exception if the list is empty).
*/
def tail: LinkedList[A]
}
In this implementation, we want the head
and tail
methods
to be included in the base class so that we can access them in methods
that we implement at the base class level (in the other approaches below
we can implement the same functionality without doing this).
The concrete classes for an empty list (Nil
) and a
nonempty list (Cons
) extend the base class and implement
the remaining methods:
class Nil[A]() extends LinkedList[A] {
def isEmpty = true
def head = throw new java.util.NoSuchElementException("head of empty list")
def tail = throw new java.util.NoSuchElementException("tail of empty list")
}
class Cons[A](val head: A, val tail: LinkedList[A]) extends LinkedList[A] {
def isEmpty = false
}
Finally, the companion objects provide the usual helper methods for constructing new list objects:
/** Companion object for an empty list */
object Nil {
def apply[A]() = new Nil[A]()
}
/** Companion object for a non-empty list */
object Cons {
def apply[A](head : A, tail : LinkedList[A]) = new Cons(head, tail)
}
We can now create, for example, a small linked list over String
objects
and fetch its first element:
scala> val l = Cons("First", Cons("Second", Nil()))
l: linkedLists.Cons[String] = Cons(First, Cons(Second, Nil()))
scala> l.head
res: String = First
(Actually, the second line will not look like
that until we have implemented the toString
method below.)
Let us study what happens when we prepend existing lists with new elements. For example, let us execute the following code:
val l = Cons(5, Cons(4, Cons(-7, Nil())))
val m = Cons(3, l)
Now m
is a list of four Int
values corresponding to
the sequence [3, 5, 4, -7]. Observe that when we create the list
Cons(3, l)
, the list l
was not copied or even accessed
at all. So what happens in the memory?
The statement
val l = Cons(5, Cons(4, Cons(-7, Nil())))
first creates four objects:
Next, the statement
val m = Cons(3, l)
just adds one more Cons
object, whose tail references
to the first object in the list l
:
That is, the new list shares all the objects in the first list.
Observe that it is not possible to create cyclic lists because
the recursive definition is well-founded (must start from Nil
).
Furthermore, the data structure is immutable so we cannot modify
the tail references of existing lists.
The contains
method
Let us now implement our first non-trivial method on lists,
namely one that checks whether an element e
is contained in the list.
The recursive reasoning behind the implementation is as follows:
an empty list contains no elements;
if the head of the list is the element
e
, then the list contains the element;otherwise, the list contains the element if the tail part of the list contains the element.
This leads to the following tail-recursive implementation which we
include in the base class LinkedList
:
/**
* Returns true if the list contains the element e.
*/
@tailrec final def contains(e: A): Boolean = {
if(isEmpty) false
else if(head == e) true
else tail.contains(e)
}
Let us test the method on a small list:
scala> val l = Cons("First", Cons("Second", Nil()))
l: linkedLists.Cons[String] = Cons(First, Cons(Second, Nil()))
scala> l.contains("Second")
res: Boolean = true
scala> l.contains("Third")
res: Boolean = false
Recall that for the @tailrec
annotation above,
forcing the Scala compiler to check that the function is really tail recursive,
we must import its definition at the beginning with
import scala.annotation.tailrec
As the method is free of side effects, we can also illustrate its action by “expanding” it:
Cons(1, Cons(2, Cons(3, Nil()))).contains(3) =
Cons(2, Cons(3, Nil())).contains(3) =
Cons(3, Nil()).contains(3) =
true
Or with a call-stack:
Cons(1, Cons(2, Cons(3, Nil()))).contains(3):
return Cons(2, Cons(3, Nil())).contains(3)
Cons(2, Cons(3, Nil())).contains(3):
return Cons(3, Nil()).contains(3)
Cons(3, Nil()).contains(3):
return true
The length
method
So far, so good. Let us now implement the length
method
returning the number of elements in the list.
As lists were defined recursively, it is easy to define
the result of the length
method recursively, as well:
the length of the empty list
Nil
is 0, andthe length of a non-empty list
Cons(e, l)
is the length of the tail listl
, plus \(1\).
A non-tail recursive version
The above definition immediately leads to our first implementation of length
.
We include the method signature definition
def length: Int
in the abstract base class LinkedList
and then implement it with
def length = 0
in Nil
and with
def length = 1 + tail.length
in Cons
. This is a correct implementation, easy to understand, and
works well for small lists.
However, as we easily observe, the implementation is not tail-recursive.
Indeed, in Cons
one has to first compute the value _tail.length
and
then add 1 to it.
Therefore, if we experiment with long lists (say, 20000 in one environment)
val l = (1 to 20000).foldRight[LinkedList[Int]](Nil[Int]())((v: Int, m: LinkedList[Int]) => Cons(v, m))
println("The length is: "+l.length)
the result is a java.lang.StackOverflowError
.
Again we can illustrate the behaviour of this method by “expanding” it:
Cons(1, Cons(2, Cons(3, Nil()))).length =
1 + Cons(2, Cons(3, Nil())).length =
1 + (1 + Cons(3, Nil()).length) =
1 + (1 + (1 + Nil().length)) =
1 + (1 + (1 + 0)) =
1 + (1 + 1) =
1 + 2 =
3
Or with a call-stack:
Cons(1, Cons(2, Cons(3, Nil()))).length:
val tmp1 = Cons(2, Cons(3, Nil())).length
Cons(2, Cons(3, Nil())).length:
val tmp2 = Cons(3, Nil()).length
Cons(3, Nil()).length:
val tmp3 = Nil().length
Nil().length:
return 0
// tmp3 == 0
return 1 + tmp3
// tmp2 == 1
return 1 + tmp2
// tmp1 == 2
return 1 + tmp1
3
An imperative version
The length
method is simple and easily expressible in an imperative
way by just going from the beginning of the list to the end with
a while
loop, counting the number of elements encountered in the way.
That is, we implement the whole method in the abstract base class with
the following code:
def length: Int = {
var result = 0
var remaining = this
while(!remaining.isEmpty) {
result += 1
remaining = remaining.tail
}
result
}
Observe that the implementation does not change the state of
the LinkedList
object at any point during its execution; therefore, the class LinkedList
remains immutable.
In fact, many of the methods in the classes in the scala.collection.immutable package are implemented in this way: one uses variables to compute intermediate results but the values stored in the object itself are not changed at any point (as required by immutability).
Note
In our programming exercise on lists, you are not allowed
to use imperative code. In particular, var
declarations
are forbidden by a custom Scala compiler we use in
the evaluation phase.
A tail-recursive version
Since we are practising recursion, let us implement the length
method in a tail-recursive way. One way to do this is just to “implement”
the while
loop above with an inner function and then call the
inner function with the appropriate initial values:
def length: Int = {
@tailrec def inner(remaining: LinkedList[A], result: Int): Int = {
if(remaining.isEmpty) result
else inner(remaining.tail, result+1)
}
inner(this, 0)
}
Observe that the variables of the imperative version are now maintained and passed as parameters to the inner function. Such use of inner functions to implement an iteration is a very common and useful programming pattern. Let us illustrate the operation of the method with “expansion”
Cons(1, Cons(2, Cons(3, Nil()))).length =
inner(Cons(1, Cons(2, Cons(3, Nil()))), 0) =
inner(Cons(2, Cons(3, Nil())), 1) =
inner(Cons(3, Nil()), 2) =
inner(Nil(), 3) =
3
And as a call-stack
Cons(1, Cons(2, Cons(3, Nil()))).length:
return inner(Cons(1, Cons(2, Cons(3, Nil()))), 0)
inner(Cons(1, Cons(2, Cons(3, Nil()))), 0):
return inner(Cons(2, Cons(3, Nil())), 1)
inner(Cons(2, Cons(3, Nil())), 1):
return inner(Cons(3, Nil()), 2)
inner(Cons(3, Nil()), 2):
return inner(Nil(), 3)
inner(Nil(), 3):
return 3
The reverse
and toString
methods
Let us practise tail-recursive functions with two more methods:
reverse
that gives another list obtained by listing the elements in the current list in the reverse order, andtoString
that gives a string representation of the list.
Again, we implement the methods in the base class LinkedList
and
use inner functions to perform the iteration over the elements in the list:
/**
* Returns the reversed version of this list
*/
def reverse: LinkedList[A] = {
@tailrec def inner(remaining: LinkedList[A], result: LinkedList[A]): LinkedList[A] = {
if(remaining.isEmpty) result
else inner(remaining.tail, Cons(remaining.head, result))
}
inner(this, Nil())
}
/**
* A string representation of the list.
*/
override def toString: String = {
@tailrec def inner(remaining: LinkedList[A], prefix: String, postfix: String): String = {
if(remaining.isEmpty) prefix + "Nil()" + postfix
else inner(remaining.tail, prefix + "Cons(" + remaining.head.toString + ", ", postfix + ")")
}
inner(this, "", "")
}
It is probably not immediately clear how the reverse
method works,
so let us illustrate it with “expansion”:
Cons(1, Cons(2, Cons(3, Nil()))).reverse =
inner(Cons(1, Cons(2, Cons(3, Nil()))), Nil()) =
inner(Cons(2, Cons(3, Nil())), Cons(1, Nil())) =
inner(Cons(3, Nil()), Cons(2, Cons(1, Nil()))) =
inner(Nil(), Cons(3, Cons(2, Cons(1, Nil())))) =
Cons(3, Cons(2, Cons(1, Nil())))
A second implementation with case classes
Our second implementation takes advantage of a feature in Scala: case classes. Such classes are special in the sense that
a companion object is automatically generated, and
more importantly, they export their constructor for pattern matching.
Let us see how case classes work by re-implementing our list example
with case classes. We again have the abstract base class
LinkedList[A]
(in which we implement more methods in a while)
but the main difference is that the classes Nil
and Cons
are declared as case classes:
abstract class LinkedList[A] {
def isEmpty: Boolean
def head: A
def tail: LinkedList[A]
}
case class Nil[A]() extends LinkedList[A] {
def isEmpty = true
def head = throw new java.util.NoSuchElementException("head of empty list")
def tail = throw new java.util.NoSuchElementException("tail of empty list")
}
case class Cons[A](val head: A, val tail: LinkedList[A]) extends LinkedList[A] {
def isEmpty = false
}
Companion objects are automatically generated for case classes, enabling us to write, as earlier:
scala> val l = Cons("First", Cons("Second", Nil()))
l: linkedLists.Cons[String] = Cons(First, Cons(Second, Nil()))
scala> l.head
res: String = First
The contains
method
We could still use our previous implementation of the contains
method
but we can now also write it with pattern matching on case classes:
/**
* Returns true if the list contains the element e.
*/
@tailrec final def contains(e: A): Boolean = this match {
case Nil() => false
case Cons(h, t) => if (h == e) true else t.contains(e)
}
Here the statement “this match { ... }
” takes the object this
(i.e., the list on which the method is called) and matches it with
the cases listed in the curly brackets:
The first case,
case Nil()
, checks whetherthis
is an instance of theNil
class (i.e., whether the list is empty). If this is the case, then the “this match { ... }
” statement evaluates to the valuefalse
.The second case,
case Cons(h, t)
, checks whetherthis
is an instance of theCons
class and that
its first constructor parameter value
_head
equals toh
, andits second constructor parameter value
_tail
equals tot
.As
h
andt
are not defined in the scope, new values are reserved for them and they can match to any value. These matched values are then available in the value evaluation part “if (h == e) true else t.contains(e)
” of the case, allowing us to check (i) if the head of the list equals toe
, or (ii) if it does not, perform the tail-recursive call on the tail of the list.For example, if
this
is the listval l = Cons("First", Cons("Second", Nil()))
, then the casecase Cons(h, t)
is matched with the value ofh
being the string"First"
and the value oft
being the objectCons("Second", Nil())
.
The cases are evaluated in the order they are given in program text,
and the first case that matches is evaluated
(if none of the cases match, then scala.MatchError
error
is thrown automatically).
The length
, reverse
and toString
methods
In a similar way, we can re-implement these methods more concisely with case classes:
/**
* Returns the length of the list (the number of elements in it).
*/
def length: Int = {
@tailrec def inner(remaining: LinkedList[A], result: Int): Int =
remaining match {
case Nil() => result
case Cons(_, t) => inner(t, result + 1)
}
inner(this, 0)
}
/**
* Returns the reversed version of this list
*/
def reverse: LinkedList[A] = {
@tailrec def inner(remaining: LinkedList[A], result: LinkedList[A]): LinkedList[A] =
remaining match {
case Nil() => result
case Cons(h, t) => inner(t, Cons(h, result))
}
inner(this, Nil())
}
/**
* A string representation of the list.
*/
override def toString: String = {
@tailrec def inner(remaining: LinkedList[A], prefix: String, postfix: String): String =
remaining match {
case Nil() => prefix + "Nil()" + postfix
case Cons(h, t) => inner(t, prefix + "Cons(" + h.toString + ", ", postfix + ")")
}
inner(this, "", "")
}
Observe the underline _
in the case
case Cons(_, t) => inner(t, result + 1)
in the inner function
of the length
method. This tells the Scala compiler that
“we are not interested in the _head
value of the list in this case,
do not generate the code needed for matching it”.
It also increases readability as we need not introduce variables that
we do not actually use.