Collections and functions

CS-A1120 Programming 2

Lukas Ahrenberg

Department of Computer Science
Aalto University

Course calendar (reminder)

Module 2 - Programming abstractions and analysis

We have learned the basics of a computer, but

Efficient programming requires abstractions and models that are
- recurrent : need encountered often in practice
- useful : help control and encapsulate a problem
- efficient : for the computer to execute

High-level learning goals

Round 6: Collections and functions
Round 7: Efficiency
Round 8: Recursion

After this module, you

are familiar with Scala 3's collection framework
are able to program using the functional style
can employ recursive problem solving and data structures
can analyse the efficiency and complexity of a basic function

After this round, you …

can program in the functional style
are familiar with the use of anonymous functions
can explain properties of higher-order functions
define function side–effects and pure functions
implement a basic iterator in Scala
are aware of the properties of Scala collections

Module 1 - Module 2 bridge

Getting from a high-level language (Scala, C++, Rust, …) to machine code?

We now know that binary machine code is the low-level program fed to a processor
You have programmed a basic processor in assembly, which is a thin human-readable mnemonic for machine code
- Instructions are more or less 1:1 to machine code instructions and mapped using an assembler
High level languages use Compilers, which transform a program in some readable form into byte-code for a virtual machine or directly into machine code for a specific architecture.

Abstraction

Hardware implements the machinery for computation
Universality and programmability allows us to add programming constructs and abstraction
- → We do not have to program in assembly code
- Instead we use compilers or interpreters to derive machine code from language constructs [source code]
Control flow abstraction
- Allow reusable code through functions & subroutines
Data structure abstraction
- Add high-level structure to the fundamental level of words in memory
- Types, Traits, Structures, Classes …
Programming languages allow us to reason and analyse on a higher level
- Independent of hardware, but informed by its limitations

Data structure abstraction

Different Implementations and Data Structures are suitable for different kinds of functionality and data
Data structures usually work on different abstraction levels:
- on the fundamental level, everything will be stored in the sequential memory as words
- but logical/conceptual levels are added to structure and work with the data
For example, an array can be a fairly 'thin' abstraction (but is still an abstraction)
- Needed: start address, number of elements, size of element

mutable and immutable

In Scala we have var (a variable) and val (a value)
- var is mutable
- val is immutable
Scala collections (such as List, ArrayBuffer, …) also come in two flavours
mutable collections can be changed
immutable collections cannot
- a new collection reflecting the update is created
Immutable collections is the default in Scala (since v. 2.13)
- That is, if you simply say Seq(2,0,2,4) you will get (some) immutable sequence
- Exception: Array is a special collection, and always mutable (corresponds to Java arrays)

Reminder: Scala variables and parameters of objects are references, not the objects themselves!

This is why

import scala.collection.mutable.ArrayBuffer
val x = ArrayBuffer("a","b","c")
val y = x
println(s"x = $x")
println(s"y = $y")
x(0) = "CHANGED"
println(s"x = $x")
println(s"y = $y")

will print:

x = ArrayBuffer(a, b, c)
y = ArrayBuffer(a, b, c)
x = ArrayBuffer(CHANGED, b, c)
y = ArrayBuffer(CHANGED, b, c)

Much simplified view:

val y = x does NOT create a copy of the ArrayBuffer object, it simply creates a copy of the reference.

Heap and Stack memory

Operating systems and programming languages provides abstractions for how programs uses the underlying system memory
To provide e.g. dynamic structures (such as buffers) or subroutines
Most modern systems divide memory into the stack and the heap

(There are also data structures called stacks and heaps, we will return to them in later courses.)

The Heap

Is an (unstructured) block of memory
When we request heap memory, for example by creating an object, e.g. an ArrayBuffer, in Scala
- The system allocates (reserves) a portion of the heap for this and gives us the reference
- When the object is not needed any more the memory can be freed up for future use
- Heap memory management can be tricky, there are many different approaches, e.g.
  - Scala, Java, Python, … does the work for you by garbage collection
  - Rust relies on an ownership system to avoid the overhead of garbage collection
  - In C, it is up to you to allocate and return memory to the system
Heap memory is useful for dynamic structures or objects which we may want to pass between functions

The Stack

Is a region of memory that is managed in Last-in-first-out manner
When new stack memory (a stack frame) is allocated
- it is always put directly after ('on top') of the previous one in the stack
- but, the frames can only be freed in the reverse order (the top one first)
Stack memory is restrictive, but also much simpler and can be faster
- Languages generally do stack allocation automatically, e.g. when calling a function
- Stack memory is used to save the function state and return address

Imperative and functional programming

Scala supports both imperative and functional programming styles

Imperative programming

Uses statements
- Think commands
  - First, do this
  - then do that…
Changes values (program state)

val v = Vector(1.0, 2.5, 3.0)
var i = 0
var sumOfSquares = 0.0
while i < v.length do
  sumOfSquares += v(i)*v(i)
  i=i+1

Functional programming

Expressions of functions
- In mathematics:
  - Application: $y = f\left(x\right)$
  - Composition: $h = g \circ f$
Transforms values

val v = Vector(1.0, 2.5, 3.0)
val sumOfSquares = v.map(x=>x*x).sum

The functional style

Generally uses immutable data types and val
Loops often implemented with recursion
Characteristics are, for example
- Higher order functions
- Anonymous functions

val v = Vector(1.0, 2.5, 3.0)
// In the following,
// map is a higher-order function
// taking the anonymous function
// x=>x*x as a parameter
val squares = v.map(x=>x*x)
val sumOfSquares = squares.sum

Functional programming does not influence what we can program, but how we program it.

Anonymous functions

(Also known as lambda functions, lambdas, or function literals)

A function we don't bother 'giving a name', only a body

Why? Sometimes the function is only needed once

val v = Vector(3,4,5,2,8,7)
val evens = v.filter(x => x%2 == 0)

Structure: (parameter) => (function body),
- e.g. x => x%2 == 0

Scala also allows for a simplified _ notation:

val v = Vector(3,4,5,2,8,7)
val evens = v.filter(_%2 == 0)

Anonymous functions are values, and can be assigned to val and var:

val isEven = (x: Int) => (x%2 == 0)
val v = Vector(3,4,5,2,8,7)
val evens = v.filter(isEven)

Function types

In the Scala REPL:

scala> val isEven = (x: Int) => (x%2 == 0)
val isEven: Int => Boolean = Lambda$2155/0x00000008409b1040@418ee41b

Note the type of isEven: Int => Boolean
The form A => B denotes a function type
So, Int => Boolean means that the val holds a reference to a function which takes an Int and returns a Boolean
Compare to how we in mathematics write $f:A \mapsto B$ for some function that maps a value in domain A to the co-domain B
- For example $f:\mathbb{Z}\mapsto \left\{0,1\right\}$,
- or, why not, $\mbox{isEven} : \mbox{Int} \mapsto \mbox{Boolean}$?

Higher order functions

Now that functions can be treated like values
- (in programming language lingo we say that functions are first class)
We can write functions that take other functions as parameters, or has functions as return values
Such functions are called higher order functions
For example, the filter method in Seq is higher order, as we have seen
- Signature: def filter(pred: (A) => Boolean): Seq[A]
- Takes the predicate function pred then uses that to determine which elements to filter
- For example:
```
  scala> Seq(1,2,3).filter(x => x % 2 == 0)
  val res13: Seq[Int] = List(2)                                    
```
- Where we provide the anonymous function x => x%2 == 0 as pred

Function objects

What is this sorcery? How can functions (=program code) be given as a parameter to another function?

Well, if there could be a type Int, of some parameter, why couldn't there be one of type Int => Bool?

Scala internally, this works because functions are objects
- (and you can send an object as a parameter, right?)
For example, there is a trait Function1 for single argument functions, so we could do:

object isEven extends Function1[Int,Boolean]:
  def apply(x: Int): Boolean = (x % 2 == 0)
end isEven

val v = Vector(3,4,5,2,8,7)
val evens = v.filter(isEven)

(The construction val isEven: Int => Boolean = x=>x%2 == 0 is 'syntactic sugar'.)

Using collections in a functional style

scala.collection offers many functional utility methods
For example: forall, foreach, map, flatMap, groupBy, foldLeft, zip which you can read about in the course material
For the exercises you may also want to consult the Scala documentation for such traits as Seq and Map and read about other useful methods such as splitAt, keys, filter, toSet, toMap, toVector, indexWhere, take, min, and max (where they are available)
If you need a refresher, chapter 6 of the CS-A1110 material has a lot to offer

Side effects

A function has side effects if it does anything else than take a value and return a result (computed somehow).

For example, it has side effects if it:

it causes some other observable interaction with its environment
- e.g. prints something,
- writes to a file,
- causes an exception
it changes the state of some externally visible object (or global variable)
it calls some other method/function that have side effects

(The fact that execution takes time and uses memory is not considered a side effect.)

Side effects, for example

println, has side effects, because it prints a string to a console.
def f(a: Int, b: Int): Int = a+b is side effects free, because it simply calculates a value.

However,

def g(a: Int, b: Int): Int =
  println(s"a = $a, b = $b")
  a+b

has side effects. (Why?)

Changing the state of something mutable inside the function is also considered a side effect:
```
var x = 0
def h(v: Int): Int =
  if v > x then
    x = v
  x
```
has a side effect (Why?)
Updating a value on return is not considered a side effect
```
var x = 0
x = if v > x then v else x
```
does not have side effects

Which of the following functions have side effects?

def printDivisibleBy3(S: Seq[Int]): Unit =
  S.filter(x=> (x % 3) == 0).foreach(x=>println(x))

def cSum(A : Array[Int]): Array[Int] =
  val B: Array[Int] = A
  var i = 1
  while i < A.length do
    B(i) = B(i-1) + A(i)
    i += 1
  B

def zipWithReverse(S: Seq[Int]): Seq[(Int,Int)] =
  S.zip(S.reverse)

import scala.util.Random
def addNoise(S: Seq[Double]): Seq[Double] =
  S.map(x => x + Random.nextGaussian())

https://presemo.aalto.fi/prog2

Pure functions

A function is said to be pure if

It is does not have side effects, and

evaluating it with the same argument, always gives the same result

Are the following functions pure, and if not why?
- def f(a: Int, b: Int) = if a > b then a else b
- println
- Random.nextGaussian
- java.time.LocalDateTime.now
https://presemo.aalto.fi/prog2

Closure

In the following (slow) test for prime numbers:

def isPrimeSlow(n: Int): Boolean =
  require(n > 1)
  (2 until n).forall(divisor => n % divisor != 0)

the anonymous function divisor => n % divisor != 0, does not have n as a parameter
- n is a free variable in the closure of the anonymous function
The closure of a function consists of the function itself and the referencing environment (its scope)
A function can be passed as an argument to, or returned from, another function, and still has access to its closure

Example: Prefixing messages (don't do it like this)

Assume we want a function prefixPrintln that prints a string but inserts a line number and a user-configurable prefix first.
For example with the prefix set to "A>" and line numbers starting at 1, calling prefixPrintln("Hello") should print [1]A>Hello, and if we made the same call again in the same session, we would expect [2]A>Hello.

// These var:s exist in the global scope
// [for the sake of demo, in general a bad idea]
var prefix = "A>"
var lines = 0
// println on the format [<lines>]<prefix><s>
def prefixPrintln(s: String) = 
  lines +=1
  println(s"[$lines]$prefix$s")

scala> prefixPrintln("Hello")
[1]A>Hello

scala> prefixPrintln("World")
[2]A>World

var's declared in the global scope

scala> prefix = "B>"
prefix: String = B>

scala> lines = 4324324
lines: Int = 4324324

scala> prefixPrintln("Test")
[4324325]B>Test

Updating prefix or lines changes how the function behaves
Global var is a bad idea

Example: Using objects for encapsulation

The object-oriented way to solve this is with a class where prefix is given as parameter and lines is encapsulated

class PrefixPrinter(prefix: String):
  private var lines = 0L
  // apply method lets us call objects using ()
  def apply(s: String) =
    lines += 1
    println(s"[$lines]$prefix$s")

scala> val prefixPrintln = PrefixPrinter("A>")
val prefixPrintln: PrefixPrinter =
  PrefixPrinter@7be7c052

scala> prefixPrintln("Hello")
[1]A>Hello

scala> prefixPrintln("World")
[2]A>World

This works fine, but we can achieve the same thing using only functions

Example: Using Closures

We can let lines and prefix be in the closure of a function that we return from a factory function:

// Returns a println like function that prefixes everything
// with [n]`prefix`, where n is the line number,
// starting at 1
def createPrefixPrint(prefix: String): String => Unit =
  var lines = 0
  s =>                               // s is the parameter
       lines+=1                      // first increase
       println(s"[$lines]$prefix$s") // then print

scala> val prefixPrintln = createPrefixPrint("A>")                                                                         
val prefixPrintln: String => Unit = Lambda$1682/0x0000000840835040@25e796fe
                                                                                                                           
scala> prefixPrintln("Hello")
[1]A>Hello
                                                                                                                           
scala> prefixPrintln("World")
[2]A>World

The return type of factory createPrefixPrint is the function String => Unit
- That is, the value returned is itself a function which takes a parameter of type String and returns Unit (nothing)
- The reason for Unit is that this is the return type of println
The return value is the anonymous function starting s => which is of type String => Unit

Multiple parameter lists

Multiple parameters to a Scala function is usually given as a tuple. For example:
```
def sumTwo(a:Int, b:Int): Int = a + b
```
- Called as e.g. sumTwo(5,6)
But, we can also use multiple parameter lists to do the same thing:
```
def sumTwo(a:Int)(b:Int): Int = a + b
```
- Called as e.g. sumTwo(5)(6)
You have used this before, for example with the foldLeft method found in any iterable Scala collection.
Mathematically, this can be thought of as
- def sumTwo(a:Int, b:Int): Int corresponds to $\mbox{sumTwo} : \mbox{Int} \times \mbox{Int} \mapsto \mbox{Int}$
- def sumTwo(a:Int)(b:Int): Int corresponds to $\mbox{sumTwo} : \mbox{Int} \mapsto \left( \mbox{Int} \mapsto \mbox{Int} \right) $

Currying (good to know)

Changing from a multiple parameters to a sequence of functions is sometimes called currying (after the logician Haskell Curry).
- This terminology is discouraged for multiple parameter lists in Scala as the implementation is technically different
- There is a method called curried which converts a function of multiple parameters to a sequence of lambda functions
- The call site of the resulting function sequence looks identical to multiple parameter lists

scala> def sumTwo(a:Int, b:Int): Int = a + b
def sumTwo(a: Int, b: Int): Int
                                                                                                                                        
scala> val sumTwoCurry = sumTwo.curried
val sumTwoCurry: Int => Int => Int =
  scala.Function2$$Lambda$1815/0x0000000840899840@225ac5e7
                                                                                                                                        
scala> sumTwoCurry(10)(11)
val res0: Int = 21

Partial application

Curried function and functions with multiple parameter lists allow for programming flexibility partial application,
- Meaning that it is called with some of the parameters, thereby returning a new function instead of the value
For example, given def sumTwo(a:Int)(b:Int): Int = a + b
- We can bind the first argument, a, to some value (for example 5)
- Example in the REPL to show type:
```
scala> val fivePlus = sumTwo(5)
val fivePlus: Int => Int = $Lambda$1348/0x00000008406e6840@5381cdb6
```
The above returns a new anonymous function which is essentially b: Int => 5 + b.

Here is fivePlus used in the REPL:

scala> fivePlus(3)
val res: Int = 8

scala> fivePlus(7)
val res: Int = 12

Partial application quiz

Assume we define the following to concatenate strings:

def conc(x: String)(y: String): String = x + y
val partial = conc("A")

What is the type of partial?

What is the result of running partial("B")?

What is the result of running partial("C") + partial("B")?

https://presemo.aalto.fi/prog2

Call by

Scala (and many other languages) has more than one way of evaluating arguments given to functions

Call-by-value ("eager"-, "strict" evaluation): the arguments are evaluated before executing the function
Call-by-name ("loose"-, "non-strict" evaluation): the arguments are evaluated when they are needed
Call-by-need ("lazy" evaluation): the arguments are evaluated loosely, but at most once
Call-by-value is standard in Scala
Call-by-name and Call-by-need is useful for example when
- We may want to provide function blocks to be executed by the function
- It is costly to calculate the parameter (and the function only uses it sometimes)

A parameter can be made 'call-by-name' by using => first in the type definition. Let's create another function that prints a string with a prefix. Let's have it use multiple parameter lists, one for the prefix and another for the message string. Moreover, make the prefix call-by-name:

// prefix is call-by-name, and will be evaluated when needed (when it is printed)
// msg is call-by-value and stays the same as when when logMessage is called
def logMessage(prefix: => String)(msg: String): Unit = println(s"$prefix$msg")

Note the parameter prefix has the type => String.

Call-by-name vs call-by-value

Call-by-name (note prefix: => String)

// Import now, which gives the current time
import java.time.LocalDateTime.now

// Log message function, note that prefix is call-by-name
def logMessage(prefix: => String)(msg: String): Unit = 
  println(s"$prefix$msg")

// Create a log function that prefix everything
// with the current time
val logThis = logMessage(now().toString + ": ")

// Use it to log a few things
logThis("Captain's log")
logThis("Star-date...")
logThis("I can't remember!")

Output:

2025-03-29T16:49:40.486410873: Captain's log
2025-03-29T16:49:40.486757394: Star-date...
2025-03-29T16:49:40.486890486: I can't remember!

Note that the last decimals in the time changes!
prefix is evaluated each time it is needed by println

Call-by-value (note prefix: String)

// Import now, which gives the current time
import java.time.LocalDateTime.now

// Log message function, note that prefix is call-by-value
def logMessage(prefix: String)(msg: String): Unit = 
  println(s"$prefix$msg")

// Create a log function that prefix everything
// with the current time
val logThis = logMessage(now().toString + ": ")

// Use it to log a few things
logThis("Captain's log")
logThis("Star-date...")
logThis("I can't remember!")

Output:

2025-03-29T16:55:24.388120025: Captain's log
2025-03-29T16:55:24.388120025: Star-date...
2025-03-29T16:55:24.388120025: I can't remember!

Note that the last decimals in the time stays constant!
prefix is evaluated once, which is when logThis is created

Iterators

An Iterator can used to access the elements of a collection one by one
It can mimic the behaviour of collections and have many of the same method
Iterator is a mutable object
- (It has side effects)
- Two basic operations:
  - next(): A
    returns the next element in the iterator
  - hasNext: Boolean
    tells us if there are more elements
Use the iterator method on a Scala collection to get one
Iterators are useful when we want to
- Traverse the elements in a collection
- Generate a sequence without storing all values

Getting an iterator and asking for next() :

val s = Seq("Hey","You") // Make sequence
val it = s.iterator // Get iterator
// Access one by one
println(it.next())  // "Hey"
println(it.next())  // "You"
println(it.next())  // This thrown an Exception

Note: java.util.NoSuchElementException

In a loop as long as hasNext is true :

val s = Seq("Hey","You") // Make sequence
val it = s.iterator // Get iterator
// Access one by one, as long as there are more
while it.hasNext do
  println(it.next())

Can use the 'normal' collection methods :

val s = Seq(1,2,3,4)
// Sum all even numbers
s.iterator.filter(_%2 == 0).sum

Why bother? :

// Sum all even numbers in range
val r =(1L to 100000000L)
// Will give java.lang.OutOfMemoryError
r.filter(_%2 ==0).sum

Note: java.lang.OutOfMemoryError: Java heap space

The same with iterator :

// Sum all even numbers in range
val r =(1L to 100000000L)
r.iterator.filter(_%2 ==0).sum
// Will give 2500000050000000

Iterators - making your own

While we often use iterators provided by existing collections, sometimes we want to write our own
- For example if we implement a new kind of data structure, or
- want to generate a custom sequence of data without first storing it
An iterator in Scala inherits the Iterator trait, so we must specify the two methods
next(): A
to provide the next value if there is one
- Note the explicit (). This used in Scala for methods that does not take parameters, but has side-effects
hasNext: Boolean
to tell a caller if there is a next element to get

Iterators - example

Let us make an Iterator that calculates a numerical sequence ('orbit') using the following rule:

Take a positive integer.

If the number is even divide it by two, if it is odd then triple it and add one. Repeat.
More formally: given some $x_0 \in \mathbb{Z}^{+}$, define

\begin{equation} x_{n+1} = \begin{cases} x_n/2 & \mbox{if $x_n$ is even}\\ 3 x_n + 1 & \mbox{if $x_n$ is odd}\\ \end{cases} \end{equation}
For example, $x_0 = 5$ will give the sequence 5→16→8→4→2→1
In mathematics there is a conjecture (a famous one by Collatz) that this sequence will eventually reach 1, for every possible initial value.
If we assume the conjecture is true we can write a Scala Iterator that gives every value in the sequence until it reaches 1 (which we'll exclude for brevity).

Iterators - example cont.

Given some $x_0 \in \mathbb{Z}^{+}$, define $x_{n+1} =\begin{cases}x_n/2 & \mbox{if $x_n$ is even}\\3 x_n + 1 & \mbox{if $x_n$ is odd}\\\end{cases}$

// A function which returns a new Iterator
// representing the sequence of numbers
// starting at `start` and ending when
// the sequence reaches 1 (not included)
def orbitIterator(start: BigInt) : Iterator[BigInt] = 
  require(start>0, "Must start from a positive value")
  // new is used to create a new object
  new Iterator[BigInt]:
    //`x` is the current value, initially `start`
    private var x = start

    // Our implementation of `hasNext`
    // Orbit ends if we reach 1
    def hasNext: Boolean = 
      (x != 1)

    // Our impmenentation of `next()`
    // Calculated according to rule
    def next(): BigInt =
      // Temporarily hold the current value
      val result = x
      // Update current number
      x = if x%2 == 0 then x/2 else 3*x+1 
      // Return previous value
      result

// Let's test:
// Explicitly with while
val o5 = orbitIterator(5)
while o5.hasNext do
  println(o5.next())

// Or the compactly with for
for x <- orbitIterator(5) do
  println(x)

Results (from either call):

Iterators quiz

Assume we define a function to create an iterator over the bits in a Byte:

def bitsIt(w: Byte): Iterator[Boolean] =
  new Iterator[Boolean]:
    private var i: Int = 8
    def hasNext: Boolean = (i > 0)
    def next(): Boolean =
      i = i - 1
      (((w>>>i) & 0x1) == 0x1)

Now, if we use bitsIt as

val w: Byte = 7
var n = 0
for b <- bitsIt(w) do
  if b then n+=1

What is the value in n after the loop finishes?

Reusable collections

Common patterns and abstractions
- Sequences, Sets, Maps
Included as standard libraries in many modern programming languages

Scala collections

Common and uniform framework for working with data

Figure from the Scala 3 book

Comes in two flavours…

Immutable

Mutable

Figures from the Scala 3 book

So many collections…

mutable and immutable versions
Many different classes and traits for each kind of collection
- Sequences : e.g. Vector, List, ArraySeq,…
- Sets : e.g. Set, SortedSet, BitSet,…
- Map : e.g. Map, HashMap, ListMap, …

Taka a few from the immutable branch…

scala> import scala.collection.immutable._
import scala.collection.immutable._

scala> Vector(1,2,3,4).sum
val res1: Int = 10

scala> List(1,2,3,4).sum
val res2: Int = 10

scala> ArraySeq(1,2,3,4).sum
val res4: Int = 10

Seems to do the same thing in the case of sum, at least, what is different?

Traits and Classes

High level traits , which describes the abstract interface
The interface is implemented in the concrete class
For example, Map is an abstract trait, while HashMap and ListMap are two different concrete classes implementing it
- (They differ in the underlying data structure implementation, but have the same interface)
Most traits have companion objects which allows instantiation; for example:

scala> import scala.collection.immutable._
import scala.collection.immutable.*

scala> val m = Map("a"->1, "b"->2, "c"->8) // Getting a Map, but which kind?
val m: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 8)

Traits and Classes

But which concrete class is actually used? (Unless you explicitly create one with new)

scala> m.getClass // As created on previous slide
val res1: Class[? <: Map[String, Int]] = class scala.collection.immutable.Map$Map3

A class called Map3, OK, is this always the case?

Map("a"-> 1, "b"-> 2, "c"-> 8, "f"-> 9, "k"-> -1).getClass
val res2: Class[? <: Map[String, Int]] = class scala.collection.immutable.HashMap

(Apparently not, why?)

Internal implementation

Some classes implement the same trait, but use different data structures or algorithms internally
There are difference in performance for common operations
Usually no 'one size fits all' when it comes to working with data
It can be important to know these characteristics when programming some applications
For example, ListBuffer and ArrayBuffer both implement Buffer, but
- ListBuffer is fast to add an element at the beginning (and at the end), but 'slow' to index
- ArrayBuffer is fast to index, but 'slow' if you need to add an element to the beginning
See performance characteristics for different collection types here

We will talk more of how to analyse performance next round

Much more about this in the course CS-A1140 Data Structure and Algorithms

mutable collections

Classes in scala.collection.mutable are mutable - they can be changed (updated) after creation.

scala> val x = scala.collection.mutable.Seq(1,2,3,4)
val x: scala.collection.mutable.Seq[Int] = ArrayBuffer(1, 2, 3, 4)

scala> x(1) = 7 // We can change value at index 1

scala> x // Look, updated!
val res1: scala.collection.mutable.Seq[Int] = ArrayBuffer(1, 7, 3, 4)

immutable collections

Classes in scala.collection.immutable are immutable - they cannot be updated after creation

scala> val y = scala.collection.immutable.Seq(1,2,3,4)
val y: Seq[Int] = List(1, 2, 3, 4)

scala> y(1) = 7 // This won't work, look:
       ^
       error: value update is not a member of Seq[Int]
       did you mean updated?

scala> val z = y.updated(1,7) // Can update index 1 to 7, but
val z: Seq[Int] = List(1, 7, 3, 4) // `updated` returns a new object

scala> y // Look, y is still the same
val res1: Seq[Int] = List(1, 2, 3, 4)

Why both mutable and immutable?

Different programming styles and use-cases
Mutability allows updating in place (e.g. for keeping state)
- But, requires a certain level of discipline to use as a programmer
- E.g. sensitive to update order
Immutability somewhat restricts how we work with the collection
- Mappings/transforms of the data, rather than state
- Can be less prone to certain kind of bugs
- Useful for functional and concurrent/parallel programming

Are immutable collections inefficient?

Not necessarily (although, they can be made so)
Remember – we get a new collection (data structure), not necessarily a completely new set of data!
- Because the data is immutable, parts of it could be shared
Well designed data structures can share substructures, making the required changes small
- We will see examples of this in late rounds

Exercises

Exam grader (collections and functional style)
Frequencies (collections and functional style)
Call-by-name and closures
Sequence–generating iterators
(Challenge) Image statistics (functional style)
(Challenge) Bus schedules (using iterators)
(Challenge) Church numerals

Hint: Read the examples in the course material and on these slides!
Hint: Be familiar with the Scala documentation for collections such as Seq, Set, and Map

Note: Programming restrictions
- Some exercises forbid the use of specific constructions or data types
  - (For example var, Array)
- The grader checks for the forbidden constructions and automatically give 0 points
- The unit tests does not check for this (they only verify correctness, not how the solution is implemented)
- Make sure to read the instructions and source code carefully

Nice to know: Pattern matching

It is quite common that a collection contains tuples of data
In which case it is convenient to use pattern matching expressions
- No need for constructions involving _._1 and friends

For example, swapping elements

scala> val l = List(("apple",3.5), ("banana", 2.1), ("orange", 2.5))
val l: List[(String, Double)] = List((apple,3.5), (banana,2.1), (orange,2.5))

scala> l.map((x,y) => (y,x)) // Swap positions
val res0: List[(Double, String)] = List((3.5,apple), (2.1,banana), (2.5,orange))

scala> l.map((_,x) => x).sum // Sum of second elements
val res1: Double = 8.1

Previous syntax will work with basic tuples. For nested tuples a case expression is necessary:

// Zip previous list with its index
scala> val dataIndex = l.zipWithIndex
val dataIndex: List[((String, Double), Int)] = List(((apple,3.5),0),
    ((banana,2.1),1), ((orange,2.5),2))

// Pick out indices of values with p < 3; note the { case }
scala> dataIndex.filter{case ((n,p),i) => p < 3}.map((_,i)=>i)
val res3: List[Int] = List(1,2)

Nice to know: pattern matching - unapply

Pattern matching case expressions will work for any object with an unapply method
Created automatically for case classes:

scala> case class Person (fname:String, lname:String)

scala> val people = Seq(Person("Gwen", "Stacy"),
  Person("Miles", "Morales"), Person("Peter", "Parker"))

scala> people.map{case Person(_,last) => last}
val res11: Seq[String] = List(Stacy, Morales, Parker)

So, what does this one do?

scala> people.filter{case Person(f,l) => f.head != l.head}