Collections and functions

CS-A1120 Programming 2

Lukas Ahrenberg

Department of Computer Science
Aalto University

Course calendar (reminder)

course-calendar-2026.svg

Module 2 - Programming abstractions and analysis

We have learned the basics of a computer, but

  • Efficient programming requires abstractions and models that are
    • recurrent : need encountered often in practice
    • useful : help control and encapsulate a problem
    • efficient : for the computer to execute

High-level learning goals

  • Round 6: Collections and functions
  • Round 7: Efficiency
  • Round 8: Recursion

After this module, you

  • are familiar with Scala 3's collection framework
  • are able to program using the functional style
  • can employ recursive problem solving and data structures
  • can analyse the efficiency and complexity of a basic function

After this round, you …

  • can program in the functional style
  • are familiar with the use of anonymous functions
  • can explain properties of higher-order functions
  • define function side–effects and pure functions
  • implement a basic iterator in Scala
  • are aware of the properties of Scala collections

Module 1 - Module 2 bridge

Getting from a high-level language (Scala, C++, Rust, …) to machine code?

  • We now know that binary machine code is the low-level program fed to a processor
  • You have programmed a basic processor in assembly, which is a thin human-readable mnemonic for machine code
    • Instructions are more or less 1:1 to machine code instructions and mapped using an assembler
  • High level languages use Compilers, which transform a program in some readable form into byte-code for a virtual machine or directly into machine code for a specific architecture.

scala-to-hardware.svg

Abstraction

  • Hardware implements the machinery for computation
  • Universality and programmability allows us to add programming constructs and abstraction
    • We do not have to program in assembly code
    • Instead we use compilers or interpreters to derive machine code from language constructs [source code]
  • Control flow abstraction
    • Allow reusable code through functions & subroutines
  • Data structure abstraction
    • Add high-level structure to the fundamental level of words in memory
    • Types, Traits, Structures, Classes …
  • Programming languages allow us to reason and analyse on a higher level
    • Independent of hardware, but informed by its limitations

Data structure abstraction

  • Different Implementations and Data Structures are suitable for different kinds of functionality and data
  • Data structures usually work on different abstraction levels:
    • on the fundamental level, everything will be stored in the sequential memory as words
    • but logical/conceptual levels are added to structure and work with the data
  • For example, an array can be a fairly 'thin' abstraction (but is still an abstraction)
    • Needed: start address, number of elements, size of element

mutable and immutable

  • In Scala we have var (a variable) and val (a value)
    • var is mutable
    • val is immutable
  • Scala collections (such as List, ArrayBuffer, …) also come in two flavours
  • mutable collections can be changed
  • immutable collections cannot
    • a new collection reflecting the update is created
  • Immutable collections is the default in Scala (since v. 2.13)
    • That is, if you simply say Seq(2,0,2,4) you will get (some) immutable sequence
    • Exception: Array is a special collection, and always mutable (corresponds to Java arrays)

Reminder: Scala variables and parameters of objects are references, not the objects themselves!

This is why

import scala.collection.mutable.ArrayBuffer
val x = ArrayBuffer("a","b","c")
val y = x
println(s"x = $x")
println(s"y = $y")
x(0) = "CHANGED"
println(s"x = $x")
println(s"y = $y")

will print:

x = ArrayBuffer(a, b, c)
y = ArrayBuffer(a, b, c)
x = ArrayBuffer(CHANGED, b, c)
y = ArrayBuffer(CHANGED, b, c)

Much simplified view:
object-refs.svg

val y = x does NOT create a copy of the ArrayBuffer object, it simply creates a copy of the reference.

Heap and Stack memory

  • Operating systems and programming languages provides abstractions for how programs uses the underlying system memory
  • To provide e.g. dynamic structures (such as buffers) or subroutines
  • Most modern systems divide memory into the stack and the heap

(There are also data structures called stacks and heaps, we will return to them in later courses.)

The Heap

  • Is an (unstructured) block of memory
  • When we request heap memory, for example by creating an object, e.g. an ArrayBuffer, in Scala
    • The system allocates (reserves) a portion of the heap for this and gives us the reference
    • When the object is not needed any more the memory can be freed up for future use
    • Heap memory management can be tricky, there are many different approaches, e.g.
      • Scala, Java, Python, … does the work for you by garbage collection
      • Rust relies on an ownership system to avoid the overhead of garbage collection
      • In C, it is up to you to allocate and return memory to the system
  • Heap memory is useful for dynamic structures or objects which we may want to pass between functions

lego-heap.webp

The Stack

  • Is a region of memory that is managed in Last-in-first-out manner
  • When new stack memory (a stack frame) is allocated
    • it is always put directly after ('on top') of the previous one in the stack
    • but, the frames can only be freed in the reverse order (the top one first)
  • Stack memory is restrictive, but also much simpler and can be faster
    • Languages generally do stack allocation automatically, e.g. when calling a function
    • Stack memory is used to save the function state and return address

lego-stack.webp

Imperative and functional programming

Scala supports both imperative and functional programming styles

Imperative programming

  • Uses statements
    • Think commands
      • First, do this
      • then do that…
  • Changes values (program state)
val v = Vector(1.0, 2.5, 3.0)
var i = 0
var sumOfSquares = 0.0
while i < v.length do
  sumOfSquares += v(i)*v(i)
  i=i+1

Functional programming

  • Expressions of functions
    • In mathematics:
      • Application: \(y = f\left(x\right)\)
      • Composition: \(h = g \circ f\)
  • Transforms values
val v = Vector(1.0, 2.5, 3.0)
val sumOfSquares = v.map(x=>x*x).sum

The functional style

  • Generally uses immutable data types and val
  • Loops often implemented with recursion
  • Characteristics are, for example
    • Higher order functions
    • Anonymous functions

val v = Vector(1.0, 2.5, 3.0)
// In the following,
// map is a higher-order function
// taking the anonymous function
// x=>x*x as a parameter
val squares = v.map(x=>x*x)
val sumOfSquares = squares.sum  

Functional programming does not influence what we can program, but how we program it.

Anonymous functions

(Also known as lambda functions, lambdas, or function literals)

  • A function we don't bother 'giving a name', only a body
  • Why? Sometimes the function is only needed once

    val v = Vector(3,4,5,2,8,7)
    val evens = v.filter(x => x%2 == 0)
    
  • Structure: (parameter) => (function body),
    • e.g. x => x%2 == 0
  • Scala also allows for a simplified _ notation:

    val v = Vector(3,4,5,2,8,7)
    val evens = v.filter(_%2 == 0)
    
  • Anonymous functions are values, and can be assigned to val and var:

    val isEven = (x: Int) => (x%2 == 0)
    val v = Vector(3,4,5,2,8,7)
    val evens = v.filter(isEven)
    

Function types

In the Scala REPL:

scala> val isEven = (x: Int) => (x%2 == 0)
val isEven: Int => Boolean = Lambda$2155/0x00000008409b1040@418ee41b
  • Note the type of isEven: Int => Boolean
  • The form A => B denotes a function type
  • So, Int => Boolean means that the val holds a reference to a function which takes an Int and returns a Boolean
  • Compare to how we in mathematics write \(f:A \mapsto B\) for some function that maps a value in domain A to the co-domain B
    • For example \(f:\mathbb{Z}\mapsto \left\{0,1\right\}\),
    • or, why not, \(\mbox{isEven} : \mbox{Int} \mapsto \mbox{Boolean}\)?

map.svg

Higher order functions

  • Now that functions can be treated like values
    • (in programming language lingo we say that functions are first class)
  • We can write functions that take other functions as parameters, or has functions as return values
  • Such functions are called higher order functions
  • For example, the filter method in Seq is higher order, as we have seen
    • Signature: def filter(pred: (A) => Boolean): Seq[A]
    • Takes the predicate function pred then uses that to determine which elements to filter
    • For example:

        scala> Seq(1,2,3).filter(x => x % 2 == 0)
        val res13: Seq[Int] = List(2)                                    
      
    • Where we provide the anonymous function x => x%2 == 0 as pred

Function objects

What is this sorcery? How can functions (=program code) be given as a parameter to another function?

Well, if there could be a type Int, of some parameter, why couldn't there be one of type Int => Bool?

  • Scala internally, this works because functions are objects
    • (and you can send an object as a parameter, right?)
  • For example, there is a trait Function1 for single argument functions, so we could do:
object isEven extends Function1[Int,Boolean]:
  def apply(x: Int): Boolean = (x % 2 == 0)
end isEven

val v = Vector(3,4,5,2,8,7)
val evens = v.filter(isEven)

(The construction val isEven: Int => Boolean = x=>x%2 == 0 is 'syntactic sugar'.)

Using collections in a functional style

  • scala.collection offers many functional utility methods
  • For example: forall, foreach, map, flatMap, groupBy, foldLeft, zip which you can read about in the course material
  • For the exercises you may also want to consult the Scala documentation for such traits as Seq and Map and read about other useful methods such as splitAt, keys, filter, toSet, toMap, toVector, indexWhere, take, min, and max (where they are available)
  • If you need a refresher, chapter 6 of the CS-A1110 material has a lot to offer

Side effects

A function has side effects if it does anything else than take a value and return a result (computed somehow).

For example, it has side effects if it:

  • it causes some other observable interaction with its environment
    • e.g. prints something,
    • writes to a file,
    • causes an exception
  • it changes the state of some externally visible object (or global variable)
  • it calls some other method/function that have side effects

(The fact that execution takes time and uses memory is not considered a side effect.)

Side effects, for example

  • println, has side effects, because it prints a string to a console.
  • def f(a: Int, b: Int): Int = a+b is side effects free, because it simply calculates a value.
  • However,

    def g(a: Int, b: Int): Int =
      println(s"a = $a, b = $b")
      a+b
    

    has side effects. (Why?)

  • Changing the state of something mutable inside the function is also considered a side effect:

    var x = 0
    def h(v: Int): Int =
      if v > x then
        x = v
      x
    

    has a side effect (Why?)

  • Updating a value on return is not considered a side effect

    var x = 0
    x = if v > x then v else x
    

    does not have side effects

Which of the following functions have side effects?

def printDivisibleBy3(S: Seq[Int]): Unit =
  S.filter(x=> (x % 3) == 0).foreach(x=>println(x))
def cSum(A : Array[Int]): Array[Int] =
  val B: Array[Int] = A
  var i = 1
  while i < A.length do
    B(i) = B(i-1) + A(i)
    i += 1
  B
def zipWithReverse(S: Seq[Int]): Seq[(Int,Int)] =
  S.zip(S.reverse)
import scala.util.Random
def addNoise(S: Seq[Double]): Seq[Double] =
  S.map(x => x + Random.nextGaussian())

https://presemo.aalto.fi/prog2

35c9e48f86af625bba6aa15dfb73713f-300.svg

Pure functions

A function is said to be pure if

  1. It is does not have side effects, and
  2. evaluating it with the same argument, always gives the same result
  • Are the following functions pure, and if not why?

    • def f(a: Int, b: Int) = if a > b then a else b
    • println
    • Random.nextGaussian
    • java.time.LocalDateTime.now

    https://presemo.aalto.fi/prog2

    35c9e48f86af625bba6aa15dfb73713f-300.svg

Closure

In the following (slow) test for prime numbers:

def isPrimeSlow(n: Int): Boolean =
  require(n > 1)
  (2 until n).forall(divisor => n % divisor != 0)

  • the anonymous function divisor => n % divisor != 0, does not have n as a parameter
    • n is a free variable in the closure of the anonymous function
  • The closure of a function consists of the function itself and the referencing environment (its scope)
  • A function can be passed as an argument to, or returned from, another function, and still has access to its closure

Example: Prefixing messages (don't do it like this)

  • Assume we want a function prefixPrintln that prints a string but inserts a line number and a user-configurable prefix first.
  • For example with the prefix set to "A>" and line numbers starting at 1, calling prefixPrintln("Hello") should print [1]A>Hello, and if we made the same call again in the same session, we would expect [2]A>Hello.

// These var:s exist in the global scope
// [for the sake of demo, in general a bad idea]
var prefix = "A>"
var lines = 0
// println on the format [<lines>]<prefix><s>
def prefixPrintln(s: String) = 
  lines +=1
  println(s"[$lines]$prefix$s")
scala> prefixPrintln("Hello")
[1]A>Hello

scala> prefixPrintln("World")
[2]A>World

  • var's declared in the global scope

    scala> prefix = "B>"
    prefix: String = B>
    
    scala> lines = 4324324
    lines: Int = 4324324
    
    scala> prefixPrintln("Test")
    [4324325]B>Test
    
  • Updating prefix or lines changes how the function behaves
  • Global var is a bad idea

Example: Using objects for encapsulation

  • The object-oriented way to solve this is with a class where prefix is given as parameter and lines is encapsulated
class PrefixPrinter(prefix: String):
  private var lines = 0L
  // apply method lets us call objects using ()
  def apply(s: String) =
    lines += 1
    println(s"[$lines]$prefix$s")
scala> val prefixPrintln = PrefixPrinter("A>")
val prefixPrintln: PrefixPrinter =
  PrefixPrinter@7be7c052

scala> prefixPrintln("Hello")
[1]A>Hello

scala> prefixPrintln("World")
[2]A>World

This works fine, but we can achieve the same thing using only functions

Example: Using Closures

  • We can let lines and prefix be in the closure of a function that we return from a factory function:
// Returns a println like function that prefixes everything
// with [n]`prefix`, where n is the line number,
// starting at 1
def createPrefixPrint(prefix: String): String => Unit =
  var lines = 0
  s =>                               // s is the parameter
       lines+=1                      // first increase
       println(s"[$lines]$prefix$s") // then print
scala> val prefixPrintln = createPrefixPrint("A>")                                                                         
val prefixPrintln: String => Unit = Lambda$1682/0x0000000840835040@25e796fe
                                                                                                                           
scala> prefixPrintln("Hello")
[1]A>Hello
                                                                                                                           
scala> prefixPrintln("World")
[2]A>World

  • The return type of factory createPrefixPrint is the function String => Unit
    • That is, the value returned is itself a function which takes a parameter of type String and returns Unit (nothing)
    • The reason for Unit is that this is the return type of println
  • The return value is the anonymous function starting s => which is of type String => Unit

Multiple parameter lists

  • Multiple parameters to a Scala function is usually given as a tuple. For example:

    def sumTwo(a:Int, b:Int): Int = a + b
    
    • Called as e.g. sumTwo(5,6)
  • But, we can also use multiple parameter lists to do the same thing:

    def sumTwo(a:Int)(b:Int): Int = a + b
    
    • Called as e.g. sumTwo(5)(6)
  • You have used this before, for example with the foldLeft method found in any iterable Scala collection.
  • Mathematically, this can be thought of as
    • def sumTwo(a:Int, b:Int): Int corresponds to \(\mbox{sumTwo} : \mbox{Int} \times \mbox{Int} \mapsto \mbox{Int}\)
    • def sumTwo(a:Int)(b:Int): Int corresponds to \(\mbox{sumTwo} : \mbox{Int} \mapsto \left( \mbox{Int} \mapsto \mbox{Int} \right) \)

Currying (good to know)

  • Changing from a multiple parameters to a sequence of functions is sometimes called currying (after the logician Haskell Curry).
    • This terminology is discouraged for multiple parameter lists in Scala as the implementation is technically different
    • There is a method called curried which converts a function of multiple parameters to a sequence of lambda functions
    • The call site of the resulting function sequence looks identical to multiple parameter lists
scala> def sumTwo(a:Int, b:Int): Int = a + b
def sumTwo(a: Int, b: Int): Int
                                                                                                                                        
scala> val sumTwoCurry = sumTwo.curried
val sumTwoCurry: Int => Int => Int =
  scala.Function2$$Lambda$1815/0x0000000840899840@225ac5e7
                                                                                                                                        
scala> sumTwoCurry(10)(11)
val res0: Int = 21

Partial application

  • Curried function and functions with multiple parameter lists allow for programming flexibility partial application,
    • Meaning that it is called with some of the parameters, thereby returning a new function instead of the value
  • For example, given def sumTwo(a:Int)(b:Int): Int = a + b

    • We can bind the first argument, a, to some value (for example 5)
    • Example in the REPL to show type:
    scala> val fivePlus = sumTwo(5)
    val fivePlus: Int => Int = $Lambda$1348/0x00000008406e6840@5381cdb6
    
  • The above returns a new anonymous function which is essentially b: Int => 5 + b.
  • Here is fivePlus used in the REPL:

    scala> fivePlus(3)
    val res: Int = 8
    
    scala> fivePlus(7)
    val res: Int = 12
    

Partial application quiz

Assume we define the following to concatenate strings:

def conc(x: String)(y: String): String = x + y
val partial = conc("A")

What is the type of partial?

What is the result of running partial("B")?

What is the result of running partial("C") + partial("B")?

https://presemo.aalto.fi/prog2

35c9e48f86af625bba6aa15dfb73713f-300.svg

Call by

Scala (and many other languages) has more than one way of evaluating arguments given to functions

  • Call-by-value ("eager"-, "strict" evaluation): the arguments are evaluated before executing the function
  • Call-by-name ("loose"-, "non-strict" evaluation): the arguments are evaluated when they are needed
  • Call-by-need ("lazy" evaluation): the arguments are evaluated loosely, but at most once
  • Call-by-value is standard in Scala
  • Call-by-name and Call-by-need is useful for example when
    • We may want to provide function blocks to be executed by the function
    • It is costly to calculate the parameter (and the function only uses it sometimes)

A parameter can be made 'call-by-name' by using => first in the type definition. Let's create another function that prints a string with a prefix. Let's have it use multiple parameter lists, one for the prefix and another for the message string. Moreover, make the prefix call-by-name:

// prefix is call-by-name, and will be evaluated when needed (when it is printed)
// msg is call-by-value and stays the same as when when logMessage is called
def logMessage(prefix: => String)(msg: String): Unit = println(s"$prefix$msg")

Note the parameter prefix has the type => String.

Call-by-name vs call-by-value

Call-by-name (note prefix: => String)

// Import now, which gives the current time
import java.time.LocalDateTime.now

// Log message function, note that prefix is call-by-name
def logMessage(prefix: => String)(msg: String): Unit = 
  println(s"$prefix$msg")

// Create a log function that prefix everything
// with the current time
val logThis = logMessage(now().toString + ": ")

// Use it to log a few things
logThis("Captain's log")
logThis("Star-date...")
logThis("I can't remember!")

Output:

2025-03-29T16:49:40.486410873: Captain's log
2025-03-29T16:49:40.486757394: Star-date...
2025-03-29T16:49:40.486890486: I can't remember!
  • Note that the last decimals in the time changes!
  • prefix is evaluated each time it is needed by println

Call-by-value (note prefix: String)

// Import now, which gives the current time
import java.time.LocalDateTime.now

// Log message function, note that prefix is call-by-value
def logMessage(prefix: String)(msg: String): Unit = 
  println(s"$prefix$msg")

// Create a log function that prefix everything
// with the current time
val logThis = logMessage(now().toString + ": ")

// Use it to log a few things
logThis("Captain's log")
logThis("Star-date...")
logThis("I can't remember!")

Output:

2025-03-29T16:55:24.388120025: Captain's log
2025-03-29T16:55:24.388120025: Star-date...
2025-03-29T16:55:24.388120025: I can't remember!
  • Note that the last decimals in the time stays constant!
  • prefix is evaluated once, which is when logThis is created

Iterators

  • An Iterator can used to access the elements of a collection one by one
  • It can mimic the behaviour of collections and have many of the same method
  • Iterator is a mutable object
    • (It has side effects)
    • Two basic operations:
      • next(): A
        returns the next element in the iterator
      • hasNext: Boolean
        tells us if there are more elements
  • Use the iterator method on a Scala collection to get one
  • Iterators are useful when we want to

    • Traverse the elements in a collection
    • Generate a sequence without storing all values

Getting an iterator and asking for next() :

val s = Seq("Hey","You") // Make sequence
val it = s.iterator // Get iterator
// Access one by one
println(it.next())  // "Hey"
println(it.next())  // "You"
println(it.next())  // This thrown an Exception

Note: java.util.NoSuchElementException

In a loop as long as hasNext is true :

val s = Seq("Hey","You") // Make sequence
val it = s.iterator // Get iterator
// Access one by one, as long as there are more
while it.hasNext do
  println(it.next())

Can use the 'normal' collection methods :

val s = Seq(1,2,3,4)
// Sum all even numbers
s.iterator.filter(_%2 == 0).sum

Why bother? :

// Sum all even numbers in range
val r =(1L to 100000000L)
// Will give java.lang.OutOfMemoryError
r.filter(_%2 ==0).sum

Note: java.lang.OutOfMemoryError: Java heap space

The same with iterator :

// Sum all even numbers in range
val r =(1L to 100000000L)
r.iterator.filter(_%2 ==0).sum
// Will give 2500000050000000

Iterators - making your own

  • While we often use iterators provided by existing collections, sometimes we want to write our own
    • For example if we implement a new kind of data structure, or
    • want to generate a custom sequence of data without first storing it
  • An iterator in Scala inherits the Iterator trait, so we must specify the two methods
  • next(): A
    to provide the next value if there is one
    • Note the explicit (). This used in Scala for methods that does not take parameters, but has side-effects
  • hasNext: Boolean
    to tell a caller if there is a next element to get

Iterators - example

  • Let us make an Iterator that calculates a numerical sequence ('orbit') using the following rule:

    Take a positive integer.

    If the number is even divide it by two, if it is odd then triple it and add one. Repeat.

  • More formally: given some \(x_0 \in \mathbb{Z}^{+}\), define

    \begin{equation} x_{n+1} = \begin{cases} x_n/2 & \mbox{if $x_n$ is even}\\ 3 x_n + 1 & \mbox{if $x_n$ is odd}\\ \end{cases} \end{equation}
  • For example, \(x_0 = 5\) will give the sequence 5→16→8→4→2→1
  • In mathematics there is a conjecture (a famous one by Collatz) that this sequence will eventually reach 1, for every possible initial value.
  • If we assume the conjecture is true we can write a Scala Iterator that gives every value in the sequence until it reaches 1 (which we'll exclude for brevity).

Iterators - example cont.

Given some \(x_0 \in \mathbb{Z}^{+}\), define \(x_{n+1} =\begin{cases}x_n/2 & \mbox{if $x_n$ is even}\\3 x_n + 1 & \mbox{if $x_n$ is odd}\\\end{cases}\)

// A function which returns a new Iterator
// representing the sequence of numbers
// starting at `start` and ending when
// the sequence reaches 1 (not included)
def orbitIterator(start: BigInt) : Iterator[BigInt] = 
  require(start>0, "Must start from a positive value")
  // new is used to create a new object
  new Iterator[BigInt]:
    //`x` is the current value, initially `start`
    private var x = start

    // Our implementation of `hasNext`
    // Orbit ends if we reach 1
    def hasNext: Boolean = 
      (x != 1)

    // Our impmenentation of `next()`
    // Calculated according to rule
    def next(): BigInt =
      // Temporarily hold the current value
      val result = x
      // Update current number
      x = if x%2 == 0 then x/2 else 3*x+1 
      // Return previous value
      result

// Let's test:
// Explicitly with while
val o5 = orbitIterator(5)
while o5.hasNext do
  println(o5.next())
// Or the compactly with for
for x <- orbitIterator(5) do
  println(x)                          

Results (from either call):

5
16
8
4
2

Iterators quiz

Assume we define a function to create an iterator over the bits in a Byte:

def bitsIt(w: Byte): Iterator[Boolean] =
  new Iterator[Boolean]:
    private var i: Int = 8
    def hasNext: Boolean = (i > 0)
    def next(): Boolean =
      i = i - 1
      (((w>>>i) & 0x1) == 0x1)

Now, if we use bitsIt as

val w: Byte = 7
var n = 0
for b <- bitsIt(w) do
  if b then n+=1

What is the value in n after the loop finishes?

Reusable collections

  • Common patterns and abstractions
    • Sequences, Sets, Maps
  • Included as standard libraries in many modern programming languages

Scala collections

Common and uniform framework for working with data

collections-diagram-213.svg

Figure from the Scala 3 book

Comes in two flavours…

Immutable collections-immutable-diagram-213.svg
Mutable collections-mutable-diagram-213.svg
Figures from the Scala 3 book

So many collections…

  • mutable and immutable versions
  • Many different classes and traits for each kind of collection
    • Sequences : e.g. Vector, List, ArraySeq,…
    • Sets : e.g. Set, SortedSet, BitSet,…
    • Map : e.g. Map, HashMap, ListMap, …

Taka a few from the immutable branch…

scala> import scala.collection.immutable._
import scala.collection.immutable._

scala> Vector(1,2,3,4).sum
val res1: Int = 10

scala> List(1,2,3,4).sum
val res2: Int = 10

scala> ArraySeq(1,2,3,4).sum
val res4: Int = 10

collections-immutable-diagram-213.svg

Seems to do the same thing in the case of sum, at least, what is different?

Traits and Classes

  • High level traits , which describes the abstract interface
  • The interface is implemented in the concrete class
  • For example, Map is an abstract trait, while HashMap and ListMap are two different concrete classes implementing it
    • (They differ in the underlying data structure implementation, but have the same interface)
  • Most traits have companion objects which allows instantiation; for example:

collections-immutable-diagram-213.svg

scala> import scala.collection.immutable._
import scala.collection.immutable.*

scala> val m = Map("a"->1, "b"->2, "c"->8) // Getting a Map, but which kind?
val m: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 8)

Traits and Classes

But which concrete class is actually used? (Unless you explicitly create one with new)

scala> m.getClass // As created on previous slide
val res1: Class[? <: Map[String, Int]] = class scala.collection.immutable.Map$Map3

A class called Map3, OK, is this always the case?

Map("a"-> 1, "b"-> 2, "c"-> 8, "f"-> 9, "k"-> -1).getClass
val res2: Class[? <: Map[String, Int]] = class scala.collection.immutable.HashMap

(Apparently not, why?)

Internal implementation

  • Some classes implement the same trait, but use different data structures or algorithms internally
  • There are difference in performance for common operations
  • Usually no 'one size fits all' when it comes to working with data
  • It can be important to know these characteristics when programming some applications
  • For example, ListBuffer and ArrayBuffer both implement Buffer, but
    • ListBuffer is fast to add an element at the beginning (and at the end), but 'slow' to index
    • ArrayBuffer is fast to index, but 'slow' if you need to add an element to the beginning
  • See performance characteristics for different collection types here

We will talk more of how to analyse performance next round
Much more about this in the course CS-A1140 Data Structure and Algorithms

mutable collections

Classes in scala.collection.mutable are mutable - they can be changed (updated) after creation.

scala> val x = scala.collection.mutable.Seq(1,2,3,4)
val x: scala.collection.mutable.Seq[Int] = ArrayBuffer(1, 2, 3, 4)

scala> x(1) = 7 // We can change value at index 1

scala> x // Look, updated!
val res1: scala.collection.mutable.Seq[Int] = ArrayBuffer(1, 7, 3, 4)

immutable collections

Classes in scala.collection.immutable are immutable - they cannot be updated after creation

scala> val y = scala.collection.immutable.Seq(1,2,3,4)
val y: Seq[Int] = List(1, 2, 3, 4)

scala> y(1) = 7 // This won't work, look:
       ^
       error: value update is not a member of Seq[Int]
       did you mean updated?

scala> val z = y.updated(1,7) // Can update index 1 to 7, but
val z: Seq[Int] = List(1, 7, 3, 4) // `updated` returns a new object

scala> y // Look, y is still the same
val res1: Seq[Int] = List(1, 2, 3, 4)

Why both mutable and immutable?

  • Different programming styles and use-cases
  • Mutability allows updating in place (e.g. for keeping state)
    • But, requires a certain level of discipline to use as a programmer
    • E.g. sensitive to update order
  • Immutability somewhat restricts how we work with the collection
    • Mappings/transforms of the data, rather than state
    • Can be less prone to certain kind of bugs
    • Useful for functional and concurrent/parallel programming

Are immutable collections inefficient?

  • Not necessarily (although, they can be made so)
  • Remember – we get a new collection (data structure), not necessarily a completely new set of data!
    • Because the data is immutable, parts of it could be shared
  • Well designed data structures can share substructures, making the required changes small
    • We will see examples of this in late rounds

Exercises

  1. Exam grader (collections and functional style)
  2. Frequencies (collections and functional style)
  3. Call-by-name and closures
  4. Sequence–generating iterators
  5. (Challenge) Image statistics (functional style)
  6. (Challenge) Bus schedules (using iterators)
  7. (Challenge) Church numerals

  • Hint: Read the examples in the course material and on these slides!
  • Hint: Be familiar with the Scala documentation for collections such as Seq, Set, and Map

  • Note: Programming restrictions
    • Some exercises forbid the use of specific constructions or data types
      • (For example var, Array)
    • The grader checks for the forbidden constructions and automatically give 0 points
    • The unit tests does not check for this (they only verify correctness, not how the solution is implemented)
    • Make sure to read the instructions and source code carefully

Nice to know: Pattern matching

  • It is quite common that a collection contains tuples of data
  • In which case it is convenient to use pattern matching expressions
    • No need for constructions involving _._1 and friends

For example, swapping elements

scala> val l = List(("apple",3.5), ("banana", 2.1), ("orange", 2.5))
val l: List[(String, Double)] = List((apple,3.5), (banana,2.1), (orange,2.5))

scala> l.map((x,y) => (y,x)) // Swap positions
val res0: List[(Double, String)] = List((3.5,apple), (2.1,banana), (2.5,orange))

scala> l.map((_,x) => x).sum // Sum of second elements
val res1: Double = 8.1

Previous syntax will work with basic tuples. For nested tuples a case expression is necessary:

// Zip previous list with its index
scala> val dataIndex = l.zipWithIndex
val dataIndex: List[((String, Double), Int)] = List(((apple,3.5),0),
    ((banana,2.1),1), ((orange,2.5),2))

// Pick out indices of values with p < 3; note the { case }
scala> dataIndex.filter{case ((n,p),i) => p < 3}.map((_,i)=>i)
val res3: List[Int] = List(1,2)

Nice to know: pattern matching - unapply

  • Pattern matching case expressions will work for any object with an unapply method
  • Created automatically for case classes:
scala> case class Person (fname:String, lname:String)

scala> val people = Seq(Person("Gwen", "Stacy"),
  Person("Miles", "Morales"), Person("Peter", "Parker"))

scala> people.map{case Person(_,last) => last}
val res11: Seq[String] = List(Stacy, Morales, Parker)

So, what does this one do?

scala> people.filter{case Person(f,l) => f.head != l.head}