Abstractions for easy parallelism

Modern programming language usually come with handy abstractions encapsulating the concepts previously introduced in this chapter. In Scala, two of the most important ones are parallel collections and futures and promises.

Parallel collections

As you undoubtedly have noticed, collections are important components in most Scala programs. We saw some examples already during the warm-up exercises in Round 1, and you have since used lists, ranges, arrays, etc many times throughout this course.

A quite common pattern is that we have data stored in a collection which we wish to map, fold, or filter some function over. By default such an operation is sequential, but as we saw in the previous section certain operations could benefit from parallelization.

Scala provides a handy parallel collections module which hides much of the underlying details of parallelization and provides the same high-level interface as many of the normal collections. As long as you keep the caveats of parallelization in mind these can be a useful addition to your programmer’s toolkit.

Warning

The code examples for parallel collections used in this section and the reference article generally do not work from the console due to limitations in the Scala REPL. To play with the par example below you can download the accompanying source package and edit and run the program in it.

For a project configured to use parallel collections it is sufficient to invoke the .par method on a standard, sequential type to convert it to a parallel one. (But note that the data is copied, which could create an overhead in your program.)

For instance, if we had a vector, v, of integers, we could filter it to find numbers which are divisible by 3, but not by 5 by:

v.filter( i => (i % 3 == 0) && (i % 5 != 0))

This operation could be performed in parallel by first copying the values to a parallel collection:

// Same data as above but first create a parallel collection from v
v.par.filter(i => (i % 3 == 0) && (i % 5 != 0)) // Filter in parallel

If parallelizing an operation actually creates a speedup in reality depends on many factors including the type of operation and how many element is in the collection.

If you like to play with the small example above, you can find the following program in the accompanying code package:

import scala.collection.parallel.CollectionConverters._

import timer._

object ParExample {
  def main(args: Array[String]): Unit = {
    val n = 100000000
    val v = (1 to n).toVector // A normal, sequential vector of numbers

    // Pick out divisible by 3, but not 5
    val (x, wallX) =  measureWallClockTime(v.filter( i => (i % 3 == 0) && (i % 5 != 0)))

    // Same operation but in parallel, note the '.par'
    val (y, wallY) =  measureWallClockTime(v.par.filter(i => (i % 3 == 0) && (i % 5 != 0)))

    println(s"Sequential time - wall: ${wallX} s.")
    println(s"Par time        - wall: ${wallY} s.")
  }
}

Here we also measure wall clock time and print the result. Running the program on an Intel i7-6700K with 32 GiB RAM yields the following result:

Sequential time - wall: 0.827656085 s.
Par time        - wall: 0.487061796 s.

As you will recall from Measuring run time this measurement is not necessarily very precise (wall time are can depend on many factors), but there we appear to see an improvement when filtering data in the parallel collection. i

It is important to note that this may not be the case on your own computer. Run the program to check!

To understand why this is, consider that this simple example is a fairly light task for a modern computer, although it is performed on a huge set of numbers and may require a lot of memory. Therefore behaviour can be dependent on computer hardware and the value of n. For example, changing n=1000 we have the result:

Sequential time - wall: 0.003097761 s.
Par time        - wall: 0.047233308 s.

The fact that sequential operation is faster in this case is because parallel execution comes with a setup overhead which dominates for small data sets and quick operations.

However, as the data set and computational demand increases parallel collections can be very beneficial.

In one of the assignments you will use parallel collections to speed up matrix multiplications with the potential of a better speedup.

Before you do that however, have a look at the parallel collections on the Scala language web pages for a further introduction. (Pay attention to the examples of odd behaviour listed under Semantics; it is up to you as a programmer to be aware of these.)

Futures and Promises

As observed previously, if we can encode the dependency of values and functions using the same dependency graph, it is possible to identify dependencies, and have parts of a program operate concurrently.

The concepts of Futures and Promises in Scala are programming abstraction of these ideas. Futures allow you to declare that a value in your program is computed asynchronously and will complete eventually - in the future from the point you declared it. You can also make a promise that a value will be made available (at some point), while having other code dependent on the future of this promised value.

Futures and Promises allow for a high-level approach to concurrent programming. For example, your program may need to send out a number of concurrent requests to some online services and merge the data as it becomes available, or have some computation take place when for player-input becomes available while updating a game-scene, or parallelize a multi-stage computation with complicated dependencies. Futures and Promises provides one approach to achieve concurrency in situation such as these.

The following is a very brief introduction to these two concepts. You are encouraged to consult the external references given the end of each section for a more complete understanding.

Note

As you read the examples in this section and related articles, pay special attention to asynchronous blocks of program code. It is common for programmers of all levels of experience to think sequentially when reading source code, for example, to assume that all operations happens in order, one by one. However this can be deceiving. For example, a Future will begin execution on its own as soon as it is declared, while the rest of the program continues.

Futures

A future value is declared by using the Future, trait, and specifying the value is computed. For example, this Future value is simply the sum of two numbers:

import scala.concurrent.Future

val f = Future {
      2 +  3
    }

This means that the program should start computing f concurrently to the rest of the program, and that the result will be available in the future, when f completes. Before completion the value can not be accessed.

Of course, f as declared above only adds two constants - not much of a computation - so it is highly likely that you won’t notice the concurrency when you play around with it. The idea is of course that it takes time to compute or at least retrieve the Future value. For the purpose of instruction, we can fake this by telling the thread executing the Future to sleep for a bit.

To play a bit with Futures and see how they perform concurrently you can try the following.

Start a console and set it up for concurrency:

scala> import scala.concurrent._
import scala.concurrent._

scala> implicit val ec: scala.concurrent.ExecutionContext = scala.concurrent.ExecutionContext.global
ec: scala.concurrent.ExecutionContext = scala.concurrent.impl.ExecutionContextImpl@1aa955dc

Note

Concurrent programs require an execution context to operate. This represents the details of how the concurrent operations are carried out. In a program, the global execution context can be imported by import ExecutionContext.Implicits.global. If you try the examples in a console, you can declare implicit val ec: scala.concurrent.ExecutionContext = scala.concurrent.ExecutionContext.global to make the session use the global context.

Now we declare the same simple Future as before, but forces the thread executing it to sleep for 5 seconds before completing:

val f = Future {
   Thread.sleep(5000)
     2 + 3
   }

When executing this in the console the REPL will reply:

f: scala.concurrent.Future[Int] = Future(<not completed>)

And, if you attempt to print f within the next five seconds you will see that it still has not completed:

scala> print(f)
Future(<not completed>)

But after that the computation has succeeded and the value is available:

scala> print(f)
Future(Success(5))

Now you may ask “OK, how can I make the rest of my program wait for the Future to complete?” The answer is that you can, but that you should not. At least not explicitly. What you really want to do instead is to use the Future value in your subsequent code, which will create further Futures, and a representation of the value dependencies as discussed in the previous section.

Useful transformations here are map and flatMap, as well as the for loop construction. There is also onCompletion which allows you to register a callback function to execute when the Future completes.

Note

See the reference material for good examples for further examples on how to work with Futures, how to transform them, and combine them. Reading Futures and Promises as part of the Scala documentation is a very good way to prepare for the exercises.

Further examples:

Promises

A Promise is a statement about a Future value - a promise that the value will be made available by some computation in the future. A Promise has a Future, but the Promise itself does not specify how this Future will be completed - this is up to some other part of your program.

In contrast to a bare Future which states how to arrive at the value and thus can start immediately, a Promise simply states that this value will be filled in.

Try the following in your console (after setting it up for concurrency as you did in the Futures example above):

scala> val p = Promise[Int]()
p: scala.concurrent.Promise[Int] = Future(<not completed>)

You have now declared a Promise of a Future integer p. As you can see the promised Future is not complete (and it will stay this way unless we complete it, by declaring success, or break our Promise by declaring failure).

The promised Future can be accessed by the .future method, but as long as the Promise is not fulfilled (or broken) this future is not completed:

scala> p.future
res0: scala.concurrent.Future[Int] = Future(<not completed>)

When it is time to fulfill our Promise we can use the .success method to fill in a value. You can try this in the console as well:

scala> p.success(55)
res1: p.type = Future(Success(55))

Now the Promise has succeeded with value 55 and p.future is completed.

It is not only possible to keep a Promise and succeed, but also to break it and declaring failure (using .failure and providing an exception) when something goes wrong, which allows for error handling.

Crucially, a Promise can be completed while working on some other Future, as illustrated here. First we promise to deliver:

scala> val q = Promise[String]()
q: scala.concurrent.Promise[String] = Future(<not completed>)

Then we declare a Future, which, as a side effect will also complete q:

val h = Future {
q.success("Started")
Thread.sleep(5000)
"Ended"
}

You can check that q.future succeeds when h is declared (and h itself five seconds later).

These examples are quite simple for the purpose of illustration, and to provide something which is easy to play with at the console, but hopefully they have convey the basic idea.

Note

Further examples of Promises are provided in the scala documentation, for Futures, which you are encouraged to study next to prepare for the exercises.