Our first multithreaded program in Scala

Let us now get our hands dirty with multithreading in Scala. A common pattern in concurrent programming is that a master thread starts a number of worker threads that proceed independently with the work that the master has delegated to them.

Our first multithreaded program will use the main thread to start three worker threads that will report their progress to the console.

In Scala, a new thread of execution is prepared by extending the trait Runnable and providing an implementation for the abstract method run(). This implementation should contain the code we want to run in a thread. When the method returns, the thread stops.

Here is the code that we want to run in the worker threads:

class SumWorker(name: String, target: Int, report: Int) extends Runnable {
    def run() : Unit = {

      // This method contains the code we want to run in a thread

      // ... say, we want to take the sum of the integers i = 1,2,...,target,
      //     and issue a progress report to after every "report" integers

      var s = 0L
      var i = 1
      while(i <= target) {
        s = s + i
        if(i % report == 0) {
          println("Worker %s reporting at i = %d".format(name, i))
        }
        i = i + 1
      }
      println("Worker %s is done and reports sum s = %d".format(name, s))

      // The thread stops when run() returns, i.e. here
    }
  }

Observe that you can copy and paste the code above to the Scala console and nothing happens yet. Indeed, nothing should happen since we are simply declaring a new class.

Threads in Scala are objects instantiated from the class Thread. To get a new thread started, we first prepare (construct) the thread by handing a Runnable-object (that contains the code we want to run) to the constructor. Let us prepare three SumWorker-runnables and three threads to run the workers:

val wa = new SumWorker("SumA", 100000000, 10000000) // prepare worker A
val wb = new SumWorker("SumB", 100000000, 10000000) // prepare worker B
val wc = new SumWorker("SumC", 100000000, 10000000) // prepare worker C

val ta = new Thread(wa)         // prepare a thread for worker A
val tb = new Thread(wb)         // prepare a thread for worker B
val tc = new Thread(wc)         // prepare a thread for worker C

Again observe that nothing happens yet if you copy and paste what is above to the console. The threads are now ready to run, but not yet running.

We can start a thread by calling its start() method. Copy and paste the following to the console:

// Start threads of workers a, b, and c.
// One line with ; in between commands so we do not have to press
// return three times in the console
ta.start() ; tb.start() ; tc.start()

The threads are now running, and we observe reports from the running threads at the console:

Worker SumB reporting at i = 10000000
Worker SumA reporting at i = 10000000
Worker SumB reporting at i = 20000000
Worker SumC reporting at i = 10000000
Worker SumA reporting at i = 20000000
Worker SumC reporting at i = 20000000
Worker SumB reporting at i = 30000000
Worker SumA reporting at i = 30000000
Worker SumC reporting at i = 30000000
Worker SumB reporting at i = 40000000
Worker SumA reporting at i = 40000000
Worker SumC reporting at i = 40000000
Worker SumB reporting at i = 50000000
Worker SumA reporting at i = 50000000
Worker SumC reporting at i = 50000000
Worker SumB reporting at i = 60000000
Worker SumA reporting at i = 60000000
Worker SumC reporting at i = 60000000
Worker SumB reporting at i = 70000000
Worker SumC reporting at i = 70000000
Worker SumA reporting at i = 70000000
Worker SumB reporting at i = 80000000
Worker SumC reporting at i = 80000000
Worker SumB reporting at i = 90000000
Worker SumA reporting at i = 80000000
Worker SumC reporting at i = 90000000
Worker SumB reporting at i = 100000000
Worker SumB is done and reports sum s = 5000000050000000
Worker SumA reporting at i = 90000000
Worker SumC reporting at i = 100000000
Worker SumC is done and reports sum s = 5000000050000000
Worker SumA reporting at i = 100000000
Worker SumA is done and reports sum s = 5000000050000000

Observe how the threads are running in parallel, and independently of each other. In particular, the workers finish in the order B,C,A, which appears arbitrary, and is not the order the threads were started.

In fact, if you run the code above at the console yourself (which you should do!), you are likely to witness a different ordering from what is above.

What is going on?

Asynchronous execution

By default, threads execute asynchronously. That is to say, as programmers we cannot assume anything about how the executions of different threads are related to each other in time, unless we explicitly introduce dependency between the threads through synchronization.

Before proceeding with synchronization, let us look at a second example that gives a more vivid illustration of asynchronous execution.

Let us start 10 threads that send greetings to the console. First, the a class for greetings:

class Greeter(id: Int, num: Int) extends Runnable {
  def run() : Unit = {
    for(i <- 0 until num) {
      print("%d".format(id)) // greet by printing a single digit
    }
  }
}

Then we make sure to start a number of them:

val n = 10               // ten greeters -- one for each digit 0,1,...,9
val num_greetings = 200  // each greeter sends 200 greetings
val greeters = (0 until n).map(id => new Greeter(id, num_greetings))
val threads = greeters.map(new Thread(_))
threads.foreach(_.start())

Copy and paste the code above to the console. What you should observe is a sequence of 2000 digits 0,1,…,9. That is, 200 greetings from each of the 10 threads. Repeat the copy-and-paste a few times, and observe how you get a different sequence every time.

Here is an example of one repeat:

04011111111111111111111111111111100000444444444444444444444444444444444444444444444444487778088889999866661222133333355553112626668889999999900007444477090000088866621212222113232222533332222221611118666808888897774497777888066661112333333535555552222166666610111118788894449999878888717777011116222266533333356662251555555555500007778789999984444999700005551222222263333363633333333222221215550079444448444448999977011111152222336222233351111007171111977777788889499999888887111070555500032222226666666222233305000077775755555188888181888888199994491999999858557070000733332222262666663733333050555550888989000091119499108585858376262737385858588505050519194915050583837327226273785050505141949494915058737232363232323737387858501914149095959878383838383838383826268237393939393935040144045353535353535353535397282826262628793535444010140404535959597982826282729535340101434343459727272782868686267979797979545431010303454494767628267649599390301030395456565752778725252649390101934646265757875756524393133909093931425252626278787878787872626262545414141434949494049493915252678787878686252151593949094345454541268676767676767682114153939090909035551541412822676726814153030390501048686828282828787272646666464015955555555539393951510101014146278787878726641405050509993939054141627878727676141505593939501040467627272787828282828286404010593935010146862622777864646464610105050303030309093535353164648782827224661359595909005031313136424242478742424636101159595951010606030234343737878834202026262621595951515252604343434848484747483030303030303625251595555551212111162222300000000333333333848787434343434343020061515951556020203434478743404202020651919191515151515151616024343787343420260601519595106626242387832324260606060611595161012424338788888424040106565960000142887872721010169616102787201010101010106960606061662727277777787876767606060606969666070777887866667979797976868789879767698967678988767689868686868786969696987898689897978986898987969796767976767699977979797979797979799797999999

The ten threads are writing their greetings independently of each other, in an essentially arbitrary order. Thus, as programmers we really cannot assume anything about how the threads run in relation to each other, unless we take care to synchronize their execution.

Synchronization

Synchronization refers to actions taken in a thread to enforce dependencies between the executions of different threads.

Two natural reference points in the execution of a thread are when the thread starts and when the thread exits.

The action of starting a thread by calling its start()-method is perhaps the simplest form of synchronization between threads. Namely, the thread that performs the action (the thread that calls start()) and the thread that starts are not independent in their executions since one must start before the other. This synchronization is however rather weak, since after one thread starts the other, the two threads execute asynchronously unless further synchronization actions are taken.

The action of joining with a thread, that is, waiting until the thread exits, is another form of synchronization between threads. A thread joins with another thread by calling its join()-method, which returns after the thread exits. Joining is a very strong form of synchronization because the execution of the calling thread is suspended (blocked) until the other thread exits. That is, we are enforcing that the execution of the calling thread can resume only after the execution of the other thread has taken place.

Many other forms of synchronization exist, but for now we will be content with starting and joining threads. The more challenging exercises will explore some further possibilities for synchronization.

Why synchronization?

While it would be great if our threads could execute completely independently of each other, in most cases our threads need to use common resources, such as memory or operating system services.

For example, one thread may want to set (write) the value of a variable and another thread may want to use (read) the value. Without synchronization between the two threads we have no control over whether the thread that reads will obtain the value of the variable (a) as it was before the write, or (b) after the write took place:

_images/thread-async.png

With synchronization we can obtain control how the threads execute in relation to each other. Our intent in this round, however, is not to enter into the intricacies of thread synchronization. (Concurrent programming in its full generality is a subtle and difficult topic that easily deserves a course or two on its own.)