Measuring run time

Note

To follow the contents on this page, you may want to consult the source code package this project (if you didn’t already download it at the start of the chapter).

Let us start from the very basics. That is, we want to measure how much time is spent in executing some function. In Scala, one possibility is to use the System.nanoTime function, which returns, in nanoseconds, the current wall clock time, as measured by the Java Virtual Machine, using the timing resources available in the underlying hardware.

Warning

Caveat on timing precision and granularity. Even if the processor clock cycle is typically less than one nanosecond and thus in principle enables rather precise measurements of time, in many cases the available timing granularity and precision from functions such as System.nanoTime is considerably worse. That is, even if the unit of measure for System.nanoTime is nanoseconds, the actual timing in the underlying function implementation and the operating system may only take place at, say, microsecond or millisecond granularity. This depends in general on the operating system and on the hardware on which you are running Scala (the JVM).

Let us put System.nanoTime into use and build the following timer package:

package object timer {
  /**
   * Runs the argument function f and measures the wall clock time spent in it in seconds.
   * Returns a pair consisting of
   * - the return value of the function call and
   * - the time spent in executing the function.
   */
  def measureWallClockTime[T](f: => T): (T, Double) = {
    val start = System.nanoTime
    val r = f
    val end = System.nanoTime
    val t = (end - start) / 1000000000.0
    (r, t)
  }
}

Open up a console and import the package contents:

scala> import timer._

Now we can measure how much time it takes to compute the sum of squares of the first thousand natural numbers:

scala> measureWallClockTime((1 to 1000).foldLeft(0)((s,v) => s+(v*v)))
res: (Int, Double) = (333833500,0.001154755)

Here \(333833500\) is the sum and it took \(0.001154755\) seconds to compute it. (Note that the time reported on your computer may be different, of course.)

Recall that the “f: => T” -construct is the “call-by-name” construction that we studied in the Functions section, allowing us to pass code to other functions as argument (or more precisely, the code is automatically translated by the Scala compiler into a function object that is then passed as an argument).

From wall clock time to CPU time. One difficulty with System.nanoTime is that it measures wall clock time, that is, the real physical time elapsed during execution. We can witness this by using the Thread.sleep functionality that pauses the current thread’s execution for some time (specified in milliseconds):

scala> measureWallClockTime(Thread.sleep(1000))
res: (Unit, Double) = ((),1.000215994)

Our measurement thus indicates that the execution took one second, even if in reality the actual execution was paused. A further complication occurs with the fact that our computer may be (and in general is) executing multiple processes at the same time (that is, the computer is multitasking), so the reported wall clock time can be substantially higher than the time actually spent executing the function whose running time we want to measure.

Instead of wall clock time, in Scala one way to measure the desired CPU time – the time actually spent by the CPU in executing the function – is to use the ManagementFactory functionality available in Java:

import java.lang.management.{ ManagementFactory, ThreadMXBean }

package object timer {
  val bean: ThreadMXBean = ManagementFactory.getThreadMXBean()
  def getCpuTime = if (bean.isCurrentThreadCpuTimeSupported()) bean.getCurrentThreadCpuTime() else 0
  /**
   * Runs the argument function f and measures the user+system time spent in it in seconds.
   * Accuracy depends on the system, preferably not used for runs taking less than 0.1 seconds.
   * Returns a pair consisting of
   * - the return value of the function call and
   * - the time spent in executing the function.
   */
  def measureCpuTime[T](f: => T): (T, Double) = {
    val start = getCpuTime
    val r = f
    val end = getCpuTime
    val t = (end - start) / 1000000000.0
    (r, t)
  }
}

Now we witness what we want:

scala> measureCpuTime(Thread.sleep(1000))
res: (Unit, Double) = ((),0.0)

Measuring short run times: the repetition trick. As the precision and timing granularity of getThreadCpuTime depends on the system and is not necessarily in the nanosecond scale, a common technique to measure short run times is to repeat the same execution many times, measure the total CPU time, and report the average time over the repetitions as output. (This naturally requires that our function is pure, that is, free of side effects and otherwise always behaves in the same way so that the repeated executions are essentially identical.)

/**
 * The same as measureCpuTime but the function f is applied repeatedly
 * until a cumulative threshold time use is reached (currently 0.1 seconds).
 * The time returned is the cumulative time divided by the number of repetitions.
 * Therefore, better accuracy is obtained for very small run-times.
 * The function f should be side-effect free!
 */
def measureCpuTimeRepeated[T](f: => T): (T, Double) = {
  val start = getCpuTime
  var end = start
  var runs = 0
  var r: Option[T] = None
  while (end - start < 100000000L) {
    runs += 1
    r = Some(f)
    end = getCpuTime
  }
  val t = (end - start) / (runs * 1000000000.0)
  (r.get, t)
}

Let us look at a few examples:

scala> val l = (1 to 100000).toList
l: List[Int] = List(1, 2, 3, 4, ...

scala> measureCpuTime { l.sum }
res: (Int, Double) = (705082704,0.0)

scala> measureCpuTimeRepeated { l.sum }
res: (Int, Double) = (705082704,9.70873786407767E-4)

Note

Beyond our simple tools here, there exist dedicated profiling tools that enable more detailed measurements on which functions the run time was actually spent. (*)

For example:

  • We can use a JVM profiling tool such as VisualVM to display a a function-level run-time profile at the end of the run.

  • The cachegrind skin of the valgrind tool can profile any executable program. (The amount of detail shown in the output depends whether debugging information was available in the executable binary or not.)

  • Many commercial integrated development environments (IDEs) have support for profilers.

  • Processor vendors in many cases supply dedicated profiling software that can analyze the detailed performance of a specific processor microarchitecture to identify architectural bottlenecks in execution.

A detailed study of profiling and advanced performance optimization easily deserves a course on its own, such as CS-E4580 Programming Parallel Computers, so we will not enter into detailed discussion here.