Scaling up from a single computer

The previous round introduced us to computation as a concurrent phenomenon, simultaneously taking place across a myriad of different types of devices from individual smartphones to the scale of all the computers in the Internet.

In this round we will explore programming abstractions that scale up. That is programs that can run all the way from individual computers to industrial-scale infrastructure.

As an example of what is meant by industrial scale, consider the Google Hamina Data Center. This old paper mill (designed by Alvar Aalto) nowadays houses rows and rows of server racks, each rack with tens of computers. In total, tens of thousands of networked servers.

Google Hamina is an example of a warehouse-scale computer or a WSC.

With the appropriate abstractions a WSC can be programmed as effortlessly as a single computer.

Computation becomes a commodity

A driving force behind the development of industrial-scale computing has been modern Internet services - web search, streaming, commerce, social media, and so forth. But extensive computing resources are also required by the sciences (the search for the Higgs Boson, for example), by the arts (e.g. movie special effects require a lot of resources), and in business analytics.

An application it needs to be programmed, however. This raises the fundamental question of how to do it easily, independent on the warehouse scale.

So industrial-scale computing is not only made possible by networking computers together to form a cluster. It also requires systems and libraries able to distribute both processes and data seamlessly over the computers, and for these to be accessible to programmers.

The development of such programming abstractions and techniques has progressed immensely the last 25 years, and today it is possible to write programs that can scale almost seamlessly from a single computer up to a whole data center.

Data centers usually also provide resources such as virtualized machine instances: simulated computers not necessarily corresponding to the hardware of the data warehouse. This allows for configurable hardware capabilities in the emulated machine, and for resource allocation and time sharing.

Thus, to computing resources has become a commodity that can be bought and scaled up on demand. Furthermore, since the resources are virtualized, they can be allocated, used, and released with little effort and at low cost.

In particular, it is enabling developers to start small and then scale up to industrial scale essentially on demand.